Skip to article frontmatterSkip to article content

Smart Factors

Smart factors in GTSAM provide an efficient way to handle constraints involving landmarks (like 3D points) in Structure from Motion (SfM) or SLAM problems without explicitly including the landmark variables in the optimized state. Instead, the landmark is implicitly represented and marginalized out, leading to factors that directly constrain only the camera-related variables (poses and/or calibrations). This often results in a smaller state space and can significantly speed up optimization, especially when using iterative linear solvers.

The core idea is based on the Schur complement. If we consider a factor graph involving cameras CiC_i and a single landmark pp, the Hessian matrix of the linearized system has a block structure:

H=[HppHpcHcpHcc]H = \begin{bmatrix} H_{pp} & H_{pc} \\ H_{cp} & H_{cc} \end{bmatrix}

where HppH_{pp} relates to the landmark, HccH_{cc} relates to the cameras, and HpcH_{pc} (HcpH_{cp}) are the off-diagonal blocks. Marginalizing out the landmark variable δp\delta_p corresponds to solving the system using the Schur complement S=HccHcpHpp1HpcS = H_{cc} - H_{cp} H_{pp}^{-1} H_{pc}. Smart factors effectively represent or compute this Schur complement system directly.

Key Reference: Carlone et al. (2014).

Mathematical Details

1. Triangulation

The core internal operation of a smart factor is triangulation. Given a set of cameras CiC_i (which include pose and calibration) and corresponding 2D measurements ziz_i, the factor finds the most likely 3D landmark position pp^* by solving:

p=argminpiΠ(Ci,p)ziΣi2p^* = \underset{p}{\arg \min} \sum_i \| \Pi(C_i, p) - z_i \|_{\Sigma_i}^2

where Π(Ci,p)\Pi(C_i, p) is the projection function of the ii-th camera, and Σi\Sigma_i is the noise covariance for the ii-th measurement (though typically a shared noise model Σ is used). GTSAM uses the robust gtsam.triangulateSafe function internally, which returns a gtsam.TriangulationResult containing the point estimate pp^* and status information (valid, degenerate, behind camera, etc.).

2. Error Function

The nonlinear error minimized by the smart factor is the sum of squared reprojection errors using the implicitly triangulated point pp^*:

error({Ci})=12iΠ(Ci,p)ziΣ2\text{error}( \{C_i\} ) = \frac{1}{2} \sum_i \| \Pi(C_i, p^*) - z_i \|_{\Sigma}^2

Note that pp^* itself is a function of the cameras {Ci}\{C_i\}.

3. Linearization

When a smart factor is linearized (e.g., during optimization), it computes a Gaussian factor that only involves the camera variables {δCi}\{\delta_{C_i}\}. This is achieved by implicitly performing the Schur complement.

Let the standard projection factor error for camera ii be ei(δCi,δp)=FiδCi+Eiδpbie_i(\delta_{C_i}, \delta_p) = F_i \delta_{C_i} + E_i \delta_p - b_i, where FiF_i and EiE_i are the Jacobians with respect to the camera and point, evaluated at the current estimates Ci0C_i^0 and the triangulated point pp^*, and bi=ziΠ(Ci0,p)b_i = z_i - \Pi(C_i^0, p^*) is the residual. The combined linearized system is:

iFiδCi+EiδpbiΣ2=[FE][δCδp]bΣ2\sum_i \| F_i \delta_{C_i} + E_i \delta_p - b_i \|_{\Sigma}^2 = \left\| \begin{bmatrix} F & E \end{bmatrix} \begin{bmatrix} \delta_C \\ \delta_p \end{bmatrix} - b \right\|_{\Sigma}^2

where F=blkdiag(Fi)F = \text{blkdiag}(F_i), E=[E1T,...,EmT]TE = [E_1^T, ..., E_m^T]^T, b=[b1T,...,bmT]Tb = [b_1^T, ..., b_m^T]^T, and δC=[δC1T,...,δCmT]T\delta_C = [\delta_{C_1}^T, ..., \delta_{C_m}^T]^T. The noise model Σ is typically block-diagonal with identical blocks.

Marginalizing δp\delta_p yields a Gaussian factor on δC\delta_C with Hessian HSH_S and information vector ηS\eta_S:

HS=FTΣ1FFTΣ1E(ETΣ1E)1ETΣ1FH_S = F^T \Sigma^{-1} F - F^T \Sigma^{-1} E (E^T \Sigma^{-1} E)^{-1} E^T \Sigma^{-1} F
ηS=FTΣ1bFTΣ1E(ETΣ1E)1ETΣ1b\eta_S = F^T \Sigma^{-1} b - F^T \Sigma^{-1} E (E^T \Sigma^{-1} E)^{-1} E^T \Sigma^{-1} b

Let P=(ETΣ1E)1P = (E^T \Sigma^{-1} E)^{-1} be the point covariance and define the projection matrix Q=Σ1(IEPETΣ1)Q = \Sigma^{-1} (I - E P E^T \Sigma^{-1}). Then the system simplifies to:

HS=FTQFH_S = F^T Q F
ηS=FTQb\eta_S = F^T Q b

The smart factor’s linearization computes or represents this Schur complement system (HS,ηS)(H_S, \eta_S).

The Smart Factor Family

GTSAM provides several types of smart factors, primarily for projection measurements. They inherit common functionality from gtsam.SmartFactorBase.

gtsam.SmartProjectionFactor

This factor is used when both camera poses and camera calibration are unknown and being optimized. It connects multiple CAMERA variables, where the CAMERA type (e.g., gtsam.PinholeCameraCal3_S2) encapsulates both the pose and the intrinsic calibration.

See also: [gtsam/slam/SmartFactorBase.h], SmartProjectionFactor Notebook.

gtsam.SmartProjectionPoseFactor

This factor is optimized for the common case where camera calibration is known and fixed, and only the camera poses (gtsam.Pose3) are being optimized. It assumes a single, shared calibration object (gtsam.Cal3_S2, gtsam.Cal3Bundler, etc.) for all measurements within the factor.

See also: [gtsam/slam/SmartProjectionPoseFactor.h], SmartProjectionPoseFactor Notebook.

gtsam.SmartProjectionRigFactor

This factor extends the concept to multi-camera rigs where the relative poses and calibrations of the cameras within the rig are fixed, but the rig’s body pose (gtsam.Pose3) is being optimized. It allows multiple measurements from different cameras within the rig, potentially associated with the same body pose key.

Note: Due to implementation details, the CAMERA template parameter used with this factor should typically be gtsam.PinholePose rather than gtsam.PinholeCamera.

See also: [gtsam/slam/SmartProjectionRigFactor.h], SmartProjectionRigFactor Notebook.

Configuration

The behavior of smart factors is controlled by gtsam.SmartProjectionParams. Key parameters include:

See also: [gtsam/slam/SmartFactorParams.h], SmartFactorParams Notebook.

Linear Factor Representations

Depending on the linearizationMode specified in gtsam.SmartProjectionParams, the linearize method of a smart factor returns different types of gtsam.GaussianFactor objects, all representing the same underlying Schur complement system but optimized for different solvers:

1. gtsam.RegularHessianFactor (HESSIAN mode, default)

When a smart factor is linearized using the HESSIAN mode, it computes and returns a gtsam.RegularHessianFactor. In this specific context, this factor explicitly represents the dense Schur complement system obtained after marginalizing out the 3D landmark.

Source: [gtsam/linear/RegularHessianFactor.h].

2. gtsam.RegularImplicitSchurFactor (IMPLICIT_SCHUR mode)

This factor stores the ingredients FiF_i, EiE_i, PP, and bib_i (or rather, their whitened versions) instead of the final Hessian HSH_S. Its key feature is an efficient multiplyHessianAdd method that computes the Hessian-vector product yy+αHSxy \leftarrow y + \alpha H_S x without ever forming the potentially large HSH_S matrix. It computes this via:

vi=Fixiw=ETΣ1vd=Pwui=Eidyiyi+αFiTΣ1(viui)v_i = F_i x_i \\ w = E^T \Sigma^{-1} v \\ d = P w \\ u_i = E_i d \\ y_i \leftarrow y_i + \alpha F_i^T \Sigma^{-1} (v_i - u_i)

This is highly efficient for iterative linear solvers like Conjugate Gradient (CG), which primarily rely on Hessian-vector products.

Source: [gtsam/slam/RegularImplicitSchurFactor.h].

3. gtsam.JacobianFactorQ (JACOBIAN_Q mode)

This factor represents the Schur complement system as a Jacobian factor. It computes a projection matrix Qproj=Σ1/2(IEPETΣ1)Σ1/2Q_{proj} = \Sigma^{-1/2} (I - E P E^T \Sigma^{-1}) \Sigma^{-1/2} (where Σ=blkdiag(Σi)\Sigma = \text{blkdiag}(\Sigma_i)) and represents the error term Qproj1/2(FδCb)2\| Q_{proj}^{1/2} (F \delta_C - b) \|^2. It stores the projected Jacobians Ai=Qproj,i1/2FiA_i = Q_{proj, i}^{1/2} F_i and the projected right-hand side b=Qproj1/2bb' = Q_{proj}^{1/2} b.

Source: [gtsam/slam/JacobianFactorQ.h].

4. gtsam.JacobianFactorSVD (JACOBIAN_SVD mode)

This factor uses the “Nullspace Trick”. It computes EnullE_{null}, a matrix whose columns form an orthonormal basis for the left nullspace of EE (i.e., EnullTE=0E_{null}^T E = 0). Multiplying the original linearized system by EnullTΣ1E_{null}^T \Sigma^{-1} yields a smaller system involving only δC\delta_C:

EnullTΣ1FδCEnullTΣ1b(EnullTΣ1Enull)12\| E_{null}^T \Sigma^{-1} F \delta_C - E_{null}^T \Sigma^{-1} b \|^2_{(E_{null}^T \Sigma^{-1} E_{null})^{-1}}

The factor stores the projected Jacobian A=EnullTΣ1FA = E_{null}^T \Sigma^{-1} F, the projected right-hand side b=EnullTΣ1bb' = E_{null}^T \Sigma^{-1} b, and an appropriate noise model derived from (EnullTΣ1Enull)1(E_{null}^T \Sigma^{-1} E_{null})^{-1}. This method is mathematically equivalent to the Schur complement but can offer computational advantages in certain scenarios.

Source: [gtsam/slam/JacobianFactorSVD.h].

Conclusion

Smart factors provide a powerful mechanism for efficiently handling landmark-based constraints in SLAM and SfM. By implicitly marginalizing landmarks, they reduce the size of the state space and enable the use of specialized linear factor representations like gtsam.RegularImplicitSchurFactor, which are highly effective when combined with iterative solvers like Conjugate Gradient. Understanding the underlying mathematical connection to the Schur complement and the different linearization options allows users to choose the most appropriate configuration for their specific problem and solver.

References
  1. Carlone, L., Kira, Z., Beall, C., Indelman, V., & Dellaert, F. (2014, May). Eliminating conditionally independent sets in factor graphs: A unifying perspective based on smart factors. 2014 IEEE International Conference on Robotics and Automation (ICRA). 10.1109/icra.2014.6907483