Smart factors in GTSAM provide an efficient way to handle constraints involving landmarks (like 3D points) in Structure from Motion (SfM) or SLAM problems without explicitly including the landmark variables in the optimized state. Instead, the landmark is implicitly represented and marginalized out, leading to factors that directly constrain only the camera-related variables (poses and/or calibrations). This often results in a smaller state space and can significantly speed up optimization, especially when using iterative linear solvers.
The core idea is based on the Schur complement. If we consider a factor graph involving cameras and a single landmark , the Hessian matrix of the linearized system has a block structure:
where relates to the landmark, relates to the cameras, and () are the off-diagonal blocks. Marginalizing out the landmark variable corresponds to solving the system using the Schur complement . Smart factors effectively represent or compute this Schur complement system directly.
Key Reference: Carlone et al. (2014).
Mathematical Details¶
1. Triangulation¶
The core internal operation of a smart factor is triangulation. Given a set of cameras (which include pose and calibration) and corresponding 2D measurements , the factor finds the most likely 3D landmark position by solving:
where is the projection function of the -th camera, and is the noise covariance for the -th measurement (though typically a shared noise model Σ is used). GTSAM uses the robust gtsam.triangulateSafe
function internally, which returns a gtsam.TriangulationResult
containing the point estimate and status information (valid, degenerate, behind camera, etc.).
2. Error Function¶
The nonlinear error minimized by the smart factor is the sum of squared reprojection errors using the implicitly triangulated point :
Note that itself is a function of the cameras .
3. Linearization¶
When a smart factor is linearized (e.g., during optimization), it computes a Gaussian factor that only involves the camera variables . This is achieved by implicitly performing the Schur complement.
Let the standard projection factor error for camera be , where and are the Jacobians with respect to the camera and point, evaluated at the current estimates and the triangulated point , and is the residual. The combined linearized system is:
where , , , and . The noise model Σ is typically block-diagonal with identical blocks.
Marginalizing yields a Gaussian factor on with Hessian and information vector :
Let be the point covariance and define the projection matrix . Then the system simplifies to:
The smart factor’s linearization computes or represents this Schur complement system .
The Smart Factor Family¶
GTSAM provides several types of smart factors, primarily for projection measurements. They inherit common functionality from gtsam.SmartFactorBase
.
gtsam.SmartProjectionFactor
¶
This factor is used when both camera poses and camera calibration are unknown and being optimized. It connects multiple CAMERA
variables, where the CAMERA
type (e.g., gtsam.PinholeCameraCal3_S2
) encapsulates both the pose and the intrinsic calibration.
See also: [gtsam
gtsam.SmartProjectionPoseFactor
¶
This factor is optimized for the common case where camera calibration is known and fixed, and only the camera poses (gtsam.Pose3
) are being optimized. It assumes a single, shared calibration object (gtsam.Cal3_S2
, gtsam.Cal3Bundler
, etc.) for all measurements within the factor.
See also: [gtsam
gtsam.SmartProjectionRigFactor
¶
This factor extends the concept to multi-camera rigs where the relative poses and calibrations of the cameras within the rig are fixed, but the rig’s body pose (gtsam.Pose3
) is being optimized. It allows multiple measurements from different cameras within the rig, potentially associated with the same body pose key.
Note: Due to implementation details, the CAMERA
template parameter used with this factor should typically be gtsam.PinholePose
rather than gtsam.PinholeCamera
.
See also: [gtsam
Configuration¶
The behavior of smart factors is controlled by gtsam.SmartProjectionParams
. Key parameters include:
linearizationMode
: Selects which type ofGaussianFactor
is returned bylinearize
, default isHESSIAN
, see below.degeneracyMode
: Defines how to handle cases where triangulation fails or is ill-conditioned (e.g., collinear cameras). Options include ignoring it, returning a zero-information factor, or treating the point as being at infinity.triangulation
: Parameters passed totriangulateSafe
, such as rank tolerance for SVD and outlier rejection thresholds.retriangulationThreshold
: If the camera poses change significantly between linearizations (measured by Frobenius norm), the point is re-triangulated.throwCheirality
/verboseCheirality
: Control behavior when the triangulated point ends up behind one or more cameras.
See also: [gtsam
Linear Factor Representations¶
Depending on the linearizationMode
specified in gtsam.SmartProjectionParams
, the linearize
method of a smart factor returns different types of gtsam.GaussianFactor
objects, all representing the same underlying Schur complement system but optimized for different solvers:
1. gtsam.RegularHessianFactor
(HESSIAN
mode, default)¶
When a smart factor is linearized using the HESSIAN
mode, it computes and returns a gtsam.RegularHessianFactor
. In this specific context, this factor explicitly represents the dense Schur complement system obtained after marginalizing out the 3D landmark.
- Defining Feature: This factor requires that all involved camera variables have the same, fixed dimension D (e.g., for
gtsam.Pose3
), specified via a template parameter. This contrasts with the basegtsam.HessianFactor
which handles variable dimensions. - Stored Information: It stores the blocks of the effective Hessian matrix (denoted internally as ) and the effective information vector (denoted internally as ) that act only on the camera variables.
- Optimized Operators (Benefit for CG): Crucially, this regularity allows
RegularHessianFactor
to provide highly optimized implementations for key linear algebra operations, especially the raw-memory (double*
) version ofmultiplyHessianAdd
. This operation computes the Hessian-vector product () extremely efficiently by leveragingEigen::Map
and fixed-size matrix operations, making it ideal for iterative solvers like Conjugate Gradient (CG) which rely heavily on this product. The baseHessianFactor
’s implementations are more general and typically involve more overhead. - Formation Cost: Creating this factor from a smart factor can still be computationally expensive (forming the Schur complement ), especially for landmarks observed by many cameras.
- Solver Suitability: While efficient for CG (using the
double*
methods), it is also suitable for direct linear solvers (like Cholesky or QR factorization) as it provides the explicit system matrix .
Source: [gtsam
2. gtsam.RegularImplicitSchurFactor
(IMPLICIT_SCHUR
mode)¶
This factor stores the ingredients , , , and (or rather, their whitened versions) instead of the final Hessian . Its key feature is an efficient multiplyHessianAdd
method that computes the Hessian-vector product without ever forming the potentially large matrix. It computes this via:
This is highly efficient for iterative linear solvers like Conjugate Gradient (CG), which primarily rely on Hessian-vector products.
Source: [gtsam
3. gtsam.JacobianFactorQ
(JACOBIAN_Q
mode)¶
This factor represents the Schur complement system as a Jacobian factor. It computes a projection matrix (where ) and represents the error term . It stores the projected Jacobians and the projected right-hand side .
Source: [gtsam
4. gtsam.JacobianFactorSVD
(JACOBIAN_SVD
mode)¶
This factor uses the “Nullspace Trick”. It computes , a matrix whose columns form an orthonormal basis for the left nullspace of (i.e., ). Multiplying the original linearized system by yields a smaller system involving only :
The factor stores the projected Jacobian , the projected right-hand side , and an appropriate noise model derived from . This method is mathematically equivalent to the Schur complement but can offer computational advantages in certain scenarios.
Source: [gtsam
Conclusion¶
Smart factors provide a powerful mechanism for efficiently handling landmark-based constraints in SLAM and SfM. By implicitly marginalizing landmarks, they reduce the size of the state space and enable the use of specialized linear factor representations like gtsam.RegularImplicitSchurFactor
, which are highly effective when combined with iterative solvers like Conjugate Gradient. Understanding the underlying mathematical connection to the Schur complement and the different linearization options allows users to choose the most appropriate configuration for their specific problem and solver.
- Carlone, L., Kira, Z., Beall, C., Indelman, V., & Dellaert, F. (2014, May). Eliminating conditionally independent sets in factor graphs: A unifying perspective based on smart factors. 2014 IEEE International Conference on Robotics and Automation (ICRA). 10.1109/icra.2014.6907483