Stereo Visual Odometry with GTSAM

This notebook demonstrates a simple 3D stereo visual odometry problem using the GTSAM Python wrapper. The scenario is as follows:

A robot starts at the origin (Pose 1: X(1)).
The robot moves approximately one meter forward (Pose 2: X(2)).
From both poses, the robot observes three landmarks (L(1), L(2), L(3)) with a stereo camera.

We will:

Define camera calibration and noise models.
Create a factor graph representing the problem.
Provide initial estimates for poses and landmark positions.
Optimize the graph using Levenberg-Marquardt to find the most probable configuration.

Note: This example is also available in C++. If you’re interested in the C++ implementation, refer to StereoVOExample.cpp in the GTSAM examples directory.

1. Setup Factor Graph and Priors¶

We start by creating an empty NonlinearFactorGraph. The first pose X(1) is assumed to be at the world origin. To reflect this, we add a hard constraint using NonlinearEqualityPose3, which fixes X(1) to the identity pose. This ensures that the optimization has a known reference point for the rest of the variables. In GTSAM, factor keys are typically integers, and X(1) is a symbolic shorthand for such a key.

# Create an empty nonlinear factor graph
graph = gtsam.NonlinearFactorGraph()

# Define the first pose at the origin
first_pose = gtsam.Pose3() # Default constructor is identity

# Add a prior constraint on X(1)
graph.add(gtsam.NonlinearEqualityPose3(X(1), first_pose))

print("Graph size after adding prior:", graph.size())

Graph size after adding prior: 1

2. Define Camera Calibration and Measurement Noise¶

To accurately model the stereo observations, we must define both the measurement noise model and the intrinsic calibration of the stereo camera system.

Measurement Noise: We assume an isotropic Gaussian noise model for the stereo measurements. Since each observation consists of three pixel values $(u_L, u_R, v)$ , we use Isotropic.Sigma(3, 1.0) to define a noise model with unit standard deviation in each of the three dimensions.
Camera Calibration (Cal3_S2Stereo): We define a stereo calibration model with the following properties:
- Focal lengths: $f_x=f_y=1000$
- Skew: $s=0$
- Principal point: $(u,v)=(320,\,240)$
- Baseline: $b=0.2$

These parameters represent a synthetic stereo camera setup for this minimal example.

# Create stereo camera calibration object
K = gtsam.Cal3_S2Stereo(1000, 1000, 0, 320, 240, 0.2)
print(K)

# Define the stereo measurement noise model (isotropic, 1 pixel standard deviation)
measurement_noise = gtsam.noiseModel.Isotropic.Sigma(3, 1.0)

K: 1000    0  320
   0 1000  240
   0    0    1
Baseline: 0.2

3. Add Stereo Factors¶

We now add stereo factors to the graph. Each factor connects a camera pose to a landmark. The GenericStereoFactor3D takes:

measured: The StereoPoint2 measurement $(u_L, u_R, v)$ .
model: The noise model for the pixel measurement.
poseKey: Key for the camera pose.
landmarkKey: Key for the landmark.
K: The stereo camera calibration.

We’ll use L(1), L(2), L(3) to represent our three landmarks.

# Factors from X(1) to landmarks
graph.add(gtsam.GenericStereoFactor3D(
    gtsam.StereoPoint2(520, 480, 440), measurement_noise, X(1), L(1), K
))
graph.add(gtsam.GenericStereoFactor3D(
    gtsam.StereoPoint2(120, 80, 440), measurement_noise, X(1), L(2), K
))
graph.add(gtsam.GenericStereoFactor3D(
    gtsam.StereoPoint2(320, 280, 140), measurement_noise, X(1), L(3), K
))

print("Graph size after adding X(1) factors:", graph.size())

Graph size after adding X(1) factors: 4

# Factors from X(2) to landmarks
graph.add(gtsam.GenericStereoFactor3D(
    gtsam.StereoPoint2(570, 520, 490), measurement_noise, X(2), L(1), K
))
graph.add(gtsam.GenericStereoFactor3D(
    gtsam.StereoPoint2(70, 20, 490), measurement_noise, X(2), L(2), K
))
graph.add(gtsam.GenericStereoFactor3D(
    gtsam.StereoPoint2(320, 270, 115), measurement_noise, X(2), L(3), K
))

print("Graph size after adding all factors:", graph.size())

Graph size after adding all factors: 7

4. Create Initial Estimates¶

Nonlinear optimization requires initial estimates for all variables (poses and landmarks). We create a Values object named initial_estimate to store these. To see the effects of optimization in the later sections, we introduce a small error when inserting the initial estimate of pose X(2). In summary, this is what we insert into initial_estimate:

Pose X(1): Identity pose at the origin.
Pose X(2): Camera pose after the robot moves forward by one meter, slightly off-axis and with a small overshoot.
Landmarks (L(1), L(2), L(3)): Placed at estimated 3D positions.

initial_estimate = gtsam.Values()

# Initial estimate for X(1)
initial_estimate.insert(X(1), first_pose)

# Initial estimate for X(2) - robot moved forward ~1m, with some small error
# Pose3(rotation, translation)
# gtsam.Rot3() is identity rotation
initial_pose2 = gtsam.Pose3(gtsam.Rot3(), gtsam.Point3(0.1, -0.1, 1.1))
initial_estimate.insert(X(2), initial_pose2)

# Initial estimates for landmark positions (Point3)
initial_estimate.insert(L(1), gtsam.Point3(1.0, 1.0, 5.0))
initial_estimate.insert(L(2), gtsam.Point3(-1.0, 1.0, 5.0))
initial_estimate.insert(L(3), gtsam.Point3(0.0, -0.5, 5.0))

5. Optimize the Factor Graph¶

We use the Levenberg-Marquardt optimizer to find the solution that minimizes the sum of squared errors defined by the factors in the graph, starting from the initial_estimate.

# Create a Levenberg-Marquardt optimizer
optimizer = gtsam.LevenbergMarquardtOptimizer(graph, initial_estimate)

# Optimize the graph
result = optimizer.optimize()

6. Analyze Results¶

After optimization, we extract the refined poses and landmark positions from the result returned by the optimizer.

Recall that in this example, the robot begins at the origin, and we expect the second pose X(2) to be apprixmately one meter forward along the Z-axis, so X(2) should ideally be near Pose3(Rot3(), Point3(0, 0, 1.0)).

GTSAM minimizes the total reprojection error (i.e. the discrepancy between observed and predicted stereo image coordinates) across all factors. We can evaluate this using graph.error() to compute the total error before and after optimization. A significant drop in this error indicates that the optimizer successfully refined the estimates.

To better understand the outcome of this example, we print the error, both prior- and post-optimization, along with the optimized poses and landmark positions. This helps verify:

That the optimizer is successful in refining the estimates.
That the second pose X(2) aligns with the expected forward motion.
That the landmark locations are consistent and reasonable.

print(f"Initial Error: {graph.error(initial_estimate):.4f}")
print(f"Final Error  : {graph.error(result):.4f}")

# Extract and print optimized values for clarity
optimized_pose1 = result.atPose3(X(1))
optimized_pose2 = result.atPose3(X(2))
optimized_lm1 = result.atPoint3(L(1))
optimized_lm2 = result.atPoint3(L(2))
optimized_lm3 = result.atPoint3(L(3))

print("\nOptimized Pose X(1):\n", optimized_pose1)
print("\nOptimized Pose X(2):\n", optimized_pose2)
print(f"Translation component of X(2): {optimized_pose2.translation()}")

print("\nOptimized Landmark (L(1)):\n", optimized_lm1)
print("\nOptimized Landmark (L(2)):\n", optimized_lm2)
print("\nOptimized Landmark (L(3)):\n", optimized_lm3)

Initial Error: 3434.6236
Final Error  : 0.0000

Optimized Pose X(1):
 R: [
	1, 0, 0;
	0, 1, 0;
	0, 0, 1
]
t: 0 0 0


Optimized Pose X(2):
 R: [
	1, 4.90433e-17, -3.66864e-16;
	-4.90433e-17, 1, 1.63701e-15;
	3.66864e-16, -1.63701e-15, 1
]
t:  6.11736e-13 -6.16862e-13            1

Translation component of X(2): [ 6.11735920e-13 -6.16861807e-13  1.00000000e+00]

Optimized Landmark (L(1)):
 [1. 1. 5.]

Optimized Landmark (L(2)):
 [-1.  1.  5.]

Optimized Landmark (L(3)):
 [-1.66287119e-16 -5.00000000e-01  5.00000000e+00]

GTSAM Docs

SL(4) SLAM with GTSAM

GTSAM Docs

Stereo Visual Odometry Example (Large Dataset)