Skip to article frontmatterSkip to article content

Pose3

The Pose3 class implements two things: a pose manifold and a Lie group of transforms. They look awfully similar, and are often treated as the same thing, but they are really not.

Open In Colab

%pip install --quiet gtsam-develop
import gtsam
import numpy as np
from gtsam import Pose3, Rot3, Point3

Manifold SE(3)\mathcal{SE}(3)

The manifold SE(3)\mathcal{SE}(3) is defined by elements (C,r)(C, r), where:

  • CSO(3)C \in \mathcal{SO}(3): The orientation or attitude.
  • rR3r \in \mathbb{R}^3: The position.

This manifold represents poses in 3D space, with CC encoding orientation/attitude and rr encoding the position of a body in some reference frame.

This manifold can be thought of all possible values that can specify the position and orientation of a rigid body in some reference frame. It does not have a notion of composition: we would never think to compose two poses of two different rigid bodies.

The simplest thing you can do in GTSAM is just creating a pose with identity orientation and position:

P = Pose3()
print(f"P=\n{P}")
P=
R: [
	1, 0, 0;
	0, 1, 0;
	0, 0, 1
]
t: 0 0 0

But of course, it is more useful to specify some orientation and position. To do that, you need to create a Rot3 instance, see Rotations in 3D. For example, here is a pose of a robot at r=(3,4,0)r=(3,4,0) and rotated by 30 degrees around the Z-axis:

C = Rot3.Yaw(np.deg2rad(30))
r = Point3(3,4,0)
print(C,r)
P = Pose3(C,r)
print(f"P=\n{P}")
R: [
	0.866025, -0.5, 0;
	0.5, 0.866025, 0;
	0, 0, 1
]
 [3. 4. 0.]
P=
R: [
	0.866025, -0.5, 0;
	0.5, 0.866025, 0;
	0, 0, 1
]
t: 3 4 0

A great many things can be done using just the manifold SE(3)\mathcal{SE}(3), as it provides a flexible and foundational framework for representing and reasoning about poses in 3D space. Below are some key use cases and examples:

Trajectories on SE(3)\mathcal{SE}(3)

A trajectory over time can be represented as a continuous or discrete curve on the manifold:

P(t)=(C(t),r(t)),P(t) = (C(t), r(t)),

where:

  • C(t)C(t) is a time-varying orientation.
  • r(t)r(t) is a time-varying position vector.

This representation is useful for describing motion paths of rigid bodies, such as robots, drones, or vehicles.

Optimization on SE(3)\mathcal{SE}(3)

Many problems involve optimization over the pose manifold, such as:

  • Localization: Estimating the pose of a robot or sensor in a global or local frame.
  • SLAM (Simultaneous Localization and Mapping): Optimizing a graph of poses and landmarks to minimize error.

Lie Group SE(3)SE(3)

The Lie group SE(3)SE(3) consists of elements (R,t)(R, t), where:

  • RSO(3)R \in SO(3): A rotation matrix.
  • tR3t \in \mathbb{R}^3: A translation vector.

The group operation in SE(3)SE(3) is defined as:

(R1,t1)(R2,t2)=(R1R2,R1t2+t1).(R_1, t_1) \cdot (R_2, t_2) = (R_1 R_2, R_1 t_2 + t_1).

This operation combines two transformations, where R1R2R_1 R_2 composes the rotations and R1t2+t1R_1 t_2 + t_1 composes the translations.

Note that this particular form of composition defines SE(3)SE(3) as the semi-product of the group SO(3)SO(3) and the vector space R3\mathbb{R}^3. This should be contrasted with a direct product, which would yield (R1R2,t2+t1)(R_1 R_2, t_2 + t_1) as the composition.

Group Action on 3D Points

The group SE(3)SE(3) acts on a 3D point pR3p \in \mathbb{R}^3 as follows:

p=(R,t)p=Rp+t.p' = (R, t) \cdot p = R p + t.

Here, the rotation RR is applied first, followed by the translation tt.

Group Action on Poses

The group SE(3)SE(3) also acts on a pose (C,r)SE(3)(C,r) \in \mathcal{SE}(3) as follows:

(R,t)(C,r)=(RC,Rr+t).(R,t) \cdot (C,r) = (R C, R r +t).
  • RCR C rotates the orientation into a different frame.
  • Rr+tR r +t transforms the position, by first rotating the position in the old frame, then translating to account for the origin tt of the new frame.

4x4 Matrix Lie Group View of Transforms

A transform in SE(3)SE(3) can also be represented as a 4×44 \times 4 homogeneous transformation matrix: $$ T =

[Rt01]\begin{bmatrix} R & t \\ 0^\top & 1 \end{bmatrix}

$$

The composition of two transforms T1T_1 and T2T_2 in SE(3)SE(3) is then simply matrix multiplication $$ T_1 \cdot T_2 =

[R1t101]\begin{bmatrix} R_1 & t_1 \\ 0^\top & 1 \end{bmatrix}
[R2t201]\begin{bmatrix} R_2 & t_2 \\ 0^\top & 1 \end{bmatrix}

=

[R1R2R1t2+t101]\begin{bmatrix} R_1 R_2 & R_1 t_2 + t_1 \\ 0^\top & 1 \end{bmatrix}

$$

and the inverse is given by: $$ T^{-1} =

[RRt01]\begin{bmatrix} R^\top & -R^\top t \\ 0^\top & 1 \end{bmatrix}

$where where R^\topisthetranspose(andalsotheinverse)oftherotationmatrix is the transpose (and also the inverse) of the rotation matrix R$.

To implement the action of SE(3)SE(3) on R3\mathbb{R}^3 we need to embed a point pp in homogeneous coordinates as p~=[p,1]\tilde{p} = [p^\top, 1]^\top, where pR3p \in \mathbb{R}^3. The action of a transform TT on a point p~\tilde{p} is then given by: $$ \tilde{p}’ = T \tilde{p} =

[Rt01]\begin{bmatrix} R & t \\ 0^\top & 1 \end{bmatrix}
[p1]\begin{bmatrix} p \\ 1 \end{bmatrix}

=

[Rp+t1]\begin{bmatrix} R p + t \\ 1 \end{bmatrix}

p’ \ 1 \end{bmatrix}. $$

The matrix representation provides an elegant way to handle transformations in 3D space. This view also emphasizes that SE(3)SE(3) is a matrix Lie group, i.e., a subgroup of invertible d×dd\times d matrices, with in this case d=4d=4. However, it is by no means required to implement the Lie group SE(3)SE(3) and in fact GTSAM internally does not use 4×44\times 4 matrices.

Reference Frame Hierarchies

Poses can sometimes be used as transforms when hierarchical relationships exist between frames. For example, consider the pose PibP^b_i of an IMU relative to a rigid body B, which itself has a pose PbnP^n_b in a navigation frame N.

If we seek is the pose PinP^n_i of the IMU relative to the navigation frame, we need to upgrade the body pose PbnP^n_b to a rigid transform TbnT^n_b. The pose of the IMU relative to the navigation frame is then given by:

Pin=TbnPib.P^n_i = T^n_b \cdot P^b_i.

Note how the bb indices cancel out.

Retractions and the Exponential Map

The Tangent Space of SE(3)SE(3)

The tangent space of SE(3)SE(3) at the identity, known as its Lie algebra se(3)\mathfrak{se}(3), can be thought of informally as encoding angular and linear velocities. Elements of se(3)\mathfrak{se}(3) are twists ξ=(ω,v)\xi = (\omega, v), where:

  • ωR3\omega \in \mathbb{R}^3: Represents angular velocity.
  • vR3v \in \mathbb{R}^3: Represents linear velocity.

These quantities describe instantaneous motion in 3D space. A twist ξ can be represented as a skew-symmetric matrix: $$ \xi^\wedge =

[ωv00]\begin{bmatrix} \omega^\wedge & v \\ 0^\top & 0 \end{bmatrix}

$where where \omega^\wedgeistheskewsymmetricmatrixof is the skew-symmetric matrix of \omega$, encoding rotational motion.

To describe finite motions, the twist ξ is scaled by a time or displacement parameter Δt\Delta t, resulting in a scaled twist ξΔt=(ϕ,ρ)\xi \Delta t = (\phi, \rho):

  • ϕ=ωΔt\phi = \omega \Delta t: A finite angular displacement.
  • ρ=vΔt\rho = v \Delta t: A finite linear displacement.

We used Δt\Delta t to indicate a time interval: not this has nothing to do with the translation tt. This finite motion ξΔt\xi \Delta t is the input to the exponential map, which we explain next.

The Exponential Map for SE(3)SE(3)

The exponential map for SE(3)SE(3) takes a finite motion ξΔt=(ϕ,ρ)\xi \Delta t = (\phi, \rho), derived from a twist ξ=(ω,v)\xi = (\omega, v) scaled by a parameter tt, and maps it to a rigid body transform (R,t)SE(3)(R, t) \in SE(3). The exponential map produces a transformation (R,t)(R, t) in SE(3)SE(3): $$ \exp((\xi \Delta t)^\wedge) =

[Rt01]\begin{bmatrix} R & t \\ 0^\top & 1 \end{bmatrix}

$or or (R,t)\in SE(3)$, where

  • R=exp(ϕ)R = \exp(\phi^\wedge), where ϕ\phi^\wedge is the skew-symmetric matrix of ϕ.
  • t=Jl(ϕ)ρt = J_l(\phi) \rho, where Jl(ϕ)J_l(\phi) is the left Jacobian of SO(3)SO(3), a 3×33 \times 3 matrix. Jl(ϕ)J_l(\phi) handles the semi-direct product nature of SE(3)SE(3) by coupling the rotational and translational components appropriately.

An often-used shorthand notation is defining Exp:R6SE(3)\text{Exp}: \mathbb{R}^6 \rightarrow SE(3) or a two-argument variant as:

Exp(ϕ,ρ)=Exp(ξΔt)exp((ξΔt))=(Exp(ϕ),Jl(ϕ)ρ)\text{Exp}(\phi,\rho) = \text{Exp}(\xi \Delta t) \doteq \exp((\xi \Delta t)^\wedge) = (\text{Exp}(\phi), J_l(\phi) \rho)

where we used the same notation for SO(3)SO(3), i.e., Exp(ϕ)exp(ϕ)\text{Exp}(\phi)\doteq \exp(\phi^\wedge).

Retractions to the Manifold SE(3)\mathcal{SE}(3)

The exponential map can also be used as a retraction for optimization on the SE(3)SE(3) manifold. Retractions are mappings from the tangent space to the manifold, e.g., the exponential map implements the following retraction:

(C,r)×(ϕ,ρ)exp((ξΔt))(C,r)=(Cexp(ϕ),r+CJl(ϕ)ρ),(C, r) \times (\phi, \rho) \mapsto \exp((\xi \Delta t)^\wedge) \cdot (C,r) = (C \exp(\phi^\wedge), r + C J_l(\phi) \rho),

Alternative retractions exists, however, and the only restriction on them is that they need to have the same behavior as the exponential map near the identity. One alternative retraction simply treats the attitude and position separately:

(C,r)×(ϕ,ρ)(Cexp(ϕ),r+ρ),(C, r) \times (\phi, \rho) \mapsto (C\exp(\phi^\wedge), r + \rho),

This alternative retraction behaves identically to the exponential map at t=0t = 0 and is computationally simpler.