To discuss the general problem of pose calculation from feature points detected in an image, some knowledge about camera models and perspective projection is necessary. The principle of a pinhole camera model is illustrated in figure 6. A pinhole camera consists of a plane I, the socalled image plane or retina, and a point, the center of projection (C). The image of a point P in the three dimensional world is defined as the intersection of the ray from P through C with the image plane. This projection of a point onto the image plane can, most elegantly, be described by means of perspective geometry, involving homogeneous coordinates. Thus the coordinates u', v' of a point on the image plane can be obtained from the coordinates in three dimensional space, x, y and z, using the following equations:
Here
denotes a
projection matrix, that accounts for the
position and orientation of the camera in space, the socalled extrinsic
parameters, as well as for the physical properties of a camera that is to be
modeled, such as the focal length. The latter information is expressed by the
intrinsic parameters. The matrix
can be broken up into a
product of two matrices,
, where
is a
matrix that depends on the extrinsic parameters only and
is a
matrix, only depending on the intrinsic parameters.
In order to make use of Equation (34) the matrix
must
be known. In effect, this does not pose a problem, since we can obtain it by
applying a standard calibration method. This step needs to be done only
once, as the properties of the camera do not change, assuming a fixed focal
length.
The matrix
, on the other hand, depends on the actual position of the
camera relative to the scene, so that for a fixed camera position
encodes the pose of the objects in the scene. Therefore, estimating pose means
estimating
.