next up previous contents
Next: Estimation of the Perspective Up: Gaze Estimation Previous: Motivation

Pinhole Cameras and Perspective Projection

To discuss the general problem of pose calculation from feature points detected in an image, some knowledge about camera models and perspective projection is necessary. The principle of a pinhole camera model is illustrated in figure 6. A pinhole camera consists of a plane I, the socalled image plane or retina, and a point, the center of projection (C). The image of a point P in the three dimensional world is defined as the intersection of the ray from P through C with the image plane. This projection of a point onto the image plane can, most elegantly, be described by means of perspective geometry, involving homogeneous coordinates. Thus the coordinates u', v' of a point on the image plane can be obtained from the coordinates in three dimensional space, x, y and z, using the following equations:

  eqnarray443

Here tex2html_wrap_inline2848 denotes a tex2html_wrap_inline2850 projection matrix, that accounts for the position and orientation of the camera in space, the socalled extrinsic parameters, as well as for the physical properties of a camera that is to be modeled, such as the focal length. The latter information is expressed by the intrinsic parameters. The matrix tex2html_wrap_inline2848 can be broken up into a product of two matrices, tex2html_wrap_inline2854 , where tex2html_wrap_inline2856 is a tex2html_wrap_inline2850 matrix that depends on the extrinsic parameters only and tex2html_wrap_inline2860 is a tex2html_wrap_inline2862 matrix, only depending on the intrinsic parameters.

In order to make use of Equation (34) the matrix tex2html_wrap_inline2856 must be known. In effect, this does not pose a problem, since we can obtain it by applying a standard calibration method. This step needs to be done only once, as the properties of the camera do not change, assuming a fixed focal length.

The matrix tex2html_wrap_inline2860 , on the other hand, depends on the actual position of the camera relative to the scene, so that for a fixed camera position tex2html_wrap_inline2860 encodes the pose of the objects in the scene. Therefore, estimating pose means estimating tex2html_wrap_inline2860 .


next up previous contents
Next: Estimation of the Perspective Up: Gaze Estimation Previous: Motivation

Markus Weber
Tue Jan 7 15:44:13 PST 1997