Model-Based Object Tracking in Road Traffic Scenes
Dieter Koller
=>

An output of the model-based tracking approach: The left figure shows the last
frame of a 2 seconds video sequence. The right figure shows the detected cars
(that appeared right at the beginning of the sequence) with their associated
tracks (You obtain full blown images, [222132 Bytes] and [3350 Bytes],
respecptively, upon selection).
Introduction
Image sequence analysis provides intermediate results for a conceptual
description of events in a scene. A system that establishes such
higher level descriptions based on tracking of moving objects in the
image domain has been described in [koller 91]. Here we introduce
three-dimensional models about the structure and the motion of the
moving objects as well as about the illumination of the scene in order
to verify the hypotheses for object candidates and to robustly extract
smooth trajectories of such objects.
In order to record and analyze non trivial events in road traffic
scenes we have to cope with the following dilemma: Either we must
fixate the camera on an interesting agent by applying gaze control so
that the agent remains in the field of view. Or we must use a
stationary camera with a field of view that is large enough to capture
significant actions of moving agents. The immediate shortcoming of the
passive approach which is pursued in this work is the small size and
the low resolution of the area covered by the projection of the moving
agent. Image domain cues like grayvalue edges and corners are short
and can be hardly detected. Additionally, in a road traffic scene, we
have to cope with a highly cluttered environment full of background
features as well as with occlusions and disocclusions. This renders
the task of figure-background discrimination extremely difficult. The
use of models representing our a priori knowledge appears necessary in
order to accomplish the hard task of detecting and tracking under real
world conditions.
Our Approach
Our approach consists of the following main steps (you can also get a
one page overview in a block diagram
[25KByte]):
- Motion segmentation:
The first step is a motion segmentation, which segments moving objects from
the stationary background. We apply a discrete feature-based approach to
compute displacement vectors between consecutive frames.
A cluster of coherently moving image features provides then the rough estimates
for moving regions in the image.
=>
=>
............. Image section ................. displacement vectors .....
........... vector cluster
- Model Hypothesis: The assumption that such a cluster is due to a
hypothetical object moving on a planar road in the scene yields a
rough estimate for the position of the hypothetical object in the
scene: the center of the group of the moving image features is
projected back into the scene, based on a calibration of the camera.
The assumption of a forward motion yields the orientation of the
principal axis of the model which is assumed to be parallel to the
motion.
Backprojection of the vector cluster and enclosing rectangle and overlayed to a
digitixed image of an official map.
- Generic polyhedral vehicle model:
We use a 3D generic vehicle model parameterized by 12 length parameters.
This enables the instantiation of different vehicles, for example cedan,
hatchback, station wagon, bus, or van from the same generic vehicle model.
The estimation of model shape parameters is possible by including them into the
state estimation process (see below).
- Object recognition and alignement:
Straight line edge segments extracted from the image are matched to
the 2D model edge segments - a view sketch - obtained by projecting
a 3D polyhedral model of the vehicle into the image plane,
using a hidden-line algorithm to determine their visibility.
The matching of image edge segments and model segments is based on the Mahalanobis
distance of line segment attributes.
The midpoint representation of line segments is
suitable for using different uncertainties parallel and perpendicular
to the line segments, which emerge in the edge detection process.
=> alignment =>

=> alignment =>
These figures shows the alignment results: the left coloumn the initial
model instantiation and the right column the optimal pose estimate. The
figures in the upper row exhibits the image edge segments (red), the
model instantiation (green dashed lines) and the matched image edge segments
(thick pink lines).
- Illumination model and shadows:
In order to avoid incorrect matches between
model segments and image edge segments which arise from shadows of the
vehicles, we enrich the applied a priori knowledge by including an
illumination model. This provides us with a geometrical description of
the shadows of the vehicles projected onto the street plane.
=> + shadow =>

Effect of including an illumination model - casting a shadow on the road - in
the pose estimation: left image without shadows, right image including shadow edges.
- Motion model:
We establish a motion model which
describes the dynamic vehicle motion in the absence of knowledge about the
intention of the driver.
In the stationary case, in which the steering angle remains constant,
the result is a simple circular motion with constant
magnitude of velocity and constant angular velocity around the normal of
a plane on which the motion is assumed to take place. The unknown intention
of the driver in maneuvering the car is captured by the introduction of
process noise.
- Kalman filtering:
The motion parameters for this motion model as well as the shape parameters
according to our generic polyhedral model are estimated using a recursive
maximum a posteriori estimator (MAP), which is implemented by an iterated version of
the Extended Kalman Filter (IEKF). We use furthermore the Levenberg-Marquardt
minimization method for minimizing the objective function in the MAP
estimator.
- Model interpretation loop:
The key feature of our approach is a model interpretation loop which copes
with the non-linear relation between the model features and the image features
(due to visibility and projective projection). A model interpretation is defined
as a set correspondences between model and image features. For this interpretation
(set) we compute an optimal pose and shape parameter set according to step 7,
back-project it again into the image and continue with step 4 until the process
converges towards an optimal estimate
(see the block diagram).
- Classification:
A classification is based on the assumption that differences between
class members can be considered as deformations of the shape of a stored prototype.
For that purpose we apply a Bayes classifier between a shape parameter instantiation
and the shape parameters of the 5 prototypes.
Related Publications:
-
Moving Object Recognition and Classification based on Recursive Shape
Parameter Estimation.
- D. Koller.
In Proc. 12th Israel Conference on Artificial Intelligence,
Computer Vision, pp. 359-368, Ramat Gan, Israel, December 27-28, 1993.
-
Model-Based Object Tracking in Monocular Image Sequences of Road Traffic
Scenes.
- D. Koller, K. Daniilidis, H.-H. Nagel.
International Journal of Computer Vision 10:3 (1993) 257--281.
-
Detektion, Verfolgung und Klassifikation bewegter Objekte in monokularen
Bildfolgen am Beispiel von Straßenverkehrsszenen.
- D. Koller ,
Dissertationen zur Künstlichen Intelligenz
DISKI 13 (in german), infix-Verlag, Sankt Augustin, 1992.
-
Algorithmic Characterization of Vehicle Trajectories from Image Sequences by
Motion Verbs.
- D. Koller, N. Heinze, and H.-H. Nagel.
In Proc IEEE Conf. Computer Vision and Pattern Recognition, pp. 90-95,
June 3-6, 1991.
Last modified on Tuesday, November 20, 1996,
Dieter Koller
(koller@vision.caltech.edu)