next up previous contents
Next: Relevant Processing Techniques Up: Discussion of Related Work Previous: 2D Optical Flow

Deformable Templates & Active Contours

One recent work involving lip tracking by Hennecke et al.\ [HPS94] employs a contour finding scheme known as a deformable template. As introduced by Yuille et al. [YCH89], a deformable template is a parametrized mathematical model used to track the movements of a given object. Specifically, Hennecke et al. make use of a piecewise parabolic/quartic template which seeks to lock on to the upper and lower edges of each lip. In a manner similar to that of snakes, the deformable lip template adjusts its shape according to the value of a number of integrals along the relevant contours. In addition, the authors make use of several configurational and temporal penalty terms, which keep erroneous template deviations under control.

Other works that have dealt with lip tracking based on contour detection include that of Matsuoka et al. [MFK86], Tamura et al.\ [ea89], and Kass et al. [KWT88]. As noted in a number of works, e.g. [MP91] and [HPS94], robustly localizing a contour model can often be quite difficult when the intensity changes at the lip edges are very gradual.gif Moreover, the appearance of the teeth and tongue can present a threat to the correctness of the template placement. Despite these potential difficulties, Hennecke et al. acheived encouraging tracking results for a set of 10-12 individuals.

Another system which incorporates auditory information along with visual lip information is that of Wolff et al. [WPSH94]. In this work, the authors make use of one vertical and one horizontal intensity profile through the center of the mouth. The region of interest around the mouth through which the profiles are extracted is located using a succession of efficient filtering and thresholding steps. By means of analysing the motion of peaks and valleys in the extracted profiles, the authors obtain a number of phonologically relevant discriptors which are then fed to a time delay neural network architecture. Results are provided which indicate a notable increase in performance relative to speech-only systems.

One of the earlier works in computer-aided lipreading is that of Petajan et al. [PBB tex2html_wrap_inline2456 88]. In this work, the lips of the speaker are tracked indirectly by tracking the nostrils. During training, the mouth images are binarised and put together to form a large codebook. Then, during testing, a vector quantisation algorithm is used to associate the appearance of the mouth in a given frame of a sequence with the closest codeword in the codebook. The authors also experimented with a more direct minimum image distance method which did not employ vector quantisation, and the results were actually better. Experimental evidence is also provided indicating the usefulness of visual lip information as an aid in speech recognition.

While not specifically addressing the task of lipreading per se, the work of Chen et al. presents an interesting application of lip tracking for improved low bit-rate video transmission of talking heads. In this system, which makes use of colour information in the video signal, the lips are located via nostril tracking (as in [PBB tex2html_wrap_inline2456 88]). The audio signal is then used to render a set of synthetic lips on the image of the speaker's face. With the use of some temporal smoothing, the authors note a perceptual improvement in transmission quality of talking head sequences which employ the synthetic lips.


next up previous contents
Next: Relevant Processing Techniques Up: Discussion of Related Work Previous: 2D Optical Flow

Markus Weber
Tue Jan 7 15:44:13 PST 1997