next up previous contents
Next: Localisation of the Mouth Up: A System for Recognising Previous: The Probabilistic Eigenspace Approach

Discussion of the Proposed System

We will now discuss the system we have developed to accomplish the task of lipreading for a set of spoken digits. A block diagram of the system to be described below is shown in Figure 8.

   figure1201
Figure 8: System block diagram.

A typical input image from a monitor-mounted Sun colour video camera is shown in Figure 10.

   figure1219
Figure 9: The setup of our system.

The resolution of the image is 240 rows by 320 columns, and our frame rate is 30Hz. Our testing sequences were stored using the 8-bit colour MPEG-1 standard on a Sparc 20. Our compression quality was such that 1000 frames took up approximately 8-9 MB of hard disk space. During processing, we discard any colour information, which accounts for 4 of the 8 bits per pixel. Note that these operating conditions are far from ideal, and in fact typical of many potential practical applications, such as computer-assisted teleconferencing.

We assume that at the beginning of the sequence, the head of the speaker is at rest and facing the camera, and that subsequent motion of the head will only side-to-side and/or up-and-down. Although there is no strict scale requirement, the inter-pupil eye separation for our test subjects is approximately 40-60 pixels.

   figure1226
Figure 10: Input frame from typical sequence.





Markus Weber
Tue Jan 7 15:44:13 PST 1997