Recognising Spoken Words
The task of automatic lipreading requires two major preprocessing steps: localisation of the speaker's mouth and tracking of the salient lip movements. For the first step, we employ a technique which we have termed Orientation Template Correlation (OTC) which searches for facial features based on their characteristic orientation maps. In the spirit of Mase and Pentland [MP91], we have chosen to focus on the estimation of two measures in particular: vertical lip motion and mouth elongation. For the purposes of estimating the vertical lip motion, we calculate the 1D optical flow along a vertical intensity profile through the middle of the mouth. The mouth elongation, meanwhile, is estimated by means of a novel gradient-based filtering technique. The descriptors extracted in these two steps are then encoded as 1D waveforms and sent to a classification stage based on principle components analysis. The performance of the proposed system is demonstrated for the case of word recognition over a small database of spoken digits.