We have presented a system which can recognize a small set of spoken words based solely on lip movement. On the preprocessing end of the system, we have demonstrated a technique for localizing the lips of the speaker with high accuracy on low-quality image sequences. To this end, we have made extensive use of local orientation information as a means of achieving robustness to illumination changes and noise. We have proposed a classification technique based on waveforms describing the vertical and horizontal lip movement which extends upon the classical principal components analysis techniques found in the literature. Very promising speaker-dependent classification results have been presented for a small set of spoken English digits, whereas the performance for speaker independent recognition is not satisfactory yet.
Future work includes the incorporation of auditory information, larger databases of words, speaker independent classification, and real-time implementation.
Since we are able to locate not only the mouth but any facial feature with high accuracy, and since our flow based approach is completely general with respect to the object that is to be tracked, our system can easily be extended to recognise any kind of facial expression.