next up previous contents
Next: Discussion of Related Work Up: A System for Recognising Previous: Contents

Introduction

Lipreading is a communication skill which involves the interpretation of lip movements and, more generally, of facial expressions as a substitute or aid for speech understanding. The extent to which an individual uses lipreading depends largely on the hearing ability of that individual and/or the noise level of the environment. For the hearing impaired, lipreading often serves as a valuable form of communication when sign language is not an option.

In computer-based systems for speech understanding, the ability to accurately track lip movements offers a great deal of promise both as a standalone means of input and as a complement to audio-based recognition systems. It is well-known that visual information about the lips can resolve a number of ambiguities between highly similar vocal sounds, e.g. tex2html_wrap_inline2458 m tex2html_wrap_inline2458 and tex2html_wrap_inline2458 ng tex2html_wrap_inline2458 . Conversely, many ambiguities exist in the visual lip information which can in principle be resolved by the audio signal, e.g. tex2html_wrap_inline2458 b tex2html_wrap_inline2458 , tex2html_wrap_inline2458 p tex2html_wrap_inline2458 and tex2html_wrap_inline2458 m tex2html_wrap_inline2458 . It stands to reason, therefore, that the use of auditory and visual information in tandem could very well be the recipe for highly successful speech recognition (or ``speechreading'').

In this work, we focus specifically on the use of visual information of the lips for recognition over a small vocabulary of spoken words. Our techniques are efficient and offer certain advantages over many of the known existing lipreading systems reported in the literature. Moreover, we demonstrate our system for image sequences of comparatively poor quality to emphasize the practical nature of our proposed techniques.

The structure of this paper is as follows. We begin with a detailed discussion of related work involving lipreading, speechreading, facial expression recognition, etc. We then provide an informative summary of the basic theoretical and experimental tools that are used in both the related work and our own work. Next, we present the motivation, theory and application of our proposed method, after which a number of possible extensions and known pitfalls are discussed. Finally, we conclude with a summary of our important findings.


next up previous contents
Next: Discussion of Related Work Up: A System for Recognising Previous: Contents

Markus Weber
Tue Jan 7 15:44:13 PST 1997