The methods of Kirby et al. [KWD93] and Bregler & Konig [BK94] each feature the use of Principle Components Analysis (PCA), or Karhunen-Loève expansion, for the task of lipreading. In these techniques, the first step is to create a finite set of orthogonal images which constitute, up to a certain accuracy, a subspace for the representation of all likely lip configurations. These images are referred to as the ``eigenlips,'' a term inspired by the ``eigenface'' methods of Turk and Pentland [TP91]. Once the set of eigenlips has been created, images of subsequent lip configurations can in principle be represented quite compactly in terms of their projections onto the set of eigenlips. This concept is analogous to the representation of a periodic waveform as a sum of sinusoids weighted by Fourier series coefficients.
In both of the above works, the moving sequence of mouth images is represented in terms of a temporal sequence of projections onto the set of eigenlips. Bregler and Konig employ 10 eigenlips while Kirby et al. make use of 20. In the latter scheme, the authors explore a potential recognition technique for the extracted waveforms based on temporal eigenfunctions. While no results are cited, the approach appears to be promising and extensible. In the former scheme, a number of supporting descriptors are used to help in the recognition procedure, the most notable of which is auditory speech information. In this regard, the visual lip information is used to resolve ambiguities in the audio waveform. For recognition purposes, Bregler and Konig employ a hybrid connectionist architecture which yields a significant decrease in classification error relative to a comparable, purely audio-based speech recognition system.