The lipreading system proposed by Mase and Pentland [MP91] involves the calculation of two-dimensional optical flow around the mouth of the speaker. After this first step, which is performed using the method of Horn and Schunck, the flow vectors are integrated within a set of boxes around the lips with the goal of estimating various vectorial components of the lip movement on a frame-by-frame basis. By means of principle components analysis, it was determined that the measures of vertical lip separation O(t) and mouth elongation E(t) capture most of the information present in the integrated flow. In the recognition procedure, these two waveforms are first run through several postprocessing steps (including temporal warping) and then compared to a set of pre-calculated word templates. The reported results, although preliminary in nature, are encouraging. One caveat the authors mention is that the determination of the word beginnings, which is critical for a succesful match, can be quite difficult to estimate with accuracy.
Another system which makes use of 2D optical flow is that of Martin and Shah [MS92]. In this work, the authors perform word matching by means of correlation with precomputed sequences of optical flow fields, each of which is associated with a particular spoken word. Unfortunately, the computationally intensive nature of this algorithm makes it impractical for use with a database of more than a few words.