Our method of tracking the vertical lip motion is based on the 1D optical flow along a vertical intensity profile through the center of the lip images. For robustness to noise, we average over five adjacent vertical profiles through the center of each lip image, with averaging weights of 1,4,6,4,1. An example of an extracted lip profile vs. time is shown in Figure 12, where time runs along the horizontal axis. The wavy dark trail which runs approximately through the middle from left to right corresponds to the opening between the upper and lower lip.
Figure 12: Example of extracted profile over 200 frames.
Before calculating the optical flow, the profile image is blurred with a gauss3 kernel, thereby we obtain a certain degree of spatial as well as temporal smoothing. We then apply Equation (15) on a two frame basis. In order to obtain more reliable flow estimates we calculate the partial derivatives of the image intensity as average values in small 5-pixel windows. Furthermore we iterate the equation up to 5 times which gives us an accuracy of more than 0.1 pixels.
In Figure 13 we show an example of optical flow calculated for a small number of vertical profiles (200 frames). In this figure the directions of the flow vectors are represented by curve segments.
Figure 13: An example of optical flow calculated over a sequence of 200 frames.
The next step consists in concentrating the optical flow information so that only two main movements - one for the upper and one for the lower lip - have to be considered in the recognition phase. We found out, that the darkest point of a profile tends to correspond to a position between the lips. Therefore we can assume that the part above the minimum of each profile belongs to the upper lip and the lower part belongs to the lower lip. This allows us to place two windows on the profile in which we integrate the flow data in order to obtain the desired waveforms that accurately represent the lip motion.