For the purpose of word recognition, each utterance is represented by three seperate waveforms, which describe the movement of the two lips and the mouth elongation over time. A simple time-warping step normalises the length of each waveform to 50 sample points. In this representation the waveforms are then treated as patterns (pattern-vectors of dimension 50), subject to our classification method developed in Section 3.6.
During the training phase we calculate for each class of waveforms a set of about 10 basis vectors, the covariance matrix reflecting the distribution of the patterns in the eigenspace, as well as the variance of the reconstruction error. All this is done separately for each class of words that is to be recognised.
In the recognition step, we calculate for each incoming utterance the distance of
its three waveforms to each of the correspoding word classes. As distance
measure we apply the expression for
obtained in
Section 3.6.