In this section we describe the means by which the lips of the speaker are located on a frame by frame basis. The process described below involves an initialisation stage, whereby the facial feature templates are extracted, and a tracking stage, which makes use of OTC to determine the mouth location with respect to the eyes and nose.