There are several ways in which the performance and flexibility of the proposed system could be improved. Since it is quite natural for a speaker to tilt and/or rotate his or her head while speaking, the incorporation of full 3D pose estimation/compensation would be a welcome addition. It is also essential that wordbreaks be detected automatically rather than by hand prior to the recognition stage. Furthermore a general time-warping algorithm should be implemented to allow for more significant variations in talking speed.
In future work, we aim to incorporate auditory information into the proposed system which is certain to help with both segmentation issues and classification results overall. Another high priority for future work is automatic facial feature template extraction, based either on correlation with a set of averaged templates or on existing methods for the general task of face localization, e.g. [LBP94]. A real-time implementation of the proposed system is also under development at this time.