|
Hershey, John Audio-vision: seeing sounds Evidence from psychophysical experiments shows that localization of sound sources is strongly influenced by their synchrony with visual signals. This result, known as the ventriloquism effect, is at work when sound coming from the side of your TV feels as if it were coming from the mouth of the actors. The ventriloquism effect suggests that there is important information about sound location to be found in natural audio-visual (AV) sources, and that this information is encoded in the synchrony between the audio and video signals. Based on this idea we demonstrate a novel technique to help track areas of an image that render a sound source (e.g., a person talking). The system combines information from one audio channel and one video channel, looking for regions of the visual landscape that correlate highly with the acoustic signal. The output is a time varying AV correlation map that can be used to help determine the location of acoustic sources. We show a practical example in which audio visual correlation helps improve the performance of face tracking systems that use video information alone (e.g., systems that track track faces based on color information or template matching). Movies will be shown displaying AV correlation maps over time as well as potential applications of the approach.
|