@m-toman

Audio-visual unit selection for the synthesis of photo-realistic talking-heads

, , and . IEEE International Conference on Multimedia and Expo (ICME), page 619-622. New York, NY, USA, (August 2000)
DOI: 10.1109/ICME.2000.871439

Abstract

This paper investigates audio-visual unit selection for the synthesis of photo-realistic, speech-synchronized talking-head animations. These animations are synthesized from recorded video samples of a subject speaking in front of a camera, resulting in a photo-realistic appearance. The lip-synchronization is obtained by optimally selecting and concatenating variable-length video units of the mouth area. Synthesizing a new speech animation from these recorded units starts with audio speech and its phonetic annotation from a text-to-speech synthesizer. Then, optimal image units are selected from the recorded set using a Viterbi search through a…(more)

Links and resources

Tags

community