Article,

Near-videorealistic synthetic talking faces: implementation and evaluation

B. Theobald, J. Bangham, I. Matthews, and G. Cawley.
Speech Communication, 44 (1–4): 127-140 (October 2004)Special Issue on Audio Visual speech processing.
DOI: 10.1016/j.specom.2004.07.002

Abstract

The application of two-dimensional (2D) shape and appearance models to the problem of creating realistic synthetic talking faces is presented. A sample-based approach is adopted, where the face of a talker articulating a series of phonetically balanced training sentences is mapped to a trajectory in a low-dimensional model-space that has been learnt from the training data. Segments extracted from this trajectory corresponding to the synthesis units (e.g. triphones) are temporally normalised, blended, concatenated and smoothed to form a new trajectory, which is mapped back to the image domain to provide a natural, realistic sequence corresponding to the desired (arbitrary) utterance. The system has undergone early subjective evaluation to determine the naturalness of this synthesis approach. Described are tests to determine the suitability of the parameter smoothing method used to remove discontinuities introduced during synthesis at the concatenation boundaries, and tests used to determine how well long term coarticulation effects are reproduced during synthesis using the adopted unit selection scheme. The system has been extended to animate the face of a 3D virtual character (avatar) and this is also described.

BibTeX key: Theobald2004
entry type: article
year: 2004
month: oct
journal: Speech Communication
number: 1–4
pages: 127-140
volume: 44
owner: schabus
file: :pdfs/theobald_specom_2004.pdf:PDF
issn: 0167-6393
DOI: 10.1016/j.specom.2004.07.002
note: Special Issue on Audio Visual speech processing

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@article{Theobald2004, abstract = {The application of two-dimensional (2D) shape and appearance models to the problem of creating realistic synthetic talking faces is presented. A sample-based approach is adopted, where the face of a talker articulating a series of phonetically balanced training sentences is mapped to a trajectory in a low-dimensional model-space that has been learnt from the training data. Segments extracted from this trajectory corresponding to the synthesis units (e.g. triphones) are temporally normalised, blended, concatenated and smoothed to form a new trajectory, which is mapped back to the image domain to provide a natural, realistic sequence corresponding to the desired (arbitrary) utterance. The system has undergone early subjective evaluation to determine the naturalness of this synthesis approach. Described are tests to determine the suitability of the parameter smoothing method used to remove discontinuities introduced during synthesis at the concatenation boundaries, and tests used to determine how well long term coarticulation effects are reproduced during synthesis using the adopted unit selection scheme. The system has been extended to animate the face of a 3D virtual character (avatar) and this is also described.}, added-at = {2023-12-13T01:25:16.000+0100}, author = {Theobald, Barry-John and Bangham, J. Andrew and Matthews, Ian A. and Cawley, Gavin C.}, biburl = {https://www.bibsonomy.org/bibtex/2721a8c0e7ca347ee0e7118e71ede352f/admin}, doi = {10.1016/j.specom.2004.07.002}, file = {:pdfs/theobald_specom_2004.pdf:PDF}, interhash = {715292c035353c1709341198af8fdbcd}, intrahash = {721a8c0e7ca347ee0e7118e71ede352f}, issn = {0167-6393}, journal = {Speech Communication }, keywords = {}, month = oct, note = {Special Issue on Audio Visual speech processing }, number = {1–4}, owner = {schabus}, pages = {127-140}, timestamp = {2023-12-13T01:25:16.000+0100}, title = {Near-videorealistic synthetic talking faces: implementation and evaluation }, volume = 44, year = 2004 }

BibSonomy

Near-videorealistic synthetic talking faces: implementation and evaluation

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on