Research Article
Published: 28 November 2002

Automatic Speechreading with Applications to Human-Computer Interfaces

Xiaozheng Zhang¹,
Charles C. Broun²,
Russell M. Mersereau¹ &
…
Mark A. Clements¹

EURASIP Journal on Advances in Signal Processing volume 2002, Article number: 240192 (2002) Cite this article

1606 Accesses
31 Citations
Metrics details

Abstract

There has been growing interest in introducing speech as a new modality into the human-computer interface (HCI). Motivated by the multimodal nature of speech, the visual component is considered to yield information that is not always present in the acoustic signal and enables improved system performance over acoustic-only methods, especially in noisy environments. In this paper, we investigate the usefulness of visual speech information in HCI related applications. We first introduce a new algorithm for automatically locating the mouth region by using color and motion information and segmenting the lip region by making use of both color and edge information based on Markov random fields. We then derive a relevant set of visual speech parameters and incorporate them into a recognition engine. We present various visual feature performance comparisons to explore their impact on the recognition accuracy, including the lip inner contour and the visibility of the tongue and teeth. By using a common visual feature set, we demonstrate two applications that exploit speechreading in a joint audio-visual speech signal processing task: speech recognition and speaker verification. The experimental results based on two databases demonstrate that the visual information is highly effective for improving recognition performance over a variety of acoustic noise levels.

Author information

Authors and Affiliations

Center for Signal and Image Processing, Georgia Institute of Technology, Atlanta, GA, 30332-0250, USA
Xiaozheng Zhang, Russell M. Mersereau & Mark A. Clements
Motorola Human Interface Lab, Tempe, AZ, 85284, USA
Charles C. Broun

Authors

Xiaozheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Charles C. Broun
View author publications
You can also search for this author in PubMed Google Scholar
Russell M. Mersereau
View author publications
You can also search for this author in PubMed Google Scholar
Mark A. Clements
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaozheng Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, X., Broun, C.C., Mersereau, R.M. et al. Automatic Speechreading with Applications to Human-Computer Interfaces. EURASIP J. Adv. Signal Process. 2002, 240192 (2002). https://doi.org/10.1155/S1110865702206137

Download citation

Received: 30 October 2001
Revised: 31 July 2002
Published: 28 November 2002
DOI: https://doi.org/10.1155/S1110865702206137