Abstract
In this paper, the performance of the pitch detection algorithm in ETSI ES-202-212 XAFE standard is evaluated on a Mandarin digit string recognition task. Experimental results showed that the performance of the pitch detection algorithm degraded seriously when the SNR of speech signal was lower than 10dB. This makes the recognizer using pitch information perform inferior to the original recognizer without using pitch information in low SNR environments. A modification of the pitch detection algorithm is therefore proposed to improve the performance of pitch detection in low SNR environments. The recognition performance can be improved for most SNR levels by integrating the recognizers with and without using pitch information. Overall recognition rates of 82.1% and 86.8% were achieved for clean and multi-condition training cases.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Speech Processing, Transmission and Quality Aspects (STQ); Distributed speech recognition; Extended advanced front-end feature extraction algorithm; Compression algorithms; Back-end reconstruction algorithm, ETSI Standard ES 202 212 (November 2003)
DSR Front-end Extension for Tonal-language Recognition and Speech Reconstruction. Aurora Group Meeting, by IBM & Motorola (April 2003), http://portal.etsi.org/stq/DSR_Presentations/Presentation.pps
Lin, W.-y., Lee, L.-S.: Improved Tone Recognition for Fluent Mandarin Speech Based on New Inter-Syllabic Features and Robust Pitch Extraction. In: IEEE 8th Automatic Speech Recognition and Understanding Workshop, St. Thomas, US Virgin Islands, USA, December 2003, pp. 237–242 (2003)
AURORA Database, http://www.elda.org/article20.html
Test and Processing plan for default codec evaluation for speech enabled services (SES), Tdoc S4-030395, 3GPP TSG SA4 meeting #26, Paris, France (May 5-9, 2003)
Lyu, D.-C., Liang, M.-S., Chiang, Y.-C., Hsu, C.-N., Lyu, R.-Y.: Large Vocabulary Taiwanese (Min-nan) Speech Recognition Using Tone Features and Statistical Pronunciation Modeling. In: Eurospeech 2003, Geneva, pp. 1861–1864 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, YR., Lu, BX., Liao, YF., Chen, SH. (2006). Distributed Speech Recognition of Mandarin Digits String. In: Huo, Q., Ma, B., Chng, ES., Li, H. (eds) Chinese Spoken Language Processing. ISCSLP 2006. Lecture Notes in Computer Science(), vol 4274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11939993_40
Download citation
DOI: https://doi.org/10.1007/11939993_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49665-6
Online ISBN: 978-3-540-49666-3
eBook Packages: Computer ScienceComputer Science (R0)