Abstract
We show on results taken from recent studies from our laboratory that conventional speech analysis techniques for ASR (such as Mel cepstrum or PLP) in combination with dynamic features (such as estimates of derivatives of cepstral feature trajectories) are sub-optimal and could be improved. The improvements can be derived by employing large labeled databases which allow for studying how is the linguistic information distributed in time and in frequency as well as for a design of discrimitative spectral basis and temporal RASTA filters.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
C. Avendano, S. van Vuuren and H. Hermansky. Data-based RASTA-like filter design for channel normalization in ASR. In ICSLP’96, volume 4, pages 2087–2090, Philadelphia, PA, USA, October 1996.
H. Hermansky. The modulation spectrum in automatic recognition of speech. In S. Furui, B.-H. Juang and W, Chou, editor, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, 1997.
H. Hermansky. Should recognizers have ears? In Tutorial and Research Workshop on Robust speech recognition for unknown communication channels, pages 1–10, Pont-a-Mousson, France, April 1997. ESCA-NATO.
H. Hermansky. Should recognizers have ears? Speech Communication, 25(1–3):3–27, 1998.
H. Hermansky, N. Malayath. Spectral basis functions from discriminant analysis. In ICSLP’98, Sydney, Australia, 1998.
H. Yang. Personal communications.
H. Yang, S. van Vuuren and H. Hermansky. Relevancy of time-frequency features for phonetic classi_cation of phonemes. In ICASSP’99, pages 225–229, Phoenix, Arizona, 1999.
J.B. Allen. How do humans process and recognize speech? IEEE Trans. on Speech and Audio Processing, 2:567–577, 1994.
S. van Vuuren and H. Hermansky. Data-driven design of RASTA-like filters. In Eurospeech’97, Rhodes, Greece, 1997. ESCA.
S. van Vuuren, T. Kamm, J. Luettin and H. Hermansky. Presentation of the 1997 summer workshop on innovative techniques for continuous speech asr. In available on the http://www.clsp.jhu.edu. Johns Hopkins University, August 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hermansky, H. (1999). Data-Driven Analysis of Speech. In: Matousek, V., Mautner, P., Ocelíková, J., Sojka, P. (eds) Text, Speech and Dialogue. TSD 1999. Lecture Notes in Computer Science(), vol 1692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48239-3_2
Download citation
DOI: https://doi.org/10.1007/3-540-48239-3_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66494-9
Online ISBN: 978-3-540-48239-0
eBook Packages: Springer Book Archive