research-article

Web resources for language modeling in conversational speech recognition

Authors:

Mari Ostendorf,

Andreas Stolcke,

Özgür ÇetinAuthors Info & Claims

ACM Transactions on Speech and Language Processing (TSLP), Volume 5, Issue 1

Article No.: 1, Pages 1 - 25

https://doi.org/10.1145/1322391.1322392

Published: 12 December 2007 Publication History

Abstract

This article describes a methodology for collecting text from the Web to match a target sublanguage both in style (register) and topic. Unlike other work that estimates n-gram statistics from page counts, the approach here is to select and filter documents, which provides more control over the type of material contributing to the n-gram counts. The data can be used in a variety of ways; here, the different sources are combined in two types of mixture models. Focusing on conversational speech where data collection can be quite costly, experiments demonstrate the positive impact of Web collections on several tasks with varying amounts of data, including Mandarin and English telephone conversations and English meetings and lectures.

References

[1]

Akbacak, M., Gao, Y., Gu, L., and Kuo, H.-K. 2005. Rapid transition to new spoken dialog domains: Language model training using knowledge from previous domain applications and web text resources. In Proceedings of Interspeech. 1873--1876.

[2]

Banko, M. and Brill, E. 2003. Mitigating the paucity-of-data problem: Exploring the effect of training corpus size on classifier performance for natural language processing. In Proceedings of the Conference on Human Language Technology. 253--257.

Digital Library

[3]

Bellegarda, J. 1998. Exploiting both local and global constraints for multispan statistical language modeling. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. II, 677--680.

[4]

Berger, A. and Miller, R. 1998. Just-in-time language modeling. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. II, 705--708.

[5]

Bessling, S. and Meier, H. 1995. Language model speaker adaptation. In Proceedings of the Eurospeech. 1755--1758.

[6]

Biber, D. 1988. Variation Across Speech and Writing. Cambridge University Press.

[7]

Biber, D. 1993. Using register-diversified corpora for general language studies. Computat. Linguis. 19, 2, 219--242.

Digital Library

[8]

Boulis, C. 2005. Topic learning in text and conversational speech. Ph.D. thesis, University of Washington.

Digital Library

[9]

Bulyko, I., Ostendorf, M., and Stolcke, A. 2003. Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures. In Proceedings of the HLT/NAACL. 7--9.

Digital Library

[10]

Çetin, O. and Stolcke, A. 2005. Language modeling in the ICSI-SRI Spring 2005 Meeting speech recognition evaluation system. Tech. rep. tr-05-06, International Computer Science Institute.

[11]

Chen, S. and Goodman, J. 1999. An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13, 4, 359--394.

Digital Library

[12]

Cieri, C., Miller, D., and Walker, K. 2003. From Switchboard to Fisher: Telephone collection protocols, their uses and yields. In Proceedings of Eurospeech. 1597--1600.

[13]

Clarkson, P. and Robinson, A. 1997. Language model adaptation using mixtures and an exponentially decaying cache. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. II, 799--802.

Digital Library

[14]

Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the em algorithm. J. Royal Statis. Soc. Series B 39, 1, 1--38.

[15]

Duh, K. and Kirchhoff, K. 2005. Pos tagging of dialectal Arabic: A minimally supervised approach. In Proceedings of the Association for Computational Linguistics (ACL).

[16]

Evermann, G., Chan, H., Gales, M., Hain, T., Liu, X., Mrva, D., Wang, L., and Woodland, P. 2004a. Development of the 2003 CU-HTK conversational telephone speech transcription system. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. 1, 249--252.

[17]

Evermann, G., Chan, H., Gales, M., Jia, B., Liu, X., Mrva, D., Sim, K., Wang, L., Woodland, P., and Yu, K. 2004b. Development of the 2004 CU-HTK English CTS system using more than 2000 hours of data. In Proceedings of the NIST RT-04F Rich Transcription Workshop.

[18]

Gao, Y., Gu, L., and Kuo, H.-K. 2005. Portability challenges in developing interactive dialogue systems. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. V, 1017--1020.

[19]

Gildea, D. 2001. Corpus variation and parser performance. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. L. Lee and D. Harman, Eds. 167--202.

[20]

Godfrey, J., Holliman, E., and McDaniel, J. 1992. Switchboard: Telephone speech corpus for research and development. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. I, 517--520.

[21]

Goodman, J. 2001. A bit of progress in language modeling. Comput. Speech Lang. 15, 4, 403--434.

Digital Library

[22]

Hain, T., Burget, L., Dines, J., McCowan, I., Karafiat, M., Lincoln, M., Moore, D., Garau, G., Wan, V., Ordelman, R., and Renals, S. 2005. The development of the AMI system for the transcription of speech in meetings. In Proceedings of the Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms.

Digital Library

[23]

Hwang, M., Lei, X., Ng, T., Ostendorf, M., Stolcke, A., Wang, W., Zheng, J., and Gadde, V. 2004. Porting Decipher from English to Mandarin. In Proceedings of the NIST RT-04F Rich Transcription Workshop.

[24]

Hwang, M.-Y. et al. 1996. Predicting unseen triphones with senones. IEEE Trans. Speech Audio Process. 4. 412--419.

[25]

Iyer, R. and Ostendorf, M. 1996. Modeling long range dependencies in languages. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). 236--239.

[26]

Iyer, R. and Ostendorf, M. 1997. Transforming out-of-domain estimates to improve in-domain language models. In Proceedings of Eurospeech. vol. 4, 1975--1978.

[27]

Iyer, R. and Ostendorf, M. 1999. Relevance weighting for combining multi-domain data for n-gram language modeling. Comput. Speech Lang. 13, 3, 267--282.

Digital Library

[28]

Iyer, R., Ostendorf, M., and Meteer, M. 1997. Analyzing and predicting language model improvements. In IEEE Workshop on Speech Recognition and Understanding Proceedings. 254--261.

[29]

Keller, F. and Lapata, M. 2003. Using the web to obtain frequencies for unseen bigrams. Comput. Linguist. 29, 3, 459--484.

Digital Library

[30]

Kilgarriff, A. and Grefenstette, G. 2003. Introduction to the special issue on the web as a corpus. Computat. Linguist. 29, 3, 333--348.

Digital Library

[31]

Klakow, D. 2000. Selecting articles from the language model training corpus. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. III, 1695--1698.

[32]

Lamel, L., Adda, G., Bilinski, E., and Gauvain, J. L. 2005. Transcribing lectures and seminars. In Proceedings of Interspeech. 1657--1660.

[33]

Lapata, M. and Keller, F. 2005. Unsupervised web-based models for natural language processing. ACM Trans. Speech Lang. Process. 1, 2, 1--31.

Digital Library

[34]

Lee, Y.-B. and Myaeng, S. 2002. Text genre classification with genre-revealing and subject-revealing features. In Proceedings of SIGIR. 145--150.

Digital Library

[35]

Liu, F.-H., Picheny, M., Srinivasa, P., Mankowski, M., and Chen, J. 1996. Speech recognition on Mandarin CallHome: A large-vocabulary conversational and telephone speech corpus. In Proceedings of the International Conference on Acovstics, Speech and Signal Processing (ICASSP). vol. I, 157--160.

Digital Library

[36]

Mahajan, M., Beeferman, D., and Huang, D. 1999. Improved topic-dependent language modeling using information retrieval techniques. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol., I, 541--544.

Digital Library

[37]

Martin, S., Liermann, J., and Ney, H. 1997. Adaptive topic-dependent language modeling using word-based varigrams. In Proceedings of Eurospeech. vol. 3. 3, 1447--1450.

[38]

Morgan, N., Baron, D., Bhagat, S., Carvey, H., Dhillon, R., Edwards, J., Gelbart, D., Janin, A., Krupski, A., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., and Wooters, C. 2003. Meetings about meetings: Research at ICSI on speech in multiparty conversations. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vol. 4, 740--743.

[39]

Ng, T., Ostendorf, M., Hwang, M.-Y., Siu, M., Bulyko, I., and Lei, X. 2005. Web-data augmented language models for Mandarin conversational speech recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). 89--593.

[40]

Ratnaparkhi, A. 1996. A maximum entropy part-of-speech tagger. In Proceedings of Empirical Methods in Natural Language Processing Conference. 133--141.

[41]

Ries, K. 1997. A class based approach to domain adaptation and constraint integration for empirical m-gram models. In Proceedings of Eurospeech. 4, 1983--1986.

[42]

Rudnicky, A. 1995. Language modeling with limited domain data. In Proceedings of ARPA Spoken Language Technology Workshop. 66--69.

[43]

Sarikaya, R., Gravano, A., and Gao, Y. 2005. Rapid language model development using external resources for new spoken dialog domains. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vol. I, 573--576.

[44]

Scheytt, P., Geutner, P., and Waibel, A. 1998. Serbo-Croatian LVCSR on the dictation and broadcast news domain. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. 2, 897--900.

[45]

Schwarm, S., Bulyko, I., and Ostendorf, M. 2004. Adaptive language modeling with varied sources to cover new vocabulary items. IEEE Trans. Speech Audio 12, 3, 334--342.

[46]

Sethy, A., Georgiou, P., and Narayanan, S. 2005. Building topic-specific language models from webdata using competitive models. In Proceedings of Interspeech. 1293--1296.

[47]

Sproat, R., Black, A., Chen, S., Kumar, S., Ostendorf, M., and Richards, C. 2001. Normalization of non-standard words. Comput. Speech Lang. 15, 3, 287--333.

Digital Library

[48]

Stolcke, A. 1998. Entropy-based pruning of backoff language models. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop. 270--274.

[49]

Stolcke, A. 2002. SRILM -- an extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). 901--904.

[50]

Stolcke, A., Anguera, X., Boakye, K., Janin, A., Mandal, A., Peskin, B., Wooters, C., and Zheng, J. 2005. Further progress in meeting recognition: The ICSI-SRI spring 2005 speech-to-text evaluation system. In Proceedings of NIST MLMI Meeting Recognition Workshop.

Digital Library

[51]

Stolcke, A. et al. 2003. Speech-to-text research at SRI-ICSI-UW. NIST RT-03 Workshop.

[52]

Venkataraman, A. and Wang, W. 2003. Techniques for effective vocabulary selection. In Proceedings of Eurospeech. 245--248.

[53]

Wang, W., Stolcke, A., and Harper, M. 2004. The use of a linguistically motivated language model in conversational speech recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. I, 261--264.

[54]

Woodland, P. C. and Young, S. J. 1993. The HTK tied-state continuous speech recogniser. In Proceedings of Eurospeech. vol. 3, 2207--2210.

[55]

Xu, P. and Mangu, L. 2005. Using random forest language models in the IBM RT-04 CTS system. In Proceedings of Interspeech. 741--744.

[56]

Yang, Y. and Pedersen, J. 1997. A comparative study on feature selection in text categorization. In Proceedings of the International Conference on Machine Learning. 412--420.

Digital Library

[57]

Zhu, Q., Stolcke, A., Chen, B., and Morgan, N. 2005. Using mlp features in SRI's conversational speech recognition system. In Proceedings of Interspeech. 2141--2144.

[58]

Zhu, X. and Rosenfeld, R. 2001. Improving trigram language modeling with the World Wide Web. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). I:533--536.

Cited By

Murthy SSitaram D(2024)Initial decoding with minimally augmented language model for improved lattice rescoring in low resource ASRSādhanā10.1007/s12046-024-02520-049:2Online publication date: 21-May-2024
https://doi.org/10.1007/s12046-024-02520-0
Wotherspoon SHartmann WSnover MKimball O(2021)Improved Data Selection for Domain Adaptation in ASRICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP39728.2021.9413869(7018-7022)Online publication date: 6-Jun-2021
https://doi.org/10.1109/ICASSP39728.2021.9413869
Vertanen KKristensson P(2019)Mining, analyzing, and modeling text written on mobile devicesNatural Language Engineering10.1017/S135132491900054827:1(1-33)Online publication date: 10-Oct-2019
https://doi.org/10.1017/S1351324919000548
Show More Cited By

Index Terms

Web resources for language modeling in conversational speech recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Speech recognition

Recommendations

Large vocabulary Russian speech recognition using syntactico-statistical language modeling

Speech is the most natural way of human communication and in order to achieve convenient and efficient human-computer interaction implementation of state-of-the-art spoken language technology is necessary. Research in this area has been traditionally ...
A corpus of read and conversational Austrian German

First large scale speech database for Austrian German.It contains read and conversational speech of 38 speakers.Annotations at the orthographic, segmental and prosodic level.Our analysis demonstrates the highly casual speaking style. This paper presents ...
Improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling
Abstract
This article presents the research work on improving speech recognition systems for the morphologically complex Malayalam language using subword tokens for language modeling. The speech recognition system is built using a deep neural network–...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Speech and Language Processing

ACM Transactions on Speech and Language Processing Volume 5, Issue 1

December 2007

80 pages

ISSN:1550-4875

EISSN:1550-4883

DOI:10.1145/1322391

Issue’s Table of Contents

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 December 2007

Accepted: 01 August 2007

Received: 01 November 2005

Published in TSLP Volume 5, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
1,282
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Murthy SSitaram D(2024)Initial decoding with minimally augmented language model for improved lattice rescoring in low resource ASRSādhanā10.1007/s12046-024-02520-049:2Online publication date: 21-May-2024
https://doi.org/10.1007/s12046-024-02520-0
Wotherspoon SHartmann WSnover MKimball O(2021)Improved Data Selection for Domain Adaptation in ASRICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP39728.2021.9413869(7018-7022)Online publication date: 6-Jun-2021
https://doi.org/10.1109/ICASSP39728.2021.9413869
Vertanen KKristensson P(2019)Mining, analyzing, and modeling text written on mobile devicesNatural Language Engineering10.1017/S135132491900054827:1(1-33)Online publication date: 10-Oct-2019
https://doi.org/10.1017/S1351324919000548
Meszaros EChandarana MTrujillo AAllen B(2017)Compensating for Limitations in Speech-Based Natural Language Processing with Multimodal Interfaces in UAV OperationAdvances in Human Factors in Robots and Unmanned Systems10.1007/978-3-319-60384-1_18(183-194)Online publication date: 21-Jun-2017
https://doi.org/10.1007/978-3-319-60384-1_18
MASUMURA RASAMI TOBA TMASATAKI HSAKAUCHI SITO A(2016)Investigation of Combining Various Major Language Model Technologies including Data Expansion and AdaptationIEICE Transactions on Information and Systems10.1587/transinf.2016SLP0013E99.D:10(2452-2461)Online publication date: 2016
https://doi.org/10.1587/transinf.2016SLP0013
Smirnov VIgnatov DGusev MFarkhadov MRumyantseva NFarkhadova M(2016)A Russian Keyword Spotting System Based on Large Vocabulary Continuous Speech Recognition and Linguistic KnowledgeJournal of Electrical and Computer Engineering10.1155/2016/40627862016(4)Online publication date: 1-Nov-2016
https://dl.acm.org/doi/10.1155/2016/4062786
Metze FGandhe AMiao YSheikh ZWang YXu DZhang HKim JLane ILee WStuker SMuller M(2015)Semi-supervised training in low-resource ASR and KWS2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2015.7178862(4699-4703)Online publication date: Apr-2015
https://doi.org/10.1109/ICASSP.2015.7178862
Mohamed ASeide FYu DDroppo JStoicke AZweig GPenn G(2015)Deep bi-directional recurrent networks over spectral windows2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)10.1109/ASRU.2015.7404777(78-83)Online publication date: Dec-2015
https://doi.org/10.1109/ASRU.2015.7404777
Duta N(2014)Natural Language Understanding and PredictionFundamenta Informaticae10.5555/2608491.2608501131:3-4(425-440)Online publication date: 1-Jul-2014
https://dl.acm.org/doi/10.5555/2608491.2608501
Švec JLehečka JIrcing PSkorkovská LPražák AVavruška JStanislav PHoidekr J(2014)General framework for mining, processing and storing large amounts of electronic texts for language modeling purposesLanguage Resources and Evaluation10.1007/s10579-013-9246-z48:2(227-248)Online publication date: 1-Jun-2014
https://dl.acm.org/doi/10.1007/s10579-013-9246-z
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents