Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Web resources for language modeling in conversational speech recognition

Published: 12 December 2007 Publication History

Abstract

This article describes a methodology for collecting text from the Web to match a target sublanguage both in style (register) and topic. Unlike other work that estimates n-gram statistics from page counts, the approach here is to select and filter documents, which provides more control over the type of material contributing to the n-gram counts. The data can be used in a variety of ways; here, the different sources are combined in two types of mixture models. Focusing on conversational speech where data collection can be quite costly, experiments demonstrate the positive impact of Web collections on several tasks with varying amounts of data, including Mandarin and English telephone conversations and English meetings and lectures.

References

[1]
Akbacak, M., Gao, Y., Gu, L., and Kuo, H.-K. 2005. Rapid transition to new spoken dialog domains: Language model training using knowledge from previous domain applications and web text resources. In Proceedings of Interspeech. 1873--1876.
[2]
Banko, M. and Brill, E. 2003. Mitigating the paucity-of-data problem: Exploring the effect of training corpus size on classifier performance for natural language processing. In Proceedings of the Conference on Human Language Technology. 253--257.
[3]
Bellegarda, J. 1998. Exploiting both local and global constraints for multispan statistical language modeling. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. II, 677--680.
[4]
Berger, A. and Miller, R. 1998. Just-in-time language modeling. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. II, 705--708.
[5]
Bessling, S. and Meier, H. 1995. Language model speaker adaptation. In Proceedings of the Eurospeech. 1755--1758.
[6]
Biber, D. 1988. Variation Across Speech and Writing. Cambridge University Press.
[7]
Biber, D. 1993. Using register-diversified corpora for general language studies. Computat. Linguis. 19, 2, 219--242.
[8]
Boulis, C. 2005. Topic learning in text and conversational speech. Ph.D. thesis, University of Washington.
[9]
Bulyko, I., Ostendorf, M., and Stolcke, A. 2003. Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures. In Proceedings of the HLT/NAACL. 7--9.
[10]
Çetin, O. and Stolcke, A. 2005. Language modeling in the ICSI-SRI Spring 2005 Meeting speech recognition evaluation system. Tech. rep. tr-05-06, International Computer Science Institute.
[11]
Chen, S. and Goodman, J. 1999. An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13, 4, 359--394.
[12]
Cieri, C., Miller, D., and Walker, K. 2003. From Switchboard to Fisher: Telephone collection protocols, their uses and yields. In Proceedings of Eurospeech. 1597--1600.
[13]
Clarkson, P. and Robinson, A. 1997. Language model adaptation using mixtures and an exponentially decaying cache. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. II, 799--802.
[14]
Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the em algorithm. J. Royal Statis. Soc. Series B 39, 1, 1--38.
[15]
Duh, K. and Kirchhoff, K. 2005. Pos tagging of dialectal Arabic: A minimally supervised approach. In Proceedings of the Association for Computational Linguistics (ACL).
[16]
Evermann, G., Chan, H., Gales, M., Hain, T., Liu, X., Mrva, D., Wang, L., and Woodland, P. 2004a. Development of the 2003 CU-HTK conversational telephone speech transcription system. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. 1, 249--252.
[17]
Evermann, G., Chan, H., Gales, M., Jia, B., Liu, X., Mrva, D., Sim, K., Wang, L., Woodland, P., and Yu, K. 2004b. Development of the 2004 CU-HTK English CTS system using more than 2000 hours of data. In Proceedings of the NIST RT-04F Rich Transcription Workshop.
[18]
Gao, Y., Gu, L., and Kuo, H.-K. 2005. Portability challenges in developing interactive dialogue systems. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. V, 1017--1020.
[19]
Gildea, D. 2001. Corpus variation and parser performance. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. L. Lee and D. Harman, Eds. 167--202.
[20]
Godfrey, J., Holliman, E., and McDaniel, J. 1992. Switchboard: Telephone speech corpus for research and development. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. I, 517--520.
[21]
Goodman, J. 2001. A bit of progress in language modeling. Comput. Speech Lang. 15, 4, 403--434.
[22]
Hain, T., Burget, L., Dines, J., McCowan, I., Karafiat, M., Lincoln, M., Moore, D., Garau, G., Wan, V., Ordelman, R., and Renals, S. 2005. The development of the AMI system for the transcription of speech in meetings. In Proceedings of the Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms.
[23]
Hwang, M., Lei, X., Ng, T., Ostendorf, M., Stolcke, A., Wang, W., Zheng, J., and Gadde, V. 2004. Porting Decipher from English to Mandarin. In Proceedings of the NIST RT-04F Rich Transcription Workshop.
[24]
Hwang, M.-Y. et al. 1996. Predicting unseen triphones with senones. IEEE Trans. Speech Audio Process. 4. 412--419.
[25]
Iyer, R. and Ostendorf, M. 1996. Modeling long range dependencies in languages. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). 236--239.
[26]
Iyer, R. and Ostendorf, M. 1997. Transforming out-of-domain estimates to improve in-domain language models. In Proceedings of Eurospeech. vol. 4, 1975--1978.
[27]
Iyer, R. and Ostendorf, M. 1999. Relevance weighting for combining multi-domain data for n-gram language modeling. Comput. Speech Lang. 13, 3, 267--282.
[28]
Iyer, R., Ostendorf, M., and Meteer, M. 1997. Analyzing and predicting language model improvements. In IEEE Workshop on Speech Recognition and Understanding Proceedings. 254--261.
[29]
Keller, F. and Lapata, M. 2003. Using the web to obtain frequencies for unseen bigrams. Comput. Linguist. 29, 3, 459--484.
[30]
Kilgarriff, A. and Grefenstette, G. 2003. Introduction to the special issue on the web as a corpus. Computat. Linguist. 29, 3, 333--348.
[31]
Klakow, D. 2000. Selecting articles from the language model training corpus. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. III, 1695--1698.
[32]
Lamel, L., Adda, G., Bilinski, E., and Gauvain, J. L. 2005. Transcribing lectures and seminars. In Proceedings of Interspeech. 1657--1660.
[33]
Lapata, M. and Keller, F. 2005. Unsupervised web-based models for natural language processing. ACM Trans. Speech Lang. Process. 1, 2, 1--31.
[34]
Lee, Y.-B. and Myaeng, S. 2002. Text genre classification with genre-revealing and subject-revealing features. In Proceedings of SIGIR. 145--150.
[35]
Liu, F.-H., Picheny, M., Srinivasa, P., Mankowski, M., and Chen, J. 1996. Speech recognition on Mandarin CallHome: A large-vocabulary conversational and telephone speech corpus. In Proceedings of the International Conference on Acovstics, Speech and Signal Processing (ICASSP). vol. I, 157--160.
[36]
Mahajan, M., Beeferman, D., and Huang, D. 1999. Improved topic-dependent language modeling using information retrieval techniques. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol., I, 541--544.
[37]
Martin, S., Liermann, J., and Ney, H. 1997. Adaptive topic-dependent language modeling using word-based varigrams. In Proceedings of Eurospeech. vol. 3. 3, 1447--1450.
[38]
Morgan, N., Baron, D., Bhagat, S., Carvey, H., Dhillon, R., Edwards, J., Gelbart, D., Janin, A., Krupski, A., Peskin, B., Pfau, T., Shriberg, E., Stolcke, A., and Wooters, C. 2003. Meetings about meetings: Research at ICSI on speech in multiparty conversations. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vol. 4, 740--743.
[39]
Ng, T., Ostendorf, M., Hwang, M.-Y., Siu, M., Bulyko, I., and Lei, X. 2005. Web-data augmented language models for Mandarin conversational speech recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). 89--593.
[40]
Ratnaparkhi, A. 1996. A maximum entropy part-of-speech tagger. In Proceedings of Empirical Methods in Natural Language Processing Conference. 133--141.
[41]
Ries, K. 1997. A class based approach to domain adaptation and constraint integration for empirical m-gram models. In Proceedings of Eurospeech. 4, 1983--1986.
[42]
Rudnicky, A. 1995. Language modeling with limited domain data. In Proceedings of ARPA Spoken Language Technology Workshop. 66--69.
[43]
Sarikaya, R., Gravano, A., and Gao, Y. 2005. Rapid language model development using external resources for new spoken dialog domains. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). Vol. I, 573--576.
[44]
Scheytt, P., Geutner, P., and Waibel, A. 1998. Serbo-Croatian LVCSR on the dictation and broadcast news domain. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. 2, 897--900.
[45]
Schwarm, S., Bulyko, I., and Ostendorf, M. 2004. Adaptive language modeling with varied sources to cover new vocabulary items. IEEE Trans. Speech Audio 12, 3, 334--342.
[46]
Sethy, A., Georgiou, P., and Narayanan, S. 2005. Building topic-specific language models from webdata using competitive models. In Proceedings of Interspeech. 1293--1296.
[47]
Sproat, R., Black, A., Chen, S., Kumar, S., Ostendorf, M., and Richards, C. 2001. Normalization of non-standard words. Comput. Speech Lang. 15, 3, 287--333.
[48]
Stolcke, A. 1998. Entropy-based pruning of backoff language models. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop. 270--274.
[49]
Stolcke, A. 2002. SRILM -- an extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing (ICSLP). 901--904.
[50]
Stolcke, A., Anguera, X., Boakye, K., Janin, A., Mandal, A., Peskin, B., Wooters, C., and Zheng, J. 2005. Further progress in meeting recognition: The ICSI-SRI spring 2005 speech-to-text evaluation system. In Proceedings of NIST MLMI Meeting Recognition Workshop.
[51]
Stolcke, A. et al. 2003. Speech-to-text research at SRI-ICSI-UW. NIST RT-03 Workshop.
[52]
Venkataraman, A. and Wang, W. 2003. Techniques for effective vocabulary selection. In Proceedings of Eurospeech. 245--248.
[53]
Wang, W., Stolcke, A., and Harper, M. 2004. The use of a linguistically motivated language model in conversational speech recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). vol. I, 261--264.
[54]
Woodland, P. C. and Young, S. J. 1993. The HTK tied-state continuous speech recogniser. In Proceedings of Eurospeech. vol. 3, 2207--2210.
[55]
Xu, P. and Mangu, L. 2005. Using random forest language models in the IBM RT-04 CTS system. In Proceedings of Interspeech. 741--744.
[56]
Yang, Y. and Pedersen, J. 1997. A comparative study on feature selection in text categorization. In Proceedings of the International Conference on Machine Learning. 412--420.
[57]
Zhu, Q., Stolcke, A., Chen, B., and Morgan, N. 2005. Using mlp features in SRI's conversational speech recognition system. In Proceedings of Interspeech. 2141--2144.
[58]
Zhu, X. and Rosenfeld, R. 2001. Improving trigram language modeling with the World Wide Web. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). I:533--536.

Cited By

View all
  • (2024)Initial decoding with minimally augmented language model for improved lattice rescoring in low resource ASRSādhanā10.1007/s12046-024-02520-049:2Online publication date: 21-May-2024
  • (2021)Improved Data Selection for Domain Adaptation in ASRICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP39728.2021.9413869(7018-7022)Online publication date: 6-Jun-2021
  • (2019)Mining, analyzing, and modeling text written on mobile devicesNatural Language Engineering10.1017/S135132491900054827:1(1-33)Online publication date: 10-Oct-2019
  • Show More Cited By

Index Terms

  1. Web resources for language modeling in conversational speech recognition

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Speech and Language Processing
    ACM Transactions on Speech and Language Processing   Volume 5, Issue 1
    December 2007
    80 pages
    ISSN:1550-4875
    EISSN:1550-4883
    DOI:10.1145/1322391
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 December 2007
    Accepted: 01 August 2007
    Received: 01 November 2005
    Published in TSLP Volume 5, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Conversational speech
    2. Web data
    3. language modeling

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Initial decoding with minimally augmented language model for improved lattice rescoring in low resource ASRSādhanā10.1007/s12046-024-02520-049:2Online publication date: 21-May-2024
    • (2021)Improved Data Selection for Domain Adaptation in ASRICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP39728.2021.9413869(7018-7022)Online publication date: 6-Jun-2021
    • (2019)Mining, analyzing, and modeling text written on mobile devicesNatural Language Engineering10.1017/S135132491900054827:1(1-33)Online publication date: 10-Oct-2019
    • (2017)Compensating for Limitations in Speech-Based Natural Language Processing with Multimodal Interfaces in UAV OperationAdvances in Human Factors in Robots and Unmanned Systems10.1007/978-3-319-60384-1_18(183-194)Online publication date: 21-Jun-2017
    • (2016)Investigation of Combining Various Major Language Model Technologies including Data Expansion and AdaptationIEICE Transactions on Information and Systems10.1587/transinf.2016SLP0013E99.D:10(2452-2461)Online publication date: 2016
    • (2016)A Russian Keyword Spotting System Based on Large Vocabulary Continuous Speech Recognition and Linguistic KnowledgeJournal of Electrical and Computer Engineering10.1155/2016/40627862016(4)Online publication date: 1-Nov-2016
    • (2015)Semi-supervised training in low-resource ASR and KWS2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP.2015.7178862(4699-4703)Online publication date: Apr-2015
    • (2015)Deep bi-directional recurrent networks over spectral windows2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)10.1109/ASRU.2015.7404777(78-83)Online publication date: Dec-2015
    • (2014)Natural Language Understanding and PredictionFundamenta Informaticae10.5555/2608491.2608501131:3-4(425-440)Online publication date: 1-Jul-2014
    • (2014)General framework for mining, processing and storing large amounts of electronic texts for language modeling purposesLanguage Resources and Evaluation10.1007/s10579-013-9246-z48:2(227-248)Online publication date: 1-Jun-2014
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media