Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1571941.1571959acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Combining LVCSR and vocabulary-independent ranked utterance retrieval for robust speech search

Published: 19 July 2009 Publication History

Abstract

Well tuned Large-Vocabulary Continuous Speech Recognition (LVCSR) has been shown to generally be more effective than vocabulary-independent techniques for ranked retrieval of spoken content when one or the other approach is used alone. Tuning LVCSR systems to a topic domain can be costly, however, and the experiments in this paper show that Out-Of-Vocabulary (OOV) query terms can significantly reduce retrieval effectiveness when that tuning is not performed. Further experiments demonstrate, however, that retrieval effectiveness for queries with OOV terms can be substantially improved by combining evidence from LVCSR with additional evidence from vocabulary-independent Ranked Utterance Retrieval (RUR). The combination is performed by using relevance judgments from held-out topics to learn generic (i.e., topic-independent), smooth, non-decreasing transformations from LVCSR and RUR system scores to probabilities of topical relevance. Evaluated using a CLEF collection that includes topics, spontaneous conversational speech audio, and relevance judgments, the system recovers 57% of the mean uninterpolated average precision that could have been obtained through LVCSR domain tuning for very short queries (or 41% for longer queries).

References

[1]
B. T. Bartell et al. Automatic Combination of Multiple Ranked Retrieval Systems. In SIGIR '94: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 173--181, 1994.
[2]
S. M. Beitzel et al. Fusion of effective retrieval strategies in the same information retrieval system. J. Am. Soc. Inf. Sci. Technol., 55(10):859--868, 2004.
[3]
W. Byrne et al. Automatic Recognition of Spontaneous Speech for Access to Multilingual Oral History Archives. IEEE Transactions on Speech and Audio Processing, Special Issue on Spontaneous Speech Processing, 12(4):420--435, July 2004.
[4]
J. P. Callan et al. Searching Distributed Collections with Inference Networks. In E. A. Fox, P. Ingwersen, and R. Fidel, editors, SIGIR '95: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21--28, Seattle, Washington, 1995. ACM Press.
[5]
J. Fiscus, J. Ajot, and G. Doddington. English Spoken Term Detection 2006 Results. In Presentation at NIST's 2006 STD Eval Workshop, 2006.
[6]
J. Garofolo, G. Auzanne, and E. Voorhees. The TREC spoken document retrieval task: A success story. Proceedings of the TREC-9 Conference, 2000.
[7]
D. A. James. A system for unrestricted topic retrieval from radio news broadcasts. In ICASSP '96: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 279--282, 1996.
[8]
Jean-Manuel Van Thong et al. Speechbot: an experimental speech-based search engine for multimedia content on the web. IEEE Trans. Multimedia, 4:88--96, 2002.
[9]
G. J. F. Jones et al. Retrieving spoken documents by combining multiple index sources. In SIGIR '96: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 30--38, New York, NY, USA, 1996. ACM.
[10]
P. Koehn et al. Moses: Open Source Toolkit for Statistical Machine Translation. In ACL '07: Proceedings of the 2007 Conference of the Association for Computational Linguistics, demonstration session, June 2007.
[11]
J.-H. Lee. Analyses of Multiple Evidence Combination. In SIGIR Forum: Forum of the ACM SIGIR Conference on Research and Development in Information Retrieval, pages 267--276, 1997.
[12]
S. Lee, K. Tanaka, and Y. Itoh. Combining Multiple Subword Representations for Open-Vocabulary Spoken Document Retrieval. In ICASSP '05: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 1, pages 505--508, March 2005.
[13]
D. Lillis, F. Toolan, R. Collier, and J. Dunnion. ProbFuse: a probabilistic approach to data fusion. In SIGIR '06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 139--146, New York, NY, USA, 2006. ACM.
[14]
B. Logan, P. Moreno, and O. Deshmukh. Word and sub-word indexing approaches for reducing the effects of OOV queries on spoken audio. In HLT '02: Proceedings of the 2002 Conference on Human Language Technology, 2002.
[15]
J. Mamou, D. Carmel, and R. Hoory. Spoken document retrieval from call-center conversations. In SIGIR '06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 51--58, New York, NY, USA, 2006. ACM.
[16]
R. Manmatha, T. Rath, and F. Feng. Modeling score distributions for combining the outputs of search engines. In SIGIR '01: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 267--275, New York, NY, USA, 2001. ACM.
[17]
S. Matsoukas, R. Prasad, S. Laxminarayan, B. Xiang, L. Nguyen, and R. Schwartz. The 2004 BBN 1xRT Recognition Systems for English Broadcast News and Conversational Telephone Speech. In Interspeech '05: Conference of the International Speech Communication Association, pages 1641--1644, 2005.
[18]
M. Montague and J. A. Aslam. Condorcet fusion for improved retrieval. In CIKM '02: Proceedings of the 11th International Conference on Information and Knowledge Management, pages 538--548, New York, NY, USA, 2002. ACM.
[19]
K. Ng and V. Zue. Subword-based approaches for spoken document retrieval. Speech Commun., 32(3):157--186, 2000.
[20]
D. W. Oard, J. Wang, G. J. Jones, R. W. White, P. Pecina, D. Soergel, X. Huang, and I. Shafran. Overview of the CLEF-2006 Cross-Language Speech Retrieval Track. In Proceedings of the CLEF 2006 Workshop on Cross-Language Information Retrieval and Evaluation, September 2006.
[21]
J. S. Olsson. Combining Evidence from Unconstrained Spoken Term Frequency Estimation for Improved Speech Retrieval. PhD thesis, University of Maryland, College Park, MD, USA, 2008. Directed by Douglas W. Oard.
[22]
J. S. Olsson. Combining Speech Retrieval Results with Generalized Additive Models. In ACL '08: Proceedings of the 2008 Conference of the Association for Computational Linguistics, 2008.
[23]
J. S. Olsson and D. W. Oard. Phrase-Based Query Degradation Modeling for Vocabulary-Independent Ranked Utterance Retrieval. In To appear in NAACL-HLT 2009, 2009.
[24]
P. Pecina, P. Hoffmannova, G. Jones, J. Wang, and D. W. Oard. Overview of the CLEF-2007 Cross-Language Speech Retrieval Track. In Proceedings of the CLEF 2007 Workshop on Cross-Language Information Retrieval and Evaluation, September 2007.
[25]
A. L. Powell, J. C. French, J. P. Callan, M. E. Connell, and C. L. Viles. The impact of database selection on distributed searching. In Research and Development in Information Retrieval, pages 232--239, 2000.
[26]
R. Prasad, S. Matsoukas, C. Kao, J. Ma, D. Xu, T. Colthurst, O. Kimball, R. Schwartz, J. Gauvain, L. Lamel, H. Schwenk, G. Adda, and F. Lefevre. The 2004 BBN/LIMSI 20xRT English Conversational Telephone Speech Recognition System. In Interspeech '05: Conference of the International Speech Communication Association, 2005.
[27]
S. Robertson, S. Walker, S. Jones, and M. H.-B. M. Gatford. Okapi at TREC-3. In Text REtrieval Conference, pages 21--30, 1996.
[28]
M. Saraclar and R. Sproat. Lattice-Based Search for Spoken Utterance Retrieval. In NAACL '04: Proceedings of the 2004 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, 2004.
[29]
J. A. Shaw and E. A. Fox. Combination of Multiple Searches. In Proceedings of the 2nd Text REtrieval Conference (TREC-2), 1994.
[30]
O. Siohan and M. Bacchiani. Fast Vocabulary-Independent Audio Search Using Path-Based Graph Indexing. In Interspeech '05: Conference of the International Speech Communication Association, 2005.
[31]
A. Stolcke. SRILM--an extensible language modeling toolkit. In ICSLP '02: Proceedings of 2002 International Conference on Spoken Language Processing, 2002.
[32]
C. C. Vogt and G. W. Cottrell. Fusion Via a Linear Combination of Scores. Information Retrieval, 1(3):151--173, 1999.
[33]
E. M. Voorhees, N. K. Gupta, and B. Johnson-Laird. The Collection Fusion Problem. In D. K. Harman, editor, The Third Text REtrieval Conference (TREC-3), pages 500--225. National Institute of Standards and Technology, 1994.
[34]
R. W. White, D. W. Oard, G. J. F. Jones, D. Soergel, and X. Huang. Overview of the CLEF-2005 Cross-Language Speech Retrieval Track. In Proceedings of the CLEF 2005 Workshop on Cross-Language Information Retrieval and Evaluation, pages 744--759, 2005.
[35]
M. Witbrock and E. G. Hauptmann. Speech recognition and information retrieval: Experiments in retrieving spoken documents. In In Proc. DARPA Speech Recognition Workshop '97, 1997.
[36]
S. Wood. Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC., 2006.
[37]
S. N. Wood. Monotonic smoothing splines fitted by cross validation. SIAM Journal on Scientific Computing, 15(5):1126--1133, 1994.
[38]
S. N. Wood. Modelling and smoothing parameter estimation with multiple quadratic penalties. Journal Of The Royal Statistical Society Series B, 62(2):413--428, 2000.
[39]
P. Yu and F. Seide. Fast Two-Stage Vocabulary-Independent Search In Spontaneous Speech. In ICASSP '05: Proceedings of the 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005.

Cited By

View all
  • (2017)Support for Interactive Identification of Mentioned Entities in Conversational SpeechProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3080688(953-956)Online publication date: 7-Aug-2017
  • (2016)EyeOpenerACM Transactions on Graphics10.1145/292671336:1(1-13)Online publication date: 9-Sep-2016
  • (2014)Exploiting Representations from Statistical Machine Translation for Cross-Language Information RetrievalACM Transactions on Information Systems10.1145/264480732:4(1-32)Online publication date: 28-Oct-2014
  • Show More Cited By

Index Terms

  1. Combining LVCSR and vocabulary-independent ranked utterance retrieval for robust speech search

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
    July 2009
    896 pages
    ISBN:9781605584836
    DOI:10.1145/1571941
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 July 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. ranked utterance retrieval
    2. speech retrieval
    3. vocabulary-independent spoken term detection

    Qualifiers

    • Research-article

    Conference

    SIGIR '09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 24 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)Support for Interactive Identification of Mentioned Entities in Conversational SpeechProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3080688(953-956)Online publication date: 7-Aug-2017
    • (2016)EyeOpenerACM Transactions on Graphics10.1145/292671336:1(1-13)Online publication date: 9-Sep-2016
    • (2014)Exploiting Representations from Statistical Machine Translation for Cross-Language Information RetrievalACM Transactions on Information Systems10.1145/264480732:4(1-32)Online publication date: 28-Oct-2014
    • (2012)Query by babblingProceedings of the first workshop on Information and knowledge management for developing region10.1145/2389776.2389781(17-22)Online publication date: 2-Nov-2012
    • (2012)Looking inside the boxProceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval10.1145/2348283.2348491(1105-1106)Online publication date: 12-Aug-2012
    • (2012)Comparison of methods for language-dependent and language-independent query-by-example spoken term detectionACM Transactions on Information Systems10.1145/2328967.232897130:3(1-34)Online publication date: 6-Sep-2012
    • (2012)Direct posterior confidence for out-of-vocabulary spoken term detectionACM Transactions on Information Systems10.1145/2328967.232896930:3(1-34)Online publication date: 6-Sep-2012
    • (2012)Matching meaning for cross-language information retrievalInformation Processing and Management: an International Journal10.1016/j.ipm.2011.09.00348:4(631-653)Online publication date: 1-Jul-2012
    • (2010)Temporal upsampling of performance geometry using photometric alignmentACM Transactions on Graphics10.1145/1731047.173105529:2(1-11)Online publication date: 21-Apr-2010
    • (2010)An efficient multigrid method for the simulation of high-resolution elastic solidsACM Transactions on Graphics10.1145/1731047.173105429:2(1-18)Online publication date: 21-Apr-2010
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media