research-article

Combining LVCSR and vocabulary-independent ranked utterance retrieval for robust speech search

Authors:

J. Scott Olsson,

Douglas W. OardAuthors Info & Claims

SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Pages 91 - 98

https://doi.org/10.1145/1571941.1571959

Published: 19 July 2009 Publication History

Abstract

Well tuned Large-Vocabulary Continuous Speech Recognition (LVCSR) has been shown to generally be more effective than vocabulary-independent techniques for ranked retrieval of spoken content when one or the other approach is used alone. Tuning LVCSR systems to a topic domain can be costly, however, and the experiments in this paper show that Out-Of-Vocabulary (OOV) query terms can significantly reduce retrieval effectiveness when that tuning is not performed. Further experiments demonstrate, however, that retrieval effectiveness for queries with OOV terms can be substantially improved by combining evidence from LVCSR with additional evidence from vocabulary-independent Ranked Utterance Retrieval (RUR). The combination is performed by using relevance judgments from held-out topics to learn generic (i.e., topic-independent), smooth, non-decreasing transformations from LVCSR and RUR system scores to probabilities of topical relevance. Evaluated using a CLEF collection that includes topics, spontaneous conversational speech audio, and relevance judgments, the system recovers 57% of the mean uninterpolated average precision that could have been obtained through LVCSR domain tuning for very short queries (or 41% for longer queries).

References

[1]

B. T. Bartell et al. Automatic Combination of Multiple Ranked Retrieval Systems. In SIGIR '94: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 173--181, 1994.

Digital Library

[2]

S. M. Beitzel et al. Fusion of effective retrieval strategies in the same information retrieval system. J. Am. Soc. Inf. Sci. Technol., 55(10):859--868, 2004.

Digital Library

[3]

W. Byrne et al. Automatic Recognition of Spontaneous Speech for Access to Multilingual Oral History Archives. IEEE Transactions on Speech and Audio Processing, Special Issue on Spontaneous Speech Processing, 12(4):420--435, July 2004.

[4]

J. P. Callan et al. Searching Distributed Collections with Inference Networks. In E. A. Fox, P. Ingwersen, and R. Fidel, editors, SIGIR '95: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21--28, Seattle, Washington, 1995. ACM Press.

Digital Library

[5]

J. Fiscus, J. Ajot, and G. Doddington. English Spoken Term Detection 2006 Results. In Presentation at NIST's 2006 STD Eval Workshop, 2006.

[6]

J. Garofolo, G. Auzanne, and E. Voorhees. The TREC spoken document retrieval task: A success story. Proceedings of the TREC-9 Conference, 2000.

[7]

D. A. James. A system for unrestricted topic retrieval from radio news broadcasts. In ICASSP '96: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 279--282, 1996.

Digital Library

[8]

Jean-Manuel Van Thong et al. Speechbot: an experimental speech-based search engine for multimedia content on the web. IEEE Trans. Multimedia, 4:88--96, 2002.

Digital Library

[9]

G. J. F. Jones et al. Retrieving spoken documents by combining multiple index sources. In SIGIR '96: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 30--38, New York, NY, USA, 1996. ACM.

Digital Library

[10]

P. Koehn et al. Moses: Open Source Toolkit for Statistical Machine Translation. In ACL '07: Proceedings of the 2007 Conference of the Association for Computational Linguistics, demonstration session, June 2007.

Digital Library

[11]

J.-H. Lee. Analyses of Multiple Evidence Combination. In SIGIR Forum: Forum of the ACM SIGIR Conference on Research and Development in Information Retrieval, pages 267--276, 1997.

Digital Library

[12]

S. Lee, K. Tanaka, and Y. Itoh. Combining Multiple Subword Representations for Open-Vocabulary Spoken Document Retrieval. In ICASSP '05: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 1, pages 505--508, March 2005.

[13]

D. Lillis, F. Toolan, R. Collier, and J. Dunnion. ProbFuse: a probabilistic approach to data fusion. In SIGIR '06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 139--146, New York, NY, USA, 2006. ACM.

Digital Library

[14]

B. Logan, P. Moreno, and O. Deshmukh. Word and sub-word indexing approaches for reducing the effects of OOV queries on spoken audio. In HLT '02: Proceedings of the 2002 Conference on Human Language Technology, 2002.

Digital Library

[15]

J. Mamou, D. Carmel, and R. Hoory. Spoken document retrieval from call-center conversations. In SIGIR '06: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 51--58, New York, NY, USA, 2006. ACM.

Digital Library

[16]

R. Manmatha, T. Rath, and F. Feng. Modeling score distributions for combining the outputs of search engines. In SIGIR '01: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 267--275, New York, NY, USA, 2001. ACM.

Digital Library

[17]

S. Matsoukas, R. Prasad, S. Laxminarayan, B. Xiang, L. Nguyen, and R. Schwartz. The 2004 BBN 1xRT Recognition Systems for English Broadcast News and Conversational Telephone Speech. In Interspeech '05: Conference of the International Speech Communication Association, pages 1641--1644, 2005.

[18]

M. Montague and J. A. Aslam. Condorcet fusion for improved retrieval. In CIKM '02: Proceedings of the 11th International Conference on Information and Knowledge Management, pages 538--548, New York, NY, USA, 2002. ACM.

Digital Library

[19]

K. Ng and V. Zue. Subword-based approaches for spoken document retrieval. Speech Commun., 32(3):157--186, 2000.

Digital Library

[20]

D. W. Oard, J. Wang, G. J. Jones, R. W. White, P. Pecina, D. Soergel, X. Huang, and I. Shafran. Overview of the CLEF-2006 Cross-Language Speech Retrieval Track. In Proceedings of the CLEF 2006 Workshop on Cross-Language Information Retrieval and Evaluation, September 2006.

Digital Library

[21]

J. S. Olsson. Combining Evidence from Unconstrained Spoken Term Frequency Estimation for Improved Speech Retrieval. PhD thesis, University of Maryland, College Park, MD, USA, 2008. Directed by Douglas W. Oard.

Digital Library

[22]

J. S. Olsson. Combining Speech Retrieval Results with Generalized Additive Models. In ACL '08: Proceedings of the 2008 Conference of the Association for Computational Linguistics, 2008.

[23]

J. S. Olsson and D. W. Oard. Phrase-Based Query Degradation Modeling for Vocabulary-Independent Ranked Utterance Retrieval. In To appear in NAACL-HLT 2009, 2009.

Digital Library

[24]

P. Pecina, P. Hoffmannova, G. Jones, J. Wang, and D. W. Oard. Overview of the CLEF-2007 Cross-Language Speech Retrieval Track. In Proceedings of the CLEF 2007 Workshop on Cross-Language Information Retrieval and Evaluation, September 2007.

Digital Library

[25]

A. L. Powell, J. C. French, J. P. Callan, M. E. Connell, and C. L. Viles. The impact of database selection on distributed searching. In Research and Development in Information Retrieval, pages 232--239, 2000.

Digital Library

[26]

R. Prasad, S. Matsoukas, C. Kao, J. Ma, D. Xu, T. Colthurst, O. Kimball, R. Schwartz, J. Gauvain, L. Lamel, H. Schwenk, G. Adda, and F. Lefevre. The 2004 BBN/LIMSI 20xRT English Conversational Telephone Speech Recognition System. In Interspeech '05: Conference of the International Speech Communication Association, 2005.

[27]

S. Robertson, S. Walker, S. Jones, and M. H.-B. M. Gatford. Okapi at TREC-3. In Text REtrieval Conference, pages 21--30, 1996.

[28]

M. Saraclar and R. Sproat. Lattice-Based Search for Spoken Utterance Retrieval. In NAACL '04: Proceedings of the 2004 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, 2004.

[29]

J. A. Shaw and E. A. Fox. Combination of Multiple Searches. In Proceedings of the 2nd Text REtrieval Conference (TREC-2), 1994.

[30]

O. Siohan and M. Bacchiani. Fast Vocabulary-Independent Audio Search Using Path-Based Graph Indexing. In Interspeech '05: Conference of the International Speech Communication Association, 2005.

[31]

A. Stolcke. SRILM--an extensible language modeling toolkit. In ICSLP '02: Proceedings of 2002 International Conference on Spoken Language Processing, 2002.

[32]

C. C. Vogt and G. W. Cottrell. Fusion Via a Linear Combination of Scores. Information Retrieval, 1(3):151--173, 1999.

Digital Library

[33]

E. M. Voorhees, N. K. Gupta, and B. Johnson-Laird. The Collection Fusion Problem. In D. K. Harman, editor, The Third Text REtrieval Conference (TREC-3), pages 500--225. National Institute of Standards and Technology, 1994.

[34]

R. W. White, D. W. Oard, G. J. F. Jones, D. Soergel, and X. Huang. Overview of the CLEF-2005 Cross-Language Speech Retrieval Track. In Proceedings of the CLEF 2005 Workshop on Cross-Language Information Retrieval and Evaluation, pages 744--759, 2005.

Digital Library

[35]

M. Witbrock and E. G. Hauptmann. Speech recognition and information retrieval: Experiments in retrieving spoken documents. In In Proc. DARPA Speech Recognition Workshop '97, 1997.

[36]

S. Wood. Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC., 2006.

Digital Library

[37]

S. N. Wood. Monotonic smoothing splines fitted by cross validation. SIAM Journal on Scientific Computing, 15(5):1126--1133, 1994.

Digital Library

[38]

S. N. Wood. Modelling and smoothing parameter estimation with multiple quadratic penalties. Journal Of The Royal Statistical Society Series B, 62(2):413--428, 2000.

[39]

P. Yu and F. Seide. Fast Two-Stage Vocabulary-Independent Search In Spontaneous Speech. In ICASSP '05: Proceedings of the 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005.

Cited By

Gao NOard DDredze MKando NSakai TJoho HLi Hde Vries AWhite R(2017)Support for Interactive Identification of Mentioned Entities in Conversational SpeechProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3080688(953-956)Online publication date: 7-Aug-2017
https://dl.acm.org/doi/10.1145/3077136.3080688
Shu ZShechtman ESamaras DHadap S(2016)EyeOpenerACM Transactions on Graphics10.1145/292671336:1(1-13)Online publication date: 9-Sep-2016
https://dl.acm.org/doi/10.1145/2926713
Ture FLin J(2014)Exploiting Representations from Statistical Machine Translation for Cross-Language Information RetrievalACM Transactions on Information Systems10.1145/264480732:4(1-32)Online publication date: 28-Oct-2014
https://dl.acm.org/doi/10.1145/2644807
Show More Cited By

Index Terms

Combining LVCSR and vocabulary-independent ranked utterance retrieval for robust speech search
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Vocabulary independent spoken term detection
SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

We are interested in retrieving information from speech data like broadcast news, telephone conversations and roundtable meetings. Today, most systems use large vocabulary continuous speech recognition tools to produce word transcripts; the transcripts ...
Combining Spectral Representations for Large-Vocabulary Continuous Speech Recognition

In this paper, we investigate the combination of complementary acoustic feature streams in large-vocabulary continuous speech recognition (LVCSR). We have explored the use of acoustic features obtained using a pitch-synchronous analysis, Straight, in ...
Phrase-based query degradation modeling for vocabulary-independent ranked utterance retrieval
NAACL '09: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

This paper introduces a new approach to ranking speech utterances by a system's confidence that they contain a spoken word. Multiple alternate pronunciations, or degradations, of a query word's phoneme sequence are hypothesized and incorporated into the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

July 2009

896 pages

ISBN:9781605584836

DOI:10.1145/1571941

General Chairs:
James Allan
University of Massachusetts Amherst, USA
,
Javed Aslam
Northeastern University, USA
,
Program Chairs:
Mark Sanderson
University of Sheffield, UK
,
ChengXiang Zhai
University of Illinois at Urbana-Champaign, USA
,
Justin Zobel
University of Melbourne, Australia

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGIR '09

Sponsor:

SIGIR '09: The 32nd International ACM SIGIR conference on research and development in Information Retrieval

July 19 - 23, 2009

MA, Boston, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
381
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 24 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Gao NOard DDredze MKando NSakai TJoho HLi Hde Vries AWhite R(2017)Support for Interactive Identification of Mentioned Entities in Conversational SpeechProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3077136.3080688(953-956)Online publication date: 7-Aug-2017
https://dl.acm.org/doi/10.1145/3077136.3080688
Shu ZShechtman ESamaras DHadap S(2016)EyeOpenerACM Transactions on Graphics10.1145/292671336:1(1-13)Online publication date: 9-Sep-2016
https://dl.acm.org/doi/10.1145/2926713
Ture FLin J(2014)Exploiting Representations from Statistical Machine Translation for Cross-Language Information RetrievalACM Transactions on Information Systems10.1145/264480732:4(1-32)Online publication date: 28-Oct-2014
https://dl.acm.org/doi/10.1145/2644807
Oard DAgrawal ROard DRajput N(2012)Query by babblingProceedings of the first workshop on Information and knowledge management for developing region10.1145/2389776.2389781(17-22)Online publication date: 2-Nov-2012
https://dl.acm.org/doi/10.1145/2389776.2389781
Ture FLin JOard DHersh WCallan JMaarek YSanderson M(2012)Looking inside the boxProceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval10.1145/2348283.2348491(1105-1106)Online publication date: 12-Aug-2012
https://dl.acm.org/doi/10.1145/2348283.2348491
Tejedor JFapšo MSzöke IČernocký JGrézl F(2012)Comparison of methods for language-dependent and language-independent query-by-example spoken term detectionACM Transactions on Information Systems10.1145/2328967.232897130:3(1-34)Online publication date: 6-Sep-2012
https://dl.acm.org/doi/10.1145/2328967.2328971
Wang DKing SFrankel JVipperla REvans NTroncy R(2012)Direct posterior confidence for out-of-vocabulary spoken term detectionACM Transactions on Information Systems10.1145/2328967.232896930:3(1-34)Online publication date: 6-Sep-2012
https://dl.acm.org/doi/10.1145/2328967.2328969
Wang JOard D(2012)Matching meaning for cross-language information retrievalInformation Processing and Management: an International Journal10.1016/j.ipm.2011.09.00348:4(631-653)Online publication date: 1-Jul-2012
https://dl.acm.org/doi/10.1016/j.ipm.2011.09.003
Wilson CGhosh APeers PChiang JBusch JDebevec P(2010)Temporal upsampling of performance geometry using photometric alignmentACM Transactions on Graphics10.1145/1731047.173105529:2(1-11)Online publication date: 21-Apr-2010
https://dl.acm.org/doi/10.1145/1731047.1731055
Zhu YSifakis ETeran JBrandt A(2010)An efficient multigrid method for the simulation of high-resolution elastic solidsACM Transactions on Graphics10.1145/1731047.173105429:2(1-18)Online publication date: 21-Apr-2010
https://dl.acm.org/doi/10.1145/1731047.1731054
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents