Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1076034.1076098acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

A phonotactic-semantic paradigm for automatic spoken document classification

Published: 15 August 2005 Publication History

Abstract

We demonstrate a phonotactic-semantic paradigm for spoken document categorization. In this framework, we define a set of acoustic words instead of lexical words to represent acoustic activities in spoken languages. The strategy for acoustic vocabulary selection is studied by comparing different feature selection methods. With an appropriate acoustic vocabulary, a voice tokenizer converts a spoken document into a text-like document of acoustic words. Thus, a spoken document can be represented by a count vector, named a bag-of-sounds vector, which characterizes a spoken document's semantic domain. We study two phonotactic-semantic classifiers, the support vector machine classifier and the latent semantic analysis classifier, and their properties. The phonotactic-semantic framework constitutes a new paradigm in spoken document classification, as demonstrated by its success in the spoken language identification task. It achieves 18.2% error reduction over state-of-the-art benchmark performance on the 1996 NIST Language Recognition Evaluation database.

References

[1]
Alshawi, H. Effective utterance classification with unsupervised phonotactic models. In Proceedings of HLT-NAACL, Edmonton, 2003, 1--7.
[2]
Bellegarda, J.R. Exploiting latent semantic information in statistical language modeling, In Proc. of the IEEE, 88, 8 (Aug. 2000), 1279--1296.
[3]
Cavnar, W.B., and Trenkle, J.M. N-Gram-Based Text Categorization, In Proc. of 3rd Annual Symposium on Document Analysis and Information Retrieval, 1994, 161--169.
[4]
Chu-Carroll, J., and Carpenter, B. Vector-based Natural Language Call Routing, Computational Linguistics, 25,3 (Sept. 1999), 361--388.
[5]
Dai, P., Iurgel, U., and Rigoll, G. A novel feature combination approach for spoken document classification with support vector machines, Multimedia Information Retrieval Workshop 2003, Toronto, Canada, Aug 2003.
[6]
Duda, R.O., and Hart, P.E. Pattern Classification and scene analysis. John Wiley & Sons, 1973.
[7]
Garofolo, J.S., Auzanne, C.G.P., and Voorhees, E.M. The TREC spoken document retrieval track: A success story. In Proceedings of the RIAO 2000 Conference: Context-based Multimedia Information Access, Paris 2000, 1--20.
[8]
Hieronymus, J.L. ASCII Phonetic Symbols for the World's Languages: Worldbet. Technical Report AT&T Bell Labs, 1994.
[9]
Ma, B., Li, H., and Lee, C.H. An Acoustic Segment Modeling Approach to Automatic Language Identification, submitted to Interspeech 2005.
[10]
Mladenic, D., Brank, J., Grobelnik, M., and Milic-Frayling, N. Feature selection using linear classifier weights: Interaction with classification with classification models, SIGIR'04, Sheffield, UK, 2004, 234--241.
[11]
Muller, K.R., Mika, S., Ratsch, G., Tsuda, K. and Scholkopf, B. An introduction to kernel-based learning algorithm, IEEE Trans on Neural Networks, 12, 2 (Mar 2001), 181--202.
[12]
Ng, C., Wilkinson, R., and Zobel, J. Experiments in Spoken Document Retrieval using Phoneme N-gram, Speech Communication, 32 (2000), 61--77.
[13]
Ng, K., Zue, V.W. Subword unit representations for spoken document retrieval, In Proc. of Eurospeech 1997, Rhodes, Greece, 1607--1610.
[14]
Salton, G. The SMART Retrieval System. Prentice-Hall, Englewood Cliffs, NJ, 1971.
[15]
Singer, E., Torres-Carrasquillo, P.A., Gleason, T.P., Campbell W.M., and Reynolds, D.A. Acoustic, Phonetic and Discriminative Approaches to Automatic language recognition, In Proc. of Eurospeech, 2003.
[16]
Torres-Carrasquillo, P.A., Reynolds, D.A., and Deller. Jr., J.R. Language identification using Gaussian Mixture model tokenization. In Proc. of ICASSP, 2002.
[17]
Zipf, G.K. Human Behavior and the Principal of Least effort, an introduction to human ecology. Addison-Wesley, Reading, Mass, 1949.
[18]
Zissman, M.A. Comparison of four approaches to automatic language identification of telephone speech, IEEE Trans. on Speech and Audio Processing, 4, 1 (Jan. 1996), 31--44.

Cited By

View all
  • (2017)ALBAYZIN 2016 spoken term detection evaluationEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-017-0119-z2017:1(1-23)Online publication date: 1-Dec-2017
  • (2012)Direct posterior confidence for out-of-vocabulary spoken term detectionACM Transactions on Information Systems10.1145/2328967.232896930:3(1-34)Online publication date: 6-Sep-2012
  • (2012)Term-Dependent Confidence Normalisation for Out-of-Vocabulary Spoken Term DetectionJournal of Computer Science and Technology10.1007/s11390-012-1228-x27:2(358-375)Online publication date: 5-Mar-2012
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
August 2005
708 pages
ISBN:1595930345
DOI:10.1145/1076034
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 August 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. acoustic words
  2. n-gram
  3. phonotactic-semantic
  4. semantic domain
  5. spoken document classification
  6. voice tokenizer

Qualifiers

  • Article

Conference

SIGIR05
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2017)ALBAYZIN 2016 spoken term detection evaluationEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-017-0119-z2017:1(1-23)Online publication date: 1-Dec-2017
  • (2012)Direct posterior confidence for out-of-vocabulary spoken term detectionACM Transactions on Information Systems10.1145/2328967.232896930:3(1-34)Online publication date: 6-Sep-2012
  • (2012)Term-Dependent Confidence Normalisation for Out-of-Vocabulary Spoken Term DetectionJournal of Computer Science and Technology10.1007/s11390-012-1228-x27:2(358-375)Online publication date: 5-Mar-2012
  • (2009)Audio Clips Content Comparison Using Latent Semantic IndexingProceedings of the 2009 IEEE International Conference on Semantic Computing10.1109/ICSC.2009.21(509-512)Online publication date: 14-Sep-2009
  • (2008)Language recognition with discriminative keyword selection2008 IEEE International Conference on Acoustics, Speech and Signal Processing10.1109/ICASSP.2008.4518567(4145-4148)Online publication date: Mar-2008
  • (2007)Type-II dialogue systems for information access from unstructured knowledge sources2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)10.1109/ASRU.2007.4430170(544-549)Online publication date: Dec-2007
  • (2006)Music structure based vector space retrievalProceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval10.1145/1148170.1148185(67-74)Online publication date: 6-Aug-2006
  • (2006)Integrating Acoustic, Prosodic and Phonotactic Features for Spoken Language Identification2006 IEEE International Conference on Acoustics Speed and Signal Processing Proceedings10.1109/ICASSP.2006.1659993(I-205-I-208)Online publication date: 2006

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media