Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Free access

A historical perspective of speech recognition

Published: 01 January 2014 Publication History

Abstract

What do we know now that we did not know 40 years ago?

References

[1]
Bahl, L. et al. Maximum mutual information estimation of HMM parameters. In Proceedings of ICASSP (1986), 49--52.
[2]
Baker, J. Stochastic modeling for ASR. Speech Recognition. D.R. Reddy, ed. Academic Press, 1975.
[3]
Baum, L. Statistical Estimation for Probabilistic Functions of a Markov Process. Inequalities III, (1972), 1--8.
[4]
Chen, X., et al. Pipelined back-propagation for context-dependent deep neural networks. In Proceedings of Interspeech, 2012.
[5]
Dahl, G., et al. Context-dependent pre-trained deep neural networks for LVSR. In IEEE Trans. ASLP 20, 1 (2012), 30--42.
[6]
Davis, S. et al. Comparison of parametric representations. IEEE Trans ASSP 28, 4 (1980), 357--366.
[7]
Dean, J. et al. Large scale distributed deep networks. In Proceedings of NIPS (Lake Tahoe, NV, 2012).
[8]
Dempster, et al. Maximum likelihood from incomplete data via the EM algorithm. JRSS 39, 1 (1977), 1--38.
[9]
De Mori, R. Spoken Dialogue with Computers. Academic Press, 1998.
[10]
Deng, L. and Huang, X. (2004). Challenges in adopting speech recognition. Commun. ACM 47, 1 (Jan. 2004), 69--75.
[11]
Deng, L. et al. Binary coding of speech spectrograms using a deep auto-encoder. In Proceedings of Interspeech, 2010.
[12]
Fiscus, J. Recognizer output voting error reduction (ROVER). In Proceedings of IEEE ASRU Workshop (1997), 347--354.
[13]
He, X., et al. Discriminative learning in sequential pattern recognition. IEEE Signal Processing 25, 5 (2008), 14--36.
[14]
Hinton, G., et al. Deep neural networks for acoustic modeling in SR. IEEE Signal Processing 29, 11 (2012).
[15]
Huang, X., Acero, A., and Hon, H. Spoken Language Processing. Prentice Hall, Upper Saddle River, NJ, 2001.
[16]
Huang, X. et al. MiPad: A multimodal interaction prototype. In Proceedings of ICASSP (Salt Lake City, UT, 2001).
[17]
Huang, J. et al. Cross-language knowledge transfer using multilingual DNN. In Proceedings of ICASSP (2013), 7304--7308.
[18]
Hwang, M., and Huang, X. Shared-distribution HMMs for speech. IEEE Trans S&AP 1, 4 (1993), 414--420.
[19]
Jelinek, F. Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA, 1997.
[20]
Jelinek, F. Continuous speech recognition by statistical methods. In Proceedings of the IEEE 64, 4 (1976), 532--557.
[21]
Katagiri, S. et al. Pattern recognition using a family of design algorithms based upon the generalized probabilistic descent method. In Proceedings of the IEEE 86, 11 (1998), 2345--2373.
[22]
Kingsbury, B. et al. Scalable minimum Bayes risk training of deep neural network acoustic models. In Proceedings of Interspeech 2012.
[23]
Klatt, D.H. Review of the ARPA speech understanding project. JASA 62, 6 (1977), 1345--1366.
[24]
Lee, C. and Huo, Q. On adaptive decision rules and decision parameters adaption for ASR. In Proceedings of the IEEE 88, 8 (2000), 1241--1269.
[25]
Lee, K. ASR: The Development of the Sphinx Recognition System. Springer-Verlag, 1988.
[26]
Lowerre, B. The Harpy Speech Recognition System. Ph.D. Thesis (1976). Carnegie Mellon University.
[27]
Mikolov, T. et al. Extensions of recurrent neural network language model. In Proceedings of ICASSP (2011), 5528--5531.
[28]
Mohri, M. et al. Weighted finite state transducers in speech recognition. Computer Speech & Language 16 (2002), 69--88.
[29]
Morgan, N. et al. Continuous speech recognition using mulitlayer perceptions with Hidden Markov Models. In Proceedings of ICASSP (1990).
[30]
Pieraccini R. et al. A speech understanding system based on statistical representation. In Proceedings of ICASSP (1992), 193--196.
[31]
Potter, R., Kopp, G. and Green, H. Visible Speech. Van Nostrand, New York, NY, 1947.
[32]
Price, P. Evaluation of spoken language systems: The ATIS domain. In Proceedings of the DARPA Workshop, (Hidden Valley, PA, 1990).
[33]
Rabiner L. and Juang, B. Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993.
[34]
Reddy, R. Speech recognition by machine: A review. In Proceedings of the IEEE 64, 4 (1976), 501--531; http://www.rr.cs.cmu.edu/sr.pdf.
[35]
Seneff S. Tina: A NL system for spoken language application. Computational Linguistics 18, 1 (1992), 61--86.
[36]
Tur, G., and De Mori, R. SLU: Systems for Extracting Semantic Information from Speech. Wiley, U.K., 2011.
[37]
Yan, Z., Huo, Q., and Xu, J. A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR. In Proceedings of Interspeech (2013).
[38]
Yao, K. et al. Recurrent neural networks for language understanding. In Proceedings of Interspeech (2013), 104--108.
[39]
Yu, D. et al. Feature learning in DNN---Studies on speech recognition tasks. ICLR (2013).
[40]
Waibel, A. Phone recognition using time-delay neural networks. IEEE Trans. on ASSP 37, 3 (1989), 328--339.
[41]
Ward, W. et al. Recent improvements in the CMU SUS. In Proceedings of ARPA Human Language Technology (1994), 213--216.
[42]
Williams, J. and Young, S. Partially observable Markov decision processes for spoken dialog systems. Computer Speech and Language 21, 2 (2007), 393--422.
[43]
Zue, V. The use of speech knowledge in speech recognition. In Proceedings of the IEEE 73, 11 (1985), 1602--1615.

Cited By

View all
  • (2024)Understanding the Impact of Artificial Intelligence and Robotics in the Tourism and Hospitality Industry Through Customer ExperienceImpact of AI and Tech-Driven Solutions in Hospitality and Tourism10.4018/979-8-3693-6755-1.ch017(329-350)Online publication date: 30-Jun-2024
  • (2024)Efficient and Robust Arabic Automotive Speech Command Recognition SystemAlgorithms10.3390/a1709038517:9(385)Online publication date: 2-Sep-2024
  • (2024)Determining the Largest Overlap between TablesProceedings of the ACM on Management of Data10.1145/36393032:1(1-26)Online publication date: 26-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Communications of the ACM
Communications of the ACM  Volume 57, Issue 1
January 2014
107 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/2541883
  • Editor:
  • Moshe Y. Vardi
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2014
Published in CACM Volume 57, Issue 1

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Popular
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)872
  • Downloads (Last 6 weeks)77
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Understanding the Impact of Artificial Intelligence and Robotics in the Tourism and Hospitality Industry Through Customer ExperienceImpact of AI and Tech-Driven Solutions in Hospitality and Tourism10.4018/979-8-3693-6755-1.ch017(329-350)Online publication date: 30-Jun-2024
  • (2024)Efficient and Robust Arabic Automotive Speech Command Recognition SystemAlgorithms10.3390/a1709038517:9(385)Online publication date: 2-Sep-2024
  • (2024)Determining the Largest Overlap between TablesProceedings of the ACM on Management of Data10.1145/36393032:1(1-26)Online publication date: 26-Mar-2024
  • (2024)Automated Speech Recognition: Spurring Artificial Intelligence Innovation [Circuits from a Systems Perspective]IEEE Solid-State Circuits Magazine10.1109/MSSC.2024.347374716:4(29-116)Online publication date: Nov-2025
  • (2024)Speech-Based Human-Exoskeleton Interaction for Lower Limb Motion Planning2024 IEEE 4th International Conference on Human-Machine Systems (ICHMS)10.1109/ICHMS59971.2024.10555587(1-6)Online publication date: 15-May-2024
  • (2024)Improved spell corrector algorithm and deepspeech2 model for enhancing end-to-end Gujarati language ASR performancee-Prime - Advances in Electrical Engineering, Electronics and Energy10.1016/j.prime.2024.1004417(100441)Online publication date: Mar-2024
  • (2024)Improved mini-batch multiple augmentation for low-resource spoken word recognitionExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.124157252:PAOnline publication date: 24-Jul-2024
  • (2024)Automatic Speech Recognition Based on Improved Deep LearningAutomatic Speech Recognition and Translation for Low Resource Languages10.1002/9781394214624.ch18(405-426)Online publication date: 29-Mar-2024
  • (2023)Environment-Aware Knowledge Distillation for Improved Resource-Constrained Edge Speech RecognitionApplied Sciences10.3390/app13231257113:23(12571)Online publication date: 22-Nov-2023
  • (2023) “Is it Even Giving the Correct Reading or Not?”: How Trust and Relationships Mediate Blood Pressure Management in IndiaACM Transactions on Computer-Human Interaction10.1145/360932730:6(1-27)Online publication date: 25-Sep-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDFChinese translation

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media