Nothing Special   »   [go: up one dir, main page]

skip to main content
Skip header Section
Connectionist Speech Recognition: A Hybrid ApproachOctober 1993
Publisher:
  • Kluwer Academic Publishers
  • 101 Philip Drive Assinippi Park Norwell, MA
  • United States
ISBN:978-0-7923-9396-2
Published:01 October 1993
Pages:
352
Skip Bibliometrics Section
Reflects downloads up to 30 Sep 2024Bibliometrics
Skip Abstract Section
Abstract

From the Publisher:

Connectionist Speech Recognition: A Hybrid Approach describes the theory and implementation of a method to incorporate neural network approaches into state-of-the-art continuous speech recognition systems based on Hidden Markov Models (HMMs) to improve their performance. In this framework, neural networks (and in particular, multilayer perceptrons or MLPs) have been restricted to well-defined subtasks of the whole system, i.e., HMM emission probability estimation and feature extraction. The book describes a successful five year international collaboration between the authors. The lessons learned form a case study that demonstrates how hybrid systems can be developed to combine neural networks with more traditional statistical approaches. The book illustrates both the advantages and limitations of neural networks in the framework of a statistical system. Using standard databases and comparing with some conventional approaches, it is shown that MLP probability estimation can improve recognition performance. Other approaches are discussed, though there is no such unequivocal experimental result for these methods. Connectionist Speech Recognition: A Hybrid Approach is of use to anyone intending to use neural networks for speech recognition or within the framework provided by an existing successful statistical approach. This includes research and development groups working in the field of speech recognition, both with standard and neural network approaches, as well as other pattern recognition and/or neural network researchers. This book is also suitable as a text for advanced courses on neural networks or speech processing.

Cited By

  1. Prabhavalkar R, Hori T, Sainath T, Schlüter R and Watanabe S (2024). End-to-End Speech Recognition: A Survey, IEEE/ACM Transactions on Audio, Speech and Language Processing, 32, (325-351), Online publication date: 1-Jan-2024.
  2. Wong J, Zhang H and Chen N (2023). Modelling Inter-Rater Uncertainty in Spoken Language Assessment, IEEE/ACM Transactions on Audio, Speech and Language Processing, 31, (2886-2898), Online publication date: 1-Jan-2023.
  3. Koller O, Camgoz N, Ney H and Bowden R (2020). Weakly Supervised Learning with Multi-Stream CNN-LSTM-HMMs to Discover Sequential Parallelism in Sign Language Videos, IEEE Transactions on Pattern Analysis and Machine Intelligence, 42:9, (2306-2320), Online publication date: 1-Sep-2020.
  4. Becerra A, Rosa J, González E, Pedroza A, Escalante N and Santos E (2020). A comparative case study of neural network training by using frame-level cost functions for automatic speech recognition purposes in Spanish, Multimedia Tools and Applications, 79:27-28, (19669-19715), Online publication date: 1-Jul-2020.
  5. Waibel A Multimodal dialogue processing for machine translation The Handbook of Multimodal-Multisensor Interfaces, (577-620)
  6. Kadyan V, Mantri A, Aggarwal R and Singh A (2019). A comparative study of deep neural network based Punjabi-ASR system, International Journal of Speech Technology, 22:1, (111-119), Online publication date: 1-Mar-2019.
  7. Jha A, Namboodiri V and Jawahar C (2019). Spotting words in silent speech videos, Machine Vision and Applications, 30:2, (217-229), Online publication date: 1-Mar-2019.
  8. Becerra A, Rosa J, González E, Pedroza A and Escalante N (2018). Training deep neural networks with non-uniform frame-level cost function for automatic speech recognition, Multimedia Tools and Applications, 77:20, (27231-27267), Online publication date: 1-Oct-2018.
  9. Wen Z, Li K, Huang Z, Lee C and Tao J (2018). Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning, Journal of Signal Processing Systems, 90:7, (1025-1037), Online publication date: 1-Jul-2018.
  10. Tüske Z, Schlüter R and Ney H Acoustic Modeling of Speech Waveform Based on Multi-Resolution, Neural Network Signal Processing 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (4859-4863)
  11. Ichikawa E, Sawada K, Hashimoto K, Nankaku Y and Tokuda K Image Recognition Based on Separable Lattice Hmms Using a Deep Neural Network for Output Probability Distributions 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (3021-3025)
  12. Lu H, Li Y, Chen M, Kim H and Serikawa S (2018). Brain Intelligence, Mobile Networks and Applications, 23:2, (368-375), Online publication date: 1-Apr-2018.
  13. Manjunath K and Sreenivasa Rao K (2018). Improvement of Phone Recognition Accuracy Using Articulatory Features, Circuits, Systems, and Signal Processing, 37:2, (704-728), Online publication date: 1-Feb-2018.
  14. Ochiai T, Watanabe S, Hori T and Hershey J Multichannel end-to-end speech recognition Proceedings of the 34th International Conference on Machine Learning - Volume 70, (2632-2641)
  15. Backurs A and Tzamos C Improving Viterbi is hard Proceedings of the 34th International Conference on Machine Learning - Volume 70, (311-321)
  16. Potamianos G, Marcheret E, Mroueh Y, Goel V, Koumbaroulis A, Vartholomaios A and Thermos S Audio and visual modality combination in speech processing applications The Handbook of Multimodal-Multisensor Interfaces, (489-543)
  17. Katsamanis A, Pitsikalis V, Theodorakis S and Maragos P Multimodal gesture recognition The Handbook of Multimodal-Multisensor Interfaces, (449-487)
  18. Zhang C and Woodland P Joint optimisation of tandem systems using Gaussian mixture density neural network discriminative sequence training 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (5015-5019)
  19. Zweig G, Yu C, Droppo J and Stolcke A Advances in all-neural speech recognition 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (4805-4809)
  20. Bell P, Swietojanski P, Renals S, Bell P, Swietojanski P and Renals S (2017). Multitask Learning of Context-Dependent Targets in Deep Neural Network Acoustic Models, IEEE/ACM Transactions on Audio, Speech and Language Processing, 25:2, (238-247), Online publication date: 1-Feb-2017.
  21. Huang Z, Siniscalchi S, Lee C, Zhen Huang , Siniscalchi S, Chin-Hui Lee , Lee C, Huang Z and Siniscalchi S (2017). Bayesian Unsupervised Batch and Online Speaker Adaptation of Activation Function Parameters in Deep Models for Automatic Speech Recognition, IEEE/ACM Transactions on Audio, Speech and Language Processing, 25:1, (64-75), Online publication date: 1-Jan-2017.
  22. Maas A, Qi P, Xie Z, Hannun A, Lengerich C, Jurafsky D and Ng A (2017). Building DNN acoustic models for large vocabulary speech recognition, Computer Speech and Language, 41:C, (195-213), Online publication date: 1-Jan-2017.
  23. Ansari Z and Seyyedsalehi S (2017). Toward growing modular deep neural networks for continuous speech recognition, Neural Computing and Applications, 28:1, (1177-1196), Online publication date: 1-Jan-2017.
  24. Cernak M, Lazaridis A, Asaei A, Garner P, Cernak M, Lazaridis A, Asaei A and Garner P (2016). Composition of Deep and Spiking Neural Networks for Very Low Bit Rate Speech Coding, IEEE/ACM Transactions on Audio, Speech and Language Processing, 24:12, (2301-2312), Online publication date: 1-Dec-2016.
  25. España-Bonet C and Fonollosa J Automatic Speech Recognition with Deep Neural Networks for Impaired Speech Advances in Speech and Language Technologies for Iberian Languages, (97-107)
  26. Swietojanski P, Renals S, Swietojanski P, Renals S, Swietojanski P and Renals S (2016). Differentiable Pooling for Unsupervised Acoustic Model Adaptation, IEEE/ACM Transactions on Audio, Speech and Language Processing, 24:10, (1773-1784), Online publication date: 1-Oct-2016.
  27. ACM
    Lavania C, Thulasidasan S, LaMarca A, Scofield J and Bilmes J A weakly supervised activity recognition framework for real-time synthetic biology laboratory assistance Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing, (37-48)
  28. Swietojanski P, Li J and Renals S (2016). Learning hidden unit contributions for unsupervised acoustic model adaptation, IEEE/ACM Transactions on Audio, Speech and Language Processing, 24:8, (1450-1463), Online publication date: 1-Aug-2016.
  29. Chen K and Huo Q (2016). Training deep bidirectional LSTM acoustic model for LVCSR by a context-sensitive-chunk BPTT approach, IEEE/ACM Transactions on Audio, Speech and Language Processing, 24:7, (1185-1193), Online publication date: 1-Jul-2016.
  30. Amodei D, Ananthanarayanan S, Anubhai R, Bai J, Battenberg E, Case C, Casper J, Catanzaro B, Cheng Q, Chen G, Chen J, Chen J, Chen Z, Chrzanowski M, Coates A, Diamos G, Ding K, Du N, Elsen E, Engel J, Fang W, Fan L, Fougner C, Gao L, Gong C, Hannun A, Han T, Johannes L, Jiang B, Ju C, Jun B, LeGresley P, Lin L, Liu J, Liu Y, Li W, Li X, Ma D, Narang S, Ng A, Ozair S, Peng Y, Prenger R, Qian S, Quan Z, Raiman J, Rao V, Satheesh S, Seetapun D, Sengupta S, Srinet K, Sriram A, Tang H, Tang L, Wang C, Wang J, Wang K, Wang Y, Wang Z, Wang Z, Wu S, Wei L, Xiao B, Xie W, Xie Y, Yogatama D, Yuan B, Zhan J and Zhu Z Deep speech 2 Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, (173-182)
  31. Phan H, Hertel L, Maass M, Mazur R and Mertins A (2016). Learning representations for nonspeech audio events through their similarities to speech patterns, IEEE/ACM Transactions on Audio, Speech and Language Processing, 24:4, (807-822), Online publication date: 1-Apr-2016.
  32. Cheng J, Chen X and Metallinou A (2015). Deep neural network acoustic models for spoken assessment applications, Speech Communication, 73:C, (14-27), Online publication date: 1-Oct-2015.
  33. Kwon Y, Kim K, Tompkin J, Kim J and Theobalt C (2015). Efficient Learning of Image Super-Resolution and Compression Artifact Removal with Semi-Local Gaussian Processes, IEEE Transactions on Pattern Analysis and Machine Intelligence, 37:9, (1792-1805), Online publication date: 1-Sep-2015.
  34. Costa-jussá M and Fonollosa J (2015). Latest trends in hybrid machine translation and its applications, Computer Speech and Language, 32:1, (3-10), Online publication date: 1-Jul-2015.
  35. Noda K, Yamaguchi Y, Nakadai K, Okuno H and Ogata T (2015). Audio-visual speech recognition using deep learning, Applied Intelligence, 42:4, (722-737), Online publication date: 1-Jun-2015.
  36. Rasipuram R and Magimai-Doss M (2015). Acoustic and lexical resource constrained ASR using language-independent acoustic model and language-dependent probabilistic lexical model, Speech Communication, 68:C, (23-40), Online publication date: 1-Apr-2015.
  37. Wang Y and Lee L (2015). Supervised detection and unsupervised discovery of pronunciation error patterns for computer-assisted language learning, IEEE/ACM Transactions on Audio, Speech and Language Processing, 23:3, (564-579), Online publication date: 1-Mar-2015.
  38. Schmidhuber J (2015). Deep learning in neural networks, Neural Networks, 61:C, (85-117), Online publication date: 1-Jan-2015.
  39. De-La-Calle-Silos F, Gallardo-Antolín A and Peláez-Moreno C Deep Maxout Networks Applied to Noise-Robust Speech Recognition Proceedings of the Second International Conference on Advances in Speech and Language Technologies for Iberian Languages - Volume 8854, (109-118)
  40. Ray J, Thompson B and Shen W Comparing a high and low-level deep neural network implementation for automatic speech recognition Proceedings of the 1st First Workshop for High Performance Technical Computing in Dynamic Languages, (41-46)
  41. Siniscalchi S, Yu D, Deng L and Lee C (2013). Exploiting deep neural networks for detection-based speech recognition, Neurocomputing, 106, (148-157), Online publication date: 1-Apr-2013.
  42. Ordóñez F, Duque A, de Toledo P and Sanchis A A hybrid HMM/ANN model for activity recognition in the home using binary sensors Proceedings of the 4th international conference on Ambient Assisted Living and Home Care, (98-105)
  43. Grézl F The role of neural network size in TRAP/HATS feature extraction Proceedings of the 14th international conference on Text, speech and dialogue, (315-322)
  44. ACM
    Wöllmer M, Schuller B, Batliner A, Steidl S and Seppi D (2011). Tandem decoding of children's speech for keyword detection in a child-robot interaction scenario, ACM Transactions on Speech and Language Processing , 7:4, (1-22), Online publication date: 1-Aug-2011.
  45. Pompili A, Abad A, Trancoso I, Fonseca J, Martins I, Leal G and Farrajota L An on-line system for remote treatment of aphasia Proceedings of the Second Workshop on Speech and Language Processing for Assistive Technologies, (1-10)
  46. ACM
    Mohamed A and Nair K Continuous Malayalam speech recognition using Hidden Markov Models Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India, (1-4)
  47. Sun Y, Ten Bosch L and Boves L Hybrid HMM/BLSTM-RNN for robust speech recognition Proceedings of the 13th international conference on Text, speech and dialogue, (400-407)
  48. Scanzio S, Cumani S, Gemello R, Mana F and Laface P (2010). Parallel implementation of Artificial Neural Network training for speech recognition, Pattern Recognition Letters, 31:11, (1302-1309), Online publication date: 1-Aug-2010.
  49. Almeida L and Ludermir T (2010). A multi-objective memetic and hybrid methodology for optimizing the parameters and performance of artificial neural networks, Neurocomputing, 73:7-9, (1438-1450), Online publication date: 1-Mar-2010.
  50. Trentin E and Di Iorio E (2009). Classification of graphical data made easy, Neurocomputing, 73:1-3, (204-212), Online publication date: 1-Dec-2009.
  51. Zamora-Martínez F, Castro-Bleda M, España-Boquera S and Gorbe J Improving isolated handwritten word recognition using a specialized classifier for short words Proceedings of the Current topics in artificial intelligence, and 13th conference on Spanish association for artificial intelligence, (61-70)
  52. Siniscalchi S and Lee C (2009). A study on integrating acoustic-phonetic information into lattice rescoring for automatic speech recognition, Speech Communication, 51:11, (1139-1153), Online publication date: 1-Nov-2009.
  53. Trentin E and Freno A Unsupervised nonparametric density estimation Proceedings of the 2009 international joint conference on Neural Networks, (2983-2990)
  54. Alvanitopoulos P, Andreadis I and Elenas A A new algorithm for the classification of earthquake damages in structures Proceedings of the Fifth IASTED International Conference on Signal Processing, Pattern Recognition and Applications, (151-156)
  55. Fernández S, Graves A and Schmidhuber J An application of recurrent neural networks to discriminative keyword spotting Proceedings of the 17th international conference on Artificial neural networks, (220-229)
  56. Aradilla G and Bourlard H Posterior-based features and distances in template matching for speech recognition Proceedings of the 4th international conference on Machine learning for multimodal interaction, (204-214)
  57. Krüger V and Grest D Using hidden Markov models for recognizing action primitives in complex actions Proceedings of the 15th Scandinavian conference on Image analysis, (203-212)
  58. García-Moral A, Solera-Ureña R, Peláez-Moreno C and Díaz-de-María F Hybrid models for automatic speech recognition Proceedings of the 2007 international conference on Advances in nonlinear speech processing, (152-160)
  59. Solera-Ureña R, Padrell-Sendra J, Martín-Iglesias D, Gallardo-Antolín A, Peláez-Moreno C and Díaz-De-María F SVMs for automatic speech recognition Progress in nonlinear speech processing, (190-216)
  60. Bozkurt B, Dutoit T and Couvreur L Spectral analysis of speech signals using chirp group delay Progress in nonlinear speech processing, (41-57)
  61. Krüger V Recognizing action primitives in complex actions using hidden markov models Proceedings of the Second international conference on Advances in Visual Computing - Volume Part I, (538-547)
  62. Trentin E A novel connectionist-oriented feature normalization technique Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part II, (410-416)
  63. ACM
    Graves A, Fernández S, Gomez F and Schmidhuber J Connectionist temporal classification Proceedings of the 23rd international conference on Machine learning, (369-376)
  64. Kim M, Park J, Kim W and Joo Y Identification of t–s fuzzy classifier via linear matrix inequalities Proceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence, (1134-1137)
  65. Tóth L and Kocsor A Explicit duration modelling in HMM/ANN hybrids Proceedings of the 8th international conference on Text, Speech and Dialogue, (310-317)
  66. Graves A, Fernández S and Schmidhuber J Bidirectional LSTM networks for improved phoneme classification and recognition Proceedings of the 15th international conference on Artificial neural networks: formal models and their applications - Volume Part II, (799-804)
  67. Tóth L and Kocsor A Training HMM/ANN hybrid speech recognizers by probabilistic sampling Proceedings of the 15th international conference on Artificial Neural Networks: biological Inspirations - Volume Part I, (597-603)
  68. Kim M, Park J, Joo Y and Lee H Design of t–s fuzzy classifier via linear matrix inequality approach Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part I, (406-415)
  69. Ketabdar H, Bourlard H and Bengio S Hierarchical multi-stream posterior based speech recognition system Proceedings of the Second international conference on Machine Learning for Multimodal Interaction, (294-306)
  70. Graves A and Schmidhuber J (2005). 2005 Special Issue, Neural Networks, 18:5-6, (602-610), Online publication date: 1-Jun-2005.
  71. Gangashetty S, Sekhar C and Yegnanarayana B Spotting multilingual consonant-vowel units of speech using neural network models Proceedings of the 3rd international conference on Non-Linear Analyses and Algorithms for Speech Processing, (303-317)
  72. Martín-Iglesias D, Bernal-Chaves J, Peláez-Moreno C, Gallardo-Antolín A and Díaz-de-María F A speech recognizer based on multiclass SVMs with HMM-Guided segmentation Proceedings of the 3rd international conference on Non-Linear Analyses and Algorithms for Speech Processing, (257-266)
  73. Petek B Predictive connectionist approach to speech recognition Nonlinear Speech Modeling and Applications, (219-243)
  74. Magimai-Doss M and Bourlard H On the adequacy of baseform pronunciations and pronunciation variants Proceedings of the First international conference on Machine Learning for Multimodal Interaction, (209-222)
  75. Hagen A and Neto J HMM/MLP hybrid speech recognizer for the Portuguese telephone SpeechDat corpus Proceedings of the 6th international conference on Computational processing of the Portuguese language, (126-134)
  76. Stolcke A, Coccaro N, Bates R, Taylor P, Van Ess-Dykema C, Ries K, Shriberg E, Jurafsky D, Martin R and Meteer M (2000). Dialogue act modeling for automatic tagging and recognition of conversational speech, Computational Linguistics, 26:3, (339-373), Online publication date: 1-Sep-2000.
  77. Palmer D and Hearst M (1997). Adaptive multilingual sentence boundary disambiguation, Computational Linguistics, 23:2, (241-267), Online publication date: 1-Jun-1997.
  78. Lazzaro J and Wawrzynek J (1997). Speech Recognition Experiments with Silicon Auditory Models, Analog Integrated Circuits and Signal Processing, 13:1-2, (37-51), Online publication date: 1-May-1997.
  79. Fragnière E, van Schaik A and Vittoz E (1997). Design of an Analogue VLSI Model of an Active Cochlea, Analog Integrated Circuits and Signal Processing, 13:1-2, (19-35), Online publication date: 1-May-1997.
  80. Brodley C and Smyth P (1997). Applying classification algorithms in practice, Statistics and Computing, 7:1, (45-56), Online publication date: 1-Jan-1997.
  81. Wand M, Koutník J and Schmidhuber J Lipreading with long short-term memory 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (6115-6119)
  82. Tüske Z, Irie K, Schlüter R and Ney H Investigation on log-linear interpolation of multi-domain neural network language model 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (6005-6009)
Contributors
  • Institut Dalle Molle D'intelligence Artificielle Perceptive
  • International Computer Science Institute
Please enable JavaScript to view thecomments powered by Disqus.

Recommendations