Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Learning to Listen... On-Device: Present and future perspectives of on-device ASR

Published: 18 May 2020 Publication History

Abstract

We have reached an important milestone in Automatic Speech Recognition (ASR) technology, with major industrial AI companies, such as Samsung, Google, Apple, and Amazon releasing high-quality ASR models that run completely on-device, e.g., on consumer smartphones. This is the consequence of giant strides in technological advancements: from making commercial grade ASR systems feasible; to large scale cloud deployments; to the present day state-of-the-art models that run on resource constrained devices.

References

[1]
S.J. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, The HTK Book Version 3.4. Cambridge University Press, 2006.
[2]
M. Mohri, F. Pereira, and M. Riley. January 2002. Weighted finite-state transducers in speech recognition, Comput. Speech Lang., vol. 16, no. 1, 69--88. https://doi.org/10.1006/csla.2001.0184
[3]
X. Aubert. An overview of decoding techniques for large vocabulary continuous speech recognition. January 2002. Computer Speech Language, vol. 16, pp. 89--114.
[4]
D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, Q. Cheng, G. Chen, and et al. Deep speech 2: End-to-end speech recognition in English and Mandarin, in Proceedings of the 33rd International Conference on International Conference on Machine Learning, Volume 48, ser. ICML'16. JMLR.org, 2016, 173--182.
[5]
D. S. Park, W. Chan, Y. Zhang, C.-C. Chiu, B. Zoph, E. D. Cubuk, and Q. V. Le. September 2019. SpecAugment: A simple data augmentation method for Automatic Speech Recognition, Interspeech http://dx.doi.org/10.21437/Interspeech.2019--2680
[6]
G. Synnaeve, Q. Xu, J. Kahn, E. Grave, T. Likhomanenko, V. Pratap, A. Sriram, V. Liptchinsky, and R. Collobert. 2019. End-to-end ASR: From supervised to semi-supervised learning with modern architectures, arXivpreprint arXiv:1911.08460.
[7]
K. Kim, K. Lee, D. Gowda, J. Park, S. Kim, E. S. Kim, Y.-Y. Lee, J. Yeo, D. Kim, S. Jung, J. Lee, M. Han, and C. Kim. 2019. Attention based ondevice streaming speech recognition with large speech corpus, ASRU.
[8]
V. Pratap and R. Collobert. 2020. Online speech recognition with wav2letter@anywhere. https:// ai.facebook.com/blog/online-speech-recognitionwith- wav2letteranywhere
[9]
Y. He, T. N. Sainath, R. Prabhavalkar, I. McGraw, R. Alvarez, D. Zhao, D. Rybach, A. Kannan, Y. Wu, R. Pang, Q. Liang, D. Bhatia, Y. Shang-guan, B. Li, G. Pundak, K. C. Sim, T. Bagby, S. Chang, K. Rao, and A. Gruenstein. 2019. Streaming Endto- end Speech Recognition for Mobile Devices, in ICASSP, 6381--6385.
[10]
J. Huang, Y. Zhang, B. Ginsburg, and P. Chitale. 2019. Develop smaller speech recognition models with NVIDIA's NeMo framework. https://devblogs. nvidia.com/develop-smaller-speech-recognitionmodels- with-nvidias-nemo-framework/
[11]
N. Jouppi et al. 2017. In-datacenter performance analysis of a tensor processing unit, in International Symposium on Computer Architecture. ACM.
[12]
C. Wu et al. 2019. Machine learning at Facebook: Understanding inference at the edge, in International Symposium on High Performance Computer Architecture. IEEE.
[13]
Lukasz Dudziak, M.S. Abdelfattah, R.Vipperla, S.Laskaridis, and N.D. Lane. 2019. ShrinkML: End-to-End ASR model compression using reinforcement learning, Interspeech.
[14]
P. Warden, Why are eight bits enough for deep neural networks? 2015. https://petewarden. com/2015/05/23/why-are-eight-bits-enough-fordeep- neural-networks/
[15]
R. Alvarez, R. Prabhavalkar, and A. Bakhtin. 2016. On the efficient representation and execution of deep acoustic models. arXiv:1607.04683.
[16]
E. Säckinger, B. Boser, J. Bromley, Y. LeCun, and L. D. Jackel. March 1992. Application of the ANNA neural network chip to high-speed character recognition, IEEE Transaction on Neural Networks, vol. 3, no. 2, 498--505.
[17]
J. Fowers et al. 2018. A configurable cloud-scale DNN processor for real-time AI, in Proc. International Symposium on Computer Architecture. IEEE.
[18]
L. G. Valiant. 1990. A bridging model for parallel computation, Comm. ACM, vol. 33, no. 8.
[19]
M. Andreessen, "Why software is eating the world," Wall Street Journal, August 2011.
[20]
S. Clebsch, "We Software People are not Worthy: All Hail the Hardware Gods," 2017, keynote talk at ICOOOLPS 2017.
[21]
C. Hu, W. Bao, D. Wang, and F. Liu. April 2019. Dynamic adaptive DNN surgery for inference acceleration on the edge. April 2019. Proceedings of IEEE INFOCOM, 1423--1431.
[22]
H. Li, C. Hu, J. Jiang, Z. Wang, Y. Wen, and W. Zhu. 2019. JALAD: Joint accuracy-and latency-aware deep structure decoupling for edgecloud execution, in International Conference on Parallel and Distributed Systems.
[23]
M. Almeida, S. Laskaridis, I. Leontiadis, S.I. Venieris, and N.D. Lane, EmBench: Quantifying performance variations of deep neural networks across modern commodity devices, in 3rd International Workshop on Deep Learning for Mobile Systems and Applications, ser. EMDL.ACM, 2019, pp. 1--6. http://doi.acm. org/10.1145/3325413.3329793

Cited By

View all
  • (2024)Speech Understanding on Tiny Devices with A Learning CacheProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661886(425-437)Online publication date: 3-Jun-2024
  • (2024)Size and Inference Time Optimized Automatic Speech Recognition ModelApplied Soft Computing and Communication Networks10.1007/978-981-97-2004-0_33(463-473)Online publication date: 28-Jul-2024
  • (2023)Joint Federated Learning and Personalization for on-Device ASR2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)10.1109/ASRU57964.2023.10389738(1-8)Online publication date: 16-Dec-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image GetMobile: Mobile Computing and Communications
GetMobile: Mobile Computing and Communications  Volume 23, Issue 4
December 2019
34 pages
ISSN:2375-0529
EISSN:2375-0537
DOI:10.1145/3400713
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 May 2020
Published in SIGMOBILE-GETMOBILE Volume 23, Issue 4

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Speech Understanding on Tiny Devices with A Learning CacheProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661886(425-437)Online publication date: 3-Jun-2024
  • (2024)Size and Inference Time Optimized Automatic Speech Recognition ModelApplied Soft Computing and Communication Networks10.1007/978-981-97-2004-0_33(463-473)Online publication date: 28-Jul-2024
  • (2023)Joint Federated Learning and Personalization for on-Device ASR2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)10.1109/ASRU57964.2023.10389738(1-8)Online publication date: 16-Dec-2023
  • (2021)OODIn: An Optimised On-Device Inference Framework for Heterogeneous Mobile Devices2021 IEEE International Conference on Smart Computing (SMARTCOMP)10.1109/SMARTCOMP52413.2021.00021(1-8)Online publication date: Aug-2021
  • (2021)How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures2021 IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP52443.2021.00022(93-100)Online publication date: Jul-2021

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media