Abstract
Keyword Spotting (KWS) is a significant branch of Automatic Speech Recognition (ASR), which has been widely used in edge computing devices. The goal of KWS is to provide high accuracy at a low false alarm rate (FAR) while reducing the costs of memory, computation, and latency. However, limited resources are challenging for KWS applications on edge computing devices. Lightweight models and structures for deep learning have achieved good results in the KWS branch while maintaining high accuracy, low computational costs, and low latency. In this paper, we present a new Convolutional Recurrent Neural Network (CRNN) architecture named EdgeCRNN for edge computing devices. EdgeCRNN is based on a depthwise separable convolution (DSC) and residual structure, and it uses a feature enhancement method. The experimental results on Google Speech Commands Dataset depict that EdgeCRNN can test 11.1 audio data per second on Raspberry Pi 3B+, which are 2.2 times that of Tpool2. Compared with Tpool2, the accuracy of EdgeCRNN reaches 98.05% whilst its performance is also competitive.
This paper is supported by the National Natural Sciences Foundation of China (No. 61572028), National Cryptography Development Fund (No. MMJJ20180206), the Project of Science and Technology of Guangzhou (No. 201802010044) and Guangdong Basic and Applied Basic Research Foundation (No. 2019A1515011797).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wilpon, J., Miller, L., Modi, P.: Improvements and applications for key word recognition using hidden markov modeling techniques. In: 1991 International Conference on Acoustics, Speech, and Signal Processing, pp. 309–312. IEEE (1991)
Silaghi, M.C.: Spotting subsequences matching an hmm using the average observation probability criteria with application to keyword spotting. In: AAAI, pp. 1118–1123 (2005)
Chen, G., Parada, C., Heigold, G.: Small-footprint keyword spotting using deep neural networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4087–4091. IEEE (2014)
Benelli, G., Meoni, G., Fanucci, L.: A low power keyword spotting algorithm for memory constrained embedded systems. In: 2018 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC), pp. 267–272. IEEE (2018)
Dinelli, G., Meoni, G., Rapuano, E., Benelli, G., Fanucci, L.: An FPGA-based hardware accelerator for cnns using on-chip memories only: Design and benchmarking with intel movidius neural compute stick. Int. J. Reconfig. Comput. 2019, 13 p. (2019)
Tang, R., Wang, W., Tu, Z., Lin, J.: An experimental analysis of the power consumption of convolutional neural networks for keyword spotting. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5479–5483. IEEE (2018)
Sainath, T., Parada, C.: Convolutional neural networks for small-footprint keyword spotting (2015)
Sun, M., Raju, A., Tucker, G., et al.: Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting. In: 2016 IEEE Spoken Language Technology Workshop (SLT), pp. 474–480. IEEE (2016)
Arik, S.O., Kliegl, M., Child, R., et al.: Convolutional recurrent neural networks for small-footprint keyword spotting. arXiv preprint arXiv:1703.05390 (2017)
Warden, P.: Speech commands: a dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209 (2018)
Tucker, G., Wu, M., Sun, M., Panchapagesan, S., Fu, G., Vitaladevuni, S.: Model compression applied to small-footprint keyword spotting. In: INTERSPEECH, pp. 1878–1882 (2016)
Zhou, Y., Ebrahimi, S., Arık, S.Ö., et al.: Resource-efficient neural architect. arXiv preprint arXiv:1806.07912 (2018)
Anderson, A., Su, J., Dahyot, R., Gregg, D.: Performance-oriented neural architecture search. arXiv preprint arXiv:2001.02976 (2020)
Zhang, Y., Suda, N., Lai, L., Chandra, V.: Hello edge: keyword spotting on microcontrollers. arXiv preprint arXiv:1711.07128 (2017)
Coucke, A., Chlieh, M., Gisselbrecht, T., Leroy, D., Poumeyrol, M., Lavril, T.: Efficient keyword spotting using dilated convolutions and gating. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6351–6355. IEEE (2019)
McFee, B., et al.: librosa: audio and music signal analysis in python. In: Proceedings of the 14th python in science conference. vol. 8 (2015)
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)
Howard, A.G., Zhu, M., Chen, B., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Tang, R., Lin, J.: Deep residual learning for small-footprint keyword spotting. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5484–5488. IEEE (2018)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Cho, K., Van Merriënboer, B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Zeng, M., Xiao, N.: Effective combination of densenet and bilstm for keyword spotting. IEEE Access 7, 10767–10775 (2019)
Ma, N., Zhang, X., Zheng, H.T., Sun, J.: Shufflenet v2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Wei, Y., Gong, Z., Yang, S., Ye, K., Wen, Y. (2020). A New Lightweight CRNN Model for Keyword Spotting with Edge Computing Devices. In: Chen, X., Yan, H., Yan, Q., Zhang, X. (eds) Machine Learning for Cyber Security. ML4CS 2020. Lecture Notes in Computer Science(), vol 12486. Springer, Cham. https://doi.org/10.1007/978-3-030-62223-7_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-62223-7_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62222-0
Online ISBN: 978-3-030-62223-7
eBook Packages: Computer ScienceComputer Science (R0)