Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3394885.3431512acmconferencesArticle/Chapter ViewAbstractPublication PagesaspdacConference Proceedingsconference-collections
research-article

Puncturing the memory wall: Joint optimization of network compression with approximate memory for ASR application

Published: 29 January 2021 Publication History

Abstract

The automatic speech recognition (ASR) system is becoming increasingly irreplaceable in smart speech interaction applications. Nonetheless, these applications confront the memory wall when embedded in the energy and memory constrained Internet of Things devices. Therefore, it is extremely challenging but imperative to design a memory-saving and energy-saving ASR system. This paper proposes a joint-optimized scheme of network compression with approximate memory for the economical ASR system. At the algorithm level, this work presents block-based pruning and quantization with error model (BPQE), an optimized compression framework including a novel pruning technique coordinated with low-precision quantization and the approximate memory scheme. The BPQE compressed recurrent neural network (RNN) model comes with an ultra-high compression rate and finegrained structured pattern that reduce the amount of memory access immensely. At the hardware level, this work presents an ASR-adapted incremental retraining method to further obtain optimal power saving. This retraining method stimulates the utility of the approximate memory scheme, while maintaining considerable accuracy. According to the experiment results, the proposed joint-optimized scheme achieves 58.6% power saving and 40x memory saving with a phone error rate of 20%.

References

[1]
Price, Michael, James Glass, and Anantha P. Chandrakasan. "A low-power speech recognizer and voice activity detector using deep neural networks." IEEE Journal of Solid-State Circuits 53.1 (2018): 66--75.
[2]
Han, Song, et al. "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding." ICLR 2016.
[3]
A. Ren, T., et al. "Admm-nn: an algorithm-hardware co-design framework of dnns using alternating direction methods of multipliers," in ASPLOS, 2019.
[4]
Lin S, Ma X, Ye S, et al. Toward Extremely Low Bit and Lossless Accuracy in DNNs with Progressive ADMM [J]. 2019.
[5]
S. Wang, Z. Li, C. Ding, B. Yuan, et al. "Clstm: Enabling efficient lstm using structured compression techniques on fpgas," in FPGA. ACM, 2018, pp. 11--20.
[6]
Yang, Lita, and Boris Murmann. "SRAM voltage scaling for energy-efficient convolutional neural networks." ISQED. IEEE, 2017.
[7]
Worek, Brian, et al. "Enabling Approximate Storage through Lossy Media Data Compression." Great Lakes Symposium on VLSI (GLSVLSI). 2019.
[8]
Wu, Shang-Lin, et al. "A 0.5-v 28-nm 256-kb mini-array based 6t SRAM with vtrip-tracking write-assist." IEEE Transactions on Circuits and Systems I: Regular Papers 64.7 (2017): 1791--1802.
[9]
Giraldo, Juan Sebastian P., and Marian Verhelst. "Laika: A 5uW programmable LSTM accelerator for always-on keyword spotting in 65nm CMOS." IEEE 44th European Solid State Circuits Conference (ESSCIRC). IEEE, 2018.
[10]
J. Chung, Ç. Gulçehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," in Proc. Of NIPS, 2014.
[11]
Song Han, Huizi Mao, and William J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In International Conference on Learning Representations (ICLR), 2016.
[12]
Miao, Yajie, et al. "EESEN: End to end speech recognition using deep RNN models and WFST based decoding." Automatic Speech Recognition and Understan ding (ASRU), 2015 IEEE.
[13]
Dong, P., Wang, S. et al. RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition. arXiv preprint arXiv:2002.11474. (2020)
[14]
Zhou, S, et al. "Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients." arXiv preprint arXiv:1606.06160 (2016).
[15]
Benoit Jacob, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In IEEE CVPR, pages 2704--2713, 2018.
[16]
Ruihao Gong, et al. Differentiable soft quantization: Bridging full-precision and low-bit neural networks. In IEEE ICCV, pages 4851--4860. IEEE, 2019.
[17]
Cheng Gong, et al. μl2q: An ultra-low loss quantization method for DNN compression. In IJCNN, pages 1--8. IEEE, 2019.
[18]
Sangil Jung, et al. Learning to quantize deep networks by optimizing quantization intervals with task loss. In IEEE CVPR, 2019.
[19]
Xuanyi Dong and Yi Yang. Network pruning via transformable architecture search. In Neural Information Processing Systems (NeurIPS), 2019.
[20]
Zhuangwei Zhuang, et al. Discrimination-aware channel pruning for deep neural networks. In Neural Information Processing Systems (NeurIPS), 2018.
[21]
Yang He, et al. Filter pruning via geometric median for deep convolutional neural networks acceleration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4340--4349, 2019.
[22]
Wei Wen, et al. Learning structured sparsity in deep neural networks. In neural information processing systems (NeurIPS), pages 2074--2082, 2016.
[23]
Zhou, Aojun, et al. "Incremental network quantization: Towards lossless cnns with low-precision weights." arXiv preprint arXiv:1702.03044 (2017).
[24]
Ravanelli, Mirco, Titouan Parcollet, and Yoshua Bengio. "The pytorch-kaldi speech recognition toolkit." ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019.
[25]
Garofolo, John S., et al. "DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1." STIN 93 (1993): 27403.
[26]
S. Cao, et al, "Efficient and effective sparse lstm on fpga with bank balanced sparsity," in FPGA. ACM, 2019, pp. 63--72.

Cited By

View all
  • (2023)A Selective Bit Dropping and Encoding Co-Strategy in Image Processing for Low-Power Design in DRAM and SRAMIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2023.323440213:1(48-57)Online publication date: Mar-2023

Index Terms

  1. Puncturing the memory wall: Joint optimization of network compression with approximate memory for ASR application

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ASPDAC '21: Proceedings of the 26th Asia and South Pacific Design Automation Conference
    January 2021
    930 pages
    ISBN:9781450379991
    DOI:10.1145/3394885
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 January 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Memory wall
    2. approximate memory
    3. automatic speech recognition
    4. joint optimization
    5. network compression

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • National Key R&D Program of China
    • National Natural Science Foundation of China
    • Beijing Innovation Center for Future Chips

    Conference

    ASPDAC '21
    Sponsor:

    Acceptance Rates

    ASPDAC '21 Paper Acceptance Rate 111 of 368 submissions, 30%;
    Overall Acceptance Rate 466 of 1,454 submissions, 32%

    Upcoming Conference

    ASPDAC '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)9
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)A Selective Bit Dropping and Encoding Co-Strategy in Image Processing for Low-Power Design in DRAM and SRAMIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2023.323440213:1(48-57)Online publication date: Mar-2023

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media