research-article

Puncturing the memory wall: Joint optimization of network compression with approximate memory for ASR application

Authors:

Huazhong YangAuthors Info & Claims

ASPDAC '21: Proceedings of the 26th Asia and South Pacific Design Automation Conference

Pages 505 - 511

https://doi.org/10.1145/3394885.3431512

Published: 29 January 2021 Publication History

Abstract

The automatic speech recognition (ASR) system is becoming increasingly irreplaceable in smart speech interaction applications. Nonetheless, these applications confront the memory wall when embedded in the energy and memory constrained Internet of Things devices. Therefore, it is extremely challenging but imperative to design a memory-saving and energy-saving ASR system. This paper proposes a joint-optimized scheme of network compression with approximate memory for the economical ASR system. At the algorithm level, this work presents block-based pruning and quantization with error model (BPQE), an optimized compression framework including a novel pruning technique coordinated with low-precision quantization and the approximate memory scheme. The BPQE compressed recurrent neural network (RNN) model comes with an ultra-high compression rate and finegrained structured pattern that reduce the amount of memory access immensely. At the hardware level, this work presents an ASR-adapted incremental retraining method to further obtain optimal power saving. This retraining method stimulates the utility of the approximate memory scheme, while maintaining considerable accuracy. According to the experiment results, the proposed joint-optimized scheme achieves 58.6% power saving and 40x memory saving with a phone error rate of 20%.

References

[1]

Price, Michael, James Glass, and Anantha P. Chandrakasan. "A low-power speech recognizer and voice activity detector using deep neural networks." IEEE Journal of Solid-State Circuits 53.1 (2018): 66--75.

[2]

Han, Song, et al. "Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding." ICLR 2016.

[3]

A. Ren, T., et al. "Admm-nn: an algorithm-hardware co-design framework of dnns using alternating direction methods of multipliers," in ASPLOS, 2019.

[4]

Lin S, Ma X, Ye S, et al. Toward Extremely Low Bit and Lossless Accuracy in DNNs with Progressive ADMM [J]. 2019.

[5]

S. Wang, Z. Li, C. Ding, B. Yuan, et al. "Clstm: Enabling efficient lstm using structured compression techniques on fpgas," in FPGA. ACM, 2018, pp. 11--20.

Digital Library

[6]

Yang, Lita, and Boris Murmann. "SRAM voltage scaling for energy-efficient convolutional neural networks." ISQED. IEEE, 2017.

[7]

Worek, Brian, et al. "Enabling Approximate Storage through Lossy Media Data Compression." Great Lakes Symposium on VLSI (GLSVLSI). 2019.

[8]

Wu, Shang-Lin, et al. "A 0.5-v 28-nm 256-kb mini-array based 6t SRAM with vtrip-tracking write-assist." IEEE Transactions on Circuits and Systems I: Regular Papers 64.7 (2017): 1791--1802.

[9]

Giraldo, Juan Sebastian P., and Marian Verhelst. "Laika: A 5uW programmable LSTM accelerator for always-on keyword spotting in 65nm CMOS." IEEE 44th European Solid State Circuits Conference (ESSCIRC). IEEE, 2018.

[10]

J. Chung, Ç. Gulçehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," in Proc. Of NIPS, 2014.

[11]

Song Han, Huizi Mao, and William J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In International Conference on Learning Representations (ICLR), 2016.

[12]

Miao, Yajie, et al. "EESEN: End to end speech recognition using deep RNN models and WFST based decoding." Automatic Speech Recognition and Understan ding (ASRU), 2015 IEEE.

[13]

Dong, P., Wang, S. et al. RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition. arXiv preprint arXiv:2002.11474. (2020)

[14]

Zhou, S, et al. "Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients." arXiv preprint arXiv:1606.06160 (2016).

[15]

Benoit Jacob, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In IEEE CVPR, pages 2704--2713, 2018.

[16]

Ruihao Gong, et al. Differentiable soft quantization: Bridging full-precision and low-bit neural networks. In IEEE ICCV, pages 4851--4860. IEEE, 2019.

[17]

Cheng Gong, et al. μl2q: An ultra-low loss quantization method for DNN compression. In IJCNN, pages 1--8. IEEE, 2019.

[18]

Sangil Jung, et al. Learning to quantize deep networks by optimizing quantization intervals with task loss. In IEEE CVPR, 2019.

[19]

Xuanyi Dong and Yi Yang. Network pruning via transformable architecture search. In Neural Information Processing Systems (NeurIPS), 2019.

[20]

Zhuangwei Zhuang, et al. Discrimination-aware channel pruning for deep neural networks. In Neural Information Processing Systems (NeurIPS), 2018.

[21]

Yang He, et al. Filter pruning via geometric median for deep convolutional neural networks acceleration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4340--4349, 2019.

[22]

Wei Wen, et al. Learning structured sparsity in deep neural networks. In neural information processing systems (NeurIPS), pages 2074--2082, 2016.

[23]

Zhou, Aojun, et al. "Incremental network quantization: Towards lossless cnns with low-precision weights." arXiv preprint arXiv:1702.03044 (2017).

[24]

Ravanelli, Mirco, Titouan Parcollet, and Yoshua Bengio. "The pytorch-kaldi speech recognition toolkit." ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019.

[25]

Garofolo, John S., et al. "DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1." STIN 93 (1993): 27403.

[26]

S. Cao, et al, "Efficient and effective sparse lstm on fpga with bank balanced sparsity," in FPGA. ACM, 2019, pp. 63--72.

Digital Library

Cited By

Liu MQue HYang XZhang KYu QYan LWang TJin YZhou N(2023)A Selective Bit Dropping and Encoding Co-Strategy in Image Processing for Low-Power Design in DRAM and SRAMIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2023.323440213:1(48-57)Online publication date: Mar-2023
https://doi.org/10.1109/JETCAS.2023.3234402

Index Terms

Puncturing the memory wall: Joint optimization of network compression with approximate memory for ASR application
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
      1. Embedded hardware

Recommendations

Another Trip to the Wall: How Much Will Stacked DRAM Benefit HPC?
MEMSYS '15: Proceedings of the 2015 International Symposium on Memory Systems

First defined two decades ago, the memory wall remains a fundamental limitation to system performance. Recent innovations in 3D-stacking technology enable DRAM devices with much higher bandwidths than traditional DIMMs. The first such products will soon ...
Addressing Memory Wall Problem of Graph Computation in Reconfigurable System
HPCC-CSS-ICESS '15: Proceedings of the 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conf on Embedded Software and Systems

Graph computation problems that exhibit irregular memory access patterns are known to show poor performance on multiprocessor architectures. Although recent studies use FPGA technology to tackle the memory wall problem of graph computation by adopting a ...
A Survey on PCM Lifetime Enhancement Schemes

Phase Change Memory (PCM) is an emerging memory technology that has the capability to address the growing demand for memory capacity and bridge the gap between the main memory and the secondary storage. As a resistive memory, PCM is able to store data ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPDAC '21: Proceedings of the 26th Asia and South Pacific Design Automation Conference

January 2021

930 pages

ISBN:9781450379991

DOI:10.1145/3394885

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGDA: ACM Special Interest Group on Design Automation
IEEE CAS
IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 January 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Key R&D Program of China
National Natural Science Foundation of China
Beijing Innovation Center for Future Chips

Conference

ASPDAC '21

Sponsor:

SIGDA

ASPDAC '21: 26th Asia and South Pacific Design Automation Conference

January 18 - 21, 2021

Tokyo, Japan

Acceptance Rates

ASPDAC '21 Paper Acceptance Rate 111 of 368 submissions, 30%;

Overall Acceptance Rate 466 of 1,454 submissions, 32%

Upcoming Conference

ASPDAC '25

Sponsor:
sigda

30th Asia and South Pacific Design Automation Conference

January 20 - 23, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
117
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)1

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Liu MQue HYang XZhang KYu QYan LWang TJin YZhou N(2023)A Selective Bit Dropping and Encoding Co-Strategy in Image Processing for Low-Power Design in DRAM and SRAMIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2023.323440213:1(48-57)Online publication date: Mar-2023
https://doi.org/10.1109/JETCAS.2023.3234402

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents