Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

Lane Compression: A Lightweight Lossless Compression Method for Machine Learning on Embedded Systems

Published: 18 March 2021 Publication History

Abstract

This article presents Lane Compression, a lightweight lossless compression technique for machine learning that is based on a detailed study of the statistical properties of machine learning data. The proposed technique profiles machine learning data gathered ahead of run-time and partitions values bit-wise into different lanes with more distinctive statistical characteristics. Then the most appropriate compression technique is chosen for each lane out of a small number of low-cost compression techniques. Lane Compression’s compute and memory requirements are very low and yet it achieves a compression rate comparable to or better than Huffman coding. We evaluate and analyse Lane Compression on a wide range of machine learning networks for both inference and re-training. We also demonstrate the profiling prior to run-time and the ability to configure the hardware based on the profiling guarantee robust performance across different models and datasets. Hardware implementations are described and the scheme’s simplicity makes it suitable for compressing both on-chip and off-chip traffic.

References

[1]
Ziad Asghar and Jeff Gehlhaar. 2019. 2019 Snapdragon 865 5G AI Deep Dive. Retrieved from https://www.qualcomm.com/media/documents/files/2019-snapdragon-865-5g-ai-deep-dive-ziad-asghar-jeff-gehlhaar.pdf.
[2]
Chaim Baskin, Brian Chmiel, Evgenii Zheltonozhskii, Ron Banner, Alex M. Bronstein, and Avi Mendelson. 2019. CAT: Compression-Aware Training for bandwidth reduction. Retrieved from https://arxiv:cs.CV/1909.11481.
[3]
Talal Bonny and Jörg Henkel. 2010. Huffman-based code compression techniques for embedded processors. ACM Trans. Des. Autom. Electron. Syst. 15, 4, Article 31 (Oct. 2010), 37 pages.
[4]
Lukas Cavigelli, Georg Rutishauser, and Luca Benini. 2019. EBPC: Extended bit-plane compression for deep neural network inference and training accelerators. IEEE J. Emerg. Select. Top. Circ. Syst. 9, 4 (Dec. 2019), 723--734.
[5]
Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circ. 52, 1 (Jan. 2017), 127--138.
[6]
Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze. 2019. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Select. Top. Circ. Syst. 9, 2 (June 2019), 292--308.
[7]
Soumith Chintala. 2016. Word-level language modeling RNN. Retrieved from https://github.com/pytorch/examples/tree/master/word_language_model.
[8]
Yoojin Choi, Mostafa El-Khamy, and Jungwon Lee. 2020. Universal deep neural network compression. IEEE J. Select. Top. Signal Process. 14, 4 (2020), 1--1.
[9]
Patrick Damme, Dirk Habich, Juliana Hildebrandt, and Wolfgang Lehner. 2017. Lightweight data compression algorithms: An experimental survey (experiments and analyses). In Proceedings of the International Conference on Extending Database Technology (EDBT’17).
[10]
Alberto Delmas, Sayeh Sharify, Patrick Judd, Milos Nikolic, and Andreas Moshovos. 2018. DPRed: Making typical activation values matter in deep learning computing. Retrieved from https://arxiv:1804.06732.
[11]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248--255.
[12]
Lei Deng, Guoqi Li, Song Han, Luping Shi, and Yuan Xie. 2020. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc. IEEE 108, 4 (2020), 1--48.
[13]
Fabrice Devaux. 2019. The true processing in memory accelerator. In Proceedings of the 31st IEEE Hot Chips Symposium (HCS’09). 1--24.
[14]
Angelo Garofalo, Manuele Rusci, Francesco Conti, Davide Rossi, and Luca Benini. 2019. PULP-NN: A computing library for quantized neural network inference at the edge on RISC-V-based parallel ultra low power clusters. In Proceedings of the 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS’19). 33--36.
[15]
Georgios Georgiadis. 2018. Accelerating convolutional neural networks via activation map compression. Retrieved from https://arxiv:1812.04056.
[16]
Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C. Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz. 2010. Understanding sources of inefficiency in general-purpose chips. In Proceedings of the 37th Annual International Symposium on Computer Architecture. ACM, New York, NY, 37--47.
[17]
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, Piscataway, NJ, 243--254.
[18]
Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization and Huffman coding. CoRR abs/1510.00149.
[19]
Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning both weights and connections for efficient neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems. MIT Press, Cambridge, MA, 1135--1143.
[20]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).
[21]
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. Retrieved from https://arxiv:1704.04861.
[22]
David A. Huffman. 1952. A method for the construction of minimum-redundancy codes. Proc. IRE 40, 9 (Sep. 1952), 1098--1101.
[23]
Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360.
[24]
Animesh Jain, Amar Phanishayee, Jason Mars, Lingjia Tang, and Gennady Pekhimenko. 2018. Gist: Efficient data encoding for deep neural network training. In Proceedings of the ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA’18). 776--789.
[25]
Norman P. Jouppi et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, New York, NY.
[26]
Sangil Jung, Changyong Son, Seohyung Lee, Jinwoo Son, Jae-Joon Han, Youngjun Kwak, Sung Ju Hwang, and Changkyu Choi. 2019. Learning to quantize deep networks by optimizing quantization intervals with task loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 4345--4354.
[27]
Wonkyung Jung, Daejin Jung, Byeongho Kim, Sunjung Lee, Wonjong Rhee, and Jung Ho Ahn. 2018. Restructuring Batch Normalization to Accelerate CNN Training. Retrieved from https://arxiv:1807.01702.
[28]
Hyunjun Kim. 2016. SqueezeNet v1.1. Retrieved from https://github.com/DeepScale/SqueezeNet/tree/master/SqueezeNet_v1.1.
[29]
Jungrae Kim, Michael Sullivan, Esha Choukse, and Mattan Erez. 2016. Bit-plane compression: Transforming data for better compression in many-core architectures. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, Piscataway, NJ, 329--340.
[30]
Morten Kjelsø, Mark Gooch, and Simon Jones. 1996. Design and performance of a main memory hardware data compressor. In Proceedings of 22nd Euromicro Conference. Beyond 2000: Hardware and Software Design Strategies. 423--430.
[31]
Jong Hwan Ko, Duckhwan Kim, Taesik Na, Jaeha Kung, and Saibal Mukhopadhyay. 2017. Adaptive weight compression for memory-efficient neural networks. In Proceedings of the Conference on Design, Automation, and Test in Europe. European Design and Automation Association, 199--204.
[32]
Saluka Kodituwakku and U. S. Amarasinghe. 2010. Comparison of lossless data compression algorithms for text data. Indian J. Comput. Sci. Eng. 1 (12 2010).
[33]
Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. University of Toronto. Retrieved from https://www.cs.toronto.edu/ kriz/learning-features-2009-TR.pdf.
[34]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems. Curran Associates Inc., 1097--1105.
[35]
Yann Lecun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (Nov. 1998), 2278--2324.
[36]
Arm Ltd. 2019. Arm Ethos-N series processors. Retrieved from https://developer.arm.com/ip-products/processors/machine-learning/arm-ethos-n.
[37]
Partha Maji, Daniel Bates, Alex Chadwick, and Robert Mullins. 2017. ADaPT: Optimizing CNN inference on IoT and mobile devices using approximately separable 1-D kernels. In Proceedings of the 1st International Conference on Internet of Things and Machine Learning. ACM, New York, NY, Article 43, 12 pages.
[38]
Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. 2016. Pointer sentinel mixture models. Retrieved from https://arxiv:1609.07843.
[39]
Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. 2018. Mixed precision training. In Proceedings of the International Conference on Learning Representations.
[40]
Daisuke Miyashita, Edward H. Lee, and Boris Murmann. 2016. Convolutional Neural Networks using Logarithmic Data Representation. Retrieved from https://arxiv:1603.01025.
[41]
Miloc Nikolic, Mostafa Mahmoud, Yiren Zhao, Robert Mullins, and Andreas Moshovos. 2019. Characterizing sources of ineffectual computations in deep learning networks. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’19). 165--176.
[42]
Nvidia. 2019. Deep Learning Performance. Retrieved from https://docs.nvidia.com/deeplearning/sdk/pdf/Deep-Learning-Performance-Guide.pdf.
[43]
Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, New York, NY, 27--40.
[44]
Richard Clark Pasco. 1976. Source Coding Algorithms for Fast Data Compression. Ph.D. Dissertation.
[45]
Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. ACM, New York, NY, 377--388.
[46]
Minsoo Rhu, Mike O’Connor, Niladrish Chatterjee, Jeff Pool, and Stephen W. Keckler. 2017. Compressing DMA engine: Leveraging activation sparsity for training deep neural networks. Retrieved from https://arxiv:1705.01626.
[47]
Mark A. Roth and Scott J. Van Horn. 1993. Database compression. SIGMOD Rec. 22, 3 (Sept. 1993), 31--39.
[48]
Amir Said. 2004. Introduction to Arithmetic Coding—Theory and Practice. Technical Report HPL-2004-76. Imaging Systems Laboratory, HP Laboratories, Palo Alto, CA.
[49]
Claude E. Shannon. 1948. A mathematical theory of communication. Bell Syst. Techn. J. 27, 3 (1948), 379--423.
[50]
Giuseppe Tagliavini, Stefan Mach, Davide Rossi, Andrea Marongiu, and Luca Benin. 2018. A transprecision floating-point platform for ultra-low power computing. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’18). 1051--1056.
[51]
Richard Wilson Vuduc. 2003. Automatic Performance Tuning of Sparse Matrix Kernels. Ph.D. Dissertation. AAI3121741.
[52]
Ying Wang, Huawei Li, and Xiaowei Li. 2018. A case of on-chip memory subsystem design for low-power CNN accelerators. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 37, 10 (Oct. 2018), 1971--1984.
[53]
Terry A. Welch. 1984. A technique for high-performance data compression. Computer 17, 6 (June 1984), 8--19.
[54]
Jiwei Yang, Xu Shen, Jun Xing, Xinmei Tian, Houqiang Li, Bing Deng, Jianqiang Huang, and Xian-sheng Hua. 2019. Quantization networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19).
[55]
Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jeff Setter, Jing Pu, Ankita Nayak, Steven Bell, Kaidi Cao, Heonjae Ha, Priyanka Raina, Christos Kozyrakis, and Mark Horowitz. 2020. Interstellar: Using Halide’s Scheduling Language to Analyze DNN Accelerators. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’20). Association for Computing Machinery, New York, NY, 369--383.
[56]
Amir Yazdanbakhsh, Choungki Song, Jacob Sacks, Pejman Lotfi-Kamran, Hadi Esmaeilzadeh, and Nam Sung Kim. 2018. In-DRAM near-data approximate acceleration for GPUs. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (PACT’18). Association for Computing Machinery, New York, NY, Article 34, 14 pages.
[57]
Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, and Gang Hua. 2018. LQ-Nets: Learned quantization for highly accurate and compact deep neural networks. In Proceedings of the European Conference on Computer Vision (ECCV’18), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 373--390.
[58]
Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. Retrieved from https://arxiv:1606.06160.

Cited By

View all
  • (2023)LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded Computing PlatformsProceedings of the 21st ACM Conference on Embedded Networked Sensor Systems10.1145/3625687.3625804(138-151)Online publication date: 12-Nov-2023
  • (2022)Vibration Edge Computing in Maritime IoTACM Transactions on Internet of Things10.1145/34847173:1(1-18)Online publication date: 28-Feb-2022

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 20, Issue 2
March 2021
230 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3446664
  • Editor:
  • Tulika Mitra
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 18 March 2021
Accepted: 01 October 2020
Revised: 01 August 2020
Received: 01 June 2020
Published in TECS Volume 20, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ASIC
  2. Machine learning
  3. deep neural networks

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1,065
  • Downloads (Last 6 weeks)154
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded Computing PlatformsProceedings of the 21st ACM Conference on Embedded Networked Sensor Systems10.1145/3625687.3625804(138-151)Online publication date: 12-Nov-2023
  • (2022)Vibration Edge Computing in Maritime IoTACM Transactions on Internet of Things10.1145/34847173:1(1-18)Online publication date: 28-Feb-2022

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media