research-article

Open access

Lane Compression: A Lightweight Lossless Compression Method for Machine Learning on Embedded Systems

Authors:

Robert MullinsAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 20, Issue 2

Article No.: 16, Pages 1 - 26

https://doi.org/10.1145/3431815

Published: 18 March 2021 Publication History

All formats PDF

Abstract

This article presents Lane Compression, a lightweight lossless compression technique for machine learning that is based on a detailed study of the statistical properties of machine learning data. The proposed technique profiles machine learning data gathered ahead of run-time and partitions values bit-wise into different lanes with more distinctive statistical characteristics. Then the most appropriate compression technique is chosen for each lane out of a small number of low-cost compression techniques. Lane Compression’s compute and memory requirements are very low and yet it achieves a compression rate comparable to or better than Huffman coding. We evaluate and analyse Lane Compression on a wide range of machine learning networks for both inference and re-training. We also demonstrate the profiling prior to run-time and the ability to configure the hardware based on the profiling guarantee robust performance across different models and datasets. Hardware implementations are described and the scheme’s simplicity makes it suitable for compressing both on-chip and off-chip traffic.

References

[1]

Ziad Asghar and Jeff Gehlhaar. 2019. 2019 Snapdragon 865 5G AI Deep Dive. Retrieved from https://www.qualcomm.com/media/documents/files/2019-snapdragon-865-5g-ai-deep-dive-ziad-asghar-jeff-gehlhaar.pdf.

[2]

Chaim Baskin, Brian Chmiel, Evgenii Zheltonozhskii, Ron Banner, Alex M. Bronstein, and Avi Mendelson. 2019. CAT: Compression-Aware Training for bandwidth reduction. Retrieved from https://arxiv:cs.CV/1909.11481.

[3]

Talal Bonny and Jörg Henkel. 2010. Huffman-based code compression techniques for embedded processors. ACM Trans. Des. Autom. Electron. Syst. 15, 4, Article 31 (Oct. 2010), 37 pages.

Digital Library

[4]

Lukas Cavigelli, Georg Rutishauser, and Luca Benini. 2019. EBPC: Extended bit-plane compression for deep neural network inference and training accelerators. IEEE J. Emerg. Select. Top. Circ. Syst. 9, 4 (Dec. 2019), 723--734.

[5]

Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circ. 52, 1 (Jan. 2017), 127--138.

[6]

Yu-Hsin Chen, Tien-Ju Yang, Joel Emer, and Vivienne Sze. 2019. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices. IEEE J. Emerg. Select. Top. Circ. Syst. 9, 2 (June 2019), 292--308.

[7]

Soumith Chintala. 2016. Word-level language modeling RNN. Retrieved from https://github.com/pytorch/examples/tree/master/word_language_model.

[8]

Yoojin Choi, Mostafa El-Khamy, and Jungwon Lee. 2020. Universal deep neural network compression. IEEE J. Select. Top. Signal Process. 14, 4 (2020), 1--1.

[9]

Patrick Damme, Dirk Habich, Juliana Hildebrandt, and Wolfgang Lehner. 2017. Lightweight data compression algorithms: An experimental survey (experiments and analyses). In Proceedings of the International Conference on Extending Database Technology (EDBT’17).

[10]

Alberto Delmas, Sayeh Sharify, Patrick Judd, Milos Nikolic, and Andreas Moshovos. 2018. DPRed: Making typical activation values matter in deep learning computing. Retrieved from https://arxiv:1804.06732.

[11]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248--255.

[12]

Lei Deng, Guoqi Li, Song Han, Luping Shi, and Yuan Xie. 2020. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc. IEEE 108, 4 (2020), 1--48.

[13]

Fabrice Devaux. 2019. The true processing in memory accelerator. In Proceedings of the 31st IEEE Hot Chips Symposium (HCS’09). 1--24.

[14]

Angelo Garofalo, Manuele Rusci, Francesco Conti, Davide Rossi, and Luca Benini. 2019. PULP-NN: A computing library for quantized neural network inference at the edge on RISC-V-based parallel ultra low power clusters. In Proceedings of the 26th IEEE International Conference on Electronics, Circuits and Systems (ICECS’19). 33--36.

[15]

Georgios Georgiadis. 2018. Accelerating convolutional neural networks via activation map compression. Retrieved from https://arxiv:1812.04056.

[16]

Rehan Hameed, Wajahat Qadeer, Megan Wachs, Omid Azizi, Alex Solomatnikov, Benjamin C. Lee, Stephen Richardson, Christos Kozyrakis, and Mark Horowitz. 2010. Understanding sources of inefficiency in general-purpose chips. In Proceedings of the 37th Annual International Symposium on Computer Architecture. ACM, New York, NY, 37--47.

Digital Library

[17]

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, Piscataway, NJ, 243--254.

Digital Library

[18]

Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization and Huffman coding. CoRR abs/1510.00149.

[19]

Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning both weights and connections for efficient neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems. MIT Press, Cambridge, MA, 1135--1143.

Digital Library

[20]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).

[21]

Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. Retrieved from https://arxiv:1704.04861.

[22]

David A. Huffman. 1952. A method for the construction of minimum-redundancy codes. Proc. IRE 40, 9 (Sep. 1952), 1098--1101.

[23]

Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360.

[24]

Animesh Jain, Amar Phanishayee, Jason Mars, Lingjia Tang, and Gennady Pekhimenko. 2018. Gist: Efficient data encoding for deep neural network training. In Proceedings of the ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA’18). 776--789.

Digital Library

[25]

Norman P. Jouppi et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, New York, NY.

Digital Library

[26]

Sangil Jung, Changyong Son, Seohyung Lee, Jinwoo Son, Jae-Joon Han, Youngjun Kwak, Sung Ju Hwang, and Changkyu Choi. 2019. Learning to quantize deep networks by optimizing quantization intervals with task loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 4345--4354.

[27]

Wonkyung Jung, Daejin Jung, Byeongho Kim, Sunjung Lee, Wonjong Rhee, and Jung Ho Ahn. 2018. Restructuring Batch Normalization to Accelerate CNN Training. Retrieved from https://arxiv:1807.01702.

[28]

Hyunjun Kim. 2016. SqueezeNet v1.1. Retrieved from https://github.com/DeepScale/SqueezeNet/tree/master/SqueezeNet_v1.1.

[29]

Jungrae Kim, Michael Sullivan, Esha Choukse, and Mattan Erez. 2016. Bit-plane compression: Transforming data for better compression in many-core architectures. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, Piscataway, NJ, 329--340.

Digital Library

[30]

Morten Kjelsø, Mark Gooch, and Simon Jones. 1996. Design and performance of a main memory hardware data compressor. In Proceedings of 22nd Euromicro Conference. Beyond 2000: Hardware and Software Design Strategies. 423--430.

[31]

Jong Hwan Ko, Duckhwan Kim, Taesik Na, Jaeha Kung, and Saibal Mukhopadhyay. 2017. Adaptive weight compression for memory-efficient neural networks. In Proceedings of the Conference on Design, Automation, and Test in Europe. European Design and Automation Association, 199--204.

Digital Library

[32]

Saluka Kodituwakku and U. S. Amarasinghe. 2010. Comparison of lossless data compression algorithms for text data. Indian J. Comput. Sci. Eng. 1 (12 2010).

[33]

Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. University of Toronto. Retrieved from https://www.cs.toronto.edu/ kriz/learning-features-2009-TR.pdf.

[34]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems. Curran Associates Inc., 1097--1105.

Digital Library

[35]

Yann Lecun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (Nov. 1998), 2278--2324.

[36]

Arm Ltd. 2019. Arm Ethos-N series processors. Retrieved from https://developer.arm.com/ip-products/processors/machine-learning/arm-ethos-n.

[37]

Partha Maji, Daniel Bates, Alex Chadwick, and Robert Mullins. 2017. ADaPT: Optimizing CNN inference on IoT and mobile devices using approximately separable 1-D kernels. In Proceedings of the 1st International Conference on Internet of Things and Machine Learning. ACM, New York, NY, Article 43, 12 pages.

Digital Library

[38]

Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. 2016. Pointer sentinel mixture models. Retrieved from https://arxiv:1609.07843.

[39]

Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. 2018. Mixed precision training. In Proceedings of the International Conference on Learning Representations.

[40]

Daisuke Miyashita, Edward H. Lee, and Boris Murmann. 2016. Convolutional Neural Networks using Logarithmic Data Representation. Retrieved from https://arxiv:1603.01025.

[41]

Miloc Nikolic, Mostafa Mahmoud, Yiren Zhao, Robert Mullins, and Andreas Moshovos. 2019. Characterizing sources of ineffectual computations in deep learning networks. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’19). 165--176.

[42]

Nvidia. 2019. Deep Learning Performance. Retrieved from https://docs.nvidia.com/deeplearning/sdk/pdf/Deep-Learning-Performance-Guide.pdf.

[43]

Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, New York, NY, 27--40.

Digital Library

[44]

Richard Clark Pasco. 1976. Source Coding Algorithms for Fast Data Compression. Ph.D. Dissertation.

[45]

Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. ACM, New York, NY, 377--388.

Digital Library

[46]

Minsoo Rhu, Mike O’Connor, Niladrish Chatterjee, Jeff Pool, and Stephen W. Keckler. 2017. Compressing DMA engine: Leveraging activation sparsity for training deep neural networks. Retrieved from https://arxiv:1705.01626.

[47]

Mark A. Roth and Scott J. Van Horn. 1993. Database compression. SIGMOD Rec. 22, 3 (Sept. 1993), 31--39.

Digital Library

[48]

Amir Said. 2004. Introduction to Arithmetic Coding—Theory and Practice. Technical Report HPL-2004-76. Imaging Systems Laboratory, HP Laboratories, Palo Alto, CA.

[49]

Claude E. Shannon. 1948. A mathematical theory of communication. Bell Syst. Techn. J. 27, 3 (1948), 379--423.

[50]

Giuseppe Tagliavini, Stefan Mach, Davide Rossi, Andrea Marongiu, and Luca Benin. 2018. A transprecision floating-point platform for ultra-low power computing. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’18). 1051--1056.

[51]

Richard Wilson Vuduc. 2003. Automatic Performance Tuning of Sparse Matrix Kernels. Ph.D. Dissertation. AAI3121741.

Digital Library

[52]

Ying Wang, Huawei Li, and Xiaowei Li. 2018. A case of on-chip memory subsystem design for low-power CNN accelerators. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 37, 10 (Oct. 2018), 1971--1984.

Digital Library

[53]

Terry A. Welch. 1984. A technique for high-performance data compression. Computer 17, 6 (June 1984), 8--19.

Digital Library

[54]

Jiwei Yang, Xu Shen, Jun Xing, Xinmei Tian, Houqiang Li, Bing Deng, Jianqiang Huang, and Xian-sheng Hua. 2019. Quantization networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19).

[55]

Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jeff Setter, Jing Pu, Ankita Nayak, Steven Bell, Kaidi Cao, Heonjae Ha, Priyanka Raina, Christos Kozyrakis, and Mark Horowitz. 2020. Interstellar: Using Halide’s Scheduling Language to Analyze DNN Accelerators. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’20). Association for Computing Machinery, New York, NY, 369--383.

Digital Library

[56]

Amir Yazdanbakhsh, Choungki Song, Jacob Sacks, Pejman Lotfi-Kamran, Hadi Esmaeilzadeh, and Nam Sung Kim. 2018. In-DRAM near-data approximate acceleration for GPUs. In Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (PACT’18). Association for Computing Machinery, New York, NY, Article 34, 14 pages.

Digital Library

[57]

Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, and Gang Hua. 2018. LQ-Nets: Learned quantization for highly accurate and compact deep neural networks. In Proceedings of the European Conference on Computer Vision (ECCV’18), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer International Publishing, Cham, 373--390.

[58]

Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. Retrieved from https://arxiv:1606.06160.

Cited By

Kwon YChauhan JJia HVenieris SMascolo CEskicioglu RHuang PPatwari N(2023)LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded Computing PlatformsProceedings of the 21st ACM Conference on Embedded Networked Sensor Systems10.1145/3625687.3625804(138-151)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3625687.3625804
Michala AVourganas ICoraddu A(2022)Vibration Edge Computing in Maritime IoTACM Transactions on Internet of Things10.1145/34847173:1(1-18)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3484717

Index Terms

Lane Compression: A Lightweight Lossless Compression Method for Machine Learning on Embedded Systems

Recommendations

Lossless Text Compression Using Recurrent Neural Networks
Abstract
Lossless Data compression is the process of reducing the size or the number of bits required to represent data, and Arithmetic coding is one of the popular lossless text compression techniques. This project focuses on lossless data compression ...
Block based learned image compression
Abstract
Efficient image compression is very important for storage, retrieval, processing and transmission of image contents. The objective is to find a striking balance between compression ratio and the distortion in image. Recently, there has been a rise ...
Predicting and Optimizing Image Compression
MM '16: Proceedings of the 24th ACM international conference on Multimedia

Image compression is a core task for mobile devices, social media and cloud storage backend services. Key evaluation criteria for compression are: the quality of the output, the compression ratio achieved and the computational time (and energy) ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 20, Issue 2

March 2021

230 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/3446664

Editor:
Tulika Mitra
National University of Singapore, Singapore

Issue’s Table of Contents

Copyright © 2021 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 18 March 2021

Accepted: 01 October 2020

Revised: 01 August 2020

Received: 01 June 2020

Published in TECS Volume 20, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Samsung Advanced Institute of Technology

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
3,285
Total Downloads

Downloads (Last 12 months)1,065
Downloads (Last 6 weeks)154

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kwon YChauhan JJia HVenieris SMascolo CEskicioglu RHuang PPatwari N(2023)LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded Computing PlatformsProceedings of the 21st ACM Conference on Embedded Networked Sensor Systems10.1145/3625687.3625804(138-151)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3625687.3625804
Michala AVourganas ICoraddu A(2022)Vibration Edge Computing in Maritime IoTACM Transactions on Internet of Things10.1145/34847173:1(1-18)Online publication date: 28-Feb-2022
https://dl.acm.org/doi/10.1145/3484717

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents