MEPAD: A Memory-Efficient Parallelized Direct Convolution Algorithm for Deep Neural Networks

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14802))

Included in the following conference series:

European Conference on Parallel Processing

736 Accesses

Abstract

Deep Convolutional Neural Networks (CNNs) have been successfully used for processing images, videos, sounds, and more generic sensor data for detecting objects, patterns, and events. In this work, we propose MEPAD, a memory-efficient parallelized direct convolution algorithm for CNNs. We compare MEPAD with several approaches for implementing the convolution proposed in the literature, by optimally mapping them on two implementations of the same SIMD target architectures. By taking as use cases the VGG-16 and TinyYOLOv2 CNNs, we focus on optimizing the memory behavior and energy consumption of the algorithm in each layer of the CNNs and show that MEPAD can achieve a reduction of up to 85% in the energy-delay product (EDP) when compared to alternative approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Skipping CNN Convolutions Through Efficient Memoization

Lightweight Convolutional Neural Networks Framework for Really Small TinyML Devices

A Unified and Energy-Efficient Depthwise Separable Convolution Accelerator

References

Anderson, A., Vasudevan, A., Keane, C., Gregg, D.: High-performance low-memory lowering: GEMM-based algorithms for DNN convolution. In: SBAC-PAD 2020, pp. 99–106 (2020). https://doi.org/10.1109/SBAC-PAD49847.2020.00024
Chellapilla, K., Puri, S., Simard, P.: High-performance convolutional neural networks for document processing. In: IWFHR 2006, pp. 99–106 (2006)
Google Scholar
Chen, G., et al.: 16.1 A 340mV-to-0.9V 20.2Tb/s source-synchronous hybrid packet/circuit-switched 16$\times $16 network-on-chip in 22nm tri-gate CMOS. In: ISSCC 2014, pp. 276–277 (2014). https://doi.org/10.1109/ISSCC.2014.6757432
Cho, M., Brand, D.: MEC: memory-efficient convolution for deep neural network. In: ICML 2017, vol. 70, pp. 815–824 (2017)
Google Scholar
Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.S.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990). https://doi.org/10.1145/77626.79170
Article MathSciNet Google Scholar
Erdem, A., Silvano, C., Boesch, T., Ornstein, A.C., Singh, S.P., Desoli, G.: Runtime design space exploration and mapping of DCNNs for the ultra-low-power Orlando SoC. ACM Trans. Archit. Code Optim. 17(2) (2020). https://doi.org/10.1145/3379933
Fiorin, L., Jongerius, R., Vermij, E., van Lunteren, J., Hagleitner, C.: Near-memory acceleration for radio astronomy. IEEE Trans. Parallel Distrib. Syst. 29(1), 115–128 (2018). https://doi.org/10.1109/TPDS.2017.2748580
Kim, M.S., Del Barrio, A.A., Kim, H., Bagherzadeh, N.: The effects of approximate multiplication on convolutional neural networks. IEEE Trans. Emerging Top. Comput. 10(2), 904–916 (2022). https://doi.org/10.1109/TETC.2021.3050989
Article Google Scholar
Kim, S., et al.: Processing-in-memory in high bandwidth memory (PIM-HBM) architecture with energy-efficient and low latency channels for high bandwidth system. In: EPEPS 2019, pp. 1–3 (2019). https://doi.org/10.1109/EPEPS47316.2019.193209
Kwon, H., Chatarasi, P., Pellauer, M., Parashar, A., Sarkar, V., Krishna, T.: Understanding reuse, performance, and hardware cost of DNN dataflow: a data-centric approach. In: MICRO 2019, pp. 754–768 (2019). https://doi.org/10.1145/3352460.3358252
Low, T.M., Igual, F.D., Smith, T.M., Quintana-Orti, E.S.: Analytical modeling is enough for high-performance BLIS. ACM Trans. Math. Softw. 43(2) (2016). https://doi.org/10.1145/2925987
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR 2017, pp. 6517–6525 (2017). https://doi.org/10.1109/CVPR.2017.690
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR 2015 (2015). http://arxiv.org/abs/1409.1556
Steiner, L., Jung, M., Prado, F.S., Bykov, K., Wehn, N.: DRAMSys4.0: an open-source simulation framework for in-depth DRAM analyses. Int. J. Parallel Prog. 50, 217–242 (2022). https://doi.org/10.1007/s10766-022-00727-4
Thoziyoor, S., Ahn, J.H., Monchiero, M., Brockman, J.B., Jouppi, N.P.: A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. In: ISCA 2008, pp. 51–62 (2008). https://doi.org/10.1109/ISCA.2008.16
Zhang, J., Franchetti, F., Low, T.M.: High performance zero-memory overhead direct convolutions. CoRR abs/1809.10170 (2018). http://arxiv.org/abs/1809.10170

Download references

Acknowledgement

This work has been supported by the Spoke 1 on Future HPC of the Italian Research Center on High-Performance Computing, Big Data and Quantum Computing (ICSC) funded by MUR Mission 4 - Next Generation EU.

Author information

Authors and Affiliations

Dipartimento di Elettronica e Informazione, Politecnico di Milano, Milano, Italy
Leandro Fiorin & Cristina Silvano

Authors

Leandro Fiorin
View author publications
You can also search for this author in PubMed Google Scholar
Cristina Silvano
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leandro Fiorin .

Editor information

Editors and Affiliations

University Carlos III of Madrid, Madrid, Spain
Jesus Carretero
University of Oregon, Eugene, OR, USA
Sameer Shende
University Carlos III of Madrid, Madrid, Spain
Javier Garcia-Blas
TU Wien, Vienna, Austria
Ivona Brandic
Universidad Complutense de Madrid, Madrid, Spain
Katzalin Olcoz
Université Grenoble Alpes, Saint Martin d'Hères, France
Martin Schreiber

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fiorin, L., Silvano, C. (2024). MEPAD: A Memory-Efficient Parallelized Direct Convolution Algorithm for Deep Neural Networks. In: Carretero, J., Shende, S., Garcia-Blas, J., Brandic, I., Olcoz, K., Schreiber, M. (eds) Euro-Par 2024: Parallel Processing. Euro-Par 2024. Lecture Notes in Computer Science, vol 14802. Springer, Cham. https://doi.org/10.1007/978-3-031-69766-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-69766-1_12
Published: 26 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-69765-4
Online ISBN: 978-3-031-69766-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics