Nothing Special   »   [go: up one dir, main page]

Skip to main content

MEPAD: A Memory-Efficient Parallelized Direct Convolution Algorithm for Deep Neural Networks

  • Conference paper
  • First Online:
Euro-Par 2024: Parallel Processing (Euro-Par 2024)

Abstract

Deep Convolutional Neural Networks (CNNs) have been successfully used for processing images, videos, sounds, and more generic sensor data for detecting objects, patterns, and events. In this work, we propose MEPAD, a memory-efficient parallelized direct convolution algorithm for CNNs. We compare MEPAD with several approaches for implementing the convolution proposed in the literature, by optimally mapping them on two implementations of the same SIMD target architectures. By taking as use cases the VGG-16 and TinyYOLOv2 CNNs, we focus on optimizing the memory behavior and energy consumption of the algorithm in each layer of the CNNs and show that MEPAD can achieve a reduction of up to 85% in the energy-delay product (EDP) when compared to alternative approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Anderson, A., Vasudevan, A., Keane, C., Gregg, D.: High-performance low-memory lowering: GEMM-based algorithms for DNN convolution. In: SBAC-PAD 2020, pp. 99–106 (2020). https://doi.org/10.1109/SBAC-PAD49847.2020.00024

  2. Chellapilla, K., Puri, S., Simard, P.: High-performance convolutional neural networks for document processing. In: IWFHR 2006, pp. 99–106 (2006)

    Google Scholar 

  3. Chen, G., et al.: 16.1 A 340mV-to-0.9V 20.2Tb/s source-synchronous hybrid packet/circuit-switched 16\(\times \)16 network-on-chip in 22nm tri-gate CMOS. In: ISSCC 2014, pp. 276–277 (2014). https://doi.org/10.1109/ISSCC.2014.6757432

  4. Cho, M., Brand, D.: MEC: memory-efficient convolution for deep neural network. In: ICML 2017, vol. 70, pp. 815–824 (2017)

    Google Scholar 

  5. Dongarra, J.J., Du Croz, J., Hammarling, S., Duff, I.S.: A set of level 3 basic linear algebra subprograms. ACM Trans. Math. Softw. 16(1), 1–17 (1990). https://doi.org/10.1145/77626.79170

    Article  MathSciNet  Google Scholar 

  6. Erdem, A., Silvano, C., Boesch, T., Ornstein, A.C., Singh, S.P., Desoli, G.: Runtime design space exploration and mapping of DCNNs for the ultra-low-power Orlando SoC. ACM Trans. Archit. Code Optim. 17(2) (2020). https://doi.org/10.1145/3379933

  7. Fiorin, L., Jongerius, R., Vermij, E., van Lunteren, J., Hagleitner, C.: Near-memory acceleration for radio astronomy. IEEE Trans. Parallel Distrib. Syst. 29(1), 115–128 (2018). https://doi.org/10.1109/TPDS.2017.2748580

  8. Kim, M.S., Del Barrio, A.A., Kim, H., Bagherzadeh, N.: The effects of approximate multiplication on convolutional neural networks. IEEE Trans. Emerging Top. Comput. 10(2), 904–916 (2022). https://doi.org/10.1109/TETC.2021.3050989

    Article  Google Scholar 

  9. Kim, S., et al.: Processing-in-memory in high bandwidth memory (PIM-HBM) architecture with energy-efficient and low latency channels for high bandwidth system. In: EPEPS 2019, pp. 1–3 (2019). https://doi.org/10.1109/EPEPS47316.2019.193209

  10. Kwon, H., Chatarasi, P., Pellauer, M., Parashar, A., Sarkar, V., Krishna, T.: Understanding reuse, performance, and hardware cost of DNN dataflow: a data-centric approach. In: MICRO 2019, pp. 754–768 (2019). https://doi.org/10.1145/3352460.3358252

  11. Low, T.M., Igual, F.D., Smith, T.M., Quintana-Orti, E.S.: Analytical modeling is enough for high-performance BLIS. ACM Trans. Math. Softw. 43(2) (2016). https://doi.org/10.1145/2925987

  12. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: CVPR 2017, pp. 6517–6525 (2017). https://doi.org/10.1109/CVPR.2017.690

  13. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR 2015 (2015). http://arxiv.org/abs/1409.1556

  14. Steiner, L., Jung, M., Prado, F.S., Bykov, K., Wehn, N.: DRAMSys4.0: an open-source simulation framework for in-depth DRAM analyses. Int. J. Parallel Prog. 50, 217–242 (2022). https://doi.org/10.1007/s10766-022-00727-4

  15. Thoziyoor, S., Ahn, J.H., Monchiero, M., Brockman, J.B., Jouppi, N.P.: A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. In: ISCA 2008, pp. 51–62 (2008). https://doi.org/10.1109/ISCA.2008.16

  16. Zhang, J., Franchetti, F., Low, T.M.: High performance zero-memory overhead direct convolutions. CoRR abs/1809.10170 (2018). http://arxiv.org/abs/1809.10170

Download references

Acknowledgement

This work has been supported by the Spoke 1 on Future HPC of the Italian Research Center on High-Performance Computing, Big Data and Quantum Computing (ICSC) funded by MUR Mission 4 - Next Generation EU.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leandro Fiorin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fiorin, L., Silvano, C. (2024). MEPAD: A Memory-Efficient Parallelized Direct Convolution Algorithm for Deep Neural Networks. In: Carretero, J., Shende, S., Garcia-Blas, J., Brandic, I., Olcoz, K., Schreiber, M. (eds) Euro-Par 2024: Parallel Processing. Euro-Par 2024. Lecture Notes in Computer Science, vol 14802. Springer, Cham. https://doi.org/10.1007/978-3-031-69766-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-69766-1_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-69765-4

  • Online ISBN: 978-3-031-69766-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics