research-article

An hardware accelerator design of Mobile-Net model on FPGA

Authors:

Madhav RaoAuthors Info & Claims

AIMLSystems '22: Proceedings of the Second International Conference on AI-ML Systems

Article No.: 2, Pages 1 - 9

https://doi.org/10.1145/3564121.3564124

Published: 16 May 2023 Publication History

Abstract

Domain specific hardware architectures and hardware accelerators have been a vital part of modern system design. Especially for math intensive applications involving tasks related to machine perception, incorporating hardware accelerators that work in tandem with general purpose micro-processors can prove to be energy efficient both at server and edge scenarios. FPGAs, due to their reconfigurability makes it possible to have customized hardware designed as per the computational and memory requirements specific to that application. This work proposes an optimized low latency hardware accelerator implementation of Mobile-net V2 CNN on an FPGA.

This paper presents an implementation of Mobile-net-V2 inference on a Xilinx Ultrascale+ MPSOC platform incorporating solely half precision floating point arithmetic for both parameters and activations of the network. The proposed implementation is also optimized by merging all batch-norm layers with its preceding convolutional layers. For applications which cannot compromise on performance of the algorithm for execution speed and efficiency, an optimized floating point inference is proposed. The current implementation offers an overall performance improvement of at-least 20X with moderate resource utilization with minimal variance in inference latency, as compared to performing inference on the processor alone with almost no degradation in the model accuracy.

References

[1]

[n.d.]. CHai-DNN. https://github.com/Xilinx/chaidnn. Accessed: May, 2021.

[2]

Michaela Blott, Thomas B. Preußer, Nicholas J. Fraser, Giulio Gambardella, Kenneth O’brien, Yaman Umuroglu, Miriam Leeser, and Kees Vissers. 2018. FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks. ACM Trans. Reconfigurable Technol. Syst. 11, 3, Article 16 (Dec. 2018), 23 pages. https://doi.org/10.1145/3242897

Digital Library

[3]

Parag S. Chandakkar, Yikang Li, Pak Lun Kevin Ding, and Baoxin Li. 2017. Strategies for Re-Training a Pruned Neural Network in an Edge Computing Paradigm. In 2017 IEEE International Conference on Edge Computing (EDGE). 244–247. https://doi.org/10.1109/IEEE.EDGE.2017.45

[4]

Ravi Teja N.V.S Chappa and Mohamed El-Sharkawy. 2020. Squeeze-and-Excitation SqueezeNext: An Efficient DNN for Hardware Deployment. In 2020 10th Annual Computing and Communication Workshop and Conference (CCWC). 0691–0697. https://doi.org/10.1109/CCWC47524.2020.9031119

[5]

Y. Chen, T. Yang, J. Emer, and V. Sze. 2019. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9, 2(2019), 292–308. https://doi.org/10.1109/JETCAS.2019.2910232

[6]

J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255. https://doi.org/10.1109/CVPR.2009.5206848

[7]

David Gschwend. 2020. ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network. CoRR abs/2005.06892(2020). arxiv:2005.06892https://arxiv.org/abs/2005.06892

[8]

David Gschwend. 2020. ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network. arxiv:2005.06892 [cs.CV]

[9]

Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR abs/1704.04861(2017). arxiv:1704.04861http://arxiv.org/abs/1704.04861

[10]

Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. CoRR abs/1502.03167(2015). arxiv:1502.03167http://arxiv.org/abs/1502.03167

[11]

B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko. 2018. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2704–2713. https://doi.org/10.1109/CVPR.2018.00286

[12]

N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P. l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). 1–12. https://doi.org/10.1145/3079856.3080246

Digital Library

[13]

Seyyed Salar Latifi Oskouei, Hossein Golestani, Matin Hashemi, and Soheil Ghiasi. 2016. CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android. In Proceedings of the 24th ACM International Conference on Multimedia (Amsterdam, The Netherlands) (MM ’16). Association for Computing Machinery, New York, NY, USA, 1201–1205. https://doi.org/10.1145/2964284.2973801

Digital Library

[14]

Andrew Lavin. 2015. Fast Algorithms for Convolutional Neural Networks. CoRR abs/1509.09308(2015). arxiv:1509.09308http://arxiv.org/abs/1509.09308

[15]

Kwangbae Lee, Hoseung Kim, Hayun Lee, and Dongkun Shin. 2020. Flexible Group-Level Pruning of Deep Neural Networks for On-Device Machine Learning. In 2020 Design, Automation Test in Europe Conference Exhibition (DATE). 79–84. https://doi.org/10.23919/DATE48585.2020.9116287

[16]

Fabian Nasse, Christian Thurau, and Gernot A. Fink. 2009. Face Detection Using GPU-Based Convolutional Neural Networks. In Computer Analysis of Images and Patterns, Xiaoyi Jiang and Nicolai Petkov (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 83–90.

[17]

Sasanka Potluri, Alireza Fasih, Laxminand Kishore Vutukuru, Fadi Al Machot, and Kyandoghere Kyamakya. 2011. CNN based high performance computing for real time image processing on GPU. In Proceedings of the Joint INDS’11 ISTET’11. 1–7. https://doi.org/10.1109/INDS.2011.6024781

[18]

Minh Quoc Hoang, Phong Luu Nguyen, Hong Viet Tran, Hong Quan Nguyen, Vu Thang Nguyen, and Cuong Vo-Le. 2021. FPGA Oriented Compression of DNN Using Layer-Targeted Weights and Activations Quantization. In 2020 IEEE Eighth International Conference on Communications and Electronics (ICCE). 157–162. https://doi.org/10.1109/ICCE48956.2021.9352106

[19]

Justin Sanchez, Adarsh Sawant, Christopher Neff, and Hamed Tabkhi. 2020. AWARE-CNN: Automated Workflow for Application-aware Real-time Edge Acceleration of CNNs. IEEE Internet of Things Journal PP (04 2020), 1–1. https://doi.org/10.1109/JIOT.2020.2990215

[20]

Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation. CoRR abs/1801.04381(2018). arxiv:1801.04381http://arxiv.org/abs/1801.04381

[21]

Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh. 2016. From high-level deep neural models to FPGAs. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1–12. https://doi.org/10.1109/MICRO.2016.7783720

[22]

Zhuoran Song, Dongyue Li, Zhezhi He, Xiaoyao Liang, and Li Jiang. 2021. ReRAM-Sharing: Fine-Grained Weight Sharing for ReRAM-Based Deep Neural Network Accelerator. In 2021 IEEE International Symposium on Circuits and Systems (ISCAS). 1–5. https://doi.org/10.1109/ISCAS51556.2021.9401155

[23]

Daniel Strigl, Klaus Kofler, and Stefan Podlipnig. 2010. Performance and Scalability of GPU-Based Convolutional Neural Networks. In 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing. 317–324. https://doi.org/10.1109/PDP.2010.43

Digital Library

[24]

M. Vohra and S. Fasciani. 2019. PYNQ- Torch: a framework to develop PyTorch accelerators on the PYNQ platform. In 2019 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). 1–6. https://doi.org/10.1109/ISSPIT47144.2019.9001806

[25]

Xuechao Wei, Cody Hao Yu, Peng Zhang, Youxiang Chen, Yuxin Wang, Han Hu, Yun Liang, and J. Cong. 2017. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC). 1–6. https://doi.org/10.1145/3061639.3062207

Digital Library

[26]

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2017. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. CoRR abs/1707.01083(2017). arxiv:1707.01083http://arxiv.org/abs/1707.01083

[27]

Wenlai Zhao, Haohuan Fu, Wayne Luk, Teng Yu, Shaojun Wang, Bo Feng, Yuchun Ma, and Guangwen Yang. 2016. F-CNN: An FPGA-based framework for training Convolutional Neural Networks. In 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP). 107–114. https://doi.org/10.1109/ASAP.2016.7760779

Cited By

Sang XRuan TLi CLi HYang RLiu Z(2023)A real-time and high-performance MobileNet accelerator based on adaptive dataflow scheduling for image classificationJournal of Real-Time Image Processing10.1007/s11554-023-01378-521:1Online publication date: 24-Nov-2023
https://dl.acm.org/doi/10.1007/s11554-023-01378-5

Index Terms

An hardware accelerator design of Mobile-Net model on FPGA
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Hardware
  1. Electronic design automation
    1. High-level and register-transfer level synthesis
      1. Hardware-software codesign
  2. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

Elliptic Curve Cryptography hardware accelerator for high-performance secure servers

Security threats affecting electronics communications in the current world make necessary the encryption and authentication of every transaction. The increasing levels of security required are leading to an overload of transaction servers due to ...
Hardware Accelerator Design Based on Rough Set Philosophy
Rough Sets and Knowledge Technology
Abstract
This paper presents a design of hardware accelerator for algorithms of rough set theory. A hardware implementation of incremental reduct generation and rule induction is proposed in this paper. Incremental reduct generation algorithm is based on ...
A hardware/software co-design approach to prototype 6G mobile applications inside the GNU Radio SDR Ecosystem using FPGA hardware accelerators
HEART '22: Proceedings of the 12th International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies

The novel communication 6G requires raw data rates of up to 400 Gbit s− 1 in a single Field Programmable Gate Array (FPGA) front-end. For these high data rates, a Software Defined Radio (SDR) on a multi-core processor reaches a performance limit due to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

AIMLSystems '22: Proceedings of the Second International Conference on AI-ML Systems

October 2022

209 pages

ISBN:9781450398473

DOI:10.1145/3564121

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 May 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

AIMLSystems 2022

AIMLSystems 2022: The Second International Conference on AI-ML Systems

October 12 - 15, 2022

Bangalore, India

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
162
Total Downloads

Downloads (Last 12 months)104
Downloads (Last 6 weeks)5

Reflects downloads up to 25 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sang XRuan TLi CLi HYang RLiu Z(2023)A real-time and high-performance MobileNet accelerator based on adaptive dataflow scheduling for image classificationJournal of Real-Time Image Processing10.1007/s11554-023-01378-521:1Online publication date: 24-Nov-2023
https://dl.acm.org/doi/10.1007/s11554-023-01378-5

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents