Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3564121.3564124acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaimlsystemsConference Proceedingsconference-collections
research-article

An hardware accelerator design of Mobile-Net model on FPGA

Published: 16 May 2023 Publication History

Abstract

Domain specific hardware architectures and hardware accelerators have been a vital part of modern system design. Especially for math intensive applications involving tasks related to machine perception, incorporating hardware accelerators that work in tandem with general purpose micro-processors can prove to be energy efficient both at server and edge scenarios. FPGAs, due to their reconfigurability makes it possible to have customized hardware designed as per the computational and memory requirements specific to that application. This work proposes an optimized low latency hardware accelerator implementation of Mobile-net V2 CNN on an FPGA.
This paper presents an implementation of Mobile-net-V2 inference on a Xilinx Ultrascale+ MPSOC platform incorporating solely half precision floating point arithmetic for both parameters and activations of the network. The proposed implementation is also optimized by merging all batch-norm layers with its preceding convolutional layers. For applications which cannot compromise on performance of the algorithm for execution speed and efficiency, an optimized floating point inference is proposed. The current implementation offers an overall performance improvement of at-least 20X with moderate resource utilization with minimal variance in inference latency, as compared to performing inference on the processor alone with almost no degradation in the model accuracy.

References

[1]
[n.d.]. CHai-DNN. https://github.com/Xilinx/chaidnn. Accessed: May, 2021.
[2]
Michaela Blott, Thomas B. Preußer, Nicholas J. Fraser, Giulio Gambardella, Kenneth O’brien, Yaman Umuroglu, Miriam Leeser, and Kees Vissers. 2018. FINN-R: An End-to-End Deep-Learning Framework for Fast Exploration of Quantized Neural Networks. ACM Trans. Reconfigurable Technol. Syst. 11, 3, Article 16 (Dec. 2018), 23 pages. https://doi.org/10.1145/3242897
[3]
Parag S. Chandakkar, Yikang Li, Pak Lun Kevin Ding, and Baoxin Li. 2017. Strategies for Re-Training a Pruned Neural Network in an Edge Computing Paradigm. In 2017 IEEE International Conference on Edge Computing (EDGE). 244–247. https://doi.org/10.1109/IEEE.EDGE.2017.45
[4]
Ravi Teja N.V.S Chappa and Mohamed El-Sharkawy. 2020. Squeeze-and-Excitation SqueezeNext: An Efficient DNN for Hardware Deployment. In 2020 10th Annual Computing and Communication Workshop and Conference (CCWC). 0691–0697. https://doi.org/10.1109/CCWC47524.2020.9031119
[5]
Y. Chen, T. Yang, J. Emer, and V. Sze. 2019. Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9, 2(2019), 292–308. https://doi.org/10.1109/JETCAS.2019.2910232
[6]
J. Deng, W. Dong, R. Socher, L. Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255. https://doi.org/10.1109/CVPR.2009.5206848
[7]
David Gschwend. 2020. ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network. CoRR abs/2005.06892(2020). arxiv:2005.06892https://arxiv.org/abs/2005.06892
[8]
David Gschwend. 2020. ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network. arxiv:2005.06892 [cs.CV]
[9]
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR abs/1704.04861(2017). arxiv:1704.04861http://arxiv.org/abs/1704.04861
[10]
Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. CoRR abs/1502.03167(2015). arxiv:1502.03167http://arxiv.org/abs/1502.03167
[11]
B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko. 2018. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2704–2713. https://doi.org/10.1109/CVPR.2018.00286
[12]
N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P. l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon. 2017. In-datacenter performance analysis of a tensor processing unit. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). 1–12. https://doi.org/10.1145/3079856.3080246
[13]
Seyyed Salar Latifi Oskouei, Hossein Golestani, Matin Hashemi, and Soheil Ghiasi. 2016. CNNdroid: GPU-Accelerated Execution of Trained Deep Convolutional Neural Networks on Android. In Proceedings of the 24th ACM International Conference on Multimedia (Amsterdam, The Netherlands) (MM ’16). Association for Computing Machinery, New York, NY, USA, 1201–1205. https://doi.org/10.1145/2964284.2973801
[14]
Andrew Lavin. 2015. Fast Algorithms for Convolutional Neural Networks. CoRR abs/1509.09308(2015). arxiv:1509.09308http://arxiv.org/abs/1509.09308
[15]
Kwangbae Lee, Hoseung Kim, Hayun Lee, and Dongkun Shin. 2020. Flexible Group-Level Pruning of Deep Neural Networks for On-Device Machine Learning. In 2020 Design, Automation Test in Europe Conference Exhibition (DATE). 79–84. https://doi.org/10.23919/DATE48585.2020.9116287
[16]
Fabian Nasse, Christian Thurau, and Gernot A. Fink. 2009. Face Detection Using GPU-Based Convolutional Neural Networks. In Computer Analysis of Images and Patterns, Xiaoyi Jiang and Nicolai Petkov (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 83–90.
[17]
Sasanka Potluri, Alireza Fasih, Laxminand Kishore Vutukuru, Fadi Al Machot, and Kyandoghere Kyamakya. 2011. CNN based high performance computing for real time image processing on GPU. In Proceedings of the Joint INDS’11 ISTET’11. 1–7. https://doi.org/10.1109/INDS.2011.6024781
[18]
Minh Quoc Hoang, Phong Luu Nguyen, Hong Viet Tran, Hong Quan Nguyen, Vu Thang Nguyen, and Cuong Vo-Le. 2021. FPGA Oriented Compression of DNN Using Layer-Targeted Weights and Activations Quantization. In 2020 IEEE Eighth International Conference on Communications and Electronics (ICCE). 157–162. https://doi.org/10.1109/ICCE48956.2021.9352106
[19]
Justin Sanchez, Adarsh Sawant, Christopher Neff, and Hamed Tabkhi. 2020. AWARE-CNN: Automated Workflow for Application-aware Real-time Edge Acceleration of CNNs. IEEE Internet of Things Journal PP (04 2020), 1–1. https://doi.org/10.1109/JIOT.2020.2990215
[20]
Mark Sandler, Andrew G. Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation. CoRR abs/1801.04381(2018). arxiv:1801.04381http://arxiv.org/abs/1801.04381
[21]
Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh. 2016. From high-level deep neural models to FPGAs. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1–12. https://doi.org/10.1109/MICRO.2016.7783720
[22]
Zhuoran Song, Dongyue Li, Zhezhi He, Xiaoyao Liang, and Li Jiang. 2021. ReRAM-Sharing: Fine-Grained Weight Sharing for ReRAM-Based Deep Neural Network Accelerator. In 2021 IEEE International Symposium on Circuits and Systems (ISCAS). 1–5. https://doi.org/10.1109/ISCAS51556.2021.9401155
[23]
Daniel Strigl, Klaus Kofler, and Stefan Podlipnig. 2010. Performance and Scalability of GPU-Based Convolutional Neural Networks. In 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing. 317–324. https://doi.org/10.1109/PDP.2010.43
[24]
M. Vohra and S. Fasciani. 2019. PYNQ- Torch: a framework to develop PyTorch accelerators on the PYNQ platform. In 2019 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). 1–6. https://doi.org/10.1109/ISSPIT47144.2019.9001806
[25]
Xuechao Wei, Cody Hao Yu, Peng Zhang, Youxiang Chen, Yuxin Wang, Han Hu, Yun Liang, and J. Cong. 2017. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC). 1–6. https://doi.org/10.1145/3061639.3062207
[26]
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2017. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. CoRR abs/1707.01083(2017). arxiv:1707.01083http://arxiv.org/abs/1707.01083
[27]
Wenlai Zhao, Haohuan Fu, Wayne Luk, Teng Yu, Shaojun Wang, Bo Feng, Yuchun Ma, and Guangwen Yang. 2016. F-CNN: An FPGA-based framework for training Convolutional Neural Networks. In 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP). 107–114. https://doi.org/10.1109/ASAP.2016.7760779

Cited By

View all
  • (2023)A real-time and high-performance MobileNet accelerator based on adaptive dataflow scheduling for image classificationJournal of Real-Time Image Processing10.1007/s11554-023-01378-521:1Online publication date: 24-Nov-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
AIMLSystems '22: Proceedings of the Second International Conference on AI-ML Systems
October 2022
209 pages
ISBN:9781450398473
DOI:10.1145/3564121
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 May 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. AI on FPGA
  2. FPGA implementation
  3. Mobile-Net
  4. hardware accelerator

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

AIMLSystems 2022

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)104
  • Downloads (Last 6 weeks)5
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A real-time and high-performance MobileNet accelerator based on adaptive dataflow scheduling for image classificationJournal of Real-Time Image Processing10.1007/s11554-023-01378-521:1Online publication date: 24-Nov-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media