Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3199700.3199810acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article

Machine learning on FPGAs to face the IoT revolution

Published: 13 November 2017 Publication History

Abstract

FPGAs have been rapidly adopted for acceleration of Deep Neural Networks (DNNs) with improved latency and energy efficiency compared to CPU and GPU-based implementations. High-level synthesis (HLS) is an effective design flow for DNNs due to improved productivity, debugging, and design space exploration ability. However, optimizing large neural networks under resource constraints for FPGAs is still a key challenge. In this paper, we present a series of effective design techniques for implementing DNNs on FPGAs with high performance and energy efficiency. These include the use of configurable DNN IPs, performance and resource modeling, resource allocation across DNN layers, and DNN reduction and re-training. We showcase several design solutions including Long-term Recurrent Convolution Network (LRCN) for video captioning, Inception module for FaceNet face recognition, as well as Long Short-Term Memory (LSTM) for sound recognition. These and other similar DNN solutions are ideal implementations to be deployed in vision or sound based IoT applications.

References

[1]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NIPS.
[2]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. In CVPR, 2015.
[3]
Jeffrey Donahue et al. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
[4]
Dzmitry Bahdanau and ohters. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
[5]
Yufei Ma et al. Optimizing loop operation and dataflow in fpga acceleration of deep convolutional neural networks. In FPGA, 2017.
[6]
Song Han et al. Ese: Efficient speech recognition engine with sparse lstm on fpga. In FPGA, 2017.
[7]
Jason Cong et al. High-level synthesis for fpgas: From prototyping to deployment. IEEE Tran. on Computer-Aided Design of Integrated Circuits and Systems, 2011.
[8]
Xilinx. UltraFast High-Level Productivity Design Methodology Guide.
[9]
Christian Szegedy et al. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, 2017.
[10]
Jayavardhana Gubbi et al. Internet of things (iot): A vision, architectural elements, and future directions. Future generation computer systems, 2013.
[11]
Rafiullah Khan et al. Future internet: the internet of things architecture, possible applications and key challenges. In FIT, pages 257--260. IEEE, 2012.
[12]
M.A. Alsheikh et al. Machine learning in wireless sensor networks: Algorithms, strategies, and applications. IEEE Commun. Surveys Tuts., 16(4):1996--2018, 2014.
[13]
Luigi Atzori, Antonio Iera, and Giacomo Morabito. The internet of things: A survey. Computer networks, 54(15):2787--2805, 2010.
[14]
Chen Zhang et al. Optimizing fpga-based accelerator design for deep convolutional neural networks. In FPGA, 2015.
[15]
Jiantao Qiu et al. Going deeper with embedded fpga platform for convolutional neural network. In FPGA, 2016.
[16]
Naveen Suda et al. Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks. In FPGA, 2016.
[17]
Maurice Peemen et al. Memory-centric accelerator design for convolutional neural networks. In ICCD, 2013.
[18]
Ritchie Zhao et al. Accelerating binarized convolutional neural networks with software-programmable fpgas. In FPGA, 2017.
[19]
Yaman Umuroglu et al. Finn: A framework for fast, scalable binarized neural network inference. In FPGA. ACM, 2017.
[20]
Shawn Hershey et al. Cnn architectures for large-scale audio classification. arXiv preprint arXiv:1609.09430, 2016.
[21]
Talal Ahmed, Momin Uppal, and Abubakr Muhammad. Improving efficiency and reliability of gunshot detection systems. In ICASSP, 2013.
[22]
Giuseppe Valenzise et al. Scream and gunshot detection and localization for audio-surveillance systems. In AVSS, 2007.
[23]
Di He et al. Using approximated auditory roughness as a pre-filtering feature for human screaming and affective speech aed.
[24]
Nicolas Vasilache et al. Fast convolutional nets with fbfft: A gpu performance evaluation. arXiv preprint arXiv:1412.7580, 2014.
[25]
Andrew Lavin and Scott Gray. Fast algorithms for convolutional neural networks. In CVPR, 2016.
[26]
Wei Zuo et al. A polyhedral-based systemc modeling and generation framework for effective low-power design space exploration. In ICCAD, 2015.
[27]
Wei Zuo et al. Accurate high-level modeling and automated hardware/software co-design for effective soc design space exploration. In DAC, 2017.
[28]
U. Bondhugula et al. Pluto: A practical and fully automatic polyhedral program optimization system. In PLDI, 2008.
[29]
Song Han et al. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR, 2015.
[30]
Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Low precision arithmetic for deep learning. CoRR, abs/1412.7024, 2014.
[31]
Suyog Gupta et al. Deep learning with limited numerical precision. In Proc. of the 32Nd Int. Conf. on Machine Learning, ICML, 2015.
[32]
Darryl Dexu Lin and Sachin S. Talathi. Overcoming challenges in fixed point training of deep convolutional networks. CoRR, abs/1607.02241, 2016.
[33]
Yao Fu et al. Deep learning with int8 optimization on xilinx devices. White Paper.
[34]
Philipp Gysel, Mohammad Motamedi, and Soheil Ghiasi. Hardware-oriented approximation of convolutional neural networks. CoRR'16.
[35]
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR'14.
[36]
Alex Graves and Navdeep Jaitly. Towards end-to-end speech recognition with recurrent neural networks. In ICML, 2014.
[37]
Xiaofan Zhang et al. High-performance video content recognition with long-term recurrent convolutional network for fpga. In FPL, 2017.
[38]
Jort F Gemmeke et al. Audio set: An ontology and human-labeled dataset for audio events. In IEEE ICASSP, 2017.

Cited By

View all
  • (2023)A Survey on Edge Intelligence and Lightweight Machine Learning Support for Future Applications and ServicesJournal of Data and Information Quality10.1145/358175915:2(1-30)Online publication date: 25-Jan-2023
  • (2019)FPGA/DNN Co-DesignProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3317829(1-6)Online publication date: 2-Jun-2019
  • (2019)Cloud-DNNProceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3289602.3293915(73-82)Online publication date: 20-Feb-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICCAD '17: Proceedings of the 36th International Conference on Computer-Aided Design
November 2017
1077 pages

Sponsors

In-Cooperation

  • IEEE-EDS: Electronic Devices Society

Publisher

IEEE Press

Publication History

Published: 13 November 2017

Check for updates

Author Tags

  1. FPGAs
  2. internet of things
  3. machine learning

Qualifiers

  • Research-article

Conference

ICCAD '17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)A Survey on Edge Intelligence and Lightweight Machine Learning Support for Future Applications and ServicesJournal of Data and Information Quality10.1145/358175915:2(1-30)Online publication date: 25-Jan-2023
  • (2019)FPGA/DNN Co-DesignProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3317829(1-6)Online publication date: 2-Jun-2019
  • (2019)Cloud-DNNProceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3289602.3293915(73-82)Online publication date: 20-Feb-2019
  • (2018)Resource and data optimization for hardware implementation of deep neural networks targeting FPGA-based edge devicesProceedings of the 20th System Level Interconnect Prediction Workshop10.1145/3225209.3225214(1-8)Online publication date: 23-Jun-2018
  • (2018)Face Recognition with Hybrid Efficient Convolution Algorithms on FPGAsProceedings of the 2018 Great Lakes Symposium on VLSI10.1145/3194554.3194597(123-128)Online publication date: 30-May-2018
  • (2018)A network-centric hardware/algorithm co-design to accelerate distributed training of deep neural networksProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00023(175-188)Online publication date: 20-Oct-2018

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media