research-article

Machine learning on FPGAs to face the IoT revolution

Authors:

Anand Ramachandran,

Chuanhao Zhuge,

Deming ChenAuthors Info & Claims

ICCAD '17: Proceedings of the 36th International Conference on Computer-Aided Design

Pages 819 - 826

Published: 13 November 2017 Publication History

Abstract

FPGAs have been rapidly adopted for acceleration of Deep Neural Networks (DNNs) with improved latency and energy efficiency compared to CPU and GPU-based implementations. High-level synthesis (HLS) is an effective design flow for DNNs due to improved productivity, debugging, and design space exploration ability. However, optimizing large neural networks under resource constraints for FPGAs is still a key challenge. In this paper, we present a series of effective design techniques for implementing DNNs on FPGAs with high performance and energy efficiency. These include the use of configurable DNN IPs, performance and resource modeling, resource allocation across DNN layers, and DNN reduction and re-training. We showcase several design solutions including Long-term Recurrent Convolution Network (LRCN) for video captioning, Inception module for FaceNet face recognition, as well as Long Short-Term Memory (LSTM) for sound recognition. These and other similar DNN solutions are ideal implementations to be deployed in vision or sound based IoT applications.

References

[1]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NIPS.

Digital Library

[2]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. In CVPR, 2015.

[3]

Jeffrey Donahue et al. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.

[4]

Dzmitry Bahdanau and ohters. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.

[5]

Yufei Ma et al. Optimizing loop operation and dataflow in fpga acceleration of deep convolutional neural networks. In FPGA, 2017.

Digital Library

[6]

Song Han et al. Ese: Efficient speech recognition engine with sparse lstm on fpga. In FPGA, 2017.

Digital Library

[7]

Jason Cong et al. High-level synthesis for fpgas: From prototyping to deployment. IEEE Tran. on Computer-Aided Design of Integrated Circuits and Systems, 2011.

Digital Library

[8]

Xilinx. UltraFast High-Level Productivity Design Methodology Guide.

[9]

Christian Szegedy et al. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI, 2017.

[10]

Jayavardhana Gubbi et al. Internet of things (iot): A vision, architectural elements, and future directions. Future generation computer systems, 2013.

Digital Library

[11]

Rafiullah Khan et al. Future internet: the internet of things architecture, possible applications and key challenges. In FIT, pages 257--260. IEEE, 2012.

Digital Library

[12]

M.A. Alsheikh et al. Machine learning in wireless sensor networks: Algorithms, strategies, and applications. IEEE Commun. Surveys Tuts., 16(4):1996--2018, 2014.

[13]

Luigi Atzori, Antonio Iera, and Giacomo Morabito. The internet of things: A survey. Computer networks, 54(15):2787--2805, 2010.

Digital Library

[14]

Chen Zhang et al. Optimizing fpga-based accelerator design for deep convolutional neural networks. In FPGA, 2015.

Digital Library

[15]

Jiantao Qiu et al. Going deeper with embedded fpga platform for convolutional neural network. In FPGA, 2016.

Digital Library

[16]

Naveen Suda et al. Throughput-optimized opencl-based fpga accelerator for large-scale convolutional neural networks. In FPGA, 2016.

Digital Library

[17]

Maurice Peemen et al. Memory-centric accelerator design for convolutional neural networks. In ICCD, 2013.

[18]

Ritchie Zhao et al. Accelerating binarized convolutional neural networks with software-programmable fpgas. In FPGA, 2017.

Digital Library

[19]

Yaman Umuroglu et al. Finn: A framework for fast, scalable binarized neural network inference. In FPGA. ACM, 2017.

Digital Library

[20]

Shawn Hershey et al. Cnn architectures for large-scale audio classification. arXiv preprint arXiv:1609.09430, 2016.

[21]

Talal Ahmed, Momin Uppal, and Abubakr Muhammad. Improving efficiency and reliability of gunshot detection systems. In ICASSP, 2013.

[22]

Giuseppe Valenzise et al. Scream and gunshot detection and localization for audio-surveillance systems. In AVSS, 2007.

Digital Library

[23]

Di He et al. Using approximated auditory roughness as a pre-filtering feature for human screaming and affective speech aed.

[24]

Nicolas Vasilache et al. Fast convolutional nets with fbfft: A gpu performance evaluation. arXiv preprint arXiv:1412.7580, 2014.

[25]

Andrew Lavin and Scott Gray. Fast algorithms for convolutional neural networks. In CVPR, 2016.

[26]

Wei Zuo et al. A polyhedral-based systemc modeling and generation framework for effective low-power design space exploration. In ICCAD, 2015.

Digital Library

[27]

Wei Zuo et al. Accurate high-level modeling and automated hardware/software co-design for effective soc design space exploration. In DAC, 2017.

Digital Library

[28]

U. Bondhugula et al. Pluto: A practical and fully automatic polyhedral program optimization system. In PLDI, 2008.

[29]

Song Han et al. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR, 2015.

[30]

Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Low precision arithmetic for deep learning. CoRR, abs/1412.7024, 2014.

[31]

Suyog Gupta et al. Deep learning with limited numerical precision. In Proc. of the 32Nd Int. Conf. on Machine Learning, ICML, 2015.

Digital Library

[32]

Darryl Dexu Lin and Sachin S. Talathi. Overcoming challenges in fixed point training of deep convolutional networks. CoRR, abs/1607.02241, 2016.

[33]

Yao Fu et al. Deep learning with int8 optimization on xilinx devices. White Paper.

[34]

Philipp Gysel, Mohammad Motamedi, and Soheil Ghiasi. Hardware-oriented approximation of convolutional neural networks. CoRR'16.

[35]

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR'14.

[36]

Alex Graves and Navdeep Jaitly. Towards end-to-end speech recognition with recurrent neural networks. In ICML, 2014.

Digital Library

[37]

Xiaofan Zhang et al. High-performance video content recognition with long-term recurrent convolutional network for fpga. In FPL, 2017.

[38]

Jort F Gemmeke et al. Audio set: An ontology and human-labeled dataset for audio events. In IEEE ICASSP, 2017.

Cited By

Hoffpauir KSimmons JSchmidt NPittala RBriggs IMakani SJararweh Y(2023)A Survey on Edge Intelligence and Lightweight Machine Learning Support for Future Applications and ServicesJournal of Data and Information Quality10.1145/358175915:2(1-30)Online publication date: 25-Jan-2023
https://dl.acm.org/doi/10.1145/3581759
Hao CZhang XLi YHuang SXiong JRupnow KHwu WChen D(2019)FPGA/DNN Co-DesignProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3317829(1-6)Online publication date: 2-Jun-2019
https://dl.acm.org/doi/10.1145/3316781.3317829
Chen YHe JZhang XHao CChen DBazargan KNeuendorffer S(2019)Cloud-DNNProceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3289602.3293915(73-82)Online publication date: 20-Feb-2019
https://dl.acm.org/doi/10.1145/3289602.3293915
Show More Cited By

Recommendations

Machine learning on FPGAs to face the IoT revolution
ICCAD '17: Proceedings of the 36th International Conference on Computer-Aided Design

FPGAs have been rapidly adopted for acceleration of Deep Neural Networks (DNNs) with improved latency and energy efficiency compared to CPU and GPU-based implementations. High-level synthesis (HLS) is an effective design flow for DNNs due to improved ...
Machine learning on FPGAs to face the IoT revolution
2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
FPGAs have been rapidly adopted for acceleration of Deep Neural Networks (DNNs) with improved latency and energy efficiency compared to CPU and GPU-based implementations. High-level synthesis (HLS) is an effective design flow for DNNs due to improved ...
Machine learning on FPGAs to face the IoT revolution
2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
FPGAs have been rapidly adopted for acceleration of Deep Neural Networks (DNNs) with improved latency and energy efficiency compared to CPU and GPU-based implementations. High-level synthesis (HLS) is an effective design flow for DNNs due to improved ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICCAD '17: Proceedings of the 36th International Conference on Computer-Aided Design

November 2017

1077 pages

Conference Chair:
Sri Parameswaran
GENERAL CHAIR

Sponsors

CEDA: Council on Electronic Design Automation
SIGDA: ACM Special Interest Group on Design Automation
IEEE-CAS: Circuits & Systems

In-Cooperation

IEEE-EDS: Electronic Devices Society

Publisher

IEEE Press

Publication History

Published: 13 November 2017

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICCAD '17

Sponsor:

CEDA
SIGDA
IEEE-CAS

ICCAD '17: IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN

November 13 - 16, 2017

California, Irvine

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
242
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 13 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hoffpauir KSimmons JSchmidt NPittala RBriggs IMakani SJararweh Y(2023)A Survey on Edge Intelligence and Lightweight Machine Learning Support for Future Applications and ServicesJournal of Data and Information Quality10.1145/358175915:2(1-30)Online publication date: 25-Jan-2023
https://dl.acm.org/doi/10.1145/3581759
Hao CZhang XLi YHuang SXiong JRupnow KHwu WChen D(2019)FPGA/DNN Co-DesignProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3317829(1-6)Online publication date: 2-Jun-2019
https://dl.acm.org/doi/10.1145/3316781.3317829
Chen YHe JZhang XHao CChen DBazargan KNeuendorffer S(2019)Cloud-DNNProceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3289602.3293915(73-82)Online publication date: 20-Feb-2019
https://dl.acm.org/doi/10.1145/3289602.3293915
Liu XKim DWu CChen DHu S(2018)Resource and data optimization for hardware implementation of deep neural networks targeting FPGA-based edge devicesProceedings of the 20th System Level Interconnect Prediction Workshop10.1145/3225209.3225214(1-8)Online publication date: 23-Jun-2018
https://dl.acm.org/doi/10.1145/3225209.3225214
Zhuge CLiu XZhang XGummadi SXiong JChen DChen DHomayoun HTaskin B(2018)Face Recognition with Hybrid Efficient Convolution Algorithms on FPGAsProceedings of the 2018 Great Lakes Symposium on VLSI10.1145/3194554.3194597(123-128)Online publication date: 30-May-2018
https://dl.acm.org/doi/10.1145/3194554.3194597
Li YPark JAlian MYuan YQu ZPan PWang RSchwing AEsmaeilzadeh HKim NOskin MInoue K(2018)A network-centric hardware/algorithm co-design to accelerate distributed training of deep neural networksProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00023(175-188)Online publication date: 20-Oct-2018
https://dl.acm.org/doi/10.1109/MICRO.2018.00023

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents