research-article

Exploration and Tradeoffs of different Kernels in FPGA Deep Learning Applications

Authors:

Elliott Delaye,

Ashish Sirasao,

Ehsan GhasemiAuthors Info & Claims

ISPD '18: Proceedings of the 2018 International Symposium on Physical Design

Pages 42 - 47

https://doi.org/10.1145/3177540.3177559

Published: 25 March 2018 Publication History

Abstract

In the field of deep learning, efficient computational hardware has come to the forefront of the large scale implementation and deployment of many applications. In the process of designing hardware, various characteristics of hardware platforms have been studied in order to best implement the high computational demand, high memory bandwidth, and flexibility of networks. In addition to design space exploration of kernels, kernel design must be seen in the context of full system architectures or in terms of the combination of deep learning and other types of applications whether video encoding/decoding or analytics, speech recognition, or the multitude of potential applications combining deep learning kernels with tightly integrated coprocessor architectures. Kernel sizes, on-chip and off-chip memories, numeric datatypes and efficient compute architectures all must be merged into optimal design choices for both performing computations with maximum efficiency as well as programmable flexibility.

References

[1]

A. Krizhevsky, et al., Imagenet classification with deep convolutional neural networks, Neural Information Processing Systems 2012

Digital Library

[2]

W. Liu, et al., SSD: Single Shot MultiBox Detector, Proceedings of the European Conference on Computer Vision 2016

[3]

G. Hinton, et al., Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups, IEEE Signal Processing Magazine Vol 29, Issue 6, Nov 2012.

[4]

Y. H. Ng, et al., Beyond short snippets: Deep networks for video classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2015.

[5]

P. C. Woodland, Weight limiting, weight quantisation and generalisation in multi-layer perceptrons, Proceedings of the First IEE International Conference on Artificial Neural Networks, 1989.

[6]

Jian Ouyang et al., SDA: Software-Defined Accelerator for Large-Scale DNN Systems, HotChips 2014

[7]

S. Han et al., ESE: Efficient Speech Recognition Engine for Compressed LSTM on FPGA, International Symposium on Field-Programmable Gate Arrays, 2017

Digital Library

[8]

8-bit Dot Product Acceleration https://www.xilinx.com/support/documentation/white_papers/wp487-int8-acceleration.pdf

[9]

T. Sainath, Towards End-To-End Speech Recognition Using Deep Neural Networks, Invited Talk, International Conference on Machine Learning 2015

[10]

A. Chang, Recurrent Neural Networks Hardware Implementation on FPGA, https://arxiv.org/abs/1511.05552v4

[11]

Norman P. Jouppi et al., In-Datacenter Performance Analysis of a Tensor Processing Unit, International Symposium on Computer Architecture (ISCA), Toronto, Canada, June 26, 2017

Digital Library

[12]

C. Szegedy, et al., Going deeper with convolutions, ILSVRC 2014

[13]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition, ILSVRC 2015.

[14]

Y. Umuroglu, et al., FINN: A Framework for Fast, Scalable Binarized Neural Network Inference, International Symposium on Field Programmable Gate Arrays, 2017

Digital Library

[15]

Reduce Power and Cost by Converting from Floating Point to Fixed Point https://www.xilinx.com/support/documentation/white_papers/wp491-floating-to-fixed-point.pdf

[16]

S. Gupta, et al., Deep Learning with Limited Numerical Precision, https://arxiv.org/abs/1502.02551 2015

[17]

C. Zhang et al., Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks, International Symposium on Field-Programmable Gate Arrays, 2015

Digital Library

[18]

C. Farabet, et al., Large-Scale FPGA-based Convolutional Networks, Scaling up Machine Learning: Parallel and Distributed Approaches, Cambridge University Press, 2011

[19]

K. Negi, et al., Deep pipelined one-chip FPGA implementation of a real-time image-based human detection algorithm, International Conference on Field-Programmable Technology, 2011

[20]

J. Qiu, Going Deeper with Embedded FPGA Platform for Convolutional Neural Network, International Symposium on Field Programmable Gate Arrays, 2016

Digital Library

[21]

C. Couprie, et al., Indoor Semantic Segmentation using depth information, International Conference on Learning Representations 2013

[22]

F. Iandola, Squeezenet: Alexnet-Level Accuracy with 50x fewer Parameters and <0.5MB Model Size, https://arxiv.org/abs/1602.07360

[23]

P. Gysel et al., Hardware-oriented Approximation of Convolutional Neural Networks, International Conference on Learning Representations 2016

Index Terms

Exploration and Tradeoffs of different Kernels in FPGA Deep Learning Applications
1. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

HW/SW co-design and co-optimizations for deep learning
INTESA '18: Proceedings of the Workshop on INTelligent Embedded Systems Architectures and Applications

Deep Learning algorithms have been proven to provide state-of-the-art results in many applications but at the cost of a high computational complexity. Therefore, accelerating such algorithms in hardware is highly needed. However, since the computational ...
FPGA-based configurable systolic architecture for window-based image processing

Image processing requires more computational power and data throughput than most conventional processors can provide. Designing specific hardware can improve execution time and achieve better performance per unit of silicon area. A field-programmable-...
Multi-video processing applications on FPGA

With the increasing needs of processing power in video and image processing for advanced media and communication applications, it is mandatory to go further than the software implementation to provide generic, real time, low cost and high performance ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISPD '18: Proceedings of the 2018 International Symposium on Physical Design

March 2018

178 pages

ISBN:9781450356268

DOI:10.1145/3177540

General Chair:
Chris Chu
Iowa State University
,
Program Chair:
Ismail Bustany
Xilinx Inc.

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISPD '18

Sponsor:

SIGDA

ISPD '18: International Symposium on Physical Design

March 25 - 28, 2018

California, Monterey, USA

Acceptance Rates

Overall Acceptance Rate 62 of 172 submissions, 36%

Upcoming Conference

ISPD '25

Sponsor:
sigda

International Symposium on Physical Design

March 16 - 19, 2025

Austin , TX , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
225
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)1

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents