short-paper

Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices

Authors:

Deming ChenAuthors Info & Claims

GLSVLSI '20: Proceedings of the 2020 on Great Lakes Symposium on VLSI

Pages 283 - 290

https://doi.org/10.1145/3386263.3406956

Published: 07 September 2020 Publication History

Abstract

High quality AI solutions require joint optimization of AI algorithms, such as deep neural networks (DNNs), and their hardware accelerators. To improve the overall solution quality as well as to boost the design productivity, efficient algorithm and accelerator co-design methodologies are indispensable. In this paper, we first discuss the motivations and challenges for the Algorithm/Accelerator co-design problem, and then provide several effective solutions. Especially, we highlight three leading works of effective co-design methodologies: 1) the first simultaneous DNN/FPGA co-design method; 2) a bi-directional light weight DNN and accelerator co-design method; 3) a differentiable and efficient DNN and accelerator co-search method. We demonstrate the effectiveness of the proposed co-design approaches using extensive experiments on both FPGAs and GPUs, with comparisons to existing works. This paper emphasizes the importance and efficacy of algorithm-accelerator co-design, and calls for more research breakthroughs in this interesting and demanding area.

References

[1]

Han Cai et al. Proxylessnas: Direct neural architecture search on target task and hardware. In Proceedings of the International Conference on Learning Representations (ICLR), 2019.

[2]

Mingxing Tan et al. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[3]

Bichen Wu et al. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

[4]

Dustin Franklin. NVIDIA Jetson AGX Xavier delivers 32 teraops for new era ofAI in robotics. NVIDIA Accelerated Computing| Parallel For all, 2018.

[5]

Norman P Jouppi et al. In-data center performance analysis of a tensor processing unit. In Proceedings of International Symposium on Computer Architecture (ISCA),2017.

[6]

Yu-Hsin Chen et al. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. In IEEE International Solid-State Circuits Conference (ISSCC), 2016.

[7]

Xiaofan Zhang et al. High-performance video content recognition with long-term recurrent convolutional network for FPGA. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), 2017.

[8]

Qin Li et al. Implementing neural machine translation with bi-directional gru and attention mechanism on FPGAs using HLS. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC), 2019.

[9]

Song Han et al. ESE: Efficient speech recognition engine with sparse lstm on FPGA. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA), 2017.

[10]

Chuanhao Zhuge et al. Face recognition with hybrid efficient convolution algorithms on FPGAs. In Proceedings of the Great Lakes Symposium on VLSI (GLSVLSI),2018.

[11]

Junsong Wang et al. Design flow of accelerating hybrid extremely low bit-width neural network in embedded FPGA. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), 2018.

[12]

Xiaofan Zhang et al. DNNBuilder: an automated tool for building high-performance DNN hardware accelerators for FPGAs. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2018.

[13]

Hanchen Ye et al. HybridDNN: A framework for high-performance hybrid dnn accelerator design and implementation. In Proceedings of the ACM/IEEE Design Automation Conference (DAC), 2020.

[14]

Cong Hao and Deming Chen. Deep neural network model and fpga accelerator co-design: Opportunities and challenges. In Proceedings of the IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT), 2018.

[15]

Cong Hao et al. NAIS: Neural architecture and implementation search and its applications in autonomous driving. Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2019.

[16]

Cong Hao et al. FPGA/DNN co-design: An efficient design methodology for1ot intelligence on the edge. In Proceedings of the ACM/IEEE Design Automation Conference (DAC), 2019.

[17]

Weiwen Jiang et al. Accuracy vs. efficiency: Achieving both through fpga-implementation aware neural architecture search. In Proceedings of the ACM/IEEE Design Automation Conference (DAC), 2019.

[18]

Yuhong Li et al. EDD: Efficient differentiable dnn architecture and implementation co-search for embedded AI solutions. Proceedings of the ACM/IEEE Design Automation Conference (DAC), 2020.

[19]

Xiaofan Zhang et al. SkyNet: a hardware-efficient method for object detection and tracking on embedded systems. In Proceedings of Machine Learning and Systems (MLSys), 2020.

[20]

Lei Yang et al. Co-exploration of neural architectures and heterogeneous asic accelerator designs targeting multiple tasks.Proceedings of the ACM/IEEE Design Automation Conference (DAC), 2020.

[21]

Hanxiao Liu et al. Darts: Differentiable architecture search. In Proceedings of the International Conference on Learning Representations (ICLR), 2019.

[22]

Dimitrios Stamoulis et al. Single-path nas: Designing hardware-efficient convnets in less than 4 hours. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2019.

[23]

Hardik Sharma et al. Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural networks. In ISCA. IEEE, 2018.

[24]

Sayeh Sharify et al. Loom: Exploiting weight and activation precisions to accelerate convolutional neural networks. In DAC. IEEE, 2018.

[25]

Xiaowei Xu et al. DAC-SDC low power object detection challenge for UAV applications. IEEE Transactions on Pattern Analysis and Machine Intelligence,2019.

[26]

Kaan Kara, Ce Zhang, and Gustavo Alonso. DAC-SDC'18 2nd place winner in FPGA track. https://github.com/fpgasystems/spooNN, 2018. Accessed: 2020-02--28.

[27]

Kaan Kara and Gustavo Alonso. DAC-SDC'19 3rd place winner in FPGA track,2019.

[28]

Feng Xiong et al. DAC-SDC'19 2nd place winner in GPU track, 2019.

[29]

Jianing Deng et al. DAC-SDC'19 3rd place winner in GPU track, 2019.

[30]

Hao Lu et al. DAC-SDC'18 1st place winner in GPU track.https://github.com/lvhao7896/DAC2018, 2018. Accessed: 2020-02--28.

[31]

Chuanqi Zang et al. DAC-SDC'18 3rd place winner in GPU track. https://github.com/xiaoyuuuuu/dac-hdc-2018-object-detection-in-Jetson-TX2, 2018. Accessed: 2020-02--28.

[32]

Boran Zhao et al. DAC-SDC'19 2nd place winner in FPGA track, 2019.

[33]

Shulin Zeng et al. DAC-SDC'18 1st place winner in FPGA track. https://github.com/hirayaku/DAC2018-TGIIF, 2018. Accessed: 2020-02--28.

[34]

Cong Hao et al. DAC-SDC'18 3rd place winner in FPGA track.https://github.com/onioncc/iSmartDNN, 2018. Accessed: 2020-02--28.

[35]

Xilinx. ChaiDNN. https://github.com/Xilinx/CHaiDNN.

[36]

Mark Sandler et al. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2018.

[37]

Ningning Ma et al. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. In Proceedings of the European Conference on Computer Vision(ECCV), 2018.

[38]

Lianghua Huang et al. Got-10k: A large high-diversity benchmark for generic object tracking in the wild.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.

[39]

Bo Li et al. Siamrpn++: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2019.

[40]

Qiang Wang et al. Fast online object tracking and segmentation: A unifying approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

Cited By

Que ZLiu SRognlien MGuo CCoutinho JLuk W(2023)MetaML: Automating Customizable Cross-Stage Design-Flow for Deep Learning Acceleration2023 33rd International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL60245.2023.00042(248-252)Online publication date: 4-Sep-2023
https://doi.org/10.1109/FPL60245.2023.00042
Zhang XLi YPan JChen D(2022)Algorithm/Accelerator Co-Design and Co-Search for Edge AIIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2022.317922969:7(3064-3070)Online publication date: Jul-2022
https://doi.org/10.1109/TCSII.2022.3179229
Liu BLuo ZChen HLi C(2022)A Survey of State-of-the-art on Edge Computing: Theoretical Models, Technologies, Directions, and Development PathsIEEE Access10.1109/ACCESS.2022.317610610(54038-54063)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3176106
Show More Cited By

Index Terms

Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Hardware
  1. Electronic design automation
    1. High-level and register-transfer level synthesis
      1. Hardware-software codesign
  2. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

An FPGA-based accelerator platform implements for convolutional neural network
HP3C '19: Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications

In recent years, convolutional neural network (CNN) has become widely universal in large number of applications including computer vision, natural language processing and automatic driving. However, the CNN-based methods are computational-intensive and ...
A High-Performance Reconfigurable Accelerator for Convolutional Neural Networks
ICMSSP '18: Proceedings of the 3rd International Conference on Multimedia Systems and Signal Processing

In this paper, we propose a new high-performance accelerator that supports a variety of convolutional neural networks (CNNs) such as GoogLeNet, ResNet and AlexNet. The proposed accelerator mainly includes 24 parallel PEs (processing engines) for ...
Flip: Data-centric Edge CGRA Accelerator
Coarse-Grained Reconfigurable Arrays (CGRA) are promising edge accelerators due to the outstanding balance in flexibility, performance, and energy efficiency. Classic CGRAs statically map compute operations onto the processing elements (PE) and route the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

GLSVLSI '20: Proceedings of the 2020 on Great Lakes Symposium on VLSI

September 2020

597 pages

ISBN:9781450379441

DOI:10.1145/3386263

General Chairs:
Tinoosh Mohsenin
University of Maryland, Baltimore County, USA
,
Weisheng Zhao
Beihang University, China
,
Program Chairs:
Yiran Chen
Duke University, USA
,
Onur Mutlu
ETH Zurich, Switzerland

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 September 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

GLSVLSI '20

GLSVLSI '20: Great Lakes Symposium on VLSI 2020

September 7 - 9, 2020

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
406
Total Downloads

Downloads (Last 12 months)42
Downloads (Last 6 weeks)5

Reflects downloads up to 01 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Que ZLiu SRognlien MGuo CCoutinho JLuk W(2023)MetaML: Automating Customizable Cross-Stage Design-Flow for Deep Learning Acceleration2023 33rd International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL60245.2023.00042(248-252)Online publication date: 4-Sep-2023
https://doi.org/10.1109/FPL60245.2023.00042
Zhang XLi YPan JChen D(2022)Algorithm/Accelerator Co-Design and Co-Search for Edge AIIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2022.317922969:7(3064-3070)Online publication date: Jul-2022
https://doi.org/10.1109/TCSII.2022.3179229
Liu BLuo ZChen HLi C(2022)A Survey of State-of-the-art on Edge Computing: Theoretical Models, Technologies, Directions, and Development PathsIEEE Access10.1109/ACCESS.2022.317610610(54038-54063)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3176106
Przewlocka-Rus DKryjak T(2022)Towards Real-Time and Energy Efficient Siamese Tracking – A Hardware-Software ApproachDesign and Architecture for Signal and Image Processing10.1007/978-3-031-12748-9_13(162-173)Online publication date: 30-Jul-2022
https://doi.org/10.1007/978-3-031-12748-9_13
Flamis GKalapothas SKitsos P(2021)Best Practices for the Deployment of Edge Inference: The Conclusions to Start DesigningElectronics10.3390/electronics1016191210:16(1912)Online publication date: 9-Aug-2021
https://doi.org/10.3390/electronics10161912

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents