Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3386263.3406956acmotherconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
short-paper

Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices

Published: 07 September 2020 Publication History

Abstract

High quality AI solutions require joint optimization of AI algorithms, such as deep neural networks (DNNs), and their hardware accelerators. To improve the overall solution quality as well as to boost the design productivity, efficient algorithm and accelerator co-design methodologies are indispensable. In this paper, we first discuss the motivations and challenges for the Algorithm/Accelerator co-design problem, and then provide several effective solutions. Especially, we highlight three leading works of effective co-design methodologies: 1) the first simultaneous DNN/FPGA co-design method; 2) a bi-directional light weight DNN and accelerator co-design method; 3) a differentiable and efficient DNN and accelerator co-search method. We demonstrate the effectiveness of the proposed co-design approaches using extensive experiments on both FPGAs and GPUs, with comparisons to existing works. This paper emphasizes the importance and efficacy of algorithm-accelerator co-design, and calls for more research breakthroughs in this interesting and demanding area.

References

[1]
Han Cai et al. Proxylessnas: Direct neural architecture search on target task and hardware. In Proceedings of the International Conference on Learning Representations (ICLR), 2019.
[2]
Mingxing Tan et al. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[3]
Bichen Wu et al. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[4]
Dustin Franklin. NVIDIA Jetson AGX Xavier delivers 32 teraops for new era ofAI in robotics. NVIDIA Accelerated Computing| Parallel For all, 2018.
[5]
Norman P Jouppi et al. In-data center performance analysis of a tensor processing unit. In Proceedings of International Symposium on Computer Architecture (ISCA),2017.
[6]
Yu-Hsin Chen et al. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. In IEEE International Solid-State Circuits Conference (ISSCC), 2016.
[7]
Xiaofan Zhang et al. High-performance video content recognition with long-term recurrent convolutional network for FPGA. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), 2017.
[8]
Qin Li et al. Implementing neural machine translation with bi-directional gru and attention mechanism on FPGAs using HLS. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC), 2019.
[9]
Song Han et al. ESE: Efficient speech recognition engine with sparse lstm on FPGA. In Proceedings of the International Symposium on Field-Programmable Gate Arrays (FPGA), 2017.
[10]
Chuanhao Zhuge et al. Face recognition with hybrid efficient convolution algorithms on FPGAs. In Proceedings of the Great Lakes Symposium on VLSI (GLSVLSI),2018.
[11]
Junsong Wang et al. Design flow of accelerating hybrid extremely low bit-width neural network in embedded FPGA. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), 2018.
[12]
Xiaofan Zhang et al. DNNBuilder: an automated tool for building high-performance DNN hardware accelerators for FPGAs. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2018.
[13]
Hanchen Ye et al. HybridDNN: A framework for high-performance hybrid dnn accelerator design and implementation. In Proceedings of the ACM/IEEE Design Automation Conference (DAC), 2020.
[14]
Cong Hao and Deming Chen. Deep neural network model and fpga accelerator co-design: Opportunities and challenges. In Proceedings of the IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT), 2018.
[15]
Cong Hao et al. NAIS: Neural architecture and implementation search and its applications in autonomous driving. Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2019.
[16]
Cong Hao et al. FPGA/DNN co-design: An efficient design methodology for1ot intelligence on the edge. In Proceedings of the ACM/IEEE Design Automation Conference (DAC), 2019.
[17]
Weiwen Jiang et al. Accuracy vs. efficiency: Achieving both through fpga-implementation aware neural architecture search. In Proceedings of the ACM/IEEE Design Automation Conference (DAC), 2019.
[18]
Yuhong Li et al. EDD: Efficient differentiable dnn architecture and implementation co-search for embedded AI solutions. Proceedings of the ACM/IEEE Design Automation Conference (DAC), 2020.
[19]
Xiaofan Zhang et al. SkyNet: a hardware-efficient method for object detection and tracking on embedded systems. In Proceedings of Machine Learning and Systems (MLSys), 2020.
[20]
Lei Yang et al. Co-exploration of neural architectures and heterogeneous asic accelerator designs targeting multiple tasks.Proceedings of the ACM/IEEE Design Automation Conference (DAC), 2020.
[21]
Hanxiao Liu et al. Darts: Differentiable architecture search. In Proceedings of the International Conference on Learning Representations (ICLR), 2019.
[22]
Dimitrios Stamoulis et al. Single-path nas: Designing hardware-efficient convnets in less than 4 hours. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2019.
[23]
Hardik Sharma et al. Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural networks. In ISCA. IEEE, 2018.
[24]
Sayeh Sharify et al. Loom: Exploiting weight and activation precisions to accelerate convolutional neural networks. In DAC. IEEE, 2018.
[25]
Xiaowei Xu et al. DAC-SDC low power object detection challenge for UAV applications. IEEE Transactions on Pattern Analysis and Machine Intelligence,2019.
[26]
Kaan Kara, Ce Zhang, and Gustavo Alonso. DAC-SDC'18 2nd place winner in FPGA track. https://github.com/fpgasystems/spooNN, 2018. Accessed: 2020-02--28.
[27]
Kaan Kara and Gustavo Alonso. DAC-SDC'19 3rd place winner in FPGA track,2019.
[28]
Feng Xiong et al. DAC-SDC'19 2nd place winner in GPU track, 2019.
[29]
Jianing Deng et al. DAC-SDC'19 3rd place winner in GPU track, 2019.
[30]
Hao Lu et al. DAC-SDC'18 1st place winner in GPU track.https://github.com/lvhao7896/DAC2018, 2018. Accessed: 2020-02--28.
[31]
Chuanqi Zang et al. DAC-SDC'18 3rd place winner in GPU track. https://github.com/xiaoyuuuuu/dac-hdc-2018-object-detection-in-Jetson-TX2, 2018. Accessed: 2020-02--28.
[32]
Boran Zhao et al. DAC-SDC'19 2nd place winner in FPGA track, 2019.
[33]
Shulin Zeng et al. DAC-SDC'18 1st place winner in FPGA track. https://github.com/hirayaku/DAC2018-TGIIF, 2018. Accessed: 2020-02--28.
[34]
Cong Hao et al. DAC-SDC'18 3rd place winner in FPGA track.https://github.com/onioncc/iSmartDNN, 2018. Accessed: 2020-02--28.
[35]
Xilinx. ChaiDNN. https://github.com/Xilinx/CHaiDNN.
[36]
Mark Sandler et al. MobileNetV2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2018.
[37]
Ningning Ma et al. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. In Proceedings of the European Conference on Computer Vision(ECCV), 2018.
[38]
Lianghua Huang et al. Got-10k: A large high-diversity benchmark for generic object tracking in the wild.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
[39]
Bo Li et al. Siamrpn++: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 2019.
[40]
Qiang Wang et al. Fast online object tracking and segmentation: A unifying approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.

Cited By

View all
  • (2023)MetaML: Automating Customizable Cross-Stage Design-Flow for Deep Learning Acceleration2023 33rd International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL60245.2023.00042(248-252)Online publication date: 4-Sep-2023
  • (2022)Algorithm/Accelerator Co-Design and Co-Search for Edge AIIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2022.317922969:7(3064-3070)Online publication date: Jul-2022
  • (2022)A Survey of State-of-the-art on Edge Computing: Theoretical Models, Technologies, Directions, and Development PathsIEEE Access10.1109/ACCESS.2022.317610610(54038-54063)Online publication date: 2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
GLSVLSI '20: Proceedings of the 2020 on Great Lakes Symposium on VLSI
September 2020
597 pages
ISBN:9781450379441
DOI:10.1145/3386263
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 September 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. AI applications
  2. DNNs
  3. accelerator
  4. algorithm
  5. co-design
  6. edge devices
  7. machine learning

Qualifiers

  • Short-paper

Conference

GLSVLSI '20
GLSVLSI '20: Great Lakes Symposium on VLSI 2020
September 7 - 9, 2020
Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)42
  • Downloads (Last 6 weeks)5
Reflects downloads up to 01 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)MetaML: Automating Customizable Cross-Stage Design-Flow for Deep Learning Acceleration2023 33rd International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL60245.2023.00042(248-252)Online publication date: 4-Sep-2023
  • (2022)Algorithm/Accelerator Co-Design and Co-Search for Edge AIIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2022.317922969:7(3064-3070)Online publication date: Jul-2022
  • (2022)A Survey of State-of-the-art on Edge Computing: Theoretical Models, Technologies, Directions, and Development PathsIEEE Access10.1109/ACCESS.2022.317610610(54038-54063)Online publication date: 2022
  • (2022)Towards Real-Time and Energy Efficient Siamese Tracking – A Hardware-Software ApproachDesign and Architecture for Signal and Image Processing10.1007/978-3-031-12748-9_13(162-173)Online publication date: 30-Jul-2022
  • (2021)Best Practices for the Deployment of Edge Inference: The Conclusions to Start DesigningElectronics10.3390/electronics1016191210:16(1912)Online publication date: 9-Aug-2021

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media