Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2742060.2743766acmconferencesArticle/Chapter ViewAbstractPublication PagesglsvlsiConference Proceedingsconference-collections
research-article

Origami: A Convolutional Network Accelerator

Published: 20 May 2015 Publication History

Abstract

Today advanced computer vision (CV) systems of ever increasing complexity are being deployed in a growing number of application scenarios with strong real-time and power constraints. Current trends in CV clearly show a rise of neural network-based algorithms, which have recently broken many object detection and localization records. These approaches are very flexible and can be used to tackle many different challenges by only changing their parameters. In this paper, we present the first convolutional network accelerator which is scalable to network sizes that are currently only handled by workstation GPUs, but remains within the power envelope of embedded systems. The architecture has been implemented on 3.09 mm2 core area in UMC 65 nm technology, capable of a throughput of 274 GOp/s at 369 GOp/s/W with an external memory bandwidth of just 525 MB/s full-duplex " a decrease of more than 90% from previous work.

References

[1]
F. Porikli, F. Bremond, S. L. Dockstader, J. Ferryman, A. Hoogs, B. C. Lovell, S. Pankanti, B. Rinner, P. Tu, and P. L. Venetianer, "Video surveillance: past, present, and now the future {DSP Forum}," IEEE Signal Process. Mag., vol. 30, pp. 190"198, 2013.
[2]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Proc. NIPS"12, 2012.
[3]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going Deeper with Convolutions," in arXiv:1409.4842, 2014.
[4]
P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, "OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks," in arXiv:1312.6229, 2013.
[5]
Y. Taigman and M. Yang, "Deepface: Closing the gap to human-level performance in face verification," in Proc. IEEE CVPR"13, 2013.
[6]
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, "Microsoft COCO: Common Objects in Context," in arXiv:1405.0312, 2014.
[7]
C. Labovitz, S. Iekel-Johnson, D. McPherson, J. Oberheide, and F. Jahanian, "Internet inter-domain traffic," ACM SIGCOMM Computer Communication Review, vol. 40. p. 75, 2010.
[8]
C. Bobda and S. Velipasalar, Eds., Distributed Embedded Smart Cameras. Springer, 2014.
[9]
C. Farabet, C. Couprie, L. Najman, and Y. LeCun, "Learning hierarchical features for scene labeling," IEEE Trans. PAMI, 2013.
[10]
L. Cavigelli, M. Magno, and L. Benini, "Accelerating Real-Time Embedded Scene Labeling with Convolutional Networks," in Proc. DAC"15, 2015.
[11]
R. Collobert, "Torch7: A matlab-like environment for machine learning," Proc. NIPSW"11, 2011.
[12]
Y. Jia, "Caffe: An Open Source Convolutional Architecture for Fast Feature Embedding." 2013.
[13]
S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer, "cuDNN: Efficient Primitives for Deep Learning," in arXiv:1410.0759, 2014.
[14]
C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun, "CNP: An FPGA-based processor for Convolutional Networks," in Proc. IEEE FPL"09, 2009, vol. 1, no. 1, pp. 32"37.
[15]
C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun, "NeuFlow: A runtime reconfigurable dataflow processor for vision," in Proc. IEEE CVPRW"11, 2011, pp. 109"116.
[16]
P. H. Pham, D. Jelaca, C. Farabet, B. Martini, Y. LeCun, and E. Culurciello, "NeuFlow: Dataflow vision processing system-on-a-chip," in Midwest Symposium on Circuits and Systems, 2012, pp. 1044"1047.
[17]
V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello, "A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks," in Proc. IEEE CVPR"14, 2014, pp. 682"687.
[18]
F. Conti and L. Benini, "A Ultra-Low-Energy Convolution Engine for Fast Brain-Inspired Vision in Multicore Clusters," in Proc. DATE"15, 2015.
[19]
S. Gould, R. Fulton, and D. Koller, "Decomposing a scene into geometric and semantically consistent regions," in Proc. IEEE ICCV"09, 2009.
[20]
M. Schaffner, F. K. Gürkaynak, A. Smolic, and L. Benini, "DRAM or no-DRAM"" Exploring Linear Solver Architectures for Image Domain Warping in 28 nm CMOS," in Proc. IEEE DATE"15, 2015.

Cited By

View all
  • (2024)ReSA: Reconfigurable Systolic Array for Multiple Tiny DNN TensorsACM Transactions on Architecture and Code Optimization10.1145/365336321:3(1-24)Online publication date: 21-Mar-2024
  • (2024)A High-Performance and Energy-Efficient Photonic Architecture for Multi-DNN AccelerationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.332753535:1(46-58)Online publication date: Jan-2024
  • (2023)Neuro-distributed cognitive adaptive optimization for training neural networks in a parallel and asynchronous mannerIntegrated Computer-Aided Engineering10.3233/ICA-23071831:1(19-41)Online publication date: 16-Nov-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
GLSVLSI '15: Proceedings of the 25th edition on Great Lakes Symposium on VLSI
May 2015
418 pages
ISBN:9781450334747
DOI:10.1145/2742060
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • IEEE CEDA
  • IEEE CASS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 May 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. accelerator
  2. classification
  3. computer vision
  4. machine learning
  5. pattern recognition
  6. signal processing
  7. vlsi design

Qualifiers

  • Research-article

Funding Sources

Conference

GLSVLSI '15
Sponsor:
GLSVLSI '15: Great Lakes Symposium on VLSI 2015
May 20 - 22, 2015
Pennsylvania, Pittsburgh, USA

Acceptance Rates

GLSVLSI '15 Paper Acceptance Rate 41 of 148 submissions, 28%;
Overall Acceptance Rate 312 of 1,156 submissions, 27%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)91
  • Downloads (Last 6 weeks)11
Reflects downloads up to 29 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)ReSA: Reconfigurable Systolic Array for Multiple Tiny DNN TensorsACM Transactions on Architecture and Code Optimization10.1145/365336321:3(1-24)Online publication date: 21-Mar-2024
  • (2024)A High-Performance and Energy-Efficient Photonic Architecture for Multi-DNN AccelerationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.332753535:1(46-58)Online publication date: Jan-2024
  • (2023)Neuro-distributed cognitive adaptive optimization for training neural networks in a parallel and asynchronous mannerIntegrated Computer-Aided Engineering10.3233/ICA-23071831:1(19-41)Online publication date: 16-Nov-2023
  • (2023)A Silicon Photonic Multi-DNN Accelerator2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT58117.2023.00028(238-249)Online publication date: 21-Oct-2023
  • (2023)Application and Implementation of Convolutional Neural Network Accelerator Based on FPGA in Environmental Sound Classification2023 8th International Conference on Computer and Communication Systems (ICCCS)10.1109/ICCCS57501.2023.10151442(22-27)Online publication date: 21-Apr-2023
  • (2023)A Case Study on DNN AcceleratorsProceeding of 2022 International Conference on Wireless Communications, Networking and Applications (WCNA 2022)10.1007/978-981-99-3951-0_86(787-792)Online publication date: 27-Jul-2023
  • (2022)Power-Efficient Deep Neural Network Accelerator Minimizing Global Buffer Access without Data Transfer between Neighboring Multiplier—Accumulator UnitsElectronics10.3390/electronics1113199611:13(1996)Online publication date: 25-Jun-2022
  • (2022)AraxProceedings of the 13th Symposium on Cloud Computing10.1145/3542929.3563467(1-15)Online publication date: 7-Nov-2022
  • (2022)Deep learning on microcontrollersProceedings of the 2nd European Workshop on Machine Learning and Systems10.1145/3517207.3526978(54-63)Online publication date: 5-Apr-2022
  • (2022)An Uninterrupted Processing Technique-Based High-Throughput and Energy-Efficient Hardware Accelerator for Convolutional Neural NetworksIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2022.321096330:12(1891-1901)Online publication date: Dec-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media