research-article

Accelerating real-time embedded scene labeling with convolutional networks

Authors:

Lukas Cavigelli,

Luca BeniniAuthors Info & Claims

DAC '15: Proceedings of the 52nd Annual Design Automation Conference

Article No.: 108, Pages 1 - 6

https://doi.org/10.1145/2744769.2744788

Published: 07 June 2015 Publication History

Abstract

Today there is a clear trend towards deploying advanced computer vision (CV) systems in a growing number of application scenarios with strong real-time and power constraints. Brain-inspired algorithms capable of achieving record-breaking results combined with embedded vision systems are the best candidate for the future of CV and video systems due to their flexibility and high accuracy in the area of image understanding. In this paper, we present an optimized convolutional network implementation suitable for real-time scene labeling on embedded platforms. We show that our algorithm can achieve up to 96GOp/s, running on the Nvidia Tegra K1 embedded SoC. We present experimental results, compare them to the state-of-the-art, and demonstrate that for scene labeling our approach achieves a 1.5x improvement in throughput when compared to a modern desktop CPU at a power budget of only 11 W.

References

[1]

C. Bobda and S. Velipasalar, editors. Distributed Embedded Smart Cameras. Springer, 2014.

Digital Library

[2]

S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer. cuDNN: Efficient Primitives for Deep Learning. In arXiv:1410.0759, Oct. 2014.

[3]

R. Collobert. Torch7: A matlab-like environment for machine learning. Proc. NIPSW'11, 2011.

[4]

A. Dundar, J. Jin, V. Gokhale, B. Krishnamurthy, A. Canziani, B. Martini, and E. Culurciello. Accelerating Deep Neural Networks on Mobile Processor with Embedded Programmable Logic. In Proc. NIPS'13, 2013.

[5]

C. Farabet, C. Couprie, L. Najman, and Y. LeCun. Learning hierarchical features for scene labeling. IEEE Trans. on PAMI, 2013.

Digital Library

[6]

C. Farabet, B. Martini, B. Corda, P. Akselrod, E. Culurciello, and Y. LeCun. NeuFlow: A runtime reconfigurable dataflow processor for vision. In Proc. IEEE CVPRW'11, pages 109--116, June 2011.

[7]

V. Gokhale, J. Jin, A. Dundar, B. Martini, and E. Culurciello. A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks. In Proc. IEEE CVPR'14, pages 682--687, 2014.

Digital Library

[8]

S. Gould, R. Fulton, and D. Koller. Decomposing a scene into geometric and semantically consistent regions. In Proc. IEEE ICCV'09, 2009.

[9]

Y. Jia. Caffe: An Open Source Convolutional Architecture for Fast Feature Embedding, 2013.

[10]

J. Jin, V. Gokhale, A. Dundar, B. Krishnamurthy, B. Martini, and E. Culurciello. An efficient implementation of deep convolutional neural networks on a mobile coprocessor. In Proc. IEEE MWSCAS'14, pages 133--136, Aug. 2014.

[11]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Proc. NIPS'12, 2012.

[12]

M. Kumar and D. Koller. Efficiently selecting regions for scene understanding. In Proc. IEEE CVPR'10, pages 3217--3224, June 2010.

[13]

C. Labovitz, S. Iekel-Johnson, D. McPherson, J. Oberheide, and F. Jahanian. Internet inter-domain traffic, 2010.

Digital Library

[14]

M. Mathieu, M. Henaff, and Y. LeCun. Fast Training of Convolutional Networks through FFTs. In arXiv:1312.5851, Dec. 2013.

[15]

D. Munoz, J. Bagnell, and M. Hebert. Stacked hierarchical labeling. In Proc. ECCV'10, 2010.

Digital Library

[16]

F. Porikli, F. Bremond, S. L. Dockstader, J. Ferryman, A. Hoogs, B. C. Lovell, S. Pankanti, B. Rinner, P. Tu, and P. L. Venetianer. Video surveillance: past, present, and now the future {DSP Forum}. IEEE Signal Processing Magazine, 30:190--198, 2013.

[17]

X. Ren, L. Bo, and D. Fox. Rgb-(d) scene labeling: Features and algorithms. In Proc. IEEE CVPR'12, pages 2759--2766, June 2012.

Digital Library

[18]

M. Seyedhosseini and T. Tasdizen. Scene Labeling with Contextual Hierarchical Models. In arXiv:1402.0595, Feb. 2014.

[19]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going Deeper with Convolutions. In arXiv:1409.4842, Sept. 2014.

[20]

Y. Taigman and M. Yang. Deepface: Closing the gap to human-level performance in face verification. In Proc. IEEE CVPR'13, 2013.

Digital Library

[21]

Teradeep Inc. Teradeep Technology Website, 2014.

[22]

J. Tighe and S. Lazebnik. Superparsing: scalable nonparametric image parsing with superpixels. In Proc. ECCV'10, 2010.

Digital Library

Cited By

Mahmoud KNicolici N(2024)ALPRI-FI: A Framework for Early Assessment of Hardware Fault Resiliency of DNN AcceleratorsElectronics10.3390/electronics1316324313:16(3243)Online publication date: 15-Aug-2024
https://doi.org/10.3390/electronics13163243
Liu YZhang YHao XChen LNi MChen MChen R(2024)Design of a Convolutional Neural Network Accelerator Based on On-Chip Data ReorderingElectronics10.3390/electronics1305097513:5(975)Online publication date: 4-Mar-2024
https://doi.org/10.3390/electronics13050975
Wang JBai YWang HHao ZWang GZhang KZhang YLv WZhang Y(2022)Reconfigurable Bit-Serial Operation Using Toggle SOT-MRAM for High-Performance Computing in Memory ArchitectureIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2022.319216569:11(4535-4545)Online publication date: Nov-2022
https://doi.org/10.1109/TCSI.2022.3192165
Show More Cited By

Index Terms

Accelerating real-time embedded scene labeling with convolutional networks
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition
      2. Computer vision tasks
        Scene understanding

Recommendations

Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks?
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Current-generation Deep Neural Networks (DNNs), such as AlexNet and VGG, rely heavily on dense floating-point matrix multiplication (GEMM), which maps well to GPUs (regular parallelism, high TFLOP/s). Because of this, GPUs are widely used for ...
Accelerating the SVD two stage bidiagonal reduction and divide and conquer using GPUs
Highlights
- Accelerates all three phases of the singular value decomposition using a GPU.
- ...
Abstract
The increasing gap between memory bandwidth and computation speed motivates the choice of algorithms to take full advantage of today’s high performance computers. For dense matrices, the classic algorithm for the singular value ...
Performance portability in reverse time migration and seismic modelling via OpenACC

Heterogeneity among the computational resources within a single machine has significantly increased in high performance computing to exploit the tremendous potential of graphics processing units GPUs. Portability in terms of code development and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '15: Proceedings of the 52nd Annual Design Automation Conference

June 2015

1204 pages

ISBN:9781450335201

DOI:10.1145/2744769

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

armasuisse Science & Technology

Conference

DAC '15

Sponsor:

SIGDA

DAC '15: The 52nd Annual Design Automation Conference 2015

June 7 - 11, 2015

California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

78
Total Citations
View Citations
668
Total Downloads

Downloads (Last 12 months)19
Downloads (Last 6 weeks)1

Reflects downloads up to 29 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mahmoud KNicolici N(2024)ALPRI-FI: A Framework for Early Assessment of Hardware Fault Resiliency of DNN AcceleratorsElectronics10.3390/electronics1316324313:16(3243)Online publication date: 15-Aug-2024
https://doi.org/10.3390/electronics13163243
Liu YZhang YHao XChen LNi MChen MChen R(2024)Design of a Convolutional Neural Network Accelerator Based on On-Chip Data ReorderingElectronics10.3390/electronics1305097513:5(975)Online publication date: 4-Mar-2024
https://doi.org/10.3390/electronics13050975
Wang JBai YWang HHao ZWang GZhang KZhang YLv WZhang Y(2022)Reconfigurable Bit-Serial Operation Using Toggle SOT-MRAM for High-Performance Computing in Memory ArchitectureIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2022.319216569:11(4535-4545)Online publication date: Nov-2022
https://doi.org/10.1109/TCSI.2022.3192165
Scherer MRutishauser GCavigelli LBenini L(2022)CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration With Better-Than-Binary Energy EfficiencyIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.307542041:4(1020-1033)Online publication date: Apr-2022
https://doi.org/10.1109/TCAD.2021.3075420
Hafner FZeller MSchutera MAbhau JKooij J(2022)BackboneAnalysis: Structured Insights into Compute Platforms from CNN Inference Latency2022 IEEE Intelligent Vehicles Symposium (IV)10.1109/IV51971.2022.9827260(1801-1809)Online publication date: 5-Jun-2022
https://doi.org/10.1109/IV51971.2022.9827260
Wang JPark SPark C(2022)Spatial Data Dependence Graph Based Pre-RTL Simulator for Convolutional Neural Network DataflowsIEEE Access10.1109/ACCESS.2022.314641310(11382-11403)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3146413
Roohi AAngizi SFan D(2022)Enabling Edge Computing Using Emerging Memory Technologies: From Device to ArchitectureFrontiers of Quality Electronic Design (QED)10.1007/978-3-031-16344-9_11(415-464)Online publication date: 6-Sep-2022
https://doi.org/10.1007/978-3-031-16344-9_11
Kodukula VKatrawala SJones BWu CLiKamWa R(2021)Dynamic Temperature Management of Near-Sensor Processing for Energy-Efficient High-Fidelity ImagingSensors10.3390/s2103092621:3(926)Online publication date: 30-Jan-2021
https://doi.org/10.3390/s21030926
KITAYAMA AONO GKISHIMOTO TITO HKOHMU N(2021)Low-Power Implementation Techniques for Convolutional Neural Networks Using Precise and Active Skipping MethodsIEICE Transactions on Electronics10.1587/transele.2020CDP0003E104.C:7(330-337)Online publication date: 1-Jul-2021
https://doi.org/10.1587/transele.2020CDP0003
Zhang YRobertson JXiang SHejda MBueno JHurtado A(2021)All-optical neuromorphic binary convolution with a spiking VCSEL neuron for image gradient magnitudesPhotonics Research10.1364/PRJ.4121419:5(B201)Online publication date: 14-Apr-2021
https://doi.org/10.1364/PRJ.412141
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents