Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1815961.1815993acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

A dynamically configurable coprocessor for convolutional neural networks

Published: 19 June 2010 Publication History

Abstract

Convolutional neural networks (CNN) applications range from recognition and reasoning (such as handwriting recognition, facial expression recognition and video surveillance) to intelligent text applications such as semantic text analysis and natural language processing applications. Two key observations drive the design of a new architecture for CNN. First, CNN workloads exhibit a widely varying mix of three types of parallelism: parallelism within a convolution operation, intra-output parallelism where multiple input sources (features) are combined to create a single output, and inter-output parallelism where multiple, independent outputs (features) are computed simultaneously. Workloads differ significantly across different CNN applications, and across different layers of a CNN. Second, the number of processing elements in an architecture continues to scale (as per Moore's law) much faster than off-chip memory bandwidth (or pin-count) of chips. Based on these two observations, we show that for a given number of processing elements and off-chip memory bandwidth, a new CNN hardware architecture that dynamically configures the hardware on-the-fly to match the specific mix of parallelism in a given workload gives the best throughput performance. Our CNN compiler automatically translates high abstraction network specification into a parallel microprogram (a sequence of low-level VLIW instructions) that is mapped, scheduled and executed by the coprocessor. Compared to a 2.3 GHz quad-core, dual socket Intel Xeon, 1.35 GHz C870 GPU, and a 200 MHz FPGA implementation, our 120 MHz dynamically configurable architecture is 4x to 8x faster. This is the first CNN architecture to achieve real-time video stream processing (25 to 30 frames per second) on a wide range of object detection and recognition tasks.

References

[1]
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, (pp. 1--46).
[2]
Collobert, R.; Weston, J., "A unified architecture for natural language processing: deep neural networks with multitask learning," Proceedings of the 25th International Conference on Machine Learning (ICML 2008), vol. 307, pp.160--167, Jul 2008.
[3]
Benkrid, K.; Belkacemi, S., "Design and implementation of a 2D convolution core for video applications on FPGAs," Digital and Computational Video, 2002. DCV 2002. Proceedings. Third International Workshop on, pp. 85--92, 14--15 Nov. 2002.
[4]
Cardells-Tormo, F.; Molinet, P.-L., "Area-efficient 2-D shift-variant convolvers for FPGA-based digital image processing," Circuits and Systems II: Express Briefs, IEEE Transactions on, vol.53, no.2, pp. 105--109, Feb. 2006.
[5]
Hui Zhang; Mingxin Xia; Guangshu Hu, "A Multiwindow Partial Buffering Scheme for FPGA-Based 2-D Convolvers," Circuits and Systems II: Express Briefs, IEEE Transactions on, vol.54, no.2, pp. 200--204, Feb. 2007.
[6]
Savich, A. W.; Moussa, M.; Areibi, S., "The Impact of Arithmetic Representation on Implementing MLP-BP on FPGAs: A Study," Neural Networks, IEEE Transactions on, vol.18, no.1, pp.240--252, Jan. 2007.
[7]
Gironés, R. G.; Palero, R. C.; Boluda, J. C.; Cortés, A. S., "FPGA Implementation of a Pipelined On-Line Backpropagation," J. VLSI Signal Process. Syst., vol. 40, no. 2, pp.189--213., Jun 2005.
[8]
Catanzaro, B.; Sundaram, N.; Keutzer, K., "Fast Support Vector Training and Classification on Graphics Processors," Machine Learning, 25th International Conference on, (ICML 2008), Jul. 2008.
[9]
C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun, "CNP: An FPGA-based Processor for Convolutional Networks", in Proc. International Conference on Field Programmable Logic and Applications (FPL'09), IEEE, Prague, 2009.
[10]
Dixon, J. D. (1981). Asymptotically fast factorization of integers. Math. Comput., 36, 255--260.
[11]
Hadsell, R. e. (2009). Learning long-range vision for Autonomous off-road Driving. Journal of Field Robotics, 26 (2), 120--144.
[12]
Haykin, S. (2008). Neural networks and learning machines. Prentice Hall.
[13]
Korekado, K., Morie, T., Nomura, O., Nakano, T., Matsugu, M., & Iwata, A. (2005). An Image Filtering Processor for Face/Object Recognition using Merged Analog-digital architecture. Symposium on VLSI Circuits, (pp. 220--223).
[14]
Lisboa, P., Ifeachor, E., & Szczepaniak, P. (2009). Artificial neural networks in Biomedicine. Springer
[15]
McNelis, P. D. (2005). Neural Networks in Finance: Gaining Predictive Edge in the Market. Academic Press.
[16]
Mirowski, P. e. (2008). Comparing SVM and Convolutional networks for Epileptic Seizure Prediction from Intracranial EEG. Proceedings of Machine Learning and Signal Processing, (pp. 244--249).
[17]
Mutch, J., & Lowe, D. (2006). Multiclass object recognition with sparse, localized features. International Conference on Computer Vision and Pattern Recognition, (pp. 11--18).
[18]
Nakajima, M., & al., e. (2006). A 40GOPS 250mw massively parallel processor based on matrix architecture. International Solid-state Circuits Conference, (pp. 410--411).
[19]
Nichols, K., Moussa, M., & Areibi, S. (2002). Feasibility of floating-point arithmetic in FPGA based artificial neural networks. Proceedings of the 15th International Conference on Computer Applications in Industry and Engineering. San Diego, California
[20]
Nomura, O., & Morie, T. (2007). Projection-Field-Type VLSI Convolutional Neural Networks Using Merged/Mixed Analog-Digital approach. International Conference on Neural Information Processing (pp. 1081--1090). Springer-Verlag.
[21]
Omondi, A., & Rajapakse, J. (2006). FPGA Implementations of Neural Networks. Springer.
[22]
Prasad, B., & Prasanna, S. (2008). Speech, Audio, Image and Biomedical Signal Processing using Neural Networks. Springer.
[23]
Sermanet, P. e. (2009). Multi-range architecture for collision-free off-road Robot Navigation. Journal of Field Robotics, 26 (1), 58--87.
[24]
Wolf, D. F., Romero, R. A., & Marques, E. (2001). Using embedded processors in hardware models of artificial neural networks. Proceedings of SBAI - Simposio Brasileiro de Automao Inteligente, (pp. 78--83).
[25]
Steve Lawrence, C. Lee Giles, Ah Chung Tsoi, Andrew D. Back, Face Recognition: A Convolutional Neural Network Approach. IEEE Transactions on Neural Networks 1997.
[26]
Nasse, F., et al, "Face Detection using GPU-based Convolutional Neural Network", CAIP 2009, LNCS pp 83--90, Springer Verlag
[27]
Serre, T. et al "Object recognition with features inspired by the visual cortex", Proceedings of Computer Vision and Pattern Recognition 2006.
[28]
Dalal, N. et al, "Histograms of oriented gradients for human detection", Proceedings of Computer Vision and Pattern Recognition, 2005
[29]
Raina, R. et al, "Large-scale Deep Unsupervised Learning using Graphics Procesors", Proceedings of International Conference on Machine Learning, 2009 (pp. 873--880).
[30]
Lee, H. et al, "Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations", Proceedings of International Conference on Machine Learning, 2009 (pp. 873--880).

Cited By

View all
  • (2024)Revealing CNN Architectures via Side-Channel Analysis in Dataflow-based Inference AcceleratorsACM Transactions on Embedded Computing Systems10.1145/368800123:6(1-25)Online publication date: 11-Sep-2024
  • (2024)ReSA: Reconfigurable Systolic Array for Multiple Tiny DNN TensorsACM Transactions on Architecture and Code Optimization10.1145/365336321:3(1-24)Online publication date: 21-Mar-2024
  • (2024)A High-Performance and Energy-Efficient Photonic Architecture for Multi-DNN AccelerationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.332753535:1(46-58)Online publication date: Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture
June 2010
520 pages
ISBN:9781450300537
DOI:10.1145/1815961
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 38, Issue 3
    ISCA '10
    June 2010
    508 pages
    ISSN:0163-5964
    DOI:10.1145/1816038
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. convolutional neural networks
  2. dynamic reconfiguration
  3. parallel computer architecture

Qualifiers

  • Research-article

Conference

ISCA '10
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)167
  • Downloads (Last 6 weeks)23
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Revealing CNN Architectures via Side-Channel Analysis in Dataflow-based Inference AcceleratorsACM Transactions on Embedded Computing Systems10.1145/368800123:6(1-25)Online publication date: 11-Sep-2024
  • (2024)ReSA: Reconfigurable Systolic Array for Multiple Tiny DNN TensorsACM Transactions on Architecture and Code Optimization10.1145/365336321:3(1-24)Online publication date: 21-Mar-2024
  • (2024)A High-Performance and Energy-Efficient Photonic Architecture for Multi-DNN AccelerationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.332753535:1(46-58)Online publication date: Jan-2024
  • (2024)Enabling HW-Based Task Scheduling in Large Multicore ArchitecturesIEEE Transactions on Computers10.1109/TC.2023.332378173:1(138-151)Online publication date: Jan-2024
  • (2024)A Comprehensive Review of Convolutional Neural Networks for Defect Detection in Industrial ApplicationsIEEE Access10.1109/ACCESS.2024.342516612(94250-94295)Online publication date: 2024
  • (2024)Hardware implementation of memristor-based artificial neural networksNature Communications10.1038/s41467-024-45670-915:1Online publication date: 4-Mar-2024
  • (2024)An Energy-Efficient Reconfigurable Autoencoder Implementation on FPGAIntelligent Systems and Applications10.1007/978-3-031-47721-8_14(212-222)Online publication date: 10-Jan-2024
  • (2023)ADA-GP: Accelerating DNN Training By Adaptive Gradient PredictionProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3623779(1092-1105)Online publication date: 28-Oct-2023
  • (2023)Improving the Performance of CNN Accelerator Architecture under the Impact of Process VariationsACM Transactions on Design Automation of Electronic Systems10.1145/360423628:5(1-21)Online publication date: 9-Sep-2023
  • (2023) SAMBA: S parsity A ware In- M emory Computing B ased Machine Learning A ccelerator IEEE Transactions on Computers10.1109/TC.2023.325751372:9(2615-2627)Online publication date: 1-Sep-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media