Yanamala et al., 2022 - Google Patents

An Efficient Configurable Hardware Accelerator Design for CNN on Low Memory 32-Bit Edge Device

Yanamala et al., 2022

Document ID: 1180521029484056305
Author: Yanamala R; Pullakandam M
Publication year: 2022
Publication venue: 2022 IEEE International Symposium on Smart Electronic Systems (iSES)

External Links

Cited by

Snippet

Nowadays the ability of Convolutional Neural Networks (CNN) to mimic the behavioral characteristics of the biological visual neuron makes it a popular choice for image identification. It comprises a deep structure and a high network that performs convolutional …

Continue reading at ieeexplore.ieee.org (other versions)

230000015654 memory 0 title description 15

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5009—Computer-aided design using simulation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5045—Circuit design
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/80—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
- G06F7/53—Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
- G06F9/3893—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator
- G06F9/3895—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros
- G06F9/3897—Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled in tandem, e.g. multiplier-accumulator for complex operations, e.g. multidimensional or interleaved address generators, macros with adaptable data path
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2217/00—Indexing scheme relating to computer aided design [CAD]
- G06F2217/68—Processors
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
- G06F8/41—Compilation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F19/00—Digital computing or data processing equipment or methods, specially adapted for specific applications
- G06F19/10—Bioinformatics, i.e. methods or systems for genetic or protein-related data processing in computational molecular biology
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image

Similar Documents

Publication	Publication Date	Title
Muslim et al.	2017	Efficient FPGA implementation of OpenCL high-performance computing applications via high-level synthesis
Amiri et al.	2017	FPGA-based soft-core processors for image processing applications
Kułaga et al.	2014	FPGA implementation of decision trees and tree ensembles for character recognition in Vivado HLS
Liang et al.	2020	OMNI: A framework for integrating hardware and software optimizations for sparse CNNs
Hailesellasie et al.	2019	Mulnet: A flexible cnn processor with higher resource utilization efficiency for constrained devices
Shahrouzi et al.	2019	Optimized hardware accelerators for data mining applications on embedded platforms: Case study principal component analysis
Yanamala et al.	2022	An Efficient Configurable Hardware Accelerator Design for CNN on Low Memory 32-Bit Edge Device
Xu et al.	2018	CaFPGA: An automatic generation model for CNN accelerator
Viet Huynh	2021	FPGA-based acceleration for convolutional neural networks on PYNQ-Z2
Zhou et al.	2018	Addressing sparsity in deep neural networks
Bernaschi et al.	2016	A factored sparse approximate inverse preconditioned conjugate gradient solver on graphics processing units
Yan et al.	2020	FPGAN: an FPGA accelerator for graph attention networks with software and hardware co-optimization
Tithi et al.	2014	Exploiting spatial architectures for edit distance algorithms
Lou et al.	2019	RV-CNN: Flexible and efficient instruction set for CNNs based on RISC-V processors
Marchisio et al.	2021	FEECA: Design space exploration for low-latency and energy-efficient capsule network accelerators
Wu	2023	Review on FPGA-based accelerators in deep learning
Darbani et al.	2022	RASHT: A partially reconfigurable architecture for efficient implementation of CNNs
Zhao et al.	2020	Machine learning computers with fractal von Neumann architecture
Yanamala et al.	2023	A high-speed reusable quantized hardware accelerator design for CNN on constrained edge device
Gnanasambandapillai et al.	2020	Finder: Find efficient parallel instructions for asips to improve performance of large applications
Miro	2020	FPGA-Based Accelerators for Convolutional Neural Networks on Embedded Devices
Eid et al.	2021	Hardware implementation of YOLOv4-tiny for object detection
Odetola et al.	2022	2l-3w: 2-level 3-way hardware–software co-verification for the mapping of convolutional neural network (cnn) onto fpga boards
Sousa et al.	2023	Tensor slicing and optimization for multicore NPUs
Gomez-Pulido et al.	2016	Fine-grained parallelization of fitness functions in bioinformatics optimization problems: gene selection for cancer classification and biclustering of gene expression data