Sudrajat et al., 2019 - Google Patents
GEMM-Based Quantized Neural Network FPGA Accelerator DesignSudrajat et al., 2019
- Document ID
- 1817960969226360832
- Author
- Sudrajat M
- Adiono T
- Syafalni I
- Publication year
- Publication venue
- 2019 International Symposium on Electronics and Smart Devices (ISESD)
External Links
Snippet
In this study, we will explore Neural Network based FPGA acceleration based on accelerating General Matrix Multiplication (GEMM). GEMM acceleration allows regularized and modular implementation of accelerator design, as well as providing the benefits of …
- 230000001537 neural 0 title abstract description 23
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
- G06F7/53—Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/78—Architectures of general purpose stored programme computers comprising a single central processing unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
-
- H—ELECTRICITY
- H03—BASIC ELECTRONIC CIRCUITRY
- H03M—CODING; DECODING; CODE CONVERSION IN GENERAL
- H03M7/00—Conversion of a code where information is represented by a given sequence or number of digits to a code where the same information or similar information or a subset of information is represented by a different sequence or number of digits
- H03M7/30—Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jang et al. | Sparsity-aware and re-configurable NPU architecture for Samsung flagship mobile SoC | |
US20210374503A1 (en) | Network-centric architecture and algorithms to accelerate distributed training of neural networks | |
US20180197084A1 (en) | Convolutional neural network system having binary parameter and operation method thereof | |
WO2020057161A1 (en) | Split accumulator for convolutional neural network accelerator | |
CN110555516B (en) | Method for realizing low-delay hardware accelerator of YOLOv2-tiny neural network based on FPGA | |
CN110543939B (en) | Hardware acceleration realization device for convolutional neural network backward training based on FPGA | |
US11948069B2 (en) | Compression of neural network activation data | |
EP3637327B1 (en) | Computing device and method | |
CN110110852B (en) | Method for transplanting deep learning network to FPAG platform | |
Struharik et al. | Conna–compressed cnn hardware accelerator | |
Piyasena et al. | Reducing dynamic power in streaming CNN hardware accelerators by exploiting computational redundancies | |
Li et al. | An efficient CNN accelerator using inter-frame data reuse of videos on FPGAs | |
Wu et al. | Skeletongcn: a simple yet effective accelerator for gcn training | |
Niu et al. | SPEC2: Spectral sparse CNN accelerator on FPGAs | |
Zhan et al. | Field programmable gate array‐based all‐layer accelerator with quantization neural networks for sustainable cyber‐physical systems | |
Sudrajat et al. | GEMM-Based Quantized Neural Network FPGA Accelerator Design | |
Xiao et al. | Research on fpga based convolutional neural network acceleration method | |
Zhou et al. | Design and implementation of YOLOv3-Tiny accelerator based on PYNQ-Z2 heterogeneous platform | |
Xiao et al. | A mobilenet accelerator with high processing-element-efficiency on fpga | |
Zhao et al. | HDSuper: High-Quality and High Computational Utilization Edge Super-Resolution Accelerator With Hardware-Algorithm Co-Design Techniques | |
Jo et al. | Bit-serial multiplier based neural processing element with approximate adder tree | |
Li et al. | A 0.13 mJ/Prediction CIFAR-100 Raster-Scan-Based Wired-Logic Processor Using Non-Linear Neural Network | |
Sharma et al. | Hardware accelerator for object detection using tiny YOLO-v3 | |
US20220121915A1 (en) | Configurable bnn asic using a network of programmable threshold logic standard cells | |
Huang et al. | A low-bit quantized and hls-based neural network fpga accelerator for object detection |