Cao et al., 2023 - Google Patents

PP-Transformer: Enable Efficient Deployment of Transformers Through Pattern Pruning

Cao et al., 2023

Document ID: 321457810320365148
Author: Cao J; Lin X; Zhang M; Shi K; Yu J; Wang K
Publication year: 2023
Publication venue: 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)

External Links

Cited by

Snippet

Transformer models have been widely adopted in the field of Natural Language Processing (NLP) and Computer Vision (CV). However, the excellent performance of Transformers comes at the cost of heavy memory footprints and gigantic computing complexity. To deploy …

Continue reading at ieeexplore.ieee.org (other versions)

238000013138 pruning 0 title abstract description 116

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
- G06F7/53—Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/544—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
- G06F7/5443—Sum of products
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5045—Circuit design
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/80—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models

Similar Documents

Publication	Publication Date	Title
Zhu et al.	2020	An efficient hardware accelerator for structured sparse convolutional neural networks on FPGAs
Deng et al.	2021	GoSPA: An energy-efficient high-performance globally optimized sparse convolutional neural network accelerator
Guo et al.	2018	FBNA: A fully binarized neural network accelerator
Lu et al.	2017	Evaluating fast algorithms for convolutional neural networks on FPGAs
Fang et al.	2022	An algorithm–hardware co-optimized framework for accelerating n: M sparse transformers
Sun et al.	2022	Vaqf: Fully automatic software-hardware co-design framework for low-bit vision transformer
TWI795519B (en)	2023-03-11	Computing apparatus, machine learning computing apparatus, combined processing device, neural network chip, electronic device, board, and method for performing machine learning calculation
Wang et al.	2020	WinoNN: Optimizing FPGA-based convolutional neural network accelerators using sparse Winograd algorithm
CN110163357B (en)	2021-06-25	Computing device and method
Dong et al.	2023	Heatvit: Hardware-efficient adaptive token pruning for vision transformers
You et al.	2020	RSNN: A software/hardware co-optimized framework for sparse convolutional neural networks on FPGAs
Zhang et al.	2021	A low-latency FPGA implementation for real-time object detection
Sun et al.	2017	A high-performance accelerator for large-scale convolutional neural networks
Cao et al.	2023	PP-Transformer: Enable Efficient Deployment of Transformers Through Pattern Pruning
Wang et al.	2019	A low-latency sparse-winograd accelerator for convolutional neural networks
Zhang et al.	2020	Achieving full parallelism in LSTM via a unified accelerator design
Que et al.	2020	A reconfigurable multithreaded accelerator for recurrent neural networks
Shu et al.	2019	High energy efficiency FPGA-based accelerator for convolutional neural networks using weight combination
Wong et al.	2021	Low bitwidth CNN accelerator on FPGA using Winograd and block floating point arithmetic
Kwon et al.	2023	Mobile Transformer Accelerator Exploiting Various Line Sparsity and Tile-Based Dynamic Quantization
Liu et al.	2021	Tcp-net: Minimizing operation counts of binarized neural network inference
Kang et al.	2020	Design of convolution operation accelerator based on FPGA
Chen et al.	2022	DSSA: Dual-Side Sparse Systolic Array Architecture for Accelerating Convolutional Neural Network Training
Singh et al.	2023	A time domain 2D OaA-based convolutional neural networks accelerator
Kabir et al.	2024	FAMOUS: Flexible Accelerator for the Attention Mechanism of Transformer on UltraScale+ FPGAs