Cao et al., 2023 - Google Patents
PP-Transformer: Enable Efficient Deployment of Transformers Through Pattern PruningCao et al., 2023
- Document ID
- 321457810320365148
- Author
- Cao J
- Lin X
- Zhang M
- Shi K
- Yu J
- Wang K
- Publication year
- Publication venue
- 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)
External Links
Snippet
Transformer models have been widely adopted in the field of Natural Language Processing (NLP) and Computer Vision (CV). However, the excellent performance of Transformers comes at the cost of heavy memory footprints and gigantic computing complexity. To deploy …
- 238000013138 pruning 0 title abstract description 116
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
- G06F7/53—Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/544—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
- G06F7/5443—Sum of products
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/30—Arrangements for executing machine-instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G06F17/5045—Circuit design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored programme computers
- G06F15/80—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhu et al. | An efficient hardware accelerator for structured sparse convolutional neural networks on FPGAs | |
Deng et al. | GoSPA: An energy-efficient high-performance globally optimized sparse convolutional neural network accelerator | |
Guo et al. | FBNA: A fully binarized neural network accelerator | |
Lu et al. | Evaluating fast algorithms for convolutional neural networks on FPGAs | |
Fang et al. | An algorithm–hardware co-optimized framework for accelerating n: M sparse transformers | |
Sun et al. | Vaqf: Fully automatic software-hardware co-design framework for low-bit vision transformer | |
TWI795519B (en) | Computing apparatus, machine learning computing apparatus, combined processing device, neural network chip, electronic device, board, and method for performing machine learning calculation | |
Wang et al. | WinoNN: Optimizing FPGA-based convolutional neural network accelerators using sparse Winograd algorithm | |
CN110163357B (en) | Computing device and method | |
Dong et al. | Heatvit: Hardware-efficient adaptive token pruning for vision transformers | |
You et al. | RSNN: A software/hardware co-optimized framework for sparse convolutional neural networks on FPGAs | |
Zhang et al. | A low-latency FPGA implementation for real-time object detection | |
Sun et al. | A high-performance accelerator for large-scale convolutional neural networks | |
Cao et al. | PP-Transformer: Enable Efficient Deployment of Transformers Through Pattern Pruning | |
Wang et al. | A low-latency sparse-winograd accelerator for convolutional neural networks | |
Zhang et al. | Achieving full parallelism in LSTM via a unified accelerator design | |
Que et al. | A reconfigurable multithreaded accelerator for recurrent neural networks | |
Shu et al. | High energy efficiency FPGA-based accelerator for convolutional neural networks using weight combination | |
Wong et al. | Low bitwidth CNN accelerator on FPGA using Winograd and block floating point arithmetic | |
Kwon et al. | Mobile Transformer Accelerator Exploiting Various Line Sparsity and Tile-Based Dynamic Quantization | |
Liu et al. | Tcp-net: Minimizing operation counts of binarized neural network inference | |
Kang et al. | Design of convolution operation accelerator based on FPGA | |
Chen et al. | DSSA: Dual-Side Sparse Systolic Array Architecture for Accelerating Convolutional Neural Network Training | |
Singh et al. | A time domain 2D OaA-based convolutional neural networks accelerator | |
Kabir et al. | FAMOUS: Flexible Accelerator for the Attention Mechanism of Transformer on UltraScale+ FPGAs |