Nothing Special   »   [go: up one dir, main page]

Cao et al., 2023 - Google Patents

PP-Transformer: Enable Efficient Deployment of Transformers Through Pattern Pruning

Cao et al., 2023

Document ID
321457810320365148
Author
Cao J
Lin X
Zhang M
Shi K
Yu J
Wang K
Publication year
Publication venue
2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD)

External Links

Snippet

Transformer models have been widely adopted in the field of Natural Language Processing (NLP) and Computer Vision (CV). However, the excellent performance of Transformers comes at the cost of heavy memory footprints and gigantic computing complexity. To deploy …
Continue reading at ieeexplore.ieee.org (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • G06F7/53Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/30Arrangements for executing machine-instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • G06F7/5443Sum of products
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for programme control, e.g. control unit
    • G06F9/06Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
    • G06F9/30Arrangements for executing machine-instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/50Computer-aided design
    • G06F17/5045Circuit design
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored programme computers
    • G06F15/80Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored programme computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models

Similar Documents

Publication Publication Date Title
Zhu et al. An efficient hardware accelerator for structured sparse convolutional neural networks on FPGAs
Deng et al. GoSPA: An energy-efficient high-performance globally optimized sparse convolutional neural network accelerator
Guo et al. FBNA: A fully binarized neural network accelerator
Lu et al. Evaluating fast algorithms for convolutional neural networks on FPGAs
Fang et al. An algorithm–hardware co-optimized framework for accelerating n: M sparse transformers
Sun et al. Vaqf: Fully automatic software-hardware co-design framework for low-bit vision transformer
TWI795519B (en) Computing apparatus, machine learning computing apparatus, combined processing device, neural network chip, electronic device, board, and method for performing machine learning calculation
Wang et al. WinoNN: Optimizing FPGA-based convolutional neural network accelerators using sparse Winograd algorithm
CN110163357B (en) Computing device and method
Dong et al. Heatvit: Hardware-efficient adaptive token pruning for vision transformers
You et al. RSNN: A software/hardware co-optimized framework for sparse convolutional neural networks on FPGAs
Zhang et al. A low-latency FPGA implementation for real-time object detection
Sun et al. A high-performance accelerator for large-scale convolutional neural networks
Cao et al. PP-Transformer: Enable Efficient Deployment of Transformers Through Pattern Pruning
Wang et al. A low-latency sparse-winograd accelerator for convolutional neural networks
Zhang et al. Achieving full parallelism in LSTM via a unified accelerator design
Que et al. A reconfigurable multithreaded accelerator for recurrent neural networks
Shu et al. High energy efficiency FPGA-based accelerator for convolutional neural networks using weight combination
Wong et al. Low bitwidth CNN accelerator on FPGA using Winograd and block floating point arithmetic
Kwon et al. Mobile Transformer Accelerator Exploiting Various Line Sparsity and Tile-Based Dynamic Quantization
Liu et al. Tcp-net: Minimizing operation counts of binarized neural network inference
Kang et al. Design of convolution operation accelerator based on FPGA
Chen et al. DSSA: Dual-Side Sparse Systolic Array Architecture for Accelerating Convolutional Neural Network Training
Singh et al. A time domain 2D OaA-based convolutional neural networks accelerator
Kabir et al. FAMOUS: Flexible Accelerator for the Attention Mechanism of Transformer on UltraScale+ FPGAs