research-article

NN-LUT: neural approximation of non-linear operations for efficient transformer inference

Authors:

Jungwook ChoiAuthors Info & Claims

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

Pages 577 - 582

https://doi.org/10.1145/3489517.3530505

Published: 23 August 2022 Publication History

Get Access

Abstract

Non-linear operations such as GELU, Layer normalization, and Soft-max are essential yet costly building blocks of Transformer models. Several prior works simplified these operations with look-up tables or integer computations, but such approximations suffer inferior accuracy or considerable hardware cost with long latency. This paper proposes an accurate and hardware-friendly approximation framework for efficient Transformer inference. Our framework employs a simple neural network as a universal approximator with its structure equivalently transformed into a Look-up table(LUT). The proposed framework called Neural network generated LUT(NN-LUT) can accurately replace all the non-linear operations in popular BERT models with significant reductions in area, power consumption, and latency.

References

[1]

NVIDIA Deep Learning Accelerator. http://nvdla.org/primer.html.

Google Scholar

[2]

A. Cantoni. 1971. Optimal Curve Fitting With Piecewise Linear Functions. IEEE Trans. Comput. C-20, 1 (1971), 59--67.

Digital Library

Google Scholar

[3]

J. Chen and X. Liu. 2017. A high-performance deeply pipelined architecture for elementary transcendental function evaluation. In ICCD.

Google Scholar

[4]

G. Cybenko. 1989. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems 2, 4 (1989), 303--314.

Google Scholar

[5]

S. Eldridge, F. Raudies, D. Zou, and A. Joshi. 2014. Neural network-based accelerators for transcendental function approximation. In GLSVLSI.

Google Scholar

[6]

H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger. 2012. Neural acceleration for general-purpose approximate programs. In MICRO.

Google Scholar

[7]

J.-W. Jang et al. 2021. Sparsity-Aware and Re-configurable NPU Architecture for Samsung Flagship Mobile SoC. In ISCA.

Google Scholar

[8]

S. Kim et al. 2021. I-BERT: Integer-only BERT Quantizatio. In ICML.

Google Scholar

[9]

Z. Lu et al. 2017. The expressive power of neural networks. In NeurIPS.

Google Scholar

[10]

J. R. Stevens et al. 2021. Softermax: Hardware/Software Co-Design of an Efficient Softmax for Transformers. In DAC.

Google Scholar

[11]

A. Vaswani et al. 2017. Attention is all you need. In NeurIPS.

Google Scholar

[12]

H. Wang, Z. Zhang, and S. Han. 2021. SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning. In HPCA.

Google Scholar

[13]

W. Zhang et al. 2020. TernaryBERT: Distillation-aware Ultra-low Bit BERT. In EMNLP.

Google Scholar

Cited By

View all

Fan ZZhang XHuang MBu Z(2024)Sampleformer: An efficient conformer-based Neural Network for Automatic Speech RecognitionIntelligent Data Analysis10.3233/IDA-23061228:6(1647-1659)Online publication date: 15-Nov-2024
https://doi.org/10.3233/IDA-230612
Upadhyay MJuneja RWong WPeh L(2024)NOVA: NoC-based Vector Unit for Mapping Attention Layers on a CNN Accelerator2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546727(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546727
Sun RNi YHe XZhao JZou A(2024)ONE-SA: Enabling Nonlinear Operations in Systolic Arrays For Efficient and Flexible Neural Network Inference2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546535(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546535
Show More Cited By

Recommendations

An Efficient FIR Filter Structure Based on Technology-Optimized Multiply-Adder Unit Targeting LUT-Based FPGAs

Finite impulse response (FIR) filter is a fundamental element in digital signal processing (DSP) systems. Traditional implementations have been using application specific integrated circuits (ASICs) or DSP processors. However, the increase in logic ...
A novel defect classification system of cast-resin transformers by neural network under acoustic emission signal
IMCAS'07: Proceedings of the 6th WSEAS International Conference on Instrumentation, Measurement, Circuits and Systems

Degraded insulating property of electric equipments will lead to serious accident and great loss for the utilities and customers. Partial discharge detection is an efficient diagnosis method to prevent the failure of electric equipments arising from ...
A Non-linear Function Approximation from Small Samples Based on Nadaraya-Watson Kernel Regression
CICSYN '10: Proceedings of the 2010 2nd International Conference on Computational Intelligence, Communication Systems and Networks

Solving function approximation problem is to appropriately find the relationship between dependent variable and independent variable(s). Function approximation algorithms normally require sufficient amount of samples to approximate a function. However, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

July 2022

1462 pages

ISBN:9781450391429

DOI:10.1145/3489517

General Chair:
Rob Oshana
NXP

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 August 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Ministry of Trade, Industry Energy (MOTIE, Korea)

Conference

DAC '22

Sponsor:

SIGDA

DAC '22: 59th ACM/IEEE Design Automation Conference

July 10 - 14, 2022

California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
923
Total Downloads

Downloads (Last 12 months)444
Downloads (Last 6 weeks)66

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Fan ZZhang XHuang MBu Z(2024)Sampleformer: An efficient conformer-based Neural Network for Automatic Speech RecognitionIntelligent Data Analysis10.3233/IDA-23061228:6(1647-1659)Online publication date: 15-Nov-2024
https://doi.org/10.3233/IDA-230612
Upadhyay MJuneja RWong WPeh L(2024)NOVA: NoC-based Vector Unit for Mapping Attention Layers on a CNN Accelerator2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546727(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546727
Sun RNi YHe XZhao JZou A(2024)ONE-SA: Enabling Nonlinear Operations in Systolic Arrays For Efficient and Flexible Neural Network Inference2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546535(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546535
Zhong JLi YLiu SDuan JZhang XChen XSekar VYu MSeneviratne AVeitch D(2024)POSTER:In-network Model Inference for Distributed Systems via Programmable SwitchesProceedings of the ACM SIGCOMM 2024 Conference: Posters and Demos10.1145/3672202.3673749(75-77)Online publication date: 4-Aug-2024
https://dl.acm.org/doi/10.1145/3672202.3673749
Bai YZhao KLiu YWang HZhou HWu XYu JWang KDe V(2024)CSTrans-OPU: An FPGA-based Overlay Processor with Full Compilation for Transformer Networks via Sparsity ExplorationProceedings of the 61st ACM/IEEE Design Automation Conference10.1145/3649329.3657325(1-6)Online publication date: 23-Jun-2024
https://dl.acm.org/doi/10.1145/3649329.3657325
Seo MNguyen XHwang SKwon YKim GPark CKim IPark JKim JShin WWon JChoi HKim KKwon DJeong CLee SChoi YByun WBaek SLee HKim JTsafrir DMusuvathi MGupta RAbu-Ghazaleh N(2024)IANUS: Integrated Accelerator based on NPU-PIM Unified Memory SystemProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3620666.3651324(545-560)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3620666.3651324
Khataei ASingh GBazargan K(2024)SimBU: Self-Similarity-Based Hybrid Binary-Unary Computing for Nonlinear FunctionsIEEE Transactions on Computers10.1109/TC.2024.339851273:9(2192-2205)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1109/TC.2024.3398512
Han MWang LXiao LCai TWang ZXu XZhang C(2024)ReDas: A Lightweight Architecture for Supporting Fine-Grained Reshaping and Multiple Dataflows on Systolic ArrayIEEE Transactions on Computers10.1109/TC.2024.339850073:8(1997-2011)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1109/TC.2024.3398500
Li CTsourdos AGuo W(2024)A Transistor Operations Model for Deep Learning Energy Consumption Scaling LawIEEE Transactions on Artificial Intelligence10.1109/TAI.2022.32292805:1(192-204)Online publication date: Jan-2024
https://doi.org/10.1109/TAI.2022.3229280
Mao YKuang HLuk WWang L(2024)PWL- Explorer: A Reconfigurable Architecture for Nonlinear Activation Function with Automatic DSE2024 2nd International Symposium of Electronics Design Automation (ISEDA)10.1109/ISEDA62518.2024.10618045(210-215)Online publication date: 10-May-2024
https://doi.org/10.1109/ISEDA62518.2024.10618045
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

An Efficient FIR Filter Structure Based on Technology-Optimized Multiply-Adder Unit Targeting LUT-Based FPGAs

A novel defect classification system of cast-resin transformers by neural network under acoustic emission signal

A Non-linear Function Approximation from Small Samples Based on Nadaraya-Watson Kernel Regression