Computer Science > Hardware Architecture

arXiv:2407.02362 (cs)

[Submitted on 2 Jul 2024 (v1), last revised 7 Jul 2024 (this version, v2)]

Title:Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA

Authors:Xuqi Zhu, Huaizhi Zhang, JunKyu Lee, Jiacheng Zhu, Chandrajit Pal, Sangeet Saha, Klaus D. McDonald-Maier, Xiaojun Zhai

View PDF HTML (experimental)

Abstract:Modern Neural Network (NN) architectures heavily rely on vast numbers of multiply-accumulate arithmetic operations, constituting the predominant computational cost. Therefore, this paper proposes a high-throughput, scalable and energy efficient non-element-wise matrix multiplication unit on FPGAs as a basic component of the NNs. We firstly streamline inter-layer and intra-layer redundancies of MADDNESS algorithm, a LUT-based approximate matrix multiplication, to design a fast, efficient scalable approximate matrix multiplication module termed "Approximate Multiplication Unit (AMU)". The AMU optimizes LUT-based matrix multiplications further through dedicated memory management and access design, decoupling computational overhead from input resolution and boosting FPGA-based NN accelerator efficiency significantly. The experimental results show that using our AMU achieves up to 9x higher throughput and 112x higher energy efficiency over the state-of-the-art solutions for the FPGA-based Quantised Neural Network (QNN) accelerators.

Subjects:	Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2407.02362 [cs.AR]
	(or arXiv:2407.02362v2 [cs.AR] for this version)
	https://doi.org/10.48550/arXiv.2407.02362

Submission history

From: Xuqi Zhu [view email]
[v1] Tue, 2 Jul 2024 15:28:10 UTC (1,121 KB)
[v2] Sun, 7 Jul 2024 17:20:51 UTC (1,121 KB)

Computer Science > Hardware Architecture

Title:Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Hardware Architecture

Title:Fast, Scalable, Energy-Efficient Non-element-wise Matrix Multiplication on FPGA

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators