A Comparison of FPGA Implementations of Bit-Level and Word-Level Matrix Multipliers

Radhika S. Grover⁶,
Weijia Shang⁶ &
Qiang Li⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1896))

Included in the following conference series:

International Workshop on Field Programmable Logic and Applications

673 Accesses
1 Citations

Abstract

We have implemented a novel bit-level matrix multiplier on a Xilinx FPGA chip where each processing element does a simple operation of adding three to six bits to generate one partial sum bit and one to two carryout bits. The speedup over word-level is possible because individual bits of a word do not have to be processed as a unit in a bit-level architecture. It is shown in a previous work that bit-level architectures for fixed point applications can be O(log p) times faster than the corresponding word-level architecture where p is the word length. In this paper we implemented the bit-level matrix multiplier on a Xilinx FPGA chip that is compared to a word-level matrix multiplier composed of highly optimized multiplier and adder macros available in the Xilinx Core generator library. The architecture presented in this paper is even faster than previous ones by breaking the critical path in the dependence graph into half. Our results show that speedup by a factor of 2 can be obtained in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Chisel Usecase: Designing General Matrix Multiply for FPGA

FPGA-Based Multi-precision Architecture for Accelerating Large-Scale Floating-Point Matrix Computing

Algorithm-oriented design of efficient many-core architectures applied to dense matrix multiplication

Article 04 December 2014

References

C. W. Li and B. W. Wah, “Optimal Bit-Level Processor Arrays for Matrix Multiplication,” Proc. 11th Int’l Conf. on Systems Engineering, Univ. of Nevada, NV, July 1996, pp. 596–601.
Google Scholar
W. Shang, B. W. Wah, “Dependence Analysis and Architecture Design for Bit-Level Algorithms”, Intl. Conf. On Parallel Process., vol. I, pp30–38, 1993.
Google Scholar
R. S. Grover, W. Shang, Q. Li, “An Improved architecture for bit-level matrix multiplication”, To be published in Conf. Proc. PDPTA ‘2000, Las Vegas, USA, June 26–29, 2000.
Google Scholar
R. S. Grover, W. Shang, Q. Li, Technical Report number: coen00-01, Department of Computer Engineering, Santa Clara University, Santa Clara, CA 95053, May 2000.
Google Scholar
Z. Yang, W. Shang and J. A. B. Fortes, “Conflict-Free Scheduling of Nested Loop Algorithms on Lower Dimensional Processor Arrays”, Proc. 6th IEEE Int’l Parallel Processing Symposium, March 1992, Beverly Hills, CA, pp. 156–164.
Google Scholar
W. Shang and J. A. B. Fortes, “On Mapping of Uniform Dependence Algorithms into. Lower Dimensional Processor Arrays,” IEEE Trans. on Parallel and Distributed Systems, Vol. 3, No. 3, May 1992, pp. 350–363.
Article Google Scholar
John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, Inc., 1990.
Google Scholar
XC4000E and XC4000X FPGA Series — Description v 1.6. Xilinx Data Book, 1999
Google Scholar
CORE Generator & IP Modules — Documentation & Data Sheets, Xilinx, 1999
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Santa Clara University, Santa Clara, CA, USA
Radhika S. Grover, Weijia Shang & Qiang Li

Authors

Radhika S. Grover
View author publications
You can also search for this author in PubMed Google Scholar
Weijia Shang
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, University of Kaiserslautern, P. O. Box. 30 49, 67653, Kaiserslautern, Germany
Reiner W. Hartenstein
Carinthia Tech Institute, Richard-Wagner-Str. 19, 9500, Villach, Austria
Herbert Grünbacher

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grover, R.S., Shang, W., Li, Q. (2000). A Comparison of FPGA Implementations of Bit-Level and Word-Level Matrix Multipliers. In: Hartenstein, R.W., Grünbacher, H. (eds) Field-Programmable Logic and Applications: The Roadmap to Reconfigurable Computing. FPL 2000. Lecture Notes in Computer Science, vol 1896. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44614-1_45

Download citation

DOI: https://doi.org/10.1007/3-540-44614-1_45
Published: 12 April 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67899-1
Online ISBN: 978-3-540-44614-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

A Comparison of FPGA Implementations of Bit-Level and Word-Level Matrix Multipliers

Abstract

Access this chapter

Preview

Similar content being viewed by others

Chisel Usecase: Designing General Matrix Multiply for FPGA

FPGA-Based Multi-precision Architecture for Accelerating Large-Scale Floating-Point Matrix Computing

Algorithm-oriented design of efficient many-core architectures applied to dense matrix multiplication

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Comparison of FPGA Implementations of Bit-Level and Word-Level Matrix Multipliers

Abstract

Access this chapter

Preview

Similar content being viewed by others

Chisel Usecase: Designing General Matrix Multiply for FPGA

FPGA-Based Multi-precision Architecture for Accelerating Large-Scale Floating-Point Matrix Computing

Algorithm-oriented design of efficient many-core architectures applied to dense matrix multiplication

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation