Abstract
We have implemented a novel bit-level matrix multiplier on a Xilinx FPGA chip where each processing element does a simple operation of adding three to six bits to generate one partial sum bit and one to two carryout bits. The speedup over word-level is possible because individual bits of a word do not have to be processed as a unit in a bit-level architecture. It is shown in a previous work that bit-level architectures for fixed point applications can be O(log p) times faster than the corresponding word-level architecture where p is the word length. In this paper we implemented the bit-level matrix multiplier on a Xilinx FPGA chip that is compared to a word-level matrix multiplier composed of highly optimized multiplier and adder macros available in the Xilinx Core generator library. The architecture presented in this paper is even faster than previous ones by breaking the critical path in the dependence graph into half. Our results show that speedup by a factor of 2 can be obtained in practice.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
C. W. Li and B. W. Wah, “Optimal Bit-Level Processor Arrays for Matrix Multiplication,” Proc. 11th Int’l Conf. on Systems Engineering, Univ. of Nevada, NV, July 1996, pp. 596–601.
W. Shang, B. W. Wah, “Dependence Analysis and Architecture Design for Bit-Level Algorithms”, Intl. Conf. On Parallel Process., vol. I, pp30–38, 1993.
R. S. Grover, W. Shang, Q. Li, “An Improved architecture for bit-level matrix multiplication”, To be published in Conf. Proc. PDPTA ‘2000, Las Vegas, USA, June 26–29, 2000.
R. S. Grover, W. Shang, Q. Li, Technical Report number: coen00-01, Department of Computer Engineering, Santa Clara University, Santa Clara, CA 95053, May 2000.
Z. Yang, W. Shang and J. A. B. Fortes, “Conflict-Free Scheduling of Nested Loop Algorithms on Lower Dimensional Processor Arrays”, Proc. 6th IEEE Int’l Parallel Processing Symposium, March 1992, Beverly Hills, CA, pp. 156–164.
W. Shang and J. A. B. Fortes, “On Mapping of Uniform Dependence Algorithms into. Lower Dimensional Processor Arrays,” IEEE Trans. on Parallel and Distributed Systems, Vol. 3, No. 3, May 1992, pp. 350–363.
John L. Hennessy and David A. Patterson. Computer Architecture: A Quantitative Approach. Morgan Kaufmann Publishers, Inc., 1990.
XC4000E and XC4000X FPGA Series — Description v 1.6. Xilinx Data Book, 1999
CORE Generator & IP Modules — Documentation & Data Sheets, Xilinx, 1999
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Grover, R.S., Shang, W., Li, Q. (2000). A Comparison of FPGA Implementations of Bit-Level and Word-Level Matrix Multipliers. In: Hartenstein, R.W., Grünbacher, H. (eds) Field-Programmable Logic and Applications: The Roadmap to Reconfigurable Computing. FPL 2000. Lecture Notes in Computer Science, vol 1896. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44614-1_45
Download citation
DOI: https://doi.org/10.1007/3-540-44614-1_45
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67899-1
Online ISBN: 978-3-540-44614-9
eBook Packages: Springer Book Archive