Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Low-Power Multiple-Precision Iterative Floating-Point Multiplier with SIMD Support

Published: 01 February 2009 Publication History

Abstract

The demand for improved SIMD floating-point performance on general-purpose x86-compatible microprocessors is rising. At the same time, there is a conflicting demand in the low-power computing market for a reduction in power consumption. Along with this, there is the absolute necessity of backward compatibility for x86-compatible microprocessors, which includes the support of x87 scientific floating-point instructions. The combined effect is that there is a need for low-power, low-cost floating-point units that are still capable of delivering good SIMD performance while maintaining full x86 functionality. This paper presents the design of an x86-compatible floating-point multiplier (FPM) that is compliant with the IEEE-754 Standard for Binary Floating-Point Arithmetic [12] and is specifically tailored to provide good SIMD performance in a low-cost, low-power solution while maintaining full x87 backward compatibility. The FPM efficiently supports multiple precisions using an iterative rectangular multiplier. The FPM can perform two parallel single-precision multiplies every cycle with a latency of two cycles, one double-precision multiply every two cycles with a latency of four cycles, or one extended-double-precision multiply every three cycles with a latency of five cycles. The iterative FPM also supports division, square-root, and transcendental functions. Compared to a previous design with similar functionality, the proposed iterative FPM has 60 percent less area and 59 percent less dynamic power dissipation.

Cited By

View all
  • (2024)M3XU: Achieving High-Precision and Complex Matrix Multiplication with Low-Precision MXUsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00016(1-16)Online publication date: 17-Nov-2024
  • (2020)New Flexible Multiple-Precision Multiply-Accumulate Unit for Deep Neural Network Training and InferenceIEEE Transactions on Computers10.1109/TC.2019.293619269:1(26-38)Online publication date: 3-Jan-2020
  • (2020)Implementation of high precision/low latency FP divider using Urdhva–Tiryakbhyam multiplier for SoC applicationsDesign Automation for Embedded Systems10.1007/s10617-019-09225-224:2(111-125)Online publication date: 1-Jun-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers
IEEE Transactions on Computers  Volume 58, Issue 2
February 2009
144 pages

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 February 2009

Author Tags

  1. Algorithms
  2. Arithmetic and Logic Structures
  3. Computer arithmetic
  4. Cost/performance
  5. High-Speed Arithmetic
  6. Parallel
  7. Pipeline
  8. floating-point arithmetic
  9. low-power
  10. multimedia
  11. multiplying circuits
  12. rectangular multiplier
  13. very-large-scale integration.

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)M3XU: Achieving High-Precision and Complex Matrix Multiplication with Low-Precision MXUsProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00016(1-16)Online publication date: 17-Nov-2024
  • (2020)New Flexible Multiple-Precision Multiply-Accumulate Unit for Deep Neural Network Training and InferenceIEEE Transactions on Computers10.1109/TC.2019.293619269:1(26-38)Online publication date: 3-Jan-2020
  • (2020)Implementation of high precision/low latency FP divider using Urdhva–Tiryakbhyam multiplier for SoC applicationsDesign Automation for Embedded Systems10.1007/s10617-019-09225-224:2(111-125)Online publication date: 1-Jun-2020
  • (2018)Throughput enhancement of SISO parallel LTE turbo decoders using floating point turbo decoding algorithmInternational Journal of Wireless and Mobile Computing10.5555/3282783.328279115:1(58-66)Online publication date: 1-Jan-2018
  • (2014)Ultra-low-power adder stage design for exascale floating point unitsACM Transactions on Embedded Computing Systems10.1145/256793213:3s(1-24)Online publication date: 28-Mar-2014
  • (2013)An exact method for estimating maximum errors of multi-mode floating-point iterative booth multiplierInternational Journal of Computational Science and Engineering10.1504/IJCSE.2013.0572958:4(306-315)Online publication date: 1-Oct-2013
  • (2013)Energy-Efficient Multiple-Precision Floating-Point Multiplier for Embedded ApplicationsJournal of Signal Processing Systems10.1007/s11265-012-0695-172:1(43-55)Online publication date: 1-Jul-2013

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media