research-article

<italic>Xvpfloat</italic>: RISC-V ISA Extension for Variable Extended Precision Floating Point Computation

Authors:

Eric Guthmuller,

Jérôme Fereyre,

Riccardo Alidori,

Yves DurandAuthors Info & Claims

IEEE Transactions on Computers, Volume 73, Issue 7

Pages 1683 - 1697

https://doi.org/10.1109/TC.2024.3383964

Published: 02 April 2024 Publication History

Abstract

A key concern in the field of scientific computation is the convergence of numerical solvers when applied to large problems. The numerical workarounds used to improve convergence are often problem specific, time consuming and require skilled numerical analysts. An alternative is to simply increase the working precision of the computation, but this is difficult due to the lack of efficient hardware support for extended precision. We propose <italic>Xvpfloat</italic>, a RISC-V ISA extension for dynamically variable and extended precision computation, a hardware implementation and a full software stack. Our architecture provides a comprehensive implementation of this ISA, with up to 512 bits of significand, including full support for common rounding modes and heterogeneous precision arithmetic operations. The memory subsystem handles IEEE 754 extendable formats, and features specialized indexed loads and stores with hardware-assisted prefetching. This processor can either operate standalone or as an accelerator for a general purpose host. We demonstrate that the number of solver iterations can be reduced up to <inline-formula><tex-math notation="LaTeX">$5\boldsymbol{\times}$</tex-math><alternatives><mml:math><mml:mn>5</mml:mn><mml:mo mathvariant="bold">×</mml:mo></mml:math><inline-graphic xlink:href="guthmuller-ieq1-3383964.gif"/></alternatives></inline-formula> and, for certain, difficult problems, convergence is only possible with very high precision (<inline-formula><tex-math notation="LaTeX">$\boldsymbol{\geq}$</tex-math><alternatives><mml:math><mml:mo mathvariant="bold">≥</mml:mo></mml:math><inline-graphic xlink:href="guthmuller-ieq2-3383964.gif"/></alternatives></inline-formula>384 bits). This accelerator provides a new approach to accelerate large scale scientific computing.

References

[1]

H. Anzt et al., “Accelerating the conjugate gradient algorithm with GPUs in CFD simulations,” in High Performance Computing for Computational Science–VECPAR 2016, Lecture Notes in Computer Science, vol. 10150, Cham, Switzerland: Springer, Jul. 2017, pp. 35–43.

[2]

D. Bailey, R. Barrio, and J. Borwein, “High-precision computation: Mathematical physics and dynamics,” Appl. Math. Comput., vol. 218, no. 20, pp. 10106–10121, Jun. 2012.

[3]

G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd ed. vol. 1. Baltimore, MD, USA: The Johns Hopkins Univ. Press, Oct. 1996.

[4]

Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, in Templates for the Solution of Algebraic Eigenvalue Problems, Philadelphia, PA, USA: Society for Industrial and Applied Mathematics, 2000.

[5]

M. Benzi, “Preconditioning techniques for large linear systems: A survey,” J. Comput. Phys., vol. 182, no. 2, pp. 418–477, Nov. 2002.

Digital Library

[6]

A. Hoffmann, Y. Durand, and J. Fereyre, “Accelerating spectral elements method with extended precision: A case study,” in Proc. 12th Int. Conf. Pure Appl. Math. (ICPAM), 2023.

[7]

Y. Hida, X. Li, and D. Bailey, “Algorithms for quad-double precision floating point arithmetic,” in Proc. 15th IEEE Symp. Comput. Arithmetic (ARITH), Jun. 2001, pp. 155–162.

[8]

L. Fousse, G. Hanrot, V. Lefevre, P. Pélissier, and P. Zimmermann, “MPFR: A multiple-precision binary floating-point library with correct rounding,” ACM Trans. Math. Softw., vol. 33, no. 2, 13–es, Jun. 2007.

Digital Library

[9]

“GCC libquadmath.” GCC Contributors. Accessed: Apr. 10, 2024. [Online]. Available: https://gcc.gnu.org/onlinedocs/libquadmath/

[10]

F. Zaruba and L. Benini, “The cost of application-class processing: Energy and performance analysis of a Linux-ready 1.7-GHz 64-bit RISC-V core in 22-nm FDSOI technology,” IEEE Trans. Very Large Scale Integration (VLSI) Syst., vol. 27, no. 11, pp. 2629–2640, Jul. 2019.

Digital Library

[11]

C. Fuguet, “HPDcache: Open-source high-performance L1 data cache for RISC-V cores,” in Proc. 20th ACM Int. Conf. Comput. Frontiers, May 2023, p. 385. Accessed: Apr. 10, 2024. [Online]. Available:https://hal-cea.archives-ouvertes.fr/cea-04110679

[12]

J. H. Wilkinson, Rounding Errors in Algebraic Processes. Englewood Cliffs, NJ, USA: Prentice-Hall, 1964.

[13]

M. S. Cohen, T. E. Hull, and V. C. Hamacher, “CADAC: A controlled-precision decimal arithmetic unit,” IEEE Trans. Comput., vol. C-32, no. 4, pp. 370–377, Apr. 1983.

Digital Library

[14]

T. Carter, “Cascade: hardware for high/variable precision arithmetic,” in Proc. 9th Symp. Comput. Arithmetic (ARITH), Sep. 1989, pp. 184–191.

[15]

R. Parthasarathi, E. Raman, K. Sankaranarayanan, and L. Chakrapani, “A reconfigurable co-processor for variable long precision arithmetic using Indian algorithms,” in Proc. 9th Annu. IEEE Symp. Field-Programmable Custom Comput. Mach. (FCCM’01), 2001, pp. 71–80.

[16]

C. Lichtenau, S. Carlough, and S. M. Mueller, “Quad precision floating point on the IBM z13,” in Proc. IEEE 23nd Symp. Comput. Arithmetic (ARITH), Jul. 2016, pp. 87–94.

[17]

Y. Lei, Y. Dou, J. Zhou, and S. Wang, “VPFPAP: A special-purpose VLIW processor for variable-precision floating-point arithmetic,” in Proc. 21st Int. Conf. Field Programmable Log. Appl., Sep. 2011, pp. 252–257.

Digital Library

[18]

F. Glaser, S. Mach, A. Rahimi, F. K. Gurkaynak, Q. Huang, and L. Benini, “An 826 MOPS, 210uW/MHz Unum ALU in 65 nm,” in Proc. IEEE Int. Symp. Circuits Syst., Piscataway, NJ, USA: IEEE Press, May 2018, pp. 1–5.

[19]

A. Bocco, Y. Durand, and F. De Dinechin, “SMURF: Scalar multiple-precision Unum RISC-V floating-point accelerator for scientific computing,” in Proc. Conf. Next Gener. Arithmetic (CoNGA’19), Mar. 2019, pp. 1–8.

Digital Library

[20]

V. Lefevre and P. Zimmermann, “Optimized binary64 and binary128 arithmetic with GNU MPFR,” in Proc. IEEE 24th Symp. Comput. Arithmetic (ARITH), Jul. 2017, pp. 18–26.

[21]

S. Graillat and V. Ménissier-Morain, “Error-free transformations in real and complex floating point arithmetic,” in Proc. Int. Symp. Nonlinear Theory Appl. (NOLTA’07), Vancouver, Canada, Sep. 2007, pp. 341–344. Accessed: Apr. 10, 2024. [Online]. Available: https://hal.archives-ouvertes.fr/hal-01306229

[22]

J. Verschelde, “Least squares on GPUs in multiple double precision,” in Proc. IEEE Int. Parallel Distrib. Process. Symp. (IPDPS), Los Alamitos, CA, USA: IEEE Comput. Soc. Press, Jun. 2022, pp. 828–837.

[23]

M. Joldes, J.-M. Muller, and V. Popescu, “Implementation and performance evaluation of an extended precision floating-point arithmetic library for high-accuracy semidefinite programming,” in Proc. IEEE 24th Symp. Comput. Arithmetic (ARITH), Jul. 2017, pp. 27–34.

[24]

T. Hishinuma and M. Nakata, “pzqd: PEZY-SC2 acceleration of double-double precision arithmetic library for high-precision BLAS,” in Proc. Int. Conf. Comput. Exp. Eng. Sci., New York, NY, USA: Springer-Verlag, Nov. 2019, pp. 717–736.

[25]

“IEEE Standard for Floating-Point Arithmetic,” in IEEE Std 754-2008, pp. 1–70, Aug. 2008.

[26]

J. L. Gustafson, The End of Error: UNUM Computing. London, U.K.: Chapman & Hall, 2017.

[27]

J. L. Gustafson and I. Yonemoto, “Beating floating point at its own game: Posit arithmetic,” Supercomput. Frontiers Innovations, vol. 4, no. 2, pp. 71–86, Jul. 2017.

Digital Library

[28]

Y. Uguen, L. Forget, and F. Dinechin, “Evaluating the hardware cost of the posit number system,” in Proc. 29th Int. Conf. Field Programmable Log. Appl. (FPL), Sep. 2019, pp. 106–113.

[29]

D. Ma and M. Saunders, “Experiments with quad precision for iterative solvers,” in Proc. SIAM Conf. Optim., 2014, pp. 1–38.

[30]

Y. Uguen and F. de Dinechin, “Exploration architecturale de l’accumulateur de Kulisch,” in Conférence d’informatique en Parallélisme, Archit. et Systeme (Compas), 2017, pp. 1–8. Accessed: Apr. 10, 2024. [Online]. Available: https://hal.inria.fr/hal-02131977

[31]

M. J. Schulte and E. E. Swartzlander, “A family of variable-precision interval arithmetic processors,” IEEE Trans. Comput., vol. 49, no. 5, pp. 387–397, May 2000.

Digital Library

[32]

Y. Lei, Y. Dou, S. Guo, and J. Zhou, “FPGA implementation of variable-precision floating-point arithmetic,” in Proc. 9th Int. Conf. Adv. Parallel Process. Technol. (APPT’11), 2011, pp. 127–141.

[33]

H. Daisaka, N. Nakasato, J. Makino, F. Yuasa, and T. Ishikawa, “GRAPE-MP: An SIMD accelerator board for multi-precision arithmetic,” Procedia Comput. Sci., vol. 4, pp. 878–887, May 2011.

[34]

OpenHW Group, “HPDCache RTL code.” GitHub. Accessed: Apr. 10, 2024. [Online]. Available: https://github.com/openhwgroup/cv-hpdcache

[35]

EPI Consortium, “EPI website.” European Processor Initiative. Accessed: Apr. 10, 2024. [Online]. Available: https://www.european-processor-initiative.eu/

[36]

R. Vuduc, J. W. Demmel, and K. A. Yelick, “OSKI: A library of automatically tuned sparse matrix kernels,” J. Phys. Conf. Ser., vol. 16, no. 1, Jan. 2005, Art. no.

[37]

T. A. Davis and Y. Hu, “The University of Florida sparse matrix collection,” in ACM Trans. Math. Softw., vol. 38, no. 1, pp. 1–25, Dec. 2011.

Digital Library

[38]

N. J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd ed. Soc. Ind. Appl. Math., 2002, pp. 1–663.

[39]

Y. Durand, E. Guthmuller, C. Fuguet, J. Fereyre, A. Bocco, and R. Alidori, “Accelerating variants of the conjugate gradient with the variable precision processor,” in Proc. IEEE 29th Symp. Comput. Arithmetic (ARITH), Sep. 2022, pp. 51–57.

[40]

M. Horowitz, “1.1 Computing's energy problem (and what we can do about it),” in Proc. IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC), Mar. 2014, pp. 10–14.

Recommendations

Intel® Itanium® floating-point architecture
WCAE '03: Proceedings of the 2003 workshop on Computer architecture education: Held in conjunction with the 30th International Symposium on Computer Architecture

The Intel® Itanium® architecture is increasingly becoming one of the major processor architectures present in the market today. Launched in 2001, the Intel Itanium processor was followed in 2002 by the Itanium 2 processor, with increased integer and ...
Multithreading extension for Thumb ISA and decoder support
EHAC'06: Proceedings of the 5th WSEAS International Conference on Electronics, Hardware, Wireless and Optical Communications

Dual width instruction set embedded processors such as ARM provide 16-bit instruction set in addition to the 32-bit instructions set for lower energy and memory cost. The combination of hardware multithreading technique with the 16-bit code design can ...
RISC-V ISA Extension Toolchain Supports: A Survey
CNIOT '23: Proceedings of the 2023 4th International Conference on Computing, Networks and Internet of Things

RISC-V is an open source modular and scalable emerging instruction set. As the RISC-V architecture gradually matures in the field of contemporary chips, the RISC-V software ecosystem is also gradually prospering. Some mainstream tool chains and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers

IEEE Transactions on Computers Volume 73, Issue 7

July 2024

243 pages

Issue’s Table of Contents

© 2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/.

Publisher

IEEE Computer Society

United States

Publication History

Published: 02 April 2024

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents