Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

<italic>Xvpfloat</italic>: RISC-V ISA Extension for Variable Extended Precision Floating Point Computation

Published: 02 April 2024 Publication History

Abstract

A key concern in the field of scientific computation is the convergence of numerical solvers when applied to large problems. The numerical workarounds used to improve convergence are often problem specific, time consuming and require skilled numerical analysts. An alternative is to simply increase the working precision of the computation, but this is difficult due to the lack of efficient hardware support for extended precision. We propose <italic>Xvpfloat</italic>, a RISC-V ISA extension for dynamically variable and extended precision computation, a hardware implementation and a full software stack. Our architecture provides a comprehensive implementation of this ISA, with up to 512 bits of significand, including full support for common rounding modes and heterogeneous precision arithmetic operations. The memory subsystem handles IEEE 754 extendable formats, and features specialized indexed loads and stores with hardware-assisted prefetching. This processor can either operate standalone or as an accelerator for a general purpose host. We demonstrate that the number of solver iterations can be reduced up to <inline-formula><tex-math notation="LaTeX">$5\boldsymbol{\times}$</tex-math><alternatives><mml:math><mml:mn>5</mml:mn><mml:mo mathvariant="bold">&#x000D7;</mml:mo></mml:math><inline-graphic xlink:href="guthmuller-ieq1-3383964.gif"/></alternatives></inline-formula> and, for certain, difficult problems, convergence is only possible with very high precision (<inline-formula><tex-math notation="LaTeX">$\boldsymbol{\geq}$</tex-math><alternatives><mml:math><mml:mo mathvariant="bold">&#x02265;</mml:mo></mml:math><inline-graphic xlink:href="guthmuller-ieq2-3383964.gif"/></alternatives></inline-formula>384 bits). This accelerator provides a new approach to accelerate large scale scientific computing.

References

[1]
H. Anzt et al., “Accelerating the conjugate gradient algorithm with GPUs in CFD simulations,” in High Performance Computing for Computational Science–VECPAR 2016, Lecture Notes in Computer Science, vol. 10150, Cham, Switzerland: Springer, Jul. 2017, pp. 35–43.
[2]
D. Bailey, R. Barrio, and J. Borwein, “High-precision computation: Mathematical physics and dynamics,” Appl. Math. Comput., vol. 218, no. 20, pp. 10106–10121, Jun. 2012.
[3]
G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd ed. vol. 1. Baltimore, MD, USA: The Johns Hopkins Univ. Press, Oct. 1996.
[4]
Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst, in Templates for the Solution of Algebraic Eigenvalue Problems, Philadelphia, PA, USA: Society for Industrial and Applied Mathematics, 2000.
[5]
M. Benzi, “Preconditioning techniques for large linear systems: A survey,” J. Comput. Phys., vol. 182, no. 2, pp. 418–477, Nov. 2002.
[6]
A. Hoffmann, Y. Durand, and J. Fereyre, “Accelerating spectral elements method with extended precision: A case study,” in Proc. 12th Int. Conf. Pure Appl. Math. (ICPAM), 2023.
[7]
Y. Hida, X. Li, and D. Bailey, “Algorithms for quad-double precision floating point arithmetic,” in Proc. 15th IEEE Symp. Comput. Arithmetic (ARITH), Jun. 2001, pp. 155–162.
[8]
L. Fousse, G. Hanrot, V. Lefevre, P. Pélissier, and P. Zimmermann, “MPFR: A multiple-precision binary floating-point library with correct rounding,” ACM Trans. Math. Softw., vol. 33, no. 2, 13–es, Jun. 2007.
[9]
“GCC libquadmath.” GCC Contributors. Accessed: Apr. 10, 2024. [Online]. Available: https://gcc.gnu.org/onlinedocs/libquadmath/
[10]
F. Zaruba and L. Benini, “The cost of application-class processing: Energy and performance analysis of a Linux-ready 1.7-GHz 64-bit RISC-V core in 22-nm FDSOI technology,” IEEE Trans. Very Large Scale Integration (VLSI) Syst., vol. 27, no. 11, pp. 2629–2640, Jul. 2019.
[11]
C. Fuguet, “HPDcache: Open-source high-performance L1 data cache for RISC-V cores,” in Proc. 20th ACM Int. Conf. Comput. Frontiers, May 2023, p. 385. Accessed: Apr. 10, 2024. [Online]. Available:https://hal-cea.archives-ouvertes.fr/cea-04110679
[12]
J. H. Wilkinson, Rounding Errors in Algebraic Processes. Englewood Cliffs, NJ, USA: Prentice-Hall, 1964.
[13]
M. S. Cohen, T. E. Hull, and V. C. Hamacher, “CADAC: A controlled-precision decimal arithmetic unit,” IEEE Trans. Comput., vol. C-32, no. 4, pp. 370–377, Apr. 1983.
[14]
T. Carter, “Cascade: hardware for high/variable precision arithmetic,” in Proc. 9th Symp. Comput. Arithmetic (ARITH), Sep. 1989, pp. 184–191.
[15]
R. Parthasarathi, E. Raman, K. Sankaranarayanan, and L. Chakrapani, “A reconfigurable co-processor for variable long precision arithmetic using Indian algorithms,” in Proc. 9th Annu. IEEE Symp. Field-Programmable Custom Comput. Mach. (FCCM’01), 2001, pp. 71–80.
[16]
C. Lichtenau, S. Carlough, and S. M. Mueller, “Quad precision floating point on the IBM z13,” in Proc. IEEE 23nd Symp. Comput. Arithmetic (ARITH), Jul. 2016, pp. 87–94.
[17]
Y. Lei, Y. Dou, J. Zhou, and S. Wang, “VPFPAP: A special-purpose VLIW processor for variable-precision floating-point arithmetic,” in Proc. 21st Int. Conf. Field Programmable Log. Appl., Sep. 2011, pp. 252–257.
[18]
F. Glaser, S. Mach, A. Rahimi, F. K. Gurkaynak, Q. Huang, and L. Benini, “An 826 MOPS, 210uW/MHz Unum ALU in 65 nm,” in Proc. IEEE Int. Symp. Circuits Syst., Piscataway, NJ, USA: IEEE Press, May 2018, pp. 1–5.
[19]
A. Bocco, Y. Durand, and F. De Dinechin, “SMURF: Scalar multiple-precision Unum RISC-V floating-point accelerator for scientific computing,” in Proc. Conf. Next Gener. Arithmetic (CoNGA’19), Mar. 2019, pp. 1–8.
[20]
V. Lefevre and P. Zimmermann, “Optimized binary64 and binary128 arithmetic with GNU MPFR,” in Proc. IEEE 24th Symp. Comput. Arithmetic (ARITH), Jul. 2017, pp. 18–26.
[21]
S. Graillat and V. Ménissier-Morain, “Error-free transformations in real and complex floating point arithmetic,” in Proc. Int. Symp. Nonlinear Theory Appl. (NOLTA’07), Vancouver, Canada, Sep. 2007, pp. 341–344. Accessed: Apr. 10, 2024. [Online]. Available: https://hal.archives-ouvertes.fr/hal-01306229
[22]
J. Verschelde, “Least squares on GPUs in multiple double precision,” in Proc. IEEE Int. Parallel Distrib. Process. Symp. (IPDPS), Los Alamitos, CA, USA: IEEE Comput. Soc. Press, Jun. 2022, pp. 828–837.
[23]
M. Joldes, J.-M. Muller, and V. Popescu, “Implementation and performance evaluation of an extended precision floating-point arithmetic library for high-accuracy semidefinite programming,” in Proc. IEEE 24th Symp. Comput. Arithmetic (ARITH), Jul. 2017, pp. 27–34.
[24]
T. Hishinuma and M. Nakata, “pzqd: PEZY-SC2 acceleration of double-double precision arithmetic library for high-precision BLAS,” in Proc. Int. Conf. Comput. Exp. Eng. Sci., New York, NY, USA: Springer-Verlag, Nov. 2019, pp. 717–736.
[25]
“IEEE Standard for Floating-Point Arithmetic,” in IEEE Std 754-2008, pp. 1–70, Aug. 2008.
[26]
J. L. Gustafson, The End of Error: UNUM Computing. London, U.K.: Chapman & Hall, 2017.
[27]
J. L. Gustafson and I. Yonemoto, “Beating floating point at its own game: Posit arithmetic,” Supercomput. Frontiers Innovations, vol. 4, no. 2, pp. 71–86, Jul. 2017.
[28]
Y. Uguen, L. Forget, and F. Dinechin, “Evaluating the hardware cost of the posit number system,” in Proc. 29th Int. Conf. Field Programmable Log. Appl. (FPL), Sep. 2019, pp. 106–113.
[29]
D. Ma and M. Saunders, “Experiments with quad precision for iterative solvers,” in Proc. SIAM Conf. Optim., 2014, pp. 1–38.
[30]
Y. Uguen and F. de Dinechin, “Exploration architecturale de l’accumulateur de Kulisch,” in Conférence d’informatique en Parallélisme, Archit. et Systeme (Compas), 2017, pp. 1–8. Accessed: Apr. 10, 2024. [Online]. Available: https://hal.inria.fr/hal-02131977
[31]
M. J. Schulte and E. E. Swartzlander, “A family of variable-precision interval arithmetic processors,” IEEE Trans. Comput., vol. 49, no. 5, pp. 387–397, May 2000.
[32]
Y. Lei, Y. Dou, S. Guo, and J. Zhou, “FPGA implementation of variable-precision floating-point arithmetic,” in Proc. 9th Int. Conf. Adv. Parallel Process. Technol. (APPT’11), 2011, pp. 127–141.
[33]
H. Daisaka, N. Nakasato, J. Makino, F. Yuasa, and T. Ishikawa, “GRAPE-MP: An SIMD accelerator board for multi-precision arithmetic,” Procedia Comput. Sci., vol. 4, pp. 878–887, May 2011.
[34]
OpenHW Group, “HPDCache RTL code.” GitHub. Accessed: Apr. 10, 2024. [Online]. Available: https://github.com/openhwgroup/cv-hpdcache
[35]
EPI Consortium, “EPI website.” European Processor Initiative. Accessed: Apr. 10, 2024. [Online]. Available: https://www.european-processor-initiative.eu/
[36]
R. Vuduc, J. W. Demmel, and K. A. Yelick, “OSKI: A library of automatically tuned sparse matrix kernels,” J. Phys. Conf. Ser., vol. 16, no. 1, Jan. 2005, Art. no.
[37]
T. A. Davis and Y. Hu, “The University of Florida sparse matrix collection,” in ACM Trans. Math. Softw., vol. 38, no. 1, pp. 1–25, Dec. 2011.
[38]
N. J. Higham, Accuracy and Stability of Numerical Algorithms, 2nd ed. Soc. Ind. Appl. Math., 2002, pp. 1–663.
[39]
Y. Durand, E. Guthmuller, C. Fuguet, J. Fereyre, A. Bocco, and R. Alidori, “Accelerating variants of the conjugate gradient with the variable precision processor,” in Proc. IEEE 29th Symp. Comput. Arithmetic (ARITH), Sep. 2022, pp. 51–57.
[40]
M. Horowitz, “1.1 Computing's energy problem (and what we can do about it),” in Proc. IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers (ISSCC), Mar. 2014, pp. 10–14.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers
IEEE Transactions on Computers  Volume 73, Issue 7
July 2024
243 pages

Publisher

IEEE Computer Society

United States

Publication History

Published: 02 April 2024

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media