Nothing Special   »   [go: up one dir, main page]

Skip to main content

A Mixed Precision Randomized Preconditioner for the LSQR Solver on GPUs

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2023)

Abstract

Randomized preconditioners for large-scale regression problems have become extremely popular over the past decade. Such preconditioners are known to accelerate large-scale regression solvers both from a theoretical and a practical perspective. In this paper, we present a mixed precision randomized preconditioner for LSQR solvers, focusing on overdetermined, dense least squares problems. We implement and evaluate our method on GPUs and we demonstrate that it outperforms the standard double precision version of randomized, preconditioned LSQR by up to \(20\%\) on the NVIDIA A100. We present extensive numerical experiments utilizing the half-precision and tensorcore units to demonstrate that, in many cases, constructing the preconditioner in reduced precision does not affect the convergence of LSQR solvers. This leads to important speedups without loss of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    cuRand v12.0.0 https://docs.nvidia.com/cuda/curand/index.html.

  2. 2.

    cuBlas v12.0 https://developer.nvidia.com/cublas.

References

  1. Abdelfattah, A., et al.: A survey of numerical linear algebra methods utilizing mixed-precision arithmetic. Int. J. High Perform. Comput. Appl. 35(4), 344–369 (2021)

    Article  Google Scholar 

  2. Aliaga, J.I., Anzt, H., Grützmacher, T., Quintana-Ortí, E.S., Tomás, A.E.: Compressed basis GMRES on high-performance graphics processing units. Int. J. High Perform. Comput. Appl. https://doi.org/10.1177/10943420221115140

  3. Avron, H., Maymounkov, P., Toledo, S.: Blendenpik: supercharging LAPACK’s least-squares solver. SIAM J. Sci. Comput. 32(3), 1217–1236 (2010). https://doi.org/10.1137/090767911

    Article  MathSciNet  MATH  Google Scholar 

  4. Baboulin, M., Becker, D., Bosilca, G., Danalis, A., Dongarra, J.: An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems. Parallel Comput. 40(7), 213–223 (2014). https://doi.org/10.1016/j.parco.2013.12.003. https://www.sciencedirect.com/science/article/pii/S0167819113001488. 7th Workshop on Parallel Matrix Algorithms and Applications

  5. Balabanov, O., Grigori, L.: Randomized Gram–Schmidt process with application to GMRES. SIAM J. Sci. Comput. 44(3), A1450–A1474 (2022). https://doi.org/10.1137/20M138870X

  6. Benzi, M., Tuma, M.: A robust preconditioner with low memory requirements for large sparse least squares problems. SIAM J. Sci. Comput. 25(2), 499–512 (2003). https://doi.org/10.1137/S106482750240649X

    Article  MathSciNet  MATH  Google Scholar 

  7. Bindel, D., Demmel, J., Kahan, W., Marques, O.: On computing givens rotations reliably and efficiently. ACM Trans. Math. Softw. 28(2), 206–238 (2002). https://doi.org/10.1145/567806.567809

    Article  MathSciNet  MATH  Google Scholar 

  8. Björck, A.: Solving linear least squares problems by Gram-Schmidt orthogonalization. BIT Numer. Math. 7, 1–21 (1967). https://doi.org/10.1007/BF01934122

    Article  MathSciNet  MATH  Google Scholar 

  9. Björck, R., Elfving, T., Strakos, Z.: Stability of conjugate gradient and Lanczos methods for linear least squares problems. SIAM J. Matrix Anal. Appl. 19(3), 720–736 (1998). https://doi.org/10.1137/S089547989631202X

    Article  MathSciNet  MATH  Google Scholar 

  10. Björck, A.: Numerics of Gram-Schmidt orthogonalization. Linear Algebra Appl. 197–198, 297–316 (1994). https://doi.org/10.1016/0024-3795(94)90493-6. https://www.sciencedirect.com/science/article/pii/0024379594904936

  11. Björk, A.: SSOR preconditioning methods for sparse least squares problems, pp. 21–25 (1979)

    Google Scholar 

  12. Buttari, A., Dongarra, J., Langou, J., Langou, J., Luszczek, P., Kurzak, J.: Mixed precision iterative refinement techniques for the solution of dense linear systems. Int. J. High Perform. Comput. Appl. 21(4), 457–466 (2007)

    Article  MATH  Google Scholar 

  13. Carson, E., Daužickaitė, I.: Single-pass Nyström approximation in mixed precision (2022). https://doi.org/10.48550/ARXIV.2205.13355

  14. Carson, E., Higham, N.J.: Accelerating the solution of linear systems by iterative refinement in three precisions. SIAM J. Sci. Comput. 40(2), A817–A847 (2018). https://doi.org/10.1137/17M1140819

  15. Carson, E., Higham, N.J., Pranesh, S.: Three-precision GMRES-based iterative refinement for least squares problems. SIAM J. Sci. Comput. 42(6), A4063–A4083 (2020). https://doi.org/10.1137/20M1316822

  16. Cui, X., Hayami, K.: Generalized approximate inverse preconditioners for least squares problems. Jpn. J. Ind. Appl. Math. 26(1) (2008). https://doi.org/10.1007/BF03167543

  17. Cui, X., Hayami, K., Yin, J.F.: Greville’s method for preconditioning least squares problems. Adv. Comput. Math. 35 (2011). https://doi.org/10.1007/s10444-011-9171-x

  18. Davis, T.A.: Algorithm 915, SuiteSparseQR: multifrontal multithreaded rank-revealing sparse QR factorization. ACM Trans. Math. Softw. 38(1) (2011). https://doi.org/10.1145/2049662.2049670

  19. Dongarra, J., et al.: Accelerating numerical dense linear algebra calculations with GPUs. Numer. Comput. GPUs 1–26 (2014)

    Google Scholar 

  20. Drineas, P., Mahoney, M.W., Muthukrishnan, S.: Sampling algorithms for L2 regression and applications. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, pp. 1127–1136 (2006)

    Google Scholar 

  21. Dubrulle, A.A.: Householder transformations revisited. SIAM J. Matrix Anal. Appl. 22(1), 33–40 (2000). https://doi.org/10.1137/S0895479898338561

    Article  MathSciNet  MATH  Google Scholar 

  22. Flegar, G., Anzt, H., Cojean, T., Quintana-Ortí, E.S.: Adaptive precision Block-Jacobi for high performance preconditioning in the Ginkgo linear algebra software. ACM Trans. Math. Softw. 47(2) (2021). https://doi.org/10.1145/3441850

  23. Fletcher, R.: Conjugate gradient methods for indefinite systems. In: Watson, G.A. (ed.) Numerical Analysis, pp. 73–89. Springer, Heidelberg (1976). https://doi.org/10.1007/bfb0080116

    Chapter  Google Scholar 

  24. Frangella, Z., Tropp, J.A., Udell, M.: Randomized Nyström preconditioning. arXiv preprint arXiv:2110.02820 (2021)

  25. George, A., Liu, J.W.: Householder reflections versus givens rotations in sparse orthogonal decomposition. Linear Algebra Appl. 88–89, 223–238 (1987). https://doi.org/10.1016/0024-3795(87)90111-X. https://www.sciencedirect.com/science/article/pii/002437958790111X

  26. Göbel, F., Grützmacher, T., Ribizel, T., Anzt, H.: Mixed precision incomplete and factorized sparse approximate inverse preconditioning on GPUs. In: Sousa, L., Roma, N., Tomás, P. (eds.) Euro-Par 2021. LNCS, vol. 12820, pp. 550–564. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85665-6_34

    Chapter  Google Scholar 

  27. Grützmacher, T., Anzt, H., Quintana-Ortí, E.S.: Using Ginkgo’s memory accessor for improving the accuracy of memory-bound low precision BLAS. Softw. Pract. Exp. 53(1), 81–98 (2023). https://doi.org/10.1002/spe.3041. https://onlinelibrary.wiley.com/doi/abs/10.1002/spe.3041

  28. Hayami, K., Yin, J.F., Ito, T.: GMRES methods for least squares problems. SIAM J. Matrix Anal. Appl. 31(5), 2400–2430 (2010). https://doi.org/10.1137/070696313

    Article  MathSciNet  MATH  Google Scholar 

  29. Higham, N.J., Pranesh, S.: Exploiting lower precision arithmetic in solving symmetric positive definite linear systems and least squares problems. SIAM J. Sci. Comput. 43(1), A258–A277 (2021). https://doi.org/10.1137/19M1298263

  30. Higham, N.J., Pranesh, S., Zounon, M.: Squeezing a matrix into half precision, with an application to solving linear systems. SIAM J. Sci. Comput. 41(4), A2536–A2551 (2019). https://doi.org/10.1137/18M1229511

  31. Ipsen, I.C., Wentworth, T.: The effect of coherence on sampling from matrices with orthonormal columns, and preconditioned least squares problems. SIAM J. Matrix Anal. Appl. 35(4), 1490–1520 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  32. Kaufman, L.: The generalized householder transformation and sparse matrices. Linear Algebra Appl. 90, 221–234 (1987). https://doi.org/10.1016/0024-3795(87)90314-4. https://www.sciencedirect.com/science/article/pii/0024379587903144

  33. Leon, S.J., Björck, Gander, W.: Gram-Schmidt orthogonalization: 100 years and more. Numer. Linear Algebra Appl. 20(3), 492–532 (2013). https://doi.org/10.1002/nla.1839. https://onlinelibrary.wiley.com/doi/abs/10.1002/nla.1839

  34. Lindquist, N., Luszczek, P., Dongarra, J.: Accelerating restarted GMRES with mixed precision arithmetic. IEEE Trans. Parallel Distrib. Syst. 33(4), 1027–1037 (2021)

    Article  Google Scholar 

  35. Ludwig, R.: Ausgleichung vermittelnder und bedingter Beobachtungen, pp. 58–79. Vieweg+Teubner Verlag, Wiesbaden (1969). https://doi.org/10.1007/978-3-322-98459-3_4

  36. Meng, X., Saunders, M.A., Mahoney, M.W.: LSRN: a parallel iterative solver for strongly over- or underdetermined systems. SIAM J. Sci. Comput. 36(2), C95–C118 (2014). https://doi.org/10.1137/120866580

  37. Paige, C.C., Rozlozník, M., Strakos, Z.: Modified Gram-Schmidt (MGS), least squares, and backward stability of MGS-GMRES. SIAM J. Matrix Anal. Appl. 28(1), 264–284 (2006). https://doi.org/10.1137/050630416

    Article  MathSciNet  MATH  Google Scholar 

  38. Paige, C.C., Saunders, M.A.: LSQR: an algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Softw. (TOMS) 8(1), 43–71 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  39. Paschou, P., Lewis, J., Javed, A., Drineas, P.: Ancestry informative markers for fine-scale individual assignment to worldwide populations. J. Med. Genet. 47(12) (2010). https://doi.org/10.1136/jmg.2010.078212

  40. Rokhlin, V., Tygert, M.: A fast randomized algorithm for overdetermined linear least-squares regression. Proc. Natl. Acad. Sci. 105(36), 13212–13217 (2008). https://doi.org/10.1073/pnas.0804869105. https://www.pnas.org/doi/abs/10.1073/pnas.0804869105

  41. Rotella, F., Zambettakis, I.: Block householder transformation for parallel QR factorization. Appl. Math. Lett. 12(4), 29–34 (1999). https://doi.org/10.1016/S0893-9659(99)00028-2. https://www.sciencedirect.com/science/article/pii/S0893965999000282

  42. Terao, T., Ozaki, K., Ogita, T.: LU-Cholesky QR algorithms for thin QR decomposition. Parallel Comput. 92, 102571 (2020). https://doi.org/10.1016/j.parco.2019.102571. https://www.sciencedirect.com/science/article/pii/S0167819119301620

  43. Tomov, S., Dongarra, J., Baboulin, M.: Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Comput. 36(5–6), 232–240 (2010). https://doi.org/10.1016/j.parco.2009.12.005

    Article  MATH  Google Scholar 

Download references

Acknowledgements

PD and CB were partially supported by NSF grants CCF-2209509, CCF- 1814041, DMS-1760353, and DOE grant DE-SC0022085. This research was also supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. The authors would like to thank the Innovative Computing Lab at University of Tennessee, for providing access to their compute cluster, to run the numerical experiments. They are also grateful to the reviewers for their insightful comments that helped improve this paper. CB and VG would like to thank Eugenia Kontopoulou for motivating them to pursue the topic of this paper and Efstratios Gallopoulos for introducing them to the Blendenpik algorithm.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vasileios Georgiou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Georgiou, V., Boutsikas, C., Drineas, P., Anzt, H. (2023). A Mixed Precision Randomized Preconditioner for the LSQR Solver on GPUs. In: Bhatele, A., Hammond, J., Baboulin, M., Kruse, C. (eds) High Performance Computing. ISC High Performance 2023. Lecture Notes in Computer Science, vol 13948. Springer, Cham. https://doi.org/10.1007/978-3-031-32041-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-32041-5_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-32040-8

  • Online ISBN: 978-3-031-32041-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics