A Mixed Precision Randomized Preconditioner for the LSQR Solver on GPUs

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13948))

Included in the following conference series:

International Conference on High Performance Computing

1683 Accesses

Abstract

Randomized preconditioners for large-scale regression problems have become extremely popular over the past decade. Such preconditioners are known to accelerate large-scale regression solvers both from a theoretical and a practical perspective. In this paper, we present a mixed precision randomized preconditioner for LSQR solvers, focusing on overdetermined, dense least squares problems. We implement and evaluate our method on GPUs and we demonstrate that it outperforms the standard double precision version of randomized, preconditioned LSQR by up to $20\%$ on the NVIDIA A100. We present extensive numerical experiments utilizing the half-precision and tensorcore units to demonstrate that, in many cases, constructing the preconditioner in reduced precision does not affect the convergence of LSQR solvers. This leads to important speedups without loss of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preconditioners for Batched Iterative Linear Solvers on GPUs

SlabLU: a two-level sparse direct solver for elliptic PDEs

Article 09 August 2024

A High-Performance Implementation of a Robust Preconditioner for Heterogeneous Problems

Notes

1.
cuRand v12.0.0 https://docs.nvidia.com/cuda/curand/index.html.
2.
cuBlas v12.0 https://developer.nvidia.com/cublas.

References

Abdelfattah, A., et al.: A survey of numerical linear algebra methods utilizing mixed-precision arithmetic. Int. J. High Perform. Comput. Appl. 35(4), 344–369 (2021)
Article Google Scholar
Aliaga, J.I., Anzt, H., Grützmacher, T., Quintana-Ortí, E.S., Tomás, A.E.: Compressed basis GMRES on high-performance graphics processing units. Int. J. High Perform. Comput. Appl. https://doi.org/10.1177/10943420221115140
Avron, H., Maymounkov, P., Toledo, S.: Blendenpik: supercharging LAPACK’s least-squares solver. SIAM J. Sci. Comput. 32(3), 1217–1236 (2010). https://doi.org/10.1137/090767911
Article MathSciNet MATH Google Scholar
Baboulin, M., Becker, D., Bosilca, G., Danalis, A., Dongarra, J.: An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems. Parallel Comput. 40(7), 213–223 (2014). https://doi.org/10.1016/j.parco.2013.12.003. https://www.sciencedirect.com/science/article/pii/S0167819113001488. 7th Workshop on Parallel Matrix Algorithms and Applications
Balabanov, O., Grigori, L.: Randomized Gram–Schmidt process with application to GMRES. SIAM J. Sci. Comput. 44(3), A1450–A1474 (2022). https://doi.org/10.1137/20M138870X
Benzi, M., Tuma, M.: A robust preconditioner with low memory requirements for large sparse least squares problems. SIAM J. Sci. Comput. 25(2), 499–512 (2003). https://doi.org/10.1137/S106482750240649X
Article MathSciNet MATH Google Scholar
Bindel, D., Demmel, J., Kahan, W., Marques, O.: On computing givens rotations reliably and efficiently. ACM Trans. Math. Softw. 28(2), 206–238 (2002). https://doi.org/10.1145/567806.567809
Article MathSciNet MATH Google Scholar
Björck, A.: Solving linear least squares problems by Gram-Schmidt orthogonalization. BIT Numer. Math. 7, 1–21 (1967). https://doi.org/10.1007/BF01934122
Article MathSciNet MATH Google Scholar
Björck, R., Elfving, T., Strakos, Z.: Stability of conjugate gradient and Lanczos methods for linear least squares problems. SIAM J. Matrix Anal. Appl. 19(3), 720–736 (1998). https://doi.org/10.1137/S089547989631202X
Article MathSciNet MATH Google Scholar
Björck, A.: Numerics of Gram-Schmidt orthogonalization. Linear Algebra Appl. 197–198, 297–316 (1994). https://doi.org/10.1016/0024-3795(94)90493-6. https://www.sciencedirect.com/science/article/pii/0024379594904936
Björk, A.: SSOR preconditioning methods for sparse least squares problems, pp. 21–25 (1979)
Google Scholar
Buttari, A., Dongarra, J., Langou, J., Langou, J., Luszczek, P., Kurzak, J.: Mixed precision iterative refinement techniques for the solution of dense linear systems. Int. J. High Perform. Comput. Appl. 21(4), 457–466 (2007)
Article MATH Google Scholar
Carson, E., Daužickaitė, I.: Single-pass Nyström approximation in mixed precision (2022). https://doi.org/10.48550/ARXIV.2205.13355
Carson, E., Higham, N.J.: Accelerating the solution of linear systems by iterative refinement in three precisions. SIAM J. Sci. Comput. 40(2), A817–A847 (2018). https://doi.org/10.1137/17M1140819
Carson, E., Higham, N.J., Pranesh, S.: Three-precision GMRES-based iterative refinement for least squares problems. SIAM J. Sci. Comput. 42(6), A4063–A4083 (2020). https://doi.org/10.1137/20M1316822
Cui, X., Hayami, K.: Generalized approximate inverse preconditioners for least squares problems. Jpn. J. Ind. Appl. Math. 26(1) (2008). https://doi.org/10.1007/BF03167543
Cui, X., Hayami, K., Yin, J.F.: Greville’s method for preconditioning least squares problems. Adv. Comput. Math. 35 (2011). https://doi.org/10.1007/s10444-011-9171-x
Davis, T.A.: Algorithm 915, SuiteSparseQR: multifrontal multithreaded rank-revealing sparse QR factorization. ACM Trans. Math. Softw. 38(1) (2011). https://doi.org/10.1145/2049662.2049670
Dongarra, J., et al.: Accelerating numerical dense linear algebra calculations with GPUs. Numer. Comput. GPUs 1–26 (2014)
Google Scholar
Drineas, P., Mahoney, M.W., Muthukrishnan, S.: Sampling algorithms for L2 regression and applications. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, pp. 1127–1136 (2006)
Google Scholar
Dubrulle, A.A.: Householder transformations revisited. SIAM J. Matrix Anal. Appl. 22(1), 33–40 (2000). https://doi.org/10.1137/S0895479898338561
Article MathSciNet MATH Google Scholar
Flegar, G., Anzt, H., Cojean, T., Quintana-Ortí, E.S.: Adaptive precision Block-Jacobi for high performance preconditioning in the Ginkgo linear algebra software. ACM Trans. Math. Softw. 47(2) (2021). https://doi.org/10.1145/3441850
Fletcher, R.: Conjugate gradient methods for indefinite systems. In: Watson, G.A. (ed.) Numerical Analysis, pp. 73–89. Springer, Heidelberg (1976). https://doi.org/10.1007/bfb0080116
Chapter Google Scholar
Frangella, Z., Tropp, J.A., Udell, M.: Randomized Nyström preconditioning. arXiv preprint arXiv:2110.02820 (2021)
George, A., Liu, J.W.: Householder reflections versus givens rotations in sparse orthogonal decomposition. Linear Algebra Appl. 88–89, 223–238 (1987). https://doi.org/10.1016/0024-3795(87)90111-X. https://www.sciencedirect.com/science/article/pii/002437958790111X
Göbel, F., Grützmacher, T., Ribizel, T., Anzt, H.: Mixed precision incomplete and factorized sparse approximate inverse preconditioning on GPUs. In: Sousa, L., Roma, N., Tomás, P. (eds.) Euro-Par 2021. LNCS, vol. 12820, pp. 550–564. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85665-6_34
Chapter Google Scholar
Grützmacher, T., Anzt, H., Quintana-Ortí, E.S.: Using Ginkgo’s memory accessor for improving the accuracy of memory-bound low precision BLAS. Softw. Pract. Exp. 53(1), 81–98 (2023). https://doi.org/10.1002/spe.3041. https://onlinelibrary.wiley.com/doi/abs/10.1002/spe.3041
Hayami, K., Yin, J.F., Ito, T.: GMRES methods for least squares problems. SIAM J. Matrix Anal. Appl. 31(5), 2400–2430 (2010). https://doi.org/10.1137/070696313
Article MathSciNet MATH Google Scholar
Higham, N.J., Pranesh, S.: Exploiting lower precision arithmetic in solving symmetric positive definite linear systems and least squares problems. SIAM J. Sci. Comput. 43(1), A258–A277 (2021). https://doi.org/10.1137/19M1298263
Higham, N.J., Pranesh, S., Zounon, M.: Squeezing a matrix into half precision, with an application to solving linear systems. SIAM J. Sci. Comput. 41(4), A2536–A2551 (2019). https://doi.org/10.1137/18M1229511
Ipsen, I.C., Wentworth, T.: The effect of coherence on sampling from matrices with orthonormal columns, and preconditioned least squares problems. SIAM J. Matrix Anal. Appl. 35(4), 1490–1520 (2014)
Article MathSciNet MATH Google Scholar
Kaufman, L.: The generalized householder transformation and sparse matrices. Linear Algebra Appl. 90, 221–234 (1987). https://doi.org/10.1016/0024-3795(87)90314-4. https://www.sciencedirect.com/science/article/pii/0024379587903144
Leon, S.J., Björck, Gander, W.: Gram-Schmidt orthogonalization: 100 years and more. Numer. Linear Algebra Appl. 20(3), 492–532 (2013). https://doi.org/10.1002/nla.1839. https://onlinelibrary.wiley.com/doi/abs/10.1002/nla.1839
Lindquist, N., Luszczek, P., Dongarra, J.: Accelerating restarted GMRES with mixed precision arithmetic. IEEE Trans. Parallel Distrib. Syst. 33(4), 1027–1037 (2021)
Article Google Scholar
Ludwig, R.: Ausgleichung vermittelnder und bedingter Beobachtungen, pp. 58–79. Vieweg+Teubner Verlag, Wiesbaden (1969). https://doi.org/10.1007/978-3-322-98459-3_4
Meng, X., Saunders, M.A., Mahoney, M.W.: LSRN: a parallel iterative solver for strongly over- or underdetermined systems. SIAM J. Sci. Comput. 36(2), C95–C118 (2014). https://doi.org/10.1137/120866580
Paige, C.C., Rozlozník, M., Strakos, Z.: Modified Gram-Schmidt (MGS), least squares, and backward stability of MGS-GMRES. SIAM J. Matrix Anal. Appl. 28(1), 264–284 (2006). https://doi.org/10.1137/050630416
Article MathSciNet MATH Google Scholar
Paige, C.C., Saunders, M.A.: LSQR: an algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Softw. (TOMS) 8(1), 43–71 (1982)
Article MathSciNet MATH Google Scholar
Paschou, P., Lewis, J., Javed, A., Drineas, P.: Ancestry informative markers for fine-scale individual assignment to worldwide populations. J. Med. Genet. 47(12) (2010). https://doi.org/10.1136/jmg.2010.078212
Rokhlin, V., Tygert, M.: A fast randomized algorithm for overdetermined linear least-squares regression. Proc. Natl. Acad. Sci. 105(36), 13212–13217 (2008). https://doi.org/10.1073/pnas.0804869105. https://www.pnas.org/doi/abs/10.1073/pnas.0804869105
Rotella, F., Zambettakis, I.: Block householder transformation for parallel QR factorization. Appl. Math. Lett. 12(4), 29–34 (1999). https://doi.org/10.1016/S0893-9659(99)00028-2. https://www.sciencedirect.com/science/article/pii/S0893965999000282
Terao, T., Ozaki, K., Ogita, T.: LU-Cholesky QR algorithms for thin QR decomposition. Parallel Comput. 92, 102571 (2020). https://doi.org/10.1016/j.parco.2019.102571. https://www.sciencedirect.com/science/article/pii/S0167819119301620
Tomov, S., Dongarra, J., Baboulin, M.: Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Comput. 36(5–6), 232–240 (2010). https://doi.org/10.1016/j.parco.2009.12.005
Article MATH Google Scholar

Download references

Acknowledgements

PD and CB were partially supported by NSF grants CCF-2209509, CCF- 1814041, DMS-1760353, and DOE grant DE-SC0022085. This research was also supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. The authors would like to thank the Innovative Computing Lab at University of Tennessee, for providing access to their compute cluster, to run the numerical experiments. They are also grateful to the reviewers for their insightful comments that helped improve this paper. CB and VG would like to thank Eugenia Kontopoulou for motivating them to pursue the topic of this paper and Efstratios Gallopoulos for introducing them to the Blendenpik algorithm.

Author information

Authors and Affiliations

Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Karlsruhe, Germany
Vasileios Georgiou & Hartwig Anzt
Purdue University, West Lafayette, USA
Christos Boutsikas & Petros Drineas
Innovative Computing Lab, University of Tennessee, Knoxville, TN, USA
Hartwig Anzt

Authors

Vasileios Georgiou
View author publications
You can also search for this author in PubMed Google Scholar
Christos Boutsikas
View author publications
You can also search for this author in PubMed Google Scholar
Petros Drineas
View author publications
You can also search for this author in PubMed Google Scholar
Hartwig Anzt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vasileios Georgiou .

Editor information

Editors and Affiliations

University of Maryland, College Park, MD, USA
Abhinav Bhatele
NVIDIA, Helsinki, Finland
Jeff Hammond
Université Paris-Saclay, Gif-sur-Yvette, France
Marc Baboulin
CERFACS, Toulouse, France
Carola Kruse

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Georgiou, V., Boutsikas, C., Drineas, P., Anzt, H. (2023). A Mixed Precision Randomized Preconditioner for the LSQR Solver on GPUs. In: Bhatele, A., Hammond, J., Baboulin, M., Kruse, C. (eds) High Performance Computing. ISC High Performance 2023. Lecture Notes in Computer Science, vol 13948. Springer, Cham. https://doi.org/10.1007/978-3-031-32041-5_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-32041-5_9
Published: 10 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-32040-8
Online ISBN: 978-3-031-32041-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Mixed Precision Randomized Preconditioner for the LSQR Solver on GPUs

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Preconditioners for Batched Iterative Linear Solvers on GPUs

SlabLU: a two-level sparse direct solver for elliptic PDEs

A High-Performance Implementation of a Robust Preconditioner for Heterogeneous Problems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Mixed Precision Randomized Preconditioner for the LSQR Solver on GPUs

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Preconditioners for Batched Iterative Linear Solvers on GPUs

SlabLU: a two-level sparse direct solver for elliptic PDEs

A High-Performance Implementation of a Robust Preconditioner for Heterogeneous Problems

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation