Abstract
Randomized preconditioners for large-scale regression problems have become extremely popular over the past decade. Such preconditioners are known to accelerate large-scale regression solvers both from a theoretical and a practical perspective. In this paper, we present a mixed precision randomized preconditioner for LSQR solvers, focusing on overdetermined, dense least squares problems. We implement and evaluate our method on GPUs and we demonstrate that it outperforms the standard double precision version of randomized, preconditioned LSQR by up to \(20\%\) on the NVIDIA A100. We present extensive numerical experiments utilizing the half-precision and tensorcore units to demonstrate that, in many cases, constructing the preconditioner in reduced precision does not affect the convergence of LSQR solvers. This leads to important speedups without loss of accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
cuRand v12.0.0 https://docs.nvidia.com/cuda/curand/index.html.
- 2.
cuBlas v12.0 https://developer.nvidia.com/cublas.
References
Abdelfattah, A., et al.: A survey of numerical linear algebra methods utilizing mixed-precision arithmetic. Int. J. High Perform. Comput. Appl. 35(4), 344–369 (2021)
Aliaga, J.I., Anzt, H., Grützmacher, T., Quintana-Ortí, E.S., Tomás, A.E.: Compressed basis GMRES on high-performance graphics processing units. Int. J. High Perform. Comput. Appl. https://doi.org/10.1177/10943420221115140
Avron, H., Maymounkov, P., Toledo, S.: Blendenpik: supercharging LAPACK’s least-squares solver. SIAM J. Sci. Comput. 32(3), 1217–1236 (2010). https://doi.org/10.1137/090767911
Baboulin, M., Becker, D., Bosilca, G., Danalis, A., Dongarra, J.: An efficient distributed randomized algorithm for solving large dense symmetric indefinite linear systems. Parallel Comput. 40(7), 213–223 (2014). https://doi.org/10.1016/j.parco.2013.12.003. https://www.sciencedirect.com/science/article/pii/S0167819113001488. 7th Workshop on Parallel Matrix Algorithms and Applications
Balabanov, O., Grigori, L.: Randomized Gram–Schmidt process with application to GMRES. SIAM J. Sci. Comput. 44(3), A1450–A1474 (2022). https://doi.org/10.1137/20M138870X
Benzi, M., Tuma, M.: A robust preconditioner with low memory requirements for large sparse least squares problems. SIAM J. Sci. Comput. 25(2), 499–512 (2003). https://doi.org/10.1137/S106482750240649X
Bindel, D., Demmel, J., Kahan, W., Marques, O.: On computing givens rotations reliably and efficiently. ACM Trans. Math. Softw. 28(2), 206–238 (2002). https://doi.org/10.1145/567806.567809
Björck, A.: Solving linear least squares problems by Gram-Schmidt orthogonalization. BIT Numer. Math. 7, 1–21 (1967). https://doi.org/10.1007/BF01934122
Björck, R., Elfving, T., Strakos, Z.: Stability of conjugate gradient and Lanczos methods for linear least squares problems. SIAM J. Matrix Anal. Appl. 19(3), 720–736 (1998). https://doi.org/10.1137/S089547989631202X
Björck, A.: Numerics of Gram-Schmidt orthogonalization. Linear Algebra Appl. 197–198, 297–316 (1994). https://doi.org/10.1016/0024-3795(94)90493-6. https://www.sciencedirect.com/science/article/pii/0024379594904936
Björk, A.: SSOR preconditioning methods for sparse least squares problems, pp. 21–25 (1979)
Buttari, A., Dongarra, J., Langou, J., Langou, J., Luszczek, P., Kurzak, J.: Mixed precision iterative refinement techniques for the solution of dense linear systems. Int. J. High Perform. Comput. Appl. 21(4), 457–466 (2007)
Carson, E., Daužickaitė, I.: Single-pass Nyström approximation in mixed precision (2022). https://doi.org/10.48550/ARXIV.2205.13355
Carson, E., Higham, N.J.: Accelerating the solution of linear systems by iterative refinement in three precisions. SIAM J. Sci. Comput. 40(2), A817–A847 (2018). https://doi.org/10.1137/17M1140819
Carson, E., Higham, N.J., Pranesh, S.: Three-precision GMRES-based iterative refinement for least squares problems. SIAM J. Sci. Comput. 42(6), A4063–A4083 (2020). https://doi.org/10.1137/20M1316822
Cui, X., Hayami, K.: Generalized approximate inverse preconditioners for least squares problems. Jpn. J. Ind. Appl. Math. 26(1) (2008). https://doi.org/10.1007/BF03167543
Cui, X., Hayami, K., Yin, J.F.: Greville’s method for preconditioning least squares problems. Adv. Comput. Math. 35 (2011). https://doi.org/10.1007/s10444-011-9171-x
Davis, T.A.: Algorithm 915, SuiteSparseQR: multifrontal multithreaded rank-revealing sparse QR factorization. ACM Trans. Math. Softw. 38(1) (2011). https://doi.org/10.1145/2049662.2049670
Dongarra, J., et al.: Accelerating numerical dense linear algebra calculations with GPUs. Numer. Comput. GPUs 1–26 (2014)
Drineas, P., Mahoney, M.W., Muthukrishnan, S.: Sampling algorithms for L2 regression and applications. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, pp. 1127–1136 (2006)
Dubrulle, A.A.: Householder transformations revisited. SIAM J. Matrix Anal. Appl. 22(1), 33–40 (2000). https://doi.org/10.1137/S0895479898338561
Flegar, G., Anzt, H., Cojean, T., Quintana-Ortí, E.S.: Adaptive precision Block-Jacobi for high performance preconditioning in the Ginkgo linear algebra software. ACM Trans. Math. Softw. 47(2) (2021). https://doi.org/10.1145/3441850
Fletcher, R.: Conjugate gradient methods for indefinite systems. In: Watson, G.A. (ed.) Numerical Analysis, pp. 73–89. Springer, Heidelberg (1976). https://doi.org/10.1007/bfb0080116
Frangella, Z., Tropp, J.A., Udell, M.: Randomized Nyström preconditioning. arXiv preprint arXiv:2110.02820 (2021)
George, A., Liu, J.W.: Householder reflections versus givens rotations in sparse orthogonal decomposition. Linear Algebra Appl. 88–89, 223–238 (1987). https://doi.org/10.1016/0024-3795(87)90111-X. https://www.sciencedirect.com/science/article/pii/002437958790111X
Göbel, F., Grützmacher, T., Ribizel, T., Anzt, H.: Mixed precision incomplete and factorized sparse approximate inverse preconditioning on GPUs. In: Sousa, L., Roma, N., Tomás, P. (eds.) Euro-Par 2021. LNCS, vol. 12820, pp. 550–564. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85665-6_34
Grützmacher, T., Anzt, H., Quintana-Ortí, E.S.: Using Ginkgo’s memory accessor for improving the accuracy of memory-bound low precision BLAS. Softw. Pract. Exp. 53(1), 81–98 (2023). https://doi.org/10.1002/spe.3041. https://onlinelibrary.wiley.com/doi/abs/10.1002/spe.3041
Hayami, K., Yin, J.F., Ito, T.: GMRES methods for least squares problems. SIAM J. Matrix Anal. Appl. 31(5), 2400–2430 (2010). https://doi.org/10.1137/070696313
Higham, N.J., Pranesh, S.: Exploiting lower precision arithmetic in solving symmetric positive definite linear systems and least squares problems. SIAM J. Sci. Comput. 43(1), A258–A277 (2021). https://doi.org/10.1137/19M1298263
Higham, N.J., Pranesh, S., Zounon, M.: Squeezing a matrix into half precision, with an application to solving linear systems. SIAM J. Sci. Comput. 41(4), A2536–A2551 (2019). https://doi.org/10.1137/18M1229511
Ipsen, I.C., Wentworth, T.: The effect of coherence on sampling from matrices with orthonormal columns, and preconditioned least squares problems. SIAM J. Matrix Anal. Appl. 35(4), 1490–1520 (2014)
Kaufman, L.: The generalized householder transformation and sparse matrices. Linear Algebra Appl. 90, 221–234 (1987). https://doi.org/10.1016/0024-3795(87)90314-4. https://www.sciencedirect.com/science/article/pii/0024379587903144
Leon, S.J., Björck, Gander, W.: Gram-Schmidt orthogonalization: 100 years and more. Numer. Linear Algebra Appl. 20(3), 492–532 (2013). https://doi.org/10.1002/nla.1839. https://onlinelibrary.wiley.com/doi/abs/10.1002/nla.1839
Lindquist, N., Luszczek, P., Dongarra, J.: Accelerating restarted GMRES with mixed precision arithmetic. IEEE Trans. Parallel Distrib. Syst. 33(4), 1027–1037 (2021)
Ludwig, R.: Ausgleichung vermittelnder und bedingter Beobachtungen, pp. 58–79. Vieweg+Teubner Verlag, Wiesbaden (1969). https://doi.org/10.1007/978-3-322-98459-3_4
Meng, X., Saunders, M.A., Mahoney, M.W.: LSRN: a parallel iterative solver for strongly over- or underdetermined systems. SIAM J. Sci. Comput. 36(2), C95–C118 (2014). https://doi.org/10.1137/120866580
Paige, C.C., Rozlozník, M., Strakos, Z.: Modified Gram-Schmidt (MGS), least squares, and backward stability of MGS-GMRES. SIAM J. Matrix Anal. Appl. 28(1), 264–284 (2006). https://doi.org/10.1137/050630416
Paige, C.C., Saunders, M.A.: LSQR: an algorithm for sparse linear equations and sparse least squares. ACM Trans. Math. Softw. (TOMS) 8(1), 43–71 (1982)
Paschou, P., Lewis, J., Javed, A., Drineas, P.: Ancestry informative markers for fine-scale individual assignment to worldwide populations. J. Med. Genet. 47(12) (2010). https://doi.org/10.1136/jmg.2010.078212
Rokhlin, V., Tygert, M.: A fast randomized algorithm for overdetermined linear least-squares regression. Proc. Natl. Acad. Sci. 105(36), 13212–13217 (2008). https://doi.org/10.1073/pnas.0804869105. https://www.pnas.org/doi/abs/10.1073/pnas.0804869105
Rotella, F., Zambettakis, I.: Block householder transformation for parallel QR factorization. Appl. Math. Lett. 12(4), 29–34 (1999). https://doi.org/10.1016/S0893-9659(99)00028-2. https://www.sciencedirect.com/science/article/pii/S0893965999000282
Terao, T., Ozaki, K., Ogita, T.: LU-Cholesky QR algorithms for thin QR decomposition. Parallel Comput. 92, 102571 (2020). https://doi.org/10.1016/j.parco.2019.102571. https://www.sciencedirect.com/science/article/pii/S0167819119301620
Tomov, S., Dongarra, J., Baboulin, M.: Towards dense linear algebra for hybrid GPU accelerated manycore systems. Parallel Comput. 36(5–6), 232–240 (2010). https://doi.org/10.1016/j.parco.2009.12.005
Acknowledgements
PD and CB were partially supported by NSF grants CCF-2209509, CCF- 1814041, DMS-1760353, and DOE grant DE-SC0022085. This research was also supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. The authors would like to thank the Innovative Computing Lab at University of Tennessee, for providing access to their compute cluster, to run the numerical experiments. They are also grateful to the reviewers for their insightful comments that helped improve this paper. CB and VG would like to thank Eugenia Kontopoulou for motivating them to pursue the topic of this paper and Efstratios Gallopoulos for introducing them to the Blendenpik algorithm.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Georgiou, V., Boutsikas, C., Drineas, P., Anzt, H. (2023). A Mixed Precision Randomized Preconditioner for the LSQR Solver on GPUs. In: Bhatele, A., Hammond, J., Baboulin, M., Kruse, C. (eds) High Performance Computing. ISC High Performance 2023. Lecture Notes in Computer Science, vol 13948. Springer, Cham. https://doi.org/10.1007/978-3-031-32041-5_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-32041-5_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-32040-8
Online ISBN: 978-3-031-32041-5
eBook Packages: Computer ScienceComputer Science (R0)