Abstract
The locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm is a popular approach for computing a few smallest eigenvalues and the corresponding eigenvectors of a large Hermitian positive definite matrix \(A\). In this work, we propose a mixed precision variant of LOBPCG that uses a (sparse) Cholesky factorization of \(A\) computed in lower precision as the preconditioner. To further enhance performance, a mixed precision orthogonalization strategy is proposed. To analyze the impact of reducing precision in the preconditioner on performance, we carry out a rounding error and convergence analysis of PINVIT, a simplified variant of LOBPCG. Our theoretical results predict and our numerical experiments confirm that the impact on convergence remains marginal. In practice, our mixed precision LOBPCG algorithm typically reduces the computation time by a factor of \(1.4\)–\(2.0\) on both CPUs and GPUs.
Similar content being viewed by others
Data Availability
The results/data/figures in this manuscript have not been published elsewhere, nor are they under consideration (from you or one of your Contributing Authors) by another publisher. All of the material is owned by the authors and/or no permissions are required.
Notes
For MPLOBPCG-schol, the number of LOBPCG iterations for constructing the initial guess is also counted.
References
Balcan, D., Gonçalves, B., Hu, H., Ramasco, J.J., Colizza, V., Vespignani, A.: Modeling the spatial spread of infectious diseases: the GLobal Epidemic and Mobility computational model. J. Comput. Sci. 1(3), 132–145 (2010). https://doi.org/10.1016/j.jocs.2010.07.002
Knyazev, A.: Recent implementations, applications, and extensions of the locally optimal block preconditioned conjugate gradient method (LOBPCG). arXiv:1708.08354(2017)
Saad, Y.: Numerical Methods for Large Eigenvalue Problems, Revised SIAM, Philadelphia, PA, USA (2011)
Neymeyr, K.: A geometric theory for preconditioned inverse iteration applied to a subspace. Math. Comp. 71(237), 197–216 (2002). https://doi.org/10.1090/S0025-5718-01-01357-6
Argentati, M., Knyazev, A., Neymeyr, K., Ovtchinnikov, E., Zhou, M.: Convergence theory for preconditioned eigenvalue solvers in a nutshell. Found. Comput. Math. 17, 713–727 (2017). https://doi.org/10.1007/s10208-015-9297-1
Knyazev, A.V.: Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method. SIAM J. Sci. Comput. 23(2), 517–541 (2001). https://doi.org/10.1137/S1064827500366124
Abdelfattah, A., Anzt, H., Boman, E.G., Carson, E., Cojean, T., Dongarra, J., Fox, A., Gates, M., Higham, N.J., Li, X.S., Loe, J., Luszczek, P., Pranesh, S., Rajamanickam, S., Ribizel, T., Smith, B.F., Swirydowicz, K., Thomas, S., Tomov, S., Tsai, Y.M., Yang, U.M.: A survey of numerical linear algebra methods utilizing mixed-precision arithmetic. Int. J. High Perform. Comput. Appl. 35(4), 344–369 (2021). https://doi.org/10.1177/10943420211003313
Higham, N.J., Mary, T.: Mixed precision algorithms in numerical linear algebra. Acta Numer. 31, 347–414 (2022). https://doi.org/10.1017/S0962492922000022
Carson, E., Higham, N.J.: Accelerating the solution of linear systems by iterative refinement in three precisions. SIAM J. Sci. Comput. 40(2), 817–847 (2018). https://doi.org/10.1137/17M1140819
Ogita, T., Aishima, K.: Iterative refinement for symmetric eigenvalue decomposition. Japan J. Indust. Appl. Math. 35(3), 1007–1035 (2018). https://doi.org/10.1007/s13160-018-0310-3
Ogita, T., Aishima, K.: Iterative refinement for symmetric eigenvalue decomposition II: clustered eigenvalues. Japan J. Indust. Appl. Math. 36(2), 435–459 (2019). https://doi.org/10.1007/s13160-019-00348-4
Ogita, T., Aishima, K.: Iterative refinement for singular value decomposition based on matrix multiplication. J. Comput. Appl. Math. 369, 112512 (2020). https://doi.org/10.1016/j.cam.2019.112512
Bujanović, Z., Kressner, D., Schröder, C.: Iterative refinement of Schur decompositions. Numer. Algorithms 92(1), 247–267 (2023). https://doi.org/10.1007/s11075-022-01327-6
Gao, W., Ma, Y., Shao, M.: A mixed precision Jacobi SVD algorithm. arXiv:2209.04626 (2022)
Dongarra, J.J.: Algorithm 589: SICEDR: A FORTRAN subroutine for improving the accuracy of computed matrix eigenvalues. ACM Trans. Math. Software 8(4), 371–375 (1982). https://doi.org/10.1145/356012.356016
Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. Johns Hopkins University Press, Baltimore, MD, USA (2013)
Duersch, J.A., Shao, M., Yang, C., Gu, M.: A robust and efficient implementation of LOBPCG. SIAM J. Sci. Comput. 40(5), 655–676 (2018). https://doi.org/10.1137/17M1129830
Hetmaniuk, U., Lehoucq, R.: Basis selection in LOBPCG. J. Comput. Phys. 218(1), 324–332 (2006). https://doi.org/10.1016/j.jcp.2006.02.007
Yamazaki, I., Tomov, S., Dongarra, J.: Mixed-precision Cholesky QR factorization and its case studies on multicore CPU with multiple GPUs. SIAM J. Sci. Comput. 37(3), 307–330 (2015). https://doi.org/10.1137/14M0973773
Yamazaki, I., Tomov, S., Kurzak, J., Dongarra, J., Barlow, J.: Mixed-precision block Gram Schmidt orthogonalization. In: ScalA ’15: Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, pp. 2–128 (2015). https://doi.org/10.1145/2832080.2832082
Rohwedder, T., Schneider, R., Zeiser, A.: Perturbed preconditioned inverse iteration for operator eigenvalue problems with applications to adaptive wavelet discretization. Adv. Comput. Math. 34(1), 43–66 (2011)
Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn. SIAM, Philadelphia, PA, USA (2002)
Chen, Y., Davis, T.A., Hager, W.W., Rajamanickam, S.: Algorithm 887: CHOLMOD, supernodal sparse Cholesky factorization and update/downdate. ACM Trans. Math. Software 35(3), 22–12214 (2008). https://doi.org/10.1145/1391989.1391995
Shao, M., Oryspayev, D., Yang, C., Maris, P., Cook, B.: Fault-tolerant LOBPCG for nuclear CI calculations. In: International Conference on High Performance Computing in Asia-Pacific Region (HPC ASIA 2023), February 27–March 2, 2023, Singapore, Singapore, pp. 88–95. ACM, New York, NY, USA (2023). https://doi.org/10.1145/3578178.3578240
Fadel, S., Ghoniemy, S., Abdallah, M., Sorra, H.A., Ashour, A., Ansary, A.: Investigating the effect of different kernel functions on the performance of SVM for recognizing Arabic characters. Int. J. Adv. Comput. Sci. Appl. 7(1) (2016). https://doi.org/10.14569/IJACSA.2016.070160
Acknowledgements
The authors thank Erin Carson for helpful discussions. Part of this work was performed when the second author was visiting EPF Lausanne in 2022.
Funding
Yuxin Ma is partially supported by the State Scholarship Fund of China Scholarship Council (CSC) under Grant No. 202106100093, National Key R &D Program of China under Grant No. 2021YFA1003305 and National Natural Science Foundation of China under Grant No. 71991471. Meiyue Shao is partially supported by the National Natural Science Foundation of China under grant No. 11971118.
Author information
Authors and Affiliations
Contributions
These authors contributed equally to this work.
Corresponding author
Ethics declarations
Ethics approval
Not applicable
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Daniel Kressner, Yuxin Ma and Meiyue Shao are contributed equally to this work.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kressner, D., Ma, Y. & Shao, M. A mixed precision LOBPCG algorithm. Numer Algor 94, 1653–1671 (2023). https://doi.org/10.1007/s11075-023-01550-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11075-023-01550-9