Abstract
The low multilinear rank approximation, also known as the truncated Tucker decomposition, has been extensively utilized in many applications that involve higher-order tensors. Popular methods for low multilinear rank approximation usually rely directly on matrix SVD, therefore often suffer from the notorious intermediate data explosion issue and are not easy to parallelize, especially when the input tensor is large. In this paper, we propose a new class of truncated HOSVD algorithms based on alternating least squares (ALS) for efficiently computing the low multilinear rank approximation of tensors. The proposed ALS-based approaches are able to eliminate the redundant computations of the singular vectors of intermediate matrices and are therefore free of data explosion. Also, the new methods are more flexible with adjustable convergence tolerance and are intrinsically parallelizable on high-performance computers. Theoretical analysis reveals that the ALS iteration in the proposed algorithms is q-linear convergent with a relatively wide convergence region. Numerical experiments with large-scale tensors from both synthetic and real-world applications demonstrate that ALS-based methods can substantially reduce the total cost of the original ones and are highly scalable for parallel computing.
Similar content being viewed by others
Availability of Data and Material
The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.
References
Intel Math Kernel Library Reference Manual [EQ/OL]. http://developer.intel.com
OpenFoam 2018, Version 6. https://openfoam.org
OpenMP application program interface, Version 4.0, OpenMP Architecture Review Board (July, 2013). http://www.openmp.org
Austin, W., Ballard, G., Kolda, T.G.: Parallel tensor compression for large-scale scientific data. In: IEEE International Parallel and Distributed Processing Symposium, pp. 912–922. IEEE (2016)
Bader, B.W., Kolda, T.G., et.al.: MATLAB Tensor Toolbox Version 3.1. Available online (2019). https://www.tensortoolbox.org
Baglama, J., Reichel, L.: Augmented implicitly restarted Lanczos bidiagonalization methods. SIAM J. Sci. Comput. 27(1), 19–42 (2005)
Beckmann, C., Smith, S.: Tensorial extensions of independent component analysis for multisubject FMRI analysis. Neuroimage 25(1), 294–311 (2005)
Beylkin, G., Mohlenkamp, M.J.: Numerical operator caculus in higher dimensions. Proc. Natl. Acad. Sci. 99, 10246–10251 (2002)
Bro, R.: Review on multiway analysis in chemistry-2000–2005. Crit. Rev. Anal. Chem. 36, 279–293 (2006)
Savas, B., Eldén, L.: Handwritten digit classification using higher order singular value decomposition. Pattern Recognition 40, 993–1003 (2007)
Burggraf, R.: Analytical and numerical studies of the structure of steady separated flows. J. Fluid Mech. 24(1), 113–151 (1966)
Carroll, J.D., Chang, J.J.: Analysis of individual differences in multidimensional scaling via an \(N\)-way generalization of “Eckart-Young’’ decomposition. Psychometrika 35, 283–319 (1970)
Charalampous, K., Gasteratos, A.: A tensor-based deep learning framework. Image Vis. Comput. 32(11), 916–929 (2014)
Cullum, J., Willoughby, R., Lake, M.: A Lanczos algorithm for computing singular values and vectors of large matrices. SIAM J. Sci. Stat. Comput. 4(2), 197–215 (1983)
De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)
De Lathauwer, L., De Moor, B., Vandewalle, J.: On the best rank-1 and rank-\((r_{1}, r_{2}, \cdots, r_{N})\) approximation of higher-order tensors. SIAM J. Matrix Anal. Appl. 21, 1324–1342 (2000)
De Lathauwer, L., Vandewalle, J.: Dimensionality reduction in higher-order signal processing and rank-\((r_{1}, r_{2}, r_{N})\) reduction in multilinear algebra. Linear Algebra Appl. 391, 31–55 (2004)
De Leeuw, J., Young, F., Takane, Y.: Additive structure in qualitative data: an alternating least squares method with optimal scaling features. Psychometrika 41, 471–503 (1976)
De Silva, V., Lim, L.H.: Tensor rank and the ill-posedness of the best low-rank approximation problem. SIAM J. Matrix Anal. Appl. 30(3), 1084–1127 (2008)
Ding, W., Wei, Y.: Solving multi-linear systems with \(\cal{M}\)-tensors. J. Sci. Comput. 68, 689–715 (2016)
Elden, L., Savas, B.: A Newton-Grassmann method for computing the best multilinear rank-\((r_{1}, r_{2}, r_{3})\) approximation of a tensor. SIAM J. Matrix Anal. Appl. 31(2), 248–271 (2009)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. Johns Hopkins University Press, Baltimore (2008)
Grasedyck, L.: Hierarchical singular value decomposition of tensors. SIAM J. Matrix Anal. Appl. 31(4), 2029–2054 (2010)
Hackbusch, W.: Numerical tensor calculus. Acta Numer. 23, 651–742 (2014)
Hackbusch, W., Kühn, S.: A new scheme for the tensor representation. J. Fourier Anal. Appl. 15, 706–722 (2009)
Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)
Henrion, R.: Body diagonalization of core matrices in three-way principal components analysis: theoretical bounds and simulation. J. Chemomet 7, 477–494 (1993)
Hitchcock, F.L.: Multiple invariants and generalized rank of a \(p\)-way matrix or tensor. J. Math. Phys. 7(1–4), 39–79 (1928)
Holtz, S., Rohwedder, T., Schneider, R.: The alternating linear scheme for tensor optimization in the tensor train format. SIAM J. Sci. Compt. 34(2), A683–A713 (2012)
Ishteva, M., Absil, P.A., Van Huffel, S., De Lathauwer, L.: Best low multlinear rank approximation of higher-order tensors, based on the Riemannian trust-region scheme. SIAM J. Matrix Anal. Appl. 31(1), 115–135 (2011)
Ishteva, M., De Lathauwer, L., Absil, P.A., Huffel, S.: Dimensionality reduction for higher-order tensors: algorithms and applications. Int. J. Pure Appl. Math. 42(3), 337–343 (2012)
Ishteva, M., De Lathauwer, L., Absil, P.A., Huffel, S.V.: Differential-geometric Newton method for the best rank-\((R_{1}, R_{2}, R_{3})\) approximation of tensors. Numer. Algor. 51, 179–194 (2009)
Jiang, J., Wu, H., Li, Y., Yu, R.: Three-way data resolution by alternating slice-wise diagonalization (ASD) method. J. Chemomet. 14, 15–36 (2000)
Khoromskij, B.N.: Tensors-structured numerical methods in scientific computing: survey on recent advances. Chemomet. Intell. Lab. Syst. 110(1), 1–19 (2012)
Kiers, H.A.L.: Towards a standardized notation and terminology in multiway analysis. J. Chemomet. 14, 105–122 (2000)
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)
Kroonenberg, P.M.: Applied Multiway Data Analysis. Wiley, New Jersey (2008)
Kroonenberg, P..M., De Leeuw, J.: Principal component analysis of three-mode data by means of alternating least squares algorithms. Psychometrika 45, 69–97 (1980)
LeCun, Y., C.Cortes, C.Burges: THE MNIST DATABASE of handwritten digits. http://yann.lecun.com/exdb/mnist/
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 84(11), 2278–2324 (1998)
Levin, J.: Three-mode factor analysis. Ph.D. thesis, University of Illinois, Urbana-Champaign (1963)
Li, J., Battaglino, C., Perros, I., Sun, J., Vuduc, R.: An input-adaptive and in-place approach to dense tensor-times-matrix multiply. In: Proceedings of the International Conference for High Performance Computing, 76, pp. 1–12. ACM (2015)
Mahoney, M.W.: Randomized algorithms for matrices and data. Found. Trends MachineLearn. 3(2), 123–224 (2011)
Mahoney, M.W., Drineas, P.: RandNLA: randomized numerical linear algebra. Commun. ACM 59(6), 80–90 (2016)
Novikov, A., Podoprikhin, D., Osokin, A., Vetrov, D.: Tensorizing neural networks. In: Annual Conference on Neural Information Processing Systems, vol. 1, pp. 442–450. ACM (2015)
Oh, J., Shin, K., E.Papalexakis, E., Faloutsos, C., Yu, H.: S-HOT: Scalable high-Order Tucker decomposition. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 761–770. ACM (2017)
Oh, S., Park, N., Jang, J., Sael, L., Kang, U.: High-performance tucker factorization on heterogeneous platforms. IEEE Trans. Parallel Distrib. Syst. 30(10), 2237–2248 (2019)
Oseledets, I.V., Tyrtyshnikov, E.E.: Breaking the curse of dimensionality, or how to use SVD in many dimensions. SIAM J. Sci. Comput. 31(5), 3744–3759 (2009)
Oseledetsv, I.V.: Tensor-train decomposition. SIAM J. Sci. Comput. 33(5), 2295–2317 (2011)
Rajbhandari, S., Nikam, A., Lai, P.W., Stock, K., Krishnamoorthy, S., Sadayappan, P.: A communication-optimal framework for contracting distributed tensors. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 375–386. IEEE (2014)
Sanderson, C., Curtin, R.: Armadillo: a template-based C++ library for linear algebra. J. Open Source Softw. 1(2), 26 (2016)
Sanderson, C., Curtin, R.: A user-friendly hybrid sparse matrix class in C++. In: International Conference on Mathematical Software, Lecture Notes in Computer Science (LNCS), vol. 10931, pp. 422–430. Springer (2018)
Savas, B., Lim, L.H.: Quasi-Newton methods on Grassmannians and multilinear approximations of tensors. SIAM J. Sci. Comput. 32(6), 3352–3393 (2009)
Schatz, M.: Distributed tensor computations: Formalizing distributions, redistributions, and algorithm derivations. Ph.D. thesis, University of Texas, Austin (2015)
Shashua, A., Levin, A.: Linear image coding for regression and classification using the tensor-rank principle. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 42–49. IEEE (2001)
Sidiropoulos, N.D., De Lathauwer, L., Xiao, F., Huang, K.J., Papalexakis, E.E., Faloutsos, C.: Tensor decomposition for signal processing and machine learning. IEEE Trans. Sig. Proc. 65(13), 3551–3582 (2017)
Smith, S., Ravindran, N., Sidiropoulos, N., Kapypis, G.: Efficient and parallel sparse tensor matrix multiplication. In: IEEE International Parallel and Distributed Processing Symposium, pp. 61–70. IEEE (2015)
Sorensen, D.: Implicit application of polynomial lters in a \(k\)-step Arnoldi method. SIAM J. Matrix Anal. Appl. 13(1), 357–385 (1992)
Stewart, G.: A Krylov-Schur algorithm for large eigenproblems. SIAM J. Matrix Anal. Appl. 23(3), 601–614 (2001)
Szlam, A., Tulloch, A., Tygert, M.: Accurate low-rank approximations via a few iterations of alternating least squares. SIAM J. Matrix Anal. Appl. 38(2), 425–433 (2017)
Tucker, L.R.: Some mathematical notes on three-mode factor analysis. Psychometrika 31, 279–311 (1966)
Vannieuwenhoven, N., Vandebril, R., Meerbergen, K.: On the truncated multilinear singular value decomposition. Technical Report TW589, Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium (2011)
Vannieuwenhoven, N., Vandebril, R., Meerbergen, K.: A new truncation strategy for the higher-order singular value decomposition. SIAM J. Sci. Comput. 34(2), A1027–A1052 (2012)
Vasilescu, M.A.O., Terzopoulos, D.: Multilinear image analysis for facial recognition. In: Object recognition supported by user interaction for service robots, vol. 2, pp. 511–514. IEEE (2002)
Vervliet, N., Debals, O., Sorber, L., Van Barel, M., De Lathauwer, L.: Tensorlab 3.0. Available online (2016). https://www.tensorlab.net
Vlasi, D., Brand, M., Pfister, H., Popovic, J.: Face transfer with multilinear models. In: ACM SIGGRAPH 2005 Papers, vol. 24, pp. 426–433. ACM (2005)
Wang, E., Zhang, Q., Shen, B., Zhang, G., Lu, X., Wu, Q., Wang, Y.: “Intel Math Kernel Library. In: High-Performance Computing on the \(Intel^{\textregistered } Xeon Phi\). Springer pp. 167–188 (2014)
Wang, S.G., Wu, M.X., Jia, Z.Z.: Matrix Inequality, 2nd edn. Science Press, Beijing (2006)
Watkins, D.: The QR algorithm revisited. SIAM Rev. 50(1), 133–145 (2008)
Xu, Y.: On the convergence of higher-order orthogonal iteration. Linear Multilinear Algebra 66(11), 2247–2265 (2018)
Young, F.W., Takane, Y., De Leeuw, J.: The principal components of mixed measurement level multivariate data: an alternating least squares method with optimal scaling features. Psychometrika 43, 279–281 (1978)
Zhang, T., Golub, G.H.: Rank-one approximation to high order tensors. SIAM J. Matrix Anal. Appl. 23(2), 534–550 (2001)
Acknowledgements
The authors would like to thank the anonymous reviewers for their valuable comments and suggestions that have greatly improved the quality of this paper.
Funding
This study was funded in part by Guangdong Key R&D Project (#2019B121204008), Beijing Natural Science Foundation (#JQ18001) and Beijing Academy of Artificial Intelligence.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Code Availability
The code used in the current study is available from the corresponding author on reasonable request.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
A Proof of Theorem 1
Based on the assumption of Theorem 1, \(\varvec{L}_{k}\) is nonsingular and \(\varvec{L}_{k}^{T}\varvec{L}_{k}\) is positive definite. Thus, the iterative form of Algorithm 3 is
i.e.,
Suppose that the full SVD of \(\varvec{A}\) is \(\varvec{A} = \varvec{U}\varvec{\varSigma } \varvec{V}^{T}\), where
Then from (A.2), we have
which can be rewritten into block form
It then follows that
Furthermore,
Here we suppose that the distance between \({\mathcal {R}}(\varvec{L}_{k})\) and \({\mathcal {R}}(\varvec{U}_{2})\) is small enough, therefore can be denoted as \(\delta _{k}\), which only depends on \(\Vert \varvec{L}_{k}^{(2)}\Vert _{2}\) (i.e., there exist two constants \(\alpha ,\ \beta >0\) such that \(\alpha \delta _{k}\le \Vert \varvec{L}_{k}^{(2)}\Vert _{2}\le \beta \delta _{k}\)). We can then obtain the lower bound of the distance between \({\mathcal {R}}(\varvec{L}_{k})\) and \({\mathcal {R}}(\varvec{U}_{1})\), which is \(\sqrt{1-\delta _{k}^{2}}\), and
where \(C_{1},\ C_{2}\) are constants independent on k and \(\delta _{k}\). From (A.5) and (A.6), there exists a constant C that is only dependent on \(C_{1},\ C_{2}\) so that the following inequality holds.
Further, from (A.7) and (A.4), assume that \(\sigma _{r}>\sigma _{r+1}\), we have
where \({\hat{C}}\) is a constant. Clearly,
and by Lemma 1, we have
Since \(\delta _{k}\) is small enough, we obtain
where \(\alpha ,\ {\tilde{C}}\) do not depend on k and \(\Vert \varvec{L}_{k}^{(2)}\Vert _{2}\).
Denote
Since we assume that \({\mathcal {R}}(\varvec{L}_{0})\) is close to \({\mathcal {R}}(\varvec{U}_{1})\) enough, \(\Vert \varvec{U}_{2}^{T}\varvec{L}_{0}\Vert _{2}\) is sufficiently small, i.e., \(\Vert \varvec{L}_{0}^{(2)}\Vert _{2} = o(1)\). In other words, we assume that \(q<1\). From (A.8), we have
for all k, which leads to
Combining with the assumption of \(\varvec{L}_{k}\), it is verified that \({\mathcal {R}}(\varvec{L}_{k})\) is orthogonal to \({\mathcal {R}}(\varvec{U}_{2})\) with \(k\rightarrow +\infty \). Since the orthogonal complement space of \({\mathcal {R}}(\varvec{U}_{2})\) is unique, we have
where \({\mathcal {R}}(\varvec{U}_{1})\) is the dominant subspace of \(\varvec{A}\). In other words, we have
where \(\varvec{L}_{k}^{\dag }\) is the pseudo-inverse of \(\varvec{L}_{k}\). Further, from the iterative form of the ALS method, we have
thus
And since the Frobenius norm \(\Vert \cdot \Vert _{F}\) is continuous,
Since \(\varvec{U}_{1}\varvec{U}_{1}^{T}\varvec{A}\) is the exact solution of low rank approximation of \(\varvec{A}\), the convergence of the ALS method is proved.
From (A.8), we further confirm the q-linear convergence of the ALS method, with approximate convergence ratio \(\sigma _{r+1}^{2}/\sigma _{r}^{2}\).
\(\square \)
B Proof of Theorem 2
The assumption of Theorem 1 implies that \(\varvec{L}_{k}\) is nonsingular at every iteration k. We assume that
is positive definite. Let \(\varepsilon \) be a positive number such that \(\sigma _{r}>\sigma _{r+1}-\varepsilon \). By (A.4), we know
which means
Clearly it holds that
under the condition that
If \(\varvec{L}_{k}^{(1)}\) is nonsingular, then (B.3) implies
It follows to see that
is a sufficient condition of (B.4).
Next we will prove that if the initial guess \(\varvec{L}_{0}\) satisfies condition (B.5), then \(\varvec{L}_{k}^{(1)}\) is nonsingular and (B.2) is satisfied at every iteration k.
Provided that \(\varvec{L}_{0}\) satisfies (B.5), we obtain
And according to the proof of Theorem 1, we know \(\varvec{L}_{1}^{(1)}\) is also nonsigular, which implies
is positive definite. Then by (B.1), we have
Analogously, we can prove that for every iteration k, \(\varvec{L}_{k}^{(1)}\) is nonsingular, i.e., \(\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)}\) is positive definite, and (B.5) is satisfied. Since (B.5) is a sufficient condition of (B.4) and (B.3), inequality (B.2) is true at every iteration k, which implies that
The rest part of the proof is analogous to the proof of Theorem 1, which is omitted for brevity.
\(\square \)
C Proof of Theorem 3
In Algorithm 4, \(\varvec{U}^{(n)}\) is obtained from the rank-\(R_{n}\) approximation of \(\varvec{A}_{(n)}\), which is done in an iterative manner and allows a tolerance parameter \(\eta _{n}\). Therefore, we have
where \(\varvec{L}^{*}\) and \(\varvec{R}^{*}\) are the same as in Algorithm 4. Note that \(\varvec{R}\) is updated by solving a multi-side least squares problem
whose exact solution is \(\varvec{R} = \varvec{A}_{(n)}^{T}(\varvec{L}^{\dagger })^{T}\). Thus
where \(\varvec{L}^{*}\varvec{L}^{*\dagger }\) represents an orthogonal projection on subspace \({\mathcal {R}}(\varvec{L}^{*})\). Consequently, by (C.1), (C.2) and (4.3), we have
which means
Combining with (4.3), we obtain (4.4).
The error analysis of Algorithm 5 can be analogously done. \(\square \)
Rights and permissions
About this article
Cite this article
Xiao, C., Yang, C. & Li, M. Efficient Alternating Least Squares Algorithms for Low Multilinear Rank Approximation of Tensors. J Sci Comput 87, 67 (2021). https://doi.org/10.1007/s10915-021-01493-0
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10915-021-01493-0
Keywords
- Low multilinear rank approximation
- Truncated Tucker decomposition
- Alternating least squares
- Parallelization