Efficient Alternating Least Squares Algorithms for Low Multilinear Rank Approximation of Tensors

788 Accesses
Explore all metrics

Abstract

The low multilinear rank approximation, also known as the truncated Tucker decomposition, has been extensively utilized in many applications that involve higher-order tensors. Popular methods for low multilinear rank approximation usually rely directly on matrix SVD, therefore often suffer from the notorious intermediate data explosion issue and are not easy to parallelize, especially when the input tensor is large. In this paper, we propose a new class of truncated HOSVD algorithms based on alternating least squares (ALS) for efficiently computing the low multilinear rank approximation of tensors. The proposed ALS-based approaches are able to eliminate the redundant computations of the singular vectors of intermediate matrices and are therefore free of data explosion. Also, the new methods are more flexible with adjustable convergence tolerance and are intrinsically parallelizable on high-performance computers. Theoretical analysis reveals that the ALS iteration in the proposed algorithms is q-linear convergent with a relatively wide convergence region. Numerical experiments with large-scale tensors from both synthetic and real-world applications demonstrate that ALS-based methods can substantially reduce the total cost of the original ones and are highly scalable for parallel computing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Krylov-Schur-like method for computing the best rank-(r₁,r₂,r₃) approximation of large and sparse tensors

Article Open access 27 April 2022

Fast multiple rank-constrained matrix approximation

Article 16 November 2023

A Randomized Algorithm for Tensor Singular Value Decomposition Using an Arbitrary Number of Passes

Article 13 December 2023

Availability of Data and Material

The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

References

Intel Math Kernel Library Reference Manual [EQ/OL]. http://developer.intel.com
OpenFoam 2018, Version 6. https://openfoam.org
OpenMP application program interface, Version 4.0, OpenMP Architecture Review Board (July, 2013). http://www.openmp.org
Austin, W., Ballard, G., Kolda, T.G.: Parallel tensor compression for large-scale scientific data. In: IEEE International Parallel and Distributed Processing Symposium, pp. 912–922. IEEE (2016)
Bader, B.W., Kolda, T.G., et.al.: MATLAB Tensor Toolbox Version 3.1. Available online (2019). https://www.tensortoolbox.org
Baglama, J., Reichel, L.: Augmented implicitly restarted Lanczos bidiagonalization methods. SIAM J. Sci. Comput. 27(1), 19–42 (2005)
Article MathSciNet MATH Google Scholar
Beckmann, C., Smith, S.: Tensorial extensions of independent component analysis for multisubject FMRI analysis. Neuroimage 25(1), 294–311 (2005)
Article Google Scholar
Beylkin, G., Mohlenkamp, M.J.: Numerical operator caculus in higher dimensions. Proc. Natl. Acad. Sci. 99, 10246–10251 (2002)
Article MATH Google Scholar
Bro, R.: Review on multiway analysis in chemistry-2000–2005. Crit. Rev. Anal. Chem. 36, 279–293 (2006)
Article Google Scholar
Savas, B., Eldén, L.: Handwritten digit classification using higher order singular value decomposition. Pattern Recognition 40, 993–1003 (2007)
Article MATH Google Scholar
Burggraf, R.: Analytical and numerical studies of the structure of steady separated flows. J. Fluid Mech. 24(1), 113–151 (1966)
Article Google Scholar
Carroll, J.D., Chang, J.J.: Analysis of individual differences in multidimensional scaling via an $N$-way generalization of “Eckart-Young’’ decomposition. Psychometrika 35, 283–319 (1970)
Article MATH Google Scholar
Charalampous, K., Gasteratos, A.: A tensor-based deep learning framework. Image Vis. Comput. 32(11), 916–929 (2014)
Article Google Scholar
Cullum, J., Willoughby, R., Lake, M.: A Lanczos algorithm for computing singular values and vectors of large matrices. SIAM J. Sci. Stat. Comput. 4(2), 197–215 (1983)
Article MATH Google Scholar
De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)
Article MathSciNet MATH Google Scholar
De Lathauwer, L., De Moor, B., Vandewalle, J.: On the best rank-1 and rank-$(r_{1}, r_{2}, \cdots, r_{N})$ approximation of higher-order tensors. SIAM J. Matrix Anal. Appl. 21, 1324–1342 (2000)
MathSciNet MATH Google Scholar
De Lathauwer, L., Vandewalle, J.: Dimensionality reduction in higher-order signal processing and rank-$(r_{1}, r_{2}, r_{N})$ reduction in multilinear algebra. Linear Algebra Appl. 391, 31–55 (2004)
Article MathSciNet MATH Google Scholar
De Leeuw, J., Young, F., Takane, Y.: Additive structure in qualitative data: an alternating least squares method with optimal scaling features. Psychometrika 41, 471–503 (1976)
Article MATH Google Scholar
De Silva, V., Lim, L.H.: Tensor rank and the ill-posedness of the best low-rank approximation problem. SIAM J. Matrix Anal. Appl. 30(3), 1084–1127 (2008)
Article MathSciNet MATH Google Scholar
Ding, W., Wei, Y.: Solving multi-linear systems with $\cal{M}$-tensors. J. Sci. Comput. 68, 689–715 (2016)
Article MathSciNet MATH Google Scholar
Elden, L., Savas, B.: A Newton-Grassmann method for computing the best multilinear rank-$(r_{1}, r_{2}, r_{3})$ approximation of a tensor. SIAM J. Matrix Anal. Appl. 31(2), 248–271 (2009)
Article MathSciNet MATH Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. Johns Hopkins University Press, Baltimore (2008)
MATH Google Scholar
Grasedyck, L.: Hierarchical singular value decomposition of tensors. SIAM J. Matrix Anal. Appl. 31(4), 2029–2054 (2010)
Article MathSciNet MATH Google Scholar
Hackbusch, W.: Numerical tensor calculus. Acta Numer. 23, 651–742 (2014)
Article MathSciNet MATH Google Scholar
Hackbusch, W., Kühn, S.: A new scheme for the tensor representation. J. Fourier Anal. Appl. 15, 706–722 (2009)
Article MathSciNet MATH Google Scholar
Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)
Article MathSciNet MATH Google Scholar
Henrion, R.: Body diagonalization of core matrices in three-way principal components analysis: theoretical bounds and simulation. J. Chemomet 7, 477–494 (1993)
Article Google Scholar
Hitchcock, F.L.: Multiple invariants and generalized rank of a $p$-way matrix or tensor. J. Math. Phys. 7(1–4), 39–79 (1928)
Article MATH Google Scholar
Holtz, S., Rohwedder, T., Schneider, R.: The alternating linear scheme for tensor optimization in the tensor train format. SIAM J. Sci. Compt. 34(2), A683–A713 (2012)
Article MathSciNet MATH Google Scholar
Ishteva, M., Absil, P.A., Van Huffel, S., De Lathauwer, L.: Best low multlinear rank approximation of higher-order tensors, based on the Riemannian trust-region scheme. SIAM J. Matrix Anal. Appl. 31(1), 115–135 (2011)
Article MATH Google Scholar
Ishteva, M., De Lathauwer, L., Absil, P.A., Huffel, S.: Dimensionality reduction for higher-order tensors: algorithms and applications. Int. J. Pure Appl. Math. 42(3), 337–343 (2012)
MathSciNet MATH Google Scholar
Ishteva, M., De Lathauwer, L., Absil, P.A., Huffel, S.V.: Differential-geometric Newton method for the best rank-$(R_{1}, R_{2}, R_{3})$ approximation of tensors. Numer. Algor. 51, 179–194 (2009)
Article MATH Google Scholar
Jiang, J., Wu, H., Li, Y., Yu, R.: Three-way data resolution by alternating slice-wise diagonalization (ASD) method. J. Chemomet. 14, 15–36 (2000)
Article Google Scholar
Khoromskij, B.N.: Tensors-structured numerical methods in scientific computing: survey on recent advances. Chemomet. Intell. Lab. Syst. 110(1), 1–19 (2012)
Article Google Scholar
Kiers, H.A.L.: Towards a standardized notation and terminology in multiway analysis. J. Chemomet. 14, 105–122 (2000)
Article Google Scholar
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)
Article MathSciNet MATH Google Scholar
Kroonenberg, P.M.: Applied Multiway Data Analysis. Wiley, New Jersey (2008)
Book MATH Google Scholar
Kroonenberg, P..M., De Leeuw, J.: Principal component analysis of three-mode data by means of alternating least squares algorithms. Psychometrika 45, 69–97 (1980)
Article MathSciNet MATH Google Scholar
LeCun, Y., C.Cortes, C.Burges: THE MNIST DATABASE of handwritten digits. http://yann.lecun.com/exdb/mnist/
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 84(11), 2278–2324 (1998)
Article Google Scholar
Levin, J.: Three-mode factor analysis. Ph.D. thesis, University of Illinois, Urbana-Champaign (1963)
Li, J., Battaglino, C., Perros, I., Sun, J., Vuduc, R.: An input-adaptive and in-place approach to dense tensor-times-matrix multiply. In: Proceedings of the International Conference for High Performance Computing, 76, pp. 1–12. ACM (2015)
Mahoney, M.W.: Randomized algorithms for matrices and data. Found. Trends MachineLearn. 3(2), 123–224 (2011)
MATH Google Scholar
Mahoney, M.W., Drineas, P.: RandNLA: randomized numerical linear algebra. Commun. ACM 59(6), 80–90 (2016)
Article Google Scholar
Novikov, A., Podoprikhin, D., Osokin, A., Vetrov, D.: Tensorizing neural networks. In: Annual Conference on Neural Information Processing Systems, vol. 1, pp. 442–450. ACM (2015)
Oh, J., Shin, K., E.Papalexakis, E., Faloutsos, C., Yu, H.: S-HOT: Scalable high-Order Tucker decomposition. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 761–770. ACM (2017)
Oh, S., Park, N., Jang, J., Sael, L., Kang, U.: High-performance tucker factorization on heterogeneous platforms. IEEE Trans. Parallel Distrib. Syst. 30(10), 2237–2248 (2019)
Article Google Scholar
Oseledets, I.V., Tyrtyshnikov, E.E.: Breaking the curse of dimensionality, or how to use SVD in many dimensions. SIAM J. Sci. Comput. 31(5), 3744–3759 (2009)
Article MathSciNet MATH Google Scholar
Oseledetsv, I.V.: Tensor-train decomposition. SIAM J. Sci. Comput. 33(5), 2295–2317 (2011)
Article MathSciNet MATH Google Scholar
Rajbhandari, S., Nikam, A., Lai, P.W., Stock, K., Krishnamoorthy, S., Sadayappan, P.: A communication-optimal framework for contracting distributed tensors. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 375–386. IEEE (2014)
Sanderson, C., Curtin, R.: Armadillo: a template-based C++ library for linear algebra. J. Open Source Softw. 1(2), 26 (2016)
Article Google Scholar
Sanderson, C., Curtin, R.: A user-friendly hybrid sparse matrix class in C++. In: International Conference on Mathematical Software, Lecture Notes in Computer Science (LNCS), vol. 10931, pp. 422–430. Springer (2018)
Savas, B., Lim, L.H.: Quasi-Newton methods on Grassmannians and multilinear approximations of tensors. SIAM J. Sci. Comput. 32(6), 3352–3393 (2009)
Article MathSciNet MATH Google Scholar
Schatz, M.: Distributed tensor computations: Formalizing distributions, redistributions, and algorithm derivations. Ph.D. thesis, University of Texas, Austin (2015)
Shashua, A., Levin, A.: Linear image coding for regression and classification using the tensor-rank principle. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 42–49. IEEE (2001)
Sidiropoulos, N.D., De Lathauwer, L., Xiao, F., Huang, K.J., Papalexakis, E.E., Faloutsos, C.: Tensor decomposition for signal processing and machine learning. IEEE Trans. Sig. Proc. 65(13), 3551–3582 (2017)
Article MathSciNet MATH Google Scholar
Smith, S., Ravindran, N., Sidiropoulos, N., Kapypis, G.: Efficient and parallel sparse tensor matrix multiplication. In: IEEE International Parallel and Distributed Processing Symposium, pp. 61–70. IEEE (2015)
Sorensen, D.: Implicit application of polynomial lters in a $k$-step Arnoldi method. SIAM J. Matrix Anal. Appl. 13(1), 357–385 (1992)
Article MathSciNet MATH Google Scholar
Stewart, G.: A Krylov-Schur algorithm for large eigenproblems. SIAM J. Matrix Anal. Appl. 23(3), 601–614 (2001)
Article MathSciNet MATH Google Scholar
Szlam, A., Tulloch, A., Tygert, M.: Accurate low-rank approximations via a few iterations of alternating least squares. SIAM J. Matrix Anal. Appl. 38(2), 425–433 (2017)
Article MathSciNet MATH Google Scholar
Tucker, L.R.: Some mathematical notes on three-mode factor analysis. Psychometrika 31, 279–311 (1966)
Article MathSciNet Google Scholar
Vannieuwenhoven, N., Vandebril, R., Meerbergen, K.: On the truncated multilinear singular value decomposition. Technical Report TW589, Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium (2011)
Vannieuwenhoven, N., Vandebril, R., Meerbergen, K.: A new truncation strategy for the higher-order singular value decomposition. SIAM J. Sci. Comput. 34(2), A1027–A1052 (2012)
Article MathSciNet MATH Google Scholar
Vasilescu, M.A.O., Terzopoulos, D.: Multilinear image analysis for facial recognition. In: Object recognition supported by user interaction for service robots, vol. 2, pp. 511–514. IEEE (2002)
Vervliet, N., Debals, O., Sorber, L., Van Barel, M., De Lathauwer, L.: Tensorlab 3.0. Available online (2016). https://www.tensorlab.net
Vlasi, D., Brand, M., Pfister, H., Popovic, J.: Face transfer with multilinear models. In: ACM SIGGRAPH 2005 Papers, vol. 24, pp. 426–433. ACM (2005)
Wang, E., Zhang, Q., Shen, B., Zhang, G., Lu, X., Wu, Q., Wang, Y.: “Intel Math Kernel Library. In: High-Performance Computing on the $Intel^{\textregistered } Xeon Phi$. Springer pp. 167–188 (2014)
Wang, S.G., Wu, M.X., Jia, Z.Z.: Matrix Inequality, 2nd edn. Science Press, Beijing (2006)
Google Scholar
Watkins, D.: The QR algorithm revisited. SIAM Rev. 50(1), 133–145 (2008)
Article MathSciNet MATH Google Scholar
Xu, Y.: On the convergence of higher-order orthogonal iteration. Linear Multilinear Algebra 66(11), 2247–2265 (2018)
Article MathSciNet MATH Google Scholar
Young, F.W., Takane, Y., De Leeuw, J.: The principal components of mixed measurement level multivariate data: an alternating least squares method with optimal scaling features. Psychometrika 43, 279–281 (1978)
Article MATH Google Scholar
Zhang, T., Golub, G.H.: Rank-one approximation to high order tensors. SIAM J. Matrix Anal. Appl. 23(2), 534–550 (2001)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions that have greatly improved the quality of this paper.

Funding

This study was funded in part by Guangdong Key R&D Project (#2019B121204008), Beijing Natural Science Foundation (#JQ18001) and Beijing Academy of Artificial Intelligence.

Author information

Authors and Affiliations

CAPT and CCSE, School of Mathematical Sciences, Peking University, Beijing, 100871, China
Chuanfu Xiao
CAPT and CCSE, School of Mathematical Sciences, & National Engineering Laboratory for Big Data Analysis and Applications, Peking University, Beijing, 100871, China
Chao Yang
Institute of Software, Chinese Academy of Sciences, Beijing, 100190, China
Min Li

Authors

Chuanfu Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Chao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Min Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chao Yang.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Code Availability

The code used in the current study is available from the corresponding author on reasonable request.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Proof of Theorem 1

Based on the assumption of Theorem 1, $\varvec{L}_{k}$ is nonsingular and $\varvec{L}_{k}^{T}\varvec{L}_{k}$ is positive definite. Thus, the iterative form of Algorithm 3 is

$$\begin{aligned} \begin{aligned} \varvec{R}_{k} = \varvec{A}^{T}\varvec{L}_{k}(\varvec{L}_{k}^{T}\varvec{L}_{k})^{-1},\\ \varvec{L}_{k+1} = \varvec{AR}_{k}(\varvec{R}_{k}^{T}\varvec{R}_{k})^{-1}, \end{aligned} \end{aligned}$$

(A.1)

i.e.,

$$\begin{aligned} \varvec{L}_{k+1} = \varvec{AA}^{T}\varvec{L}_{k}(\varvec{L}_{k}^{T}\varvec{AA}^{T}\varvec{L}_{k})^{-1}(\varvec{L}_{k}^{T}\varvec{L}_{k}). \end{aligned}$$

(A.2)

Suppose that the full SVD of $\varvec{A}$ is $\varvec{A} = \varvec{U}\varvec{\varSigma } \varvec{V}^{T}$, where

$$\begin{aligned} \varvec{U} = [\varvec{U}_{1},\varvec{U}_{2}],\ \varvec{V} = [\varvec{V}_{1},\varvec{V}_{2}], \, \varvec{\varSigma } = \left( \begin{array}{cc} \varvec{\varSigma }_{1} &{} 0 \\ 0 &{} \varvec{\varSigma }_{2} \\ \end{array} \right) . \end{aligned}$$

Then from (A.2), we have

$$\begin{aligned} \varvec{U}^{T}\varvec{L}_{k+1} = \varvec{U}^{T}\varvec{AVV}^{T}\varvec{A}^{T}\varvec{UU}^{T}\varvec{L}_{k}(\varvec{L}_{k}^{T}\varvec{UU}^{T}\varvec{AVV}^{T}\varvec{A}^{T}\varvec{UU}^{T}\varvec{L}_{k})^{-1}(\varvec{L}_{k}^{T}\varvec{UU}^{T}\varvec{L}_{k}), \end{aligned}$$

(A.3)

which can be rewritten into block form

$$\begin{aligned} \left( \begin{array}{c} \varvec{L}_{k+1}^{(1)} \\ \varvec{L}_{k+1}^{(2)} \\ \end{array} \right) = \left( \begin{array}{cc} \varvec{\varSigma }_{1}^{2} &{} 0 \\ 0 &{} \varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T} \\ \end{array} \right) \left( \begin{array}{c} \varvec{L}_{k}^{(1)} \\ \varvec{L}_{k}^{(2)} \\ \end{array} \right) (\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)})^{-1}(\varvec{L}_{k}^{(1)T}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{L}_{k}^{(2)}). \end{aligned}$$

It then follows that

$$\begin{aligned} \begin{aligned} \varvec{L}_{k+1}^{(1)}&= \varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)}(\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)})^{-1}(\varvec{L}_{k}^{(1)T}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{L}_{k}^{(2)}),\\ \varvec{L}_{k+1}^{(2)}&= \varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)}(\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)})^{-1}(\varvec{L}_{k}^{(1)T}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{L}_{k}^{(2)}). \end{aligned} \end{aligned}$$

(A.4)

Furthermore,

$$\begin{aligned} \begin{aligned}&(\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)})^{-1} \\&\quad = (\varvec{I}+(\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)})^{-1}(\varvec{L}_{k}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)}))^{-1}(\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)})^{-1} \\&\quad = (\varvec{I}+\sum \limits _{n=1}^{\infty }(-1)^{n}((\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)})^{-1}(\varvec{L}_{k}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)}))^{n})(\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)})^{-1}. \end{aligned} \end{aligned}$$

(A.5)

Here we suppose that the distance between ${\mathcal {R}}(\varvec{L}_{k})$ and ${\mathcal {R}}(\varvec{U}_{2})$ is small enough, therefore can be denoted as $\delta _{k}$, which only depends on $\Vert \varvec{L}_{k}^{(2)}\Vert _{2}$ (i.e., there exist two constants $\alpha ,\ \beta >0$ such that $\alpha \delta _{k}\le \Vert \varvec{L}_{k}^{(2)}\Vert _{2}\le \beta \delta _{k}$). We can then obtain the lower bound of the distance between ${\mathcal {R}}(\varvec{L}_{k})$ and ${\mathcal {R}}(\varvec{U}_{1})$, which is $\sqrt{1-\delta _{k}^{2}}$, and

$$\begin{aligned} \begin{aligned} \Vert \varvec{L}_{k}^{(1)}\Vert _{2}\le C_{1}, \quad \Vert (\varvec{L}_{k}^{(1)})^{-1}\Vert _{2}\le \frac{C_{2}}{\sqrt{1-\delta _{k}^{2}}}, \end{aligned} \end{aligned}$$

(A.6)

where $C_{1},\ C_{2}$ are constants independent on k and $\delta _{k}$. From (A.5) and (A.6), there exists a constant C that is only dependent on $C_{1},\ C_{2}$ so that the following inequality holds.

$$\begin{aligned} (\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)})^{-1}\le (\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)})^{-1}+\frac{C\delta _{k}^{2}}{(1-2\delta _{k}^{2})^{2}}, \end{aligned}$$

(A.7)

Further, from (A.7) and (A.4), assume that $\sigma _{r}>\sigma _{r+1}$, we have

$$\begin{aligned} \begin{aligned} \varvec{L}_{k+1}^{(2)}&\le \frac{\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}}{\sigma _{r}^{2}}\varvec{L}_{k}^{(2)}(\varvec{L}_{k}^{(1)T}\frac{\varvec{\varSigma }_{1}^{2}}{\sigma _{r}^{2}}\varvec{L}_{k}^{(1)})^{-1} (\varvec{L}_{k}^{(1)T}\varvec{L}_{k}^{(1)})\\&\quad +{\hat{C}}\left( \frac{\delta _{k}^{2}}{(1-2\delta _{k}^{2})^{2}}+\frac{\delta _{k}^{2}}{1-\delta _{k}^{2}}+\frac{\delta _{k}^{4}}{(1-2\delta _{k}^{2})^{2}}\right) , \end{aligned} \end{aligned}$$

where ${\hat{C}}$ is a constant. Clearly,

$$\begin{aligned} 0<\varvec{L}_{k}^{(1)T}\varvec{L}_{k}^{(1)}\le \varvec{L}_{k}^{(1)T}\frac{\varvec{\varSigma }_{1}^{2}}{\sigma _{r}^{2}}\varvec{L}_{k}^{(1)}, \end{aligned}$$

and by Lemma 1, we have

$$\begin{aligned} \Vert (\varvec{L}_{k}^{(1)T}\frac{\varvec{\varSigma }_{1}^{2}}{\sigma _{r}^{2}}\varvec{L}_{k}^{(1)})^{-1}(\varvec{L}_{k}^{(1)T}\varvec{L}_{k}^{(1)})\Vert _{2}\le 1. \end{aligned}$$

Since $\delta _{k}$ is small enough, we obtain

$$\begin{aligned} \begin{aligned} \Vert \varvec{L}_{k+1}^{(2)}\Vert _{2}\le \frac{\sigma _{r+1}^{2}}{\sigma _{r}^{2}}\Vert \varvec{L}_{k}^{(2)}\Vert _{2}+{\tilde{C}}\delta _{k}^{2} \le \frac{\sigma _{r+1}^{2}}{\sigma _{r}^{2}}\Vert \varvec{L}_{k}^{(2)}\Vert _{2}+\frac{{\tilde{C}}}{\alpha ^{2}}\Vert \varvec{L}_{k}^{(2)}\Vert _{2}^{2}, \end{aligned} \end{aligned}$$

(A.8)

where $\alpha ,\ {\tilde{C}}$ do not depend on k and $\Vert \varvec{L}_{k}^{(2)}\Vert _{2}$.

Denote

$$\begin{aligned} q = \frac{\sigma _{r+1}^{2}}{\sigma _{r}^{2}}+\frac{{\tilde{C}}}{\alpha ^{2}}\Vert \varvec{L}_{0}^{(2)}\Vert _{2}. \end{aligned}$$

Since we assume that ${\mathcal {R}}(\varvec{L}_{0})$ is close to ${\mathcal {R}}(\varvec{U}_{1})$ enough, $\Vert \varvec{U}_{2}^{T}\varvec{L}_{0}\Vert _{2}$ is sufficiently small, i.e., $\Vert \varvec{L}_{0}^{(2)}\Vert _{2} = o(1)$. In other words, we assume that $q<1$. From (A.8), we have

$$\begin{aligned} \Vert \varvec{L}_{k+1}^{(2)}\Vert _{2}\le q\Vert \varvec{L}_{k}^{(2)}\Vert _{2} \end{aligned}$$

for all k, which leads to

$$\begin{aligned} \lim \limits _{k\rightarrow +\infty }\Vert \varvec{L}_{k}^{(2)}\Vert _{2}\rightarrow 0. \end{aligned}$$

Combining with the assumption of $\varvec{L}_{k}$, it is verified that ${\mathcal {R}}(\varvec{L}_{k})$ is orthogonal to ${\mathcal {R}}(\varvec{U}_{2})$ with $k\rightarrow +\infty $. Since the orthogonal complement space of ${\mathcal {R}}(\varvec{U}_{2})$ is unique, we have

$$\begin{aligned} {\mathcal {R}}(\varvec{L}_{k}) = {\mathcal {R}}(\varvec{U}_{1}),\ \ k\rightarrow +\infty , \end{aligned}$$

where ${\mathcal {R}}(\varvec{U}_{1})$ is the dominant subspace of $\varvec{A}$. In other words, we have

$$\begin{aligned} \begin{aligned} \lim \limits _{k\rightarrow +\infty }\Vert \varvec{L}_{k}\varvec{L}_{k}^{\dag }-\varvec{U}_{1}\varvec{U}_{1}^{T}\Vert _{2}=0, \end{aligned} \end{aligned}$$

where $\varvec{L}_{k}^{\dag }$ is the pseudo-inverse of $\varvec{L}_{k}$. Further, from the iterative form of the ALS method, we have

$$\begin{aligned} \varvec{R}_{k} = \varvec{A}^{T}(\varvec{L}_{k}^{\dag })^{T}, \end{aligned}$$

thus

$$\begin{aligned} \varvec{L}_{k}\varvec{R}_{k}^{T}=\varvec{L}_{k}\varvec{L}_{k}^{\dag }\varvec{A}\rightarrow \varvec{U}_{1}\varvec{U}_{1}^{T}\varvec{A},\ k\rightarrow +\infty . \end{aligned}$$

And since the Frobenius norm $\Vert \cdot \Vert _{F}$ is continuous,

$$\begin{aligned} \lim \limits _{k\rightarrow +\infty }\Vert \varvec{A} - \varvec{L}_{k}\varvec{R}_{k}^{T}\Vert _{F}=\Vert \varvec{A} - \varvec{U}_{1}\varvec{U}_{1}^{T}\varvec{A}\Vert _{F}. \end{aligned}$$

Since $\varvec{U}_{1}\varvec{U}_{1}^{T}\varvec{A}$ is the exact solution of low rank approximation of $\varvec{A}$, the convergence of the ALS method is proved.

From (A.8), we further confirm the q-linear convergence of the ALS method, with approximate convergence ratio $\sigma _{r+1}^{2}/\sigma _{r}^{2}$.

$\square $

B Proof of Theorem 2

The assumption of Theorem 1 implies that $\varvec{L}_{k}$ is nonsingular at every iteration k. We assume that

$$\begin{aligned} \varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)} \end{aligned}$$

is positive definite. Let $\varepsilon $ be a positive number such that $\sigma _{r}>\sigma _{r+1}-\varepsilon $. By (A.4), we know

(B.1)

which means

$$\begin{aligned} \begin{aligned} \varvec{L}_{k+1}^{(1)}&= \frac{\varvec{\varSigma }_{1}^{2}}{(\sigma _{r}-\varepsilon )^{2}}\varvec{L}_{k}^{(1)}(\varvec{L}_{k}^{(1)T}\frac{\varvec{\varSigma }_{1}^{2}}{(\sigma _{r}-\varepsilon )^{2}}\varvec{L}_{k}^{(1)}\\&\quad +\varvec{L}_{k}^{(2)T}\frac{\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}}{(\sigma _{r}-\varepsilon )^{2}}\varvec{L}_{k}^{(2)})^{-1}(\varvec{L}_{k}^{(1)T}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{L}_{k}^{(2)}),\\ \varvec{L}_{k+1}^{(2)}&= \frac{\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}}{(\sigma _{r}-\varepsilon )^{2}}\varvec{L}_{k}^{(2)}(\varvec{L}_{k}^{(1)T}\frac{\varvec{\varSigma }_{1}^{2}}{(\sigma _{r}-\varepsilon )^{2}}\varvec{L}_{k}^{(1)}\\&\quad +\varvec{L}_{k}^{(2)T}\frac{\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}}{(\sigma _{r}-\varepsilon )^{2}}\varvec{L}_{k}^{(2)})^{-1}(\varvec{L}_{k}^{(1)T}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{L}_{k}^{(2)}). \end{aligned} \end{aligned}$$

Clearly it holds that

$$\begin{aligned} \begin{aligned} \Vert \varvec{L}_{k+1}^{(2)}\Vert _{2}\le \frac{\sigma _{r+1}^{2}}{(\sigma _{r}-\varepsilon )^{2}}\Vert \varvec{L}_{k}^{(2)}\Vert _{2} \end{aligned} \end{aligned}$$

(B.2)

under the condition that

$$\begin{aligned} \varvec{L}_{k}^{(1)T}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{L}_{k}^{(2)}\le \varvec{L}_{k}^{(1)T}\frac{\varvec{\varSigma }_{1}^{2}}{(\sigma _{r}-\varepsilon )^{2}}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\frac{\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}}{(\sigma _{r}-\varepsilon )^{2}}\varvec{L}_{k}^{(2)}. \end{aligned}$$

(B.3)

If $\varvec{L}_{k}^{(1)}$ is nonsingular, then (B.3) implies

$$\begin{aligned} (\varvec{L}_{k}^{(2)}(\varvec{L}_{k}^{(1)})^{-1})^{T}\left( \varvec{I}-\frac{\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}}{(\sigma _{r}-\varepsilon )^{2}}\right) (\varvec{L}_{k}^{(2)}\varvec{L}_{k}^{(1)-1})\le \frac{\varvec{\varSigma }_{1}^{2}}{(\sigma _{r}-\varepsilon )^{2}}-\varvec{I}. \end{aligned}$$

(B.4)

It follows to see that

$$\begin{aligned} \Vert \varvec{L}_{k}^{(2)}(\varvec{L}_{k}^{(1)})^{-1}\Vert _{2}\le \sqrt{\frac{\sigma _{r}^{2}-(\sigma _{r}-\varepsilon )^{2}}{(\sigma _{r}-\varepsilon )^{2}-\sigma _{min}^{2}}} \end{aligned}$$

(B.5)

is a sufficient condition of (B.4).

Next we will prove that if the initial guess $\varvec{L}_{0}$ satisfies condition (B.5), then $\varvec{L}_{k}^{(1)}$ is nonsingular and (B.2) is satisfied at every iteration k.

Provided that $\varvec{L}_{0}$ satisfies (B.5), we obtain

$$\begin{aligned} \Vert \varvec{L}_{1}^{(2)}\Vert _{2}\le \frac{\sigma _{r+1}^{2}}{(\sigma _{r}-\varepsilon )^{2}}\Vert \varvec{L}_{0}^{(2)}\Vert _{2}. \end{aligned}$$

(B.6)

And according to the proof of Theorem 1, we know $\varvec{L}_{1}^{(1)}$ is also nonsigular, which implies

$$\begin{aligned} \varvec{L}_{1}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{1}^{(1)}+\varvec{L}_{1}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{1}^{(2)} \end{aligned}$$

is positive definite. Then by (B.1), we have

$$\begin{aligned} \Vert \varvec{L}_{1}^{(2)}(\varvec{L}_{1}^{(1)})^{-1}\Vert _{2}\le & {} \Vert \varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{0}^{(2)}(\varvec{L}_{0}^{(1)})^{-1}\varvec{\varSigma }_{1}^{-2}\Vert _{2}\nonumber \\\le & {} \frac{\sigma _{r+1}^{2}}{\sigma _{r}^{2}}\Vert \varvec{L}_{0}^{(2)}(\varvec{L}_{0}^{(1)})^{-1}\Vert _{2}\nonumber \\\le & {} \sqrt{\frac{\sigma _{r}^{2}-(\sigma _{r}-\varepsilon )^{2}}{(\sigma _{r}-\varepsilon )^{2}-\sigma _{min}^{2}}}. \end{aligned}$$

(B.7)

Analogously, we can prove that for every iteration k, $\varvec{L}_{k}^{(1)}$ is nonsingular, i.e., $\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)}$ is positive definite, and (B.5) is satisfied. Since (B.5) is a sufficient condition of (B.4) and (B.3), inequality (B.2) is true at every iteration k, which implies that

$$\begin{aligned} \lim \limits _{k\rightarrow 0}\Vert \varvec{L}_{k}\varvec{L}_{k}^{\dagger }-\varvec{U}_{1}\varvec{U}_{1}^{T}\Vert _{2}=0. \end{aligned}$$

The rest part of the proof is analogous to the proof of Theorem 1, which is omitted for brevity.

$\square $

C Proof of Theorem 3

In Algorithm 4, $\varvec{U}^{(n)}$ is obtained from the rank-$R_{n}$ approximation of $\varvec{A}_{(n)}$, which is done in an iterative manner and allows a tolerance parameter $\eta _{n}$. Therefore, we have

$$\begin{aligned} \Vert \varvec{A}_{(n)} - \varvec{L}^{*}\varvec{R}^{*T}\Vert _{F}^{2}\le \eta _{n}^{2}\Vert \varvec{{\mathcal {A}}}\Vert _{F}^{2} + \gamma _{n}, \end{aligned}$$

(C.1)

where $\varvec{L}^{*}$ and $\varvec{R}^{*}$ are the same as in Algorithm 4. Note that $\varvec{R}$ is updated by solving a multi-side least squares problem

$$\begin{aligned} \min \limits _{\varvec{R}}\Vert \varvec{A}_{(n)} - \varvec{LR}^{T}\Vert _{F}, \end{aligned}$$

whose exact solution is $\varvec{R} = \varvec{A}_{(n)}^{T}(\varvec{L}^{\dagger })^{T}$. Thus

$$\begin{aligned} \Vert \varvec{A}_{(n)} - \varvec{L}^{*}\varvec{R}^{*T}\Vert _{F} = \Vert \varvec{A}_{(n)} - \varvec{L}^{*}\varvec{L}^{*\dagger }\varvec{A}_{(n)}\Vert _{F}, \end{aligned}$$

(C.2)

where $\varvec{L}^{*}\varvec{L}^{*\dagger }$ represents an orthogonal projection on subspace ${\mathcal {R}}(\varvec{L}^{*})$. Consequently, by (C.1), (C.2) and (4.3), we have

$$\begin{aligned} \Vert \hat{\varvec{{\mathcal {A}}}} - \varvec{{\mathcal {A}}}\Vert _{F}^{2}\le \sum \limits _{n=1}^{N}(\eta _{n}^{2}\Vert \varvec{A}_{(n)}\Vert _{F}^{2}+\gamma _{n}), \end{aligned}$$

which means

$$\begin{aligned} \frac{\Vert \hat{\varvec{{\mathcal {A}}}} - \varvec{{\mathcal {A}}}\Vert _{F}^{2}}{\Vert \varvec{{\mathcal {A}}}\Vert _{F}^{2}}\le \sum \limits _{n=1}^{N}\left( \eta _{n}^{2}+\frac{\gamma _{n}}{\Vert \varvec{{\mathcal {A}}}\Vert _{F}^{2}}\right) . \end{aligned}$$

Combining with (4.3), we obtain (4.4).

The error analysis of Algorithm 5 can be analogously done. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiao, C., Yang, C. & Li, M. Efficient Alternating Least Squares Algorithms for Low Multilinear Rank Approximation of Tensors. J Sci Comput 87, 67 (2021). https://doi.org/10.1007/s10915-021-01493-0

Download citation

Received: 21 June 2020
Revised: 21 March 2021
Accepted: 11 April 2021
Published: 21 April 2021
DOI: https://doi.org/10.1007/s10915-021-01493-0

Efficient Alternating Least Squares Algorithms for Low Multilinear Rank Approximation of Tensors

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Krylov-Schur-like method for computing the best rank-(r₁,r₂,r₃) approximation of large and sparse tensors

Fast multiple rank-constrained matrix approximation

A Randomized Algorithm for Tensor Singular Value Decomposition Using an Arbitrary Number of Passes

Availability of Data and Material

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Code Availability

Additional information

Publisher's Note

Appendices

A Proof of Theorem 1

B Proof of Theorem 2

C Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Navigation

Efficient Alternating Least Squares Algorithms for Low Multilinear Rank Approximation of Tensors

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Krylov-Schur-like method for computing the best rank-(r1,r2,r3) approximation of large and sparse tensors

Fast multiple rank-constrained matrix approximation

A Randomized Algorithm for Tensor Singular Value Decomposition Using an Arbitrary Number of Passes

Availability of Data and Material

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Code Availability

Additional information

Publisher's Note

Appendices

A Proof of Theorem 1

B Proof of Theorem 2

C Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now

Search

Navigation

A Krylov-Schur-like method for computing the best rank-(r₁,r₂,r₃) approximation of large and sparse tensors