Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Efficient Alternating Least Squares Algorithms for Low Multilinear Rank Approximation of Tensors

  • Published:
Journal of Scientific Computing Aims and scope Submit manuscript

Abstract

The low multilinear rank approximation, also known as the truncated Tucker decomposition, has been extensively utilized in many applications that involve higher-order tensors. Popular methods for low multilinear rank approximation usually rely directly on matrix SVD, therefore often suffer from the notorious intermediate data explosion issue and are not easy to parallelize, especially when the input tensor is large. In this paper, we propose a new class of truncated HOSVD algorithms based on alternating least squares (ALS) for efficiently computing the low multilinear rank approximation of tensors. The proposed ALS-based approaches are able to eliminate the redundant computations of the singular vectors of intermediate matrices and are therefore free of data explosion. Also, the new methods are more flexible with adjustable convergence tolerance and are intrinsically parallelizable on high-performance computers. Theoretical analysis reveals that the ALS iteration in the proposed algorithms is q-linear convergent with a relatively wide convergence region. Numerical experiments with large-scale tensors from both synthetic and real-world applications demonstrate that ALS-based methods can substantially reduce the total cost of the original ones and are highly scalable for parallel computing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Availability of Data and Material

The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Intel Math Kernel Library Reference Manual [EQ/OL]. http://developer.intel.com

  2. OpenFoam 2018, Version 6. https://openfoam.org

  3. OpenMP application program interface, Version 4.0, OpenMP Architecture Review Board (July, 2013). http://www.openmp.org

  4. Austin, W., Ballard, G., Kolda, T.G.: Parallel tensor compression for large-scale scientific data. In: IEEE International Parallel and Distributed Processing Symposium, pp. 912–922. IEEE (2016)

  5. Bader, B.W., Kolda, T.G., et.al.: MATLAB Tensor Toolbox Version 3.1. Available online (2019). https://www.tensortoolbox.org

  6. Baglama, J., Reichel, L.: Augmented implicitly restarted Lanczos bidiagonalization methods. SIAM J. Sci. Comput. 27(1), 19–42 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  7. Beckmann, C., Smith, S.: Tensorial extensions of independent component analysis for multisubject FMRI analysis. Neuroimage 25(1), 294–311 (2005)

    Article  Google Scholar 

  8. Beylkin, G., Mohlenkamp, M.J.: Numerical operator caculus in higher dimensions. Proc. Natl. Acad. Sci. 99, 10246–10251 (2002)

    Article  MATH  Google Scholar 

  9. Bro, R.: Review on multiway analysis in chemistry-2000–2005. Crit. Rev. Anal. Chem. 36, 279–293 (2006)

    Article  Google Scholar 

  10. Savas, B., Eldén, L.: Handwritten digit classification using higher order singular value decomposition. Pattern Recognition 40, 993–1003 (2007)

    Article  MATH  Google Scholar 

  11. Burggraf, R.: Analytical and numerical studies of the structure of steady separated flows. J. Fluid Mech. 24(1), 113–151 (1966)

    Article  Google Scholar 

  12. Carroll, J.D., Chang, J.J.: Analysis of individual differences in multidimensional scaling via an \(N\)-way generalization of “Eckart-Young’’ decomposition. Psychometrika 35, 283–319 (1970)

    Article  MATH  Google Scholar 

  13. Charalampous, K., Gasteratos, A.: A tensor-based deep learning framework. Image Vis. Comput. 32(11), 916–929 (2014)

    Article  Google Scholar 

  14. Cullum, J., Willoughby, R., Lake, M.: A Lanczos algorithm for computing singular values and vectors of large matrices. SIAM J. Sci. Stat. Comput. 4(2), 197–215 (1983)

    Article  MATH  Google Scholar 

  15. De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21(4), 1253–1278 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  16. De Lathauwer, L., De Moor, B., Vandewalle, J.: On the best rank-1 and rank-\((r_{1}, r_{2}, \cdots, r_{N})\) approximation of higher-order tensors. SIAM J. Matrix Anal. Appl. 21, 1324–1342 (2000)

    MathSciNet  MATH  Google Scholar 

  17. De Lathauwer, L., Vandewalle, J.: Dimensionality reduction in higher-order signal processing and rank-\((r_{1}, r_{2}, r_{N})\) reduction in multilinear algebra. Linear Algebra Appl. 391, 31–55 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  18. De Leeuw, J., Young, F., Takane, Y.: Additive structure in qualitative data: an alternating least squares method with optimal scaling features. Psychometrika 41, 471–503 (1976)

    Article  MATH  Google Scholar 

  19. De Silva, V., Lim, L.H.: Tensor rank and the ill-posedness of the best low-rank approximation problem. SIAM J. Matrix Anal. Appl. 30(3), 1084–1127 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  20. Ding, W., Wei, Y.: Solving multi-linear systems with \(\cal{M}\)-tensors. J. Sci. Comput. 68, 689–715 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  21. Elden, L., Savas, B.: A Newton-Grassmann method for computing the best multilinear rank-\((r_{1}, r_{2}, r_{3})\) approximation of a tensor. SIAM J. Matrix Anal. Appl. 31(2), 248–271 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  22. Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. Johns Hopkins University Press, Baltimore (2008)

    MATH  Google Scholar 

  23. Grasedyck, L.: Hierarchical singular value decomposition of tensors. SIAM J. Matrix Anal. Appl. 31(4), 2029–2054 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  24. Hackbusch, W.: Numerical tensor calculus. Acta Numer. 23, 651–742 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  25. Hackbusch, W., Kühn, S.: A new scheme for the tensor representation. J. Fourier Anal. Appl. 15, 706–722 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  26. Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  27. Henrion, R.: Body diagonalization of core matrices in three-way principal components analysis: theoretical bounds and simulation. J. Chemomet 7, 477–494 (1993)

    Article  Google Scholar 

  28. Hitchcock, F.L.: Multiple invariants and generalized rank of a \(p\)-way matrix or tensor. J. Math. Phys. 7(1–4), 39–79 (1928)

    Article  MATH  Google Scholar 

  29. Holtz, S., Rohwedder, T., Schneider, R.: The alternating linear scheme for tensor optimization in the tensor train format. SIAM J. Sci. Compt. 34(2), A683–A713 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  30. Ishteva, M., Absil, P.A., Van Huffel, S., De Lathauwer, L.: Best low multlinear rank approximation of higher-order tensors, based on the Riemannian trust-region scheme. SIAM J. Matrix Anal. Appl. 31(1), 115–135 (2011)

    Article  MATH  Google Scholar 

  31. Ishteva, M., De Lathauwer, L., Absil, P.A., Huffel, S.: Dimensionality reduction for higher-order tensors: algorithms and applications. Int. J. Pure Appl. Math. 42(3), 337–343 (2012)

    MathSciNet  MATH  Google Scholar 

  32. Ishteva, M., De Lathauwer, L., Absil, P.A., Huffel, S.V.: Differential-geometric Newton method for the best rank-\((R_{1}, R_{2}, R_{3})\) approximation of tensors. Numer. Algor. 51, 179–194 (2009)

    Article  MATH  Google Scholar 

  33. Jiang, J., Wu, H., Li, Y., Yu, R.: Three-way data resolution by alternating slice-wise diagonalization (ASD) method. J. Chemomet. 14, 15–36 (2000)

    Article  Google Scholar 

  34. Khoromskij, B.N.: Tensors-structured numerical methods in scientific computing: survey on recent advances. Chemomet. Intell. Lab. Syst. 110(1), 1–19 (2012)

    Article  Google Scholar 

  35. Kiers, H.A.L.: Towards a standardized notation and terminology in multiway analysis. J. Chemomet. 14, 105–122 (2000)

    Article  Google Scholar 

  36. Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  37. Kroonenberg, P.M.: Applied Multiway Data Analysis. Wiley, New Jersey (2008)

    Book  MATH  Google Scholar 

  38. Kroonenberg, P..M., De Leeuw, J.: Principal component analysis of three-mode data by means of alternating least squares algorithms. Psychometrika 45, 69–97 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  39. LeCun, Y., C.Cortes, C.Burges: THE MNIST DATABASE of handwritten digits. http://yann.lecun.com/exdb/mnist/

  40. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 84(11), 2278–2324 (1998)

    Article  Google Scholar 

  41. Levin, J.: Three-mode factor analysis. Ph.D. thesis, University of Illinois, Urbana-Champaign (1963)

  42. Li, J., Battaglino, C., Perros, I., Sun, J., Vuduc, R.: An input-adaptive and in-place approach to dense tensor-times-matrix multiply. In: Proceedings of the International Conference for High Performance Computing, 76, pp. 1–12. ACM (2015)

  43. Mahoney, M.W.: Randomized algorithms for matrices and data. Found. Trends MachineLearn. 3(2), 123–224 (2011)

    MATH  Google Scholar 

  44. Mahoney, M.W., Drineas, P.: RandNLA: randomized numerical linear algebra. Commun. ACM 59(6), 80–90 (2016)

    Article  Google Scholar 

  45. Novikov, A., Podoprikhin, D., Osokin, A., Vetrov, D.: Tensorizing neural networks. In: Annual Conference on Neural Information Processing Systems, vol. 1, pp. 442–450. ACM (2015)

  46. Oh, J., Shin, K., E.Papalexakis, E., Faloutsos, C., Yu, H.: S-HOT: Scalable high-Order Tucker decomposition. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 761–770. ACM (2017)

  47. Oh, S., Park, N., Jang, J., Sael, L., Kang, U.: High-performance tucker factorization on heterogeneous platforms. IEEE Trans. Parallel Distrib. Syst. 30(10), 2237–2248 (2019)

    Article  Google Scholar 

  48. Oseledets, I.V., Tyrtyshnikov, E.E.: Breaking the curse of dimensionality, or how to use SVD in many dimensions. SIAM J. Sci. Comput. 31(5), 3744–3759 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  49. Oseledetsv, I.V.: Tensor-train decomposition. SIAM J. Sci. Comput. 33(5), 2295–2317 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  50. Rajbhandari, S., Nikam, A., Lai, P.W., Stock, K., Krishnamoorthy, S., Sadayappan, P.: A communication-optimal framework for contracting distributed tensors. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 375–386. IEEE (2014)

  51. Sanderson, C., Curtin, R.: Armadillo: a template-based C++ library for linear algebra. J. Open Source Softw. 1(2), 26 (2016)

    Article  Google Scholar 

  52. Sanderson, C., Curtin, R.: A user-friendly hybrid sparse matrix class in C++. In: International Conference on Mathematical Software, Lecture Notes in Computer Science (LNCS), vol. 10931, pp. 422–430. Springer (2018)

  53. Savas, B., Lim, L.H.: Quasi-Newton methods on Grassmannians and multilinear approximations of tensors. SIAM J. Sci. Comput. 32(6), 3352–3393 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  54. Schatz, M.: Distributed tensor computations: Formalizing distributions, redistributions, and algorithm derivations. Ph.D. thesis, University of Texas, Austin (2015)

  55. Shashua, A., Levin, A.: Linear image coding for regression and classification using the tensor-rank principle. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 42–49. IEEE (2001)

  56. Sidiropoulos, N.D., De Lathauwer, L., Xiao, F., Huang, K.J., Papalexakis, E.E., Faloutsos, C.: Tensor decomposition for signal processing and machine learning. IEEE Trans. Sig. Proc. 65(13), 3551–3582 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  57. Smith, S., Ravindran, N., Sidiropoulos, N., Kapypis, G.: Efficient and parallel sparse tensor matrix multiplication. In: IEEE International Parallel and Distributed Processing Symposium, pp. 61–70. IEEE (2015)

  58. Sorensen, D.: Implicit application of polynomial lters in a \(k\)-step Arnoldi method. SIAM J. Matrix Anal. Appl. 13(1), 357–385 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  59. Stewart, G.: A Krylov-Schur algorithm for large eigenproblems. SIAM J. Matrix Anal. Appl. 23(3), 601–614 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  60. Szlam, A., Tulloch, A., Tygert, M.: Accurate low-rank approximations via a few iterations of alternating least squares. SIAM J. Matrix Anal. Appl. 38(2), 425–433 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  61. Tucker, L.R.: Some mathematical notes on three-mode factor analysis. Psychometrika 31, 279–311 (1966)

    Article  MathSciNet  Google Scholar 

  62. Vannieuwenhoven, N., Vandebril, R., Meerbergen, K.: On the truncated multilinear singular value decomposition. Technical Report TW589, Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium (2011)

  63. Vannieuwenhoven, N., Vandebril, R., Meerbergen, K.: A new truncation strategy for the higher-order singular value decomposition. SIAM J. Sci. Comput. 34(2), A1027–A1052 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  64. Vasilescu, M.A.O., Terzopoulos, D.: Multilinear image analysis for facial recognition. In: Object recognition supported by user interaction for service robots, vol. 2, pp. 511–514. IEEE (2002)

  65. Vervliet, N., Debals, O., Sorber, L., Van Barel, M., De Lathauwer, L.: Tensorlab 3.0. Available online (2016). https://www.tensorlab.net

  66. Vlasi, D., Brand, M., Pfister, H., Popovic, J.: Face transfer with multilinear models. In: ACM SIGGRAPH 2005 Papers, vol. 24, pp. 426–433. ACM (2005)

  67. Wang, E., Zhang, Q., Shen, B., Zhang, G., Lu, X., Wu, Q., Wang, Y.: “Intel Math Kernel Library. In: High-Performance Computing on the \(Intel^{\textregistered } Xeon Phi\). Springer pp. 167–188 (2014)

  68. Wang, S.G., Wu, M.X., Jia, Z.Z.: Matrix Inequality, 2nd edn. Science Press, Beijing (2006)

    Google Scholar 

  69. Watkins, D.: The QR algorithm revisited. SIAM Rev. 50(1), 133–145 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  70. Xu, Y.: On the convergence of higher-order orthogonal iteration. Linear Multilinear Algebra 66(11), 2247–2265 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  71. Young, F.W., Takane, Y., De Leeuw, J.: The principal components of mixed measurement level multivariate data: an alternating least squares method with optimal scaling features. Psychometrika 43, 279–281 (1978)

    Article  MATH  Google Scholar 

  72. Zhang, T., Golub, G.H.: Rank-one approximation to high order tensors. SIAM J. Matrix Anal. Appl. 23(2), 534–550 (2001)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions that have greatly improved the quality of this paper.

Funding

This study was funded in part by Guangdong Key R&D Project (#2019B121204008), Beijing Natural Science Foundation (#JQ18001) and Beijing Academy of Artificial Intelligence.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao Yang.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Code Availability

The code used in the current study is available from the corresponding author on reasonable request.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Proof of Theorem 1

Based on the assumption of Theorem 1, \(\varvec{L}_{k}\) is nonsingular and \(\varvec{L}_{k}^{T}\varvec{L}_{k}\) is positive definite. Thus, the iterative form of Algorithm 3 is

$$\begin{aligned} \begin{aligned} \varvec{R}_{k} = \varvec{A}^{T}\varvec{L}_{k}(\varvec{L}_{k}^{T}\varvec{L}_{k})^{-1},\\ \varvec{L}_{k+1} = \varvec{AR}_{k}(\varvec{R}_{k}^{T}\varvec{R}_{k})^{-1}, \end{aligned} \end{aligned}$$
(A.1)

i.e.,

$$\begin{aligned} \varvec{L}_{k+1} = \varvec{AA}^{T}\varvec{L}_{k}(\varvec{L}_{k}^{T}\varvec{AA}^{T}\varvec{L}_{k})^{-1}(\varvec{L}_{k}^{T}\varvec{L}_{k}). \end{aligned}$$
(A.2)

Suppose that the full SVD of \(\varvec{A}\) is \(\varvec{A} = \varvec{U}\varvec{\varSigma } \varvec{V}^{T}\), where

$$\begin{aligned} \varvec{U} = [\varvec{U}_{1},\varvec{U}_{2}],\ \varvec{V} = [\varvec{V}_{1},\varvec{V}_{2}], \, \varvec{\varSigma } = \left( \begin{array}{cc} \varvec{\varSigma }_{1} &{} 0 \\ 0 &{} \varvec{\varSigma }_{2} \\ \end{array} \right) . \end{aligned}$$

Then from (A.2), we have

$$\begin{aligned} \varvec{U}^{T}\varvec{L}_{k+1} = \varvec{U}^{T}\varvec{AVV}^{T}\varvec{A}^{T}\varvec{UU}^{T}\varvec{L}_{k}(\varvec{L}_{k}^{T}\varvec{UU}^{T}\varvec{AVV}^{T}\varvec{A}^{T}\varvec{UU}^{T}\varvec{L}_{k})^{-1}(\varvec{L}_{k}^{T}\varvec{UU}^{T}\varvec{L}_{k}), \end{aligned}$$
(A.3)

which can be rewritten into block form

$$\begin{aligned} \left( \begin{array}{c} \varvec{L}_{k+1}^{(1)} \\ \varvec{L}_{k+1}^{(2)} \\ \end{array} \right) = \left( \begin{array}{cc} \varvec{\varSigma }_{1}^{2} &{} 0 \\ 0 &{} \varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T} \\ \end{array} \right) \left( \begin{array}{c} \varvec{L}_{k}^{(1)} \\ \varvec{L}_{k}^{(2)} \\ \end{array} \right) (\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)})^{-1}(\varvec{L}_{k}^{(1)T}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{L}_{k}^{(2)}). \end{aligned}$$

It then follows that

$$\begin{aligned} \begin{aligned} \varvec{L}_{k+1}^{(1)}&= \varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)}(\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)})^{-1}(\varvec{L}_{k}^{(1)T}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{L}_{k}^{(2)}),\\ \varvec{L}_{k+1}^{(2)}&= \varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)}(\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)})^{-1}(\varvec{L}_{k}^{(1)T}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{L}_{k}^{(2)}). \end{aligned} \end{aligned}$$
(A.4)

Furthermore,

$$\begin{aligned} \begin{aligned}&(\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)})^{-1} \\&\quad = (\varvec{I}+(\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)})^{-1}(\varvec{L}_{k}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)}))^{-1}(\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)})^{-1} \\&\quad = (\varvec{I}+\sum \limits _{n=1}^{\infty }(-1)^{n}((\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)})^{-1}(\varvec{L}_{k}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)}))^{n})(\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)})^{-1}. \end{aligned} \end{aligned}$$
(A.5)

Here we suppose that the distance between \({\mathcal {R}}(\varvec{L}_{k})\) and \({\mathcal {R}}(\varvec{U}_{2})\) is small enough, therefore can be denoted as \(\delta _{k}\), which only depends on \(\Vert \varvec{L}_{k}^{(2)}\Vert _{2}\) (i.e., there exist two constants \(\alpha ,\ \beta >0\) such that \(\alpha \delta _{k}\le \Vert \varvec{L}_{k}^{(2)}\Vert _{2}\le \beta \delta _{k}\)). We can then obtain the lower bound of the distance between \({\mathcal {R}}(\varvec{L}_{k})\) and \({\mathcal {R}}(\varvec{U}_{1})\), which is \(\sqrt{1-\delta _{k}^{2}}\), and

$$\begin{aligned} \begin{aligned} \Vert \varvec{L}_{k}^{(1)}\Vert _{2}\le C_{1}, \quad \Vert (\varvec{L}_{k}^{(1)})^{-1}\Vert _{2}\le \frac{C_{2}}{\sqrt{1-\delta _{k}^{2}}}, \end{aligned} \end{aligned}$$
(A.6)

where \(C_{1},\ C_{2}\) are constants independent on k and \(\delta _{k}\). From (A.5) and (A.6), there exists a constant C that is only dependent on \(C_{1},\ C_{2}\) so that the following inequality holds.

$$\begin{aligned} (\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)})^{-1}\le (\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)})^{-1}+\frac{C\delta _{k}^{2}}{(1-2\delta _{k}^{2})^{2}}, \end{aligned}$$
(A.7)

Further, from (A.7) and (A.4), assume that \(\sigma _{r}>\sigma _{r+1}\), we have

$$\begin{aligned} \begin{aligned} \varvec{L}_{k+1}^{(2)}&\le \frac{\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}}{\sigma _{r}^{2}}\varvec{L}_{k}^{(2)}(\varvec{L}_{k}^{(1)T}\frac{\varvec{\varSigma }_{1}^{2}}{\sigma _{r}^{2}}\varvec{L}_{k}^{(1)})^{-1} (\varvec{L}_{k}^{(1)T}\varvec{L}_{k}^{(1)})\\&\quad +{\hat{C}}\left( \frac{\delta _{k}^{2}}{(1-2\delta _{k}^{2})^{2}}+\frac{\delta _{k}^{2}}{1-\delta _{k}^{2}}+\frac{\delta _{k}^{4}}{(1-2\delta _{k}^{2})^{2}}\right) , \end{aligned} \end{aligned}$$

where \({\hat{C}}\) is a constant. Clearly,

$$\begin{aligned} 0<\varvec{L}_{k}^{(1)T}\varvec{L}_{k}^{(1)}\le \varvec{L}_{k}^{(1)T}\frac{\varvec{\varSigma }_{1}^{2}}{\sigma _{r}^{2}}\varvec{L}_{k}^{(1)}, \end{aligned}$$

and by Lemma 1, we have

$$\begin{aligned} \Vert (\varvec{L}_{k}^{(1)T}\frac{\varvec{\varSigma }_{1}^{2}}{\sigma _{r}^{2}}\varvec{L}_{k}^{(1)})^{-1}(\varvec{L}_{k}^{(1)T}\varvec{L}_{k}^{(1)})\Vert _{2}\le 1. \end{aligned}$$

Since \(\delta _{k}\) is small enough, we obtain

$$\begin{aligned} \begin{aligned} \Vert \varvec{L}_{k+1}^{(2)}\Vert _{2}\le \frac{\sigma _{r+1}^{2}}{\sigma _{r}^{2}}\Vert \varvec{L}_{k}^{(2)}\Vert _{2}+{\tilde{C}}\delta _{k}^{2} \le \frac{\sigma _{r+1}^{2}}{\sigma _{r}^{2}}\Vert \varvec{L}_{k}^{(2)}\Vert _{2}+\frac{{\tilde{C}}}{\alpha ^{2}}\Vert \varvec{L}_{k}^{(2)}\Vert _{2}^{2}, \end{aligned} \end{aligned}$$
(A.8)

where \(\alpha ,\ {\tilde{C}}\) do not depend on k and \(\Vert \varvec{L}_{k}^{(2)}\Vert _{2}\).

Denote

$$\begin{aligned} q = \frac{\sigma _{r+1}^{2}}{\sigma _{r}^{2}}+\frac{{\tilde{C}}}{\alpha ^{2}}\Vert \varvec{L}_{0}^{(2)}\Vert _{2}. \end{aligned}$$

Since we assume that \({\mathcal {R}}(\varvec{L}_{0})\) is close to \({\mathcal {R}}(\varvec{U}_{1})\) enough, \(\Vert \varvec{U}_{2}^{T}\varvec{L}_{0}\Vert _{2}\) is sufficiently small, i.e., \(\Vert \varvec{L}_{0}^{(2)}\Vert _{2} = o(1)\). In other words, we assume that \(q<1\). From (A.8), we have

$$\begin{aligned} \Vert \varvec{L}_{k+1}^{(2)}\Vert _{2}\le q\Vert \varvec{L}_{k}^{(2)}\Vert _{2} \end{aligned}$$

for all k, which leads to

$$\begin{aligned} \lim \limits _{k\rightarrow +\infty }\Vert \varvec{L}_{k}^{(2)}\Vert _{2}\rightarrow 0. \end{aligned}$$

Combining with the assumption of \(\varvec{L}_{k}\), it is verified that \({\mathcal {R}}(\varvec{L}_{k})\) is orthogonal to \({\mathcal {R}}(\varvec{U}_{2})\) with \(k\rightarrow +\infty \). Since the orthogonal complement space of \({\mathcal {R}}(\varvec{U}_{2})\) is unique, we have

$$\begin{aligned} {\mathcal {R}}(\varvec{L}_{k}) = {\mathcal {R}}(\varvec{U}_{1}),\ \ k\rightarrow +\infty , \end{aligned}$$

where \({\mathcal {R}}(\varvec{U}_{1})\) is the dominant subspace of \(\varvec{A}\). In other words, we have

$$\begin{aligned} \begin{aligned} \lim \limits _{k\rightarrow +\infty }\Vert \varvec{L}_{k}\varvec{L}_{k}^{\dag }-\varvec{U}_{1}\varvec{U}_{1}^{T}\Vert _{2}=0, \end{aligned} \end{aligned}$$

where \(\varvec{L}_{k}^{\dag }\) is the pseudo-inverse of \(\varvec{L}_{k}\). Further, from the iterative form of the ALS method, we have

$$\begin{aligned} \varvec{R}_{k} = \varvec{A}^{T}(\varvec{L}_{k}^{\dag })^{T}, \end{aligned}$$

thus

$$\begin{aligned} \varvec{L}_{k}\varvec{R}_{k}^{T}=\varvec{L}_{k}\varvec{L}_{k}^{\dag }\varvec{A}\rightarrow \varvec{U}_{1}\varvec{U}_{1}^{T}\varvec{A},\ k\rightarrow +\infty . \end{aligned}$$

And since the Frobenius norm \(\Vert \cdot \Vert _{F}\) is continuous,

$$\begin{aligned} \lim \limits _{k\rightarrow +\infty }\Vert \varvec{A} - \varvec{L}_{k}\varvec{R}_{k}^{T}\Vert _{F}=\Vert \varvec{A} - \varvec{U}_{1}\varvec{U}_{1}^{T}\varvec{A}\Vert _{F}. \end{aligned}$$

Since \(\varvec{U}_{1}\varvec{U}_{1}^{T}\varvec{A}\) is the exact solution of low rank approximation of \(\varvec{A}\), the convergence of the ALS method is proved.

From (A.8), we further confirm the q-linear convergence of the ALS method, with approximate convergence ratio \(\sigma _{r+1}^{2}/\sigma _{r}^{2}\).

\(\square \)

B Proof of Theorem 2

The assumption of Theorem 1 implies that \(\varvec{L}_{k}\) is nonsingular at every iteration k. We assume that

$$\begin{aligned} \varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)} \end{aligned}$$

is positive definite. Let \(\varepsilon \) be a positive number such that \(\sigma _{r}>\sigma _{r+1}-\varepsilon \). By (A.4), we know

$$\begin{aligned} \begin{aligned} \varvec{L}_{k+1}^{(1)}&= \varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)}(\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)})^{-1}(\varvec{L}_{k}^{(1)T}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{L}_{k}^{(2)}),\\ \varvec{L}_{k+1}^{(2)}&= \varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)}(\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)})^{-1}(\varvec{L}_{k}^{(1)T}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{L}_{k}^{(2)}), \end{aligned} \end{aligned}$$
(B.1)

which means

$$\begin{aligned} \begin{aligned} \varvec{L}_{k+1}^{(1)}&= \frac{\varvec{\varSigma }_{1}^{2}}{(\sigma _{r}-\varepsilon )^{2}}\varvec{L}_{k}^{(1)}(\varvec{L}_{k}^{(1)T}\frac{\varvec{\varSigma }_{1}^{2}}{(\sigma _{r}-\varepsilon )^{2}}\varvec{L}_{k}^{(1)}\\&\quad +\varvec{L}_{k}^{(2)T}\frac{\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}}{(\sigma _{r}-\varepsilon )^{2}}\varvec{L}_{k}^{(2)})^{-1}(\varvec{L}_{k}^{(1)T}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{L}_{k}^{(2)}),\\ \varvec{L}_{k+1}^{(2)}&= \frac{\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}}{(\sigma _{r}-\varepsilon )^{2}}\varvec{L}_{k}^{(2)}(\varvec{L}_{k}^{(1)T}\frac{\varvec{\varSigma }_{1}^{2}}{(\sigma _{r}-\varepsilon )^{2}}\varvec{L}_{k}^{(1)}\\&\quad +\varvec{L}_{k}^{(2)T}\frac{\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}}{(\sigma _{r}-\varepsilon )^{2}}\varvec{L}_{k}^{(2)})^{-1}(\varvec{L}_{k}^{(1)T}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{L}_{k}^{(2)}). \end{aligned} \end{aligned}$$

Clearly it holds that

$$\begin{aligned} \begin{aligned} \Vert \varvec{L}_{k+1}^{(2)}\Vert _{2}\le \frac{\sigma _{r+1}^{2}}{(\sigma _{r}-\varepsilon )^{2}}\Vert \varvec{L}_{k}^{(2)}\Vert _{2} \end{aligned} \end{aligned}$$
(B.2)

under the condition that

$$\begin{aligned} \varvec{L}_{k}^{(1)T}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{L}_{k}^{(2)}\le \varvec{L}_{k}^{(1)T}\frac{\varvec{\varSigma }_{1}^{2}}{(\sigma _{r}-\varepsilon )^{2}}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\frac{\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}}{(\sigma _{r}-\varepsilon )^{2}}\varvec{L}_{k}^{(2)}. \end{aligned}$$
(B.3)

If \(\varvec{L}_{k}^{(1)}\) is nonsingular, then (B.3) implies

$$\begin{aligned} (\varvec{L}_{k}^{(2)}(\varvec{L}_{k}^{(1)})^{-1})^{T}\left( \varvec{I}-\frac{\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}}{(\sigma _{r}-\varepsilon )^{2}}\right) (\varvec{L}_{k}^{(2)}\varvec{L}_{k}^{(1)-1})\le \frac{\varvec{\varSigma }_{1}^{2}}{(\sigma _{r}-\varepsilon )^{2}}-\varvec{I}. \end{aligned}$$
(B.4)

It follows to see that

$$\begin{aligned} \Vert \varvec{L}_{k}^{(2)}(\varvec{L}_{k}^{(1)})^{-1}\Vert _{2}\le \sqrt{\frac{\sigma _{r}^{2}-(\sigma _{r}-\varepsilon )^{2}}{(\sigma _{r}-\varepsilon )^{2}-\sigma _{min}^{2}}} \end{aligned}$$
(B.5)

is a sufficient condition of (B.4).

Next we will prove that if the initial guess \(\varvec{L}_{0}\) satisfies condition (B.5), then \(\varvec{L}_{k}^{(1)}\) is nonsingular and (B.2) is satisfied at every iteration k.

Provided that \(\varvec{L}_{0}\) satisfies (B.5), we obtain

$$\begin{aligned} \Vert \varvec{L}_{1}^{(2)}\Vert _{2}\le \frac{\sigma _{r+1}^{2}}{(\sigma _{r}-\varepsilon )^{2}}\Vert \varvec{L}_{0}^{(2)}\Vert _{2}. \end{aligned}$$
(B.6)

And according to the proof of Theorem 1, we know \(\varvec{L}_{1}^{(1)}\) is also nonsigular, which implies

$$\begin{aligned} \varvec{L}_{1}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{1}^{(1)}+\varvec{L}_{1}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{1}^{(2)} \end{aligned}$$

is positive definite. Then by (B.1), we have

$$\begin{aligned} \Vert \varvec{L}_{1}^{(2)}(\varvec{L}_{1}^{(1)})^{-1}\Vert _{2}\le & {} \Vert \varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{0}^{(2)}(\varvec{L}_{0}^{(1)})^{-1}\varvec{\varSigma }_{1}^{-2}\Vert _{2}\nonumber \\\le & {} \frac{\sigma _{r+1}^{2}}{\sigma _{r}^{2}}\Vert \varvec{L}_{0}^{(2)}(\varvec{L}_{0}^{(1)})^{-1}\Vert _{2}\nonumber \\\le & {} \sqrt{\frac{\sigma _{r}^{2}-(\sigma _{r}-\varepsilon )^{2}}{(\sigma _{r}-\varepsilon )^{2}-\sigma _{min}^{2}}}. \end{aligned}$$
(B.7)

Analogously, we can prove that for every iteration k, \(\varvec{L}_{k}^{(1)}\) is nonsingular, i.e., \(\varvec{L}_{k}^{(1)T}\varvec{\varSigma }_{1}^{2}\varvec{L}_{k}^{(1)}+\varvec{L}_{k}^{(2)T}\varvec{\varSigma }_{2}\varvec{\varSigma }_{2}^{T}\varvec{L}_{k}^{(2)}\) is positive definite, and (B.5) is satisfied. Since (B.5) is a sufficient condition of (B.4) and (B.3), inequality (B.2) is true at every iteration k, which implies that

$$\begin{aligned} \lim \limits _{k\rightarrow 0}\Vert \varvec{L}_{k}\varvec{L}_{k}^{\dagger }-\varvec{U}_{1}\varvec{U}_{1}^{T}\Vert _{2}=0. \end{aligned}$$

The rest part of the proof is analogous to the proof of Theorem 1, which is omitted for brevity.

\(\square \)

C Proof of Theorem 3

In Algorithm 4, \(\varvec{U}^{(n)}\) is obtained from the rank-\(R_{n}\) approximation of \(\varvec{A}_{(n)}\), which is done in an iterative manner and allows a tolerance parameter \(\eta _{n}\). Therefore, we have

$$\begin{aligned} \Vert \varvec{A}_{(n)} - \varvec{L}^{*}\varvec{R}^{*T}\Vert _{F}^{2}\le \eta _{n}^{2}\Vert \varvec{{\mathcal {A}}}\Vert _{F}^{2} + \gamma _{n}, \end{aligned}$$
(C.1)

where \(\varvec{L}^{*}\) and \(\varvec{R}^{*}\) are the same as in Algorithm 4. Note that \(\varvec{R}\) is updated by solving a multi-side least squares problem

$$\begin{aligned} \min \limits _{\varvec{R}}\Vert \varvec{A}_{(n)} - \varvec{LR}^{T}\Vert _{F}, \end{aligned}$$

whose exact solution is \(\varvec{R} = \varvec{A}_{(n)}^{T}(\varvec{L}^{\dagger })^{T}\). Thus

$$\begin{aligned} \Vert \varvec{A}_{(n)} - \varvec{L}^{*}\varvec{R}^{*T}\Vert _{F} = \Vert \varvec{A}_{(n)} - \varvec{L}^{*}\varvec{L}^{*\dagger }\varvec{A}_{(n)}\Vert _{F}, \end{aligned}$$
(C.2)

where \(\varvec{L}^{*}\varvec{L}^{*\dagger }\) represents an orthogonal projection on subspace \({\mathcal {R}}(\varvec{L}^{*})\). Consequently, by (C.1), (C.2) and (4.3), we have

$$\begin{aligned} \Vert \hat{\varvec{{\mathcal {A}}}} - \varvec{{\mathcal {A}}}\Vert _{F}^{2}\le \sum \limits _{n=1}^{N}(\eta _{n}^{2}\Vert \varvec{A}_{(n)}\Vert _{F}^{2}+\gamma _{n}), \end{aligned}$$

which means

$$\begin{aligned} \frac{\Vert \hat{\varvec{{\mathcal {A}}}} - \varvec{{\mathcal {A}}}\Vert _{F}^{2}}{\Vert \varvec{{\mathcal {A}}}\Vert _{F}^{2}}\le \sum \limits _{n=1}^{N}\left( \eta _{n}^{2}+\frac{\gamma _{n}}{\Vert \varvec{{\mathcal {A}}}\Vert _{F}^{2}}\right) . \end{aligned}$$

Combining with (4.3), we obtain (4.4).

The error analysis of Algorithm 5 can be analogously done. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiao, C., Yang, C. & Li, M. Efficient Alternating Least Squares Algorithms for Low Multilinear Rank Approximation of Tensors. J Sci Comput 87, 67 (2021). https://doi.org/10.1007/s10915-021-01493-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10915-021-01493-0

Keywords

Mathematics Subject Classification

Navigation