Abstract
We give an overview of various methods based on tensor structured techniques for the solution of linear systems in high spatial dimensions. In particular, we discuss the role of multi-grid variants.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In the standard setting, linear systems
are described by a regular matrix \(A\in \mathbb {R}^{n\times n}\) and vectors \(x,b\in \mathbb {R}^{n}\). Often we are interested in large-scale problems with n about 100,000 to 100,000,000. In that case, vectors and matrices are represented differently. Vectors of this size can be described explicitly in the full format, i.e., each entry is stored. Matrices of the size \(n\times n\) must be represented by less than \(n^{2}\) units. In the fortunate case of sparse matrices, the sparse format is an easy remedy. For fully populated matrices one can use, e.g., the technique of hierarchical matricesresulting in a storage requirement ofFootnote 1 \(\mathcal {O}(n\log ^{*}n)\) (cf. [14, 17]).
The situation changes if n becomes much larger. A concrete example is the Lyapunov matrix equation. Consider a parabolic problem \(y_{t}=Ly+Bu\) with the elliptic operator L discretised by a matrix \(A\in \mathbb {R}^{n\times n}\) as considered above. The right-hand side contains the control variable u. Linear-quadratic optimal control problems leads to a description of the control involving the solution \(X\in \mathbb {R}^{n\times n}\) of a quadratic matrix Riccati equation (cf. Grasedyck–Hackbusch–Khoromskij [11]). For simplicity we consider its linearisation, the linear Lyapunov equation
(cf. Benner–Breiten [3]). Even if A and B are sparse, the solution X is a fully populated matrix. Equation (1.2) is an unusual notation for a linear system with \(n^{2}\) equations involving a matrix \(\mathbf {A}\in \mathbb {R}^{n^{2}\times n^{2}}\):
Now the full representation of the vector \(\mathbf {x}\in \mathbb {R}^{n^{2}}\) is as problematic as the full representation of the fully populated matrix \(X\in \mathbb {R}^{n\times n}.\) This shows that for much larger sizes \(N:=n^{2}\) the treatment of a linear system (1.1) must be changed substantially.
Concerning the spatial dimension, we add the following remark. Let L be a differential operator of a 3D problem, while A is the discrete version of L. Then the matrix \(\mathbf {A}\) in (1.3) discretises a six-dimensional pde. The matrix \(\mathbf {A}\) can be defined by means of the tensor product (here also called Kronecker product):
2 Tensors
The tensor space \(\mathbf {V}:=\bigotimes _{j=1}^{d}V_{j}\) exists for all vector spaces \(V_{j}\) (details in Hackbusch [15, 16]). In the following, we discuss the tensor space for particular vector spaces \(V_{j}\).
2.1 Grid functions (\(V_{j}=\mathbb {R}^{n_{j}}\))
Consider a regular grid in d spatial dimensions. More precisely, let \(\omega _{j}=\{x_{1}^{(j)},\ldots ,x_{n_{j}}^{(j)}\}\) be a regular grid consisting of \(n_{j}\) nodal points and define the product grid \(\omega =\omega _{1}\times \omega _{2}\times \cdots \times \omega _{d}.\) The nodal points \(\mathbf {x}\in \omega \) are indexed by a d-tuple \(\left( i_{1} ,\ldots ,i_{d}\right) \) with \(1\le i_{j}\le n_{j}\): \(\mathbf {x}[i_{1} ,\ldots ,i_{d}]= \big ( x_{i_{1}}^{(1)},\ldots ,x_{i_{d}}^{(d)} \big ).\) Grid functions u defined on \(\omega \) are characterised by the nodal values \(u[i_{1},\ldots ,i_{d}],\) i.e., u belongs to \(\mathbb {R}^{N}\) with \(N:=\#\omega =\prod _{j=1}^{d}n_{j}.\) The d-dimensional structure is described by the tensor product \(\mathbb {R}^{n_{1}}\otimes \mathbb {R}^{n_{2}} \otimes \cdots \otimes \mathbb {R}^{n_{d}}\) which is isomorphic to \(\mathbb {R} ^{N}\):
The tensor product of vectors \(v^{(j)}\in \mathbb {R}^{n_{j}}\) is defined by
Usually, the dimension \(N:=n^{d}\) (for \(n_{j}=n\)) exceeds the available storage by far.
2.2 Functions (\(V_{j}=L^{2}(\Omega _{j})\))
Also functions f defined on a Cartesian product \(\Omega =\Omega _{1} \times \Omega _{2}\times \cdots \times \Omega _{d}\) are tensors. For instance, \(L^{2}(\Omega )\) can be considered as the tensor space
The tensor product of univariate functions is defined by
Differently from the finite-dimensional case, the (topological) tensor space depends on the chosen norm.
2.3 Matrices (\(V_{j}=\mathbb {R}^{n_{j}\times m_{j}}\))
A tensor space can also be generated by matrices, since the set of matrices in \(\mathbb {R}^{n_{j}\times m_{j}}\) is a particular vector space. Let \(A_{j}:V_{j}\rightarrow W_{j}\) (\(1\le j\le d\)) be linear mappings and form the tensor spaces \(\mathbf {V}=V_{1}\otimes V_{2}\otimes \cdots \otimes V_{d}\) and \(\mathbf {W}=W_{1}\otimes W_{2}\otimes \cdots \otimes W_{d}\). The tensor product (Kronecker product)
can be interpreted as the linear map \(\mathbf {A}:\mathbf {V}\rightarrow \mathbf {W}\) defined by
for all \(v^{(j)}\in V_{j}.\) Using matrices \(A_{j}\in \mathbb {R}^{n\times n}\) of moderate size, one can construct huge matrices \(\mathbf {A}\in \mathbb {R} ^{n^{d}\times n^{d}}\).
2.4 r-Term format (Canonical format)
By definition, any algebraic tensor \(\mathbf {v}\in \mathbf {V}=V_{1}\otimes V_{2}\otimes \cdots \otimes V_{d}\) has a representation
with \(v_{\nu }^{(j)}\in V_{j}\) and suitable \(r\in \mathbb {N}\). Fixing some \(r\in \mathbb {N}\), we set
The number r is called the representation rank Footnote 2 of \(\mathbf {v}\in \mathcal {R}_{r}.\) Obviously, the storage size for any \(\mathbf {v}\in \mathcal {R}_{r}\) is bounded by rdn, where \(n=\max _{j}\dim V_{j}\). If r is of moderate size, this format is advantageous. Often, a tensor \(\mathbf {v}\) requires a high rank, but there is some approximation \(\mathbf {v}_{\varepsilon }\in \mathcal {R}_{r}\) with moderate \(r=r(\varepsilon ).\) An example will follow in Sect. 3.1.
Note that (2.3) holds for all vector spaces \(V_{j}.\) In the case of matrix spaces, the r-term representation of a matrix reads as
2.5 Numerical tensor calculus
The numerical treatment of tensors is called ‘numerical tensor calculus’. Besides the representation and storage of tensors, the efficient performance of the tensor operations is required. Since the tensor space \(\mathbf {V}\) is a vector space, the addition is fundamental:
Using operands of the form (2.3) or (2.4), the result has the same structure with increased representation rank. Note that this implementation of the addition requires no computational work.
Let \(\left\langle \cdot ,\cdot \right\rangle _{j}\) be the scalar product in \(V_{j}.\) Then the canonical scalar product \(\left\langle \cdot ,\cdot \right\rangle \) in \(\mathbf {V}=\bigotimes _{j=1}^{d}V_{j}\) is defined via
For instance, the Euclidean scalar products \(\left\langle \cdot ,\cdot \right\rangle _{j}\) lead to the Euclidean scalar product \(\left\langle \mathbf {v},\mathbf {w}\right\rangle =\sum _{\mathbf {i}}\mathbf {v[i]w[i]}\) in \(\mathbf {V}\). Using the tensor constructions for matrices and vectors, the matrix-vector multiplication is defined by
(cf. (2.2)). The formulae (2.5b, 2.5c) show that the tensor operation applied to elementary tensors can easily be computed by operations related to the simpler vector spaces \(V_{j}\). The same statement holds for further operations as the Hadamard product (entrywise product) and the convolution \(\mathbf {u}=\mathbf {v}\star \mathbf {w}\) with \(\mathbf {u} _{\mathbf {i}}=\sum _{\mathbf {0}\le \mathbf {k}\le \mathbf {i}}\mathbf {v} _{\mathbf {i}-\mathbf {k}}\mathbf {w}_{\mathbf {k}}\).
2.6 Truncation
The representation rankFootnote 3 r is the critical quantity in the storage size rdn. Unfortunately, all operations have the property that their result has a (much) larger representation rank than their operands. As an example we consider the matrix-vector multiplication, where the matrix \(\mathbf {A}\) has the representation rank \(r_{A}\), while the vector \(\mathbf {v}\) uses the representation rank \(r_{v}\):
Obviously, the product \(\mathbf {w}\) has enlarged the representation rank to \(r:=r_{A}r_{v}\). Repeating the operations, the calculations become more and more costly. A remedy is the rank truncation: the result \(\mathbf {w}\) has to be approximated by \(\mathbf {w}^{\prime }\) with a much smaller rank \(r^{\prime }\). This truncation should be controlled, i.e., we should be able to estimate the truncation error.
Unfortunately, truncation within the format \(\mathcal {R}_{r}\) is a difficult procedure. Moreover, instabilities may occur. The reason is that \(\mathcal {R}_{r}\) is not closed and any approximation of \(\mathbf {v} \in \overline{\mathcal {R}_{r}}\backslash \mathcal {R}_{r}\) by a sequence \(\mathcal {R}_{r}\ni \mathbf {v}_{n}\rightarrow \mathbf {v}\) leads to numerical cancellation problems (cf. [15, Sect. 9.4.3]). Because of these shortcomings one is looking for other tensor representations.
In spite of the negative statement from above, we have to emphasise that the format \(\mathcal {R}_{r}\) is perfect if, by some method, we are able to find a suitable r-term representation. Here ‘suitable’ means stable and with moderate representation rank r. In (3.1) we shall see such an example.
2.7 Other tensor representations
An alternative traditional representation is the subspace representation or Tucker format (cf. Tucker [30]). Instead of a scalar representation rank we have a d-tuple \(\mathbf {r}=\left( r_{1},\ldots ,r_{d}\right) \) of integers. In each vector space \(V_{j}\) we choose \(r_{j}\) vectors (usually, linearly independent ones):
with the intention that \(r_{j}\ll n_{j}:=\dim (V_{j}).\) Then we try to represent or approximate a tensor by
This approach was originally introduced for the case of \(d=3\). For large d, this approach is rather costly since the involved core tensor \(\mathbf {a}\in \mathbb {R}^{r_{1}}\otimes \mathbb {R}^{r_{2}}\otimes \cdots \otimes \mathbb {R}^{r_{d}}\) requires the storage size \(\prod _{j=1}^{d}r_{j}.\) On the other hand, this format has very favourable properties. The basis (2.6) can be chosen orthonormal and, using the so-called matricisation, the core tensor \(\mathbf {a}\) can be isomorphically mapped into a matrix. This allows us to use matrix techniques like the singular value decomposition. The arising ‘higher order SVD’ (HOSVD, cf. [6], [15, Sect. 8.3]) is a perfect tool for the truncation as required in Sect. 2.6.
A more recent format is the hierarchical format (cf. Hackbusch–Kühn [19] and [15, Sect. 11]). It provides an easy truncation, while the storage size \(dr^{3}+dnr\) is linear in d (r is the maximum of a rank tuple). As long as r is bounded, a tensor size of \(n^{d}\) for \(n=d=1000\) is harmless, since only the product dn appears. For later purpose, we mention that the typical tensor operations cost amounts to
elementary floating-point operations (r defined as above).
3 Linear systems
3.1 Model problem: Laplace operator
We consider the d-dimensional Laplace operator in a cube,
with Dirichlet condition \(u=0\) on \(\partial \Omega ,\) and discretise by a finite difference scheme with the grid size \(h=1/\left( n-1\right) \) in a regular product grid. Let \(A_{j}\) be the (positive definite) tridiagonal matrix built by the (one-dimensional) negative second divided differences. Then the system matrix has the form
i.e., it is a d-term representation of the form (2.4). A similar result can be obtained for finite elements using a regular grid, but then the identity I is replaced with the mass matrix. The following considerations hold also for other positive definite matrices \(A_{j}.\)
One may be afraid that the huge system \(\mathbf {Ax}=\mathbf {b}\) has a huge condition number. This is not the case; instead, the condition is determined by the one-dimensional problems:
Obviously, the spectrum of \(\mathbf {A}\) is contained in \([a,\infty )\) for \(a:=\sum _{j=1}^{d}\lambda _{1}^{(j)}\), where \(\lambda _{1}^{(j)}\) is the smallest eigenvalue of \(A_{j}.\) Using a suitable scaling, we may assume that \(a=1\).
Lemma 3.1
There are (computable) coefficients \(\alpha _{\nu },\beta _{\nu }>0\) such that
with
Even \(\left\| \mathbf {A^{-1}}-\mathbf {B}_{r}\right\| _{2}\le C_{1} \exp (-C_{2}r)\) holds, if \(\sigma (\mathbf {A})\subset [1,R],\) \(R<\infty \) (\(C_{1},C_{2}\) depend on R).
The proof uses the approximation of \(\frac{1}{x}\) in \([a,\infty )\) by the exponential sum \(E_{r}(x)=\sum _{\nu =1}^{r}\alpha _{\nu }\exp (-\beta _{\nu }x).\) Choosing the optimal coefficients \(\alpha _{\nu },\beta _{\nu },\) one proves the following error with respect to the maximum norm:
(cf. Braess–Hackbusch [4, 5]). For positive definite matrices \(\mathbf {A}\) with \(\sigma (\mathbf {A})\subset [1,\infty )\), \(E_{r}(\mathbf {A})\) approximates \(\mathbf {A}^{-1}\) with the same bound: \(\left\| E_{r}(\mathbf {A})-\mathbf {A}^{-1}\right\| _{2} \le \Vert \frac{1}{x}-E_{r}(x)\Vert _{\infty ,[1,\infty )}\). In the case of \(\mathbf {A}=A_{1}\otimes I\otimes \cdots \otimes I+\cdots +I\otimes \cdots \otimes I\otimes A_{d}\) one has to evaluate \(\exp (-\beta _{\nu }\mathbf {A})\) and obtains
3.2 Solution of linear systems
Consider a linear system
where \(\mathbf {x},\mathbf {b}\in \mathbf {V}= {\textstyle \bigotimes \nolimits _{j=1}^{d}} V_{j}\) and \(\mathbf {A}\in {\textstyle \bigotimes \nolimits _{j=1}^{d}} \mathcal {L}(V_{j},V_{j})\subset \mathcal {L}(\mathbf {V},\mathbf {V})\) are represented in one of the formats. The general form of a linear iteration is
with any matrix \(\mathbf {B}\) (cf. [12, (3.2.4)]). If this algorithm is applied. e.g., with the starting value \(\mathbf {x}^{0}=0\), the representation ranks of \(\mathbf {x}^{m}\) would blow up. Therefore, according to Sect. 2.6, a truncation T must be applied. This yields the ‘truncated iteration’
The cost per step is \(nd\ \)times powers of the involved representation ranks.
Remark 3.2
Let \(\mathbf {A}_{\Delta }\) be the discretisation of \(\Delta \) from above and \(\mathbf {B}_{r}\in \mathcal {R}_{r}\) the approximation of \(\mathbf {A}_{\Delta }^{-1}\) by (3.1). If the matrix \(\mathbf {A}\) corresponds to an elliptic pde of order 2, \(\mathbf {A}\) and \(\mathbf {A}_{\Delta }\) are spectrally equivalent (cf. [12, Lemma 8.4.1]). Therefore \(\mathbf {B:} =\mathbf {B}_{r}\in \mathcal {R}_{r}\) is an appropriate preconditioner.
The approach from above is, e.g., considered by Khoromskij [22].
3.3 cg-Like methods
The exact cg recursion yields a sequence of conjugate (i.e., A-orthogonal) vectors. It is well known that because of the floating point errors, the orthogonality is lost after several iterations. This effect is even worse for the conjugate gradient algorithm combined with tensor rank truncations, since the relative truncation errors are usually larger than the machine precision. There are two remedies:
-
1.
The cg method is restricted to few iterations so that the loss of orthogonality is still weak enough.
-
2.
As soon as necessary, a re-orthogonalisation is performed.
The first approach requires fast convergence, i.e., a very good preconditioning. The second approach has a drawback which will be discussed next. First we mention the following articles about conjugate gradient methods in the context of tensor methods: Tobler [29], Kressner–Tobler [23–25] , and Savas–Eldén [28].
Assume that \(\mathbf {u}_{k}\) (\(1\le k\le m\)) are tensors belonging to \(\mathcal {R}_{r}\) for a fixed r. The orthonormalisation changes \(\mathbf {u}_{2}\) into a linear combination \(\tilde{\mathbf {u}}_{2} =\alpha \mathbf {u}_{1}+\beta \mathbf {u}_{2}\). Obviously, the result \(\tilde{\mathbf {u}}_{2}\) belongs to \(\mathcal {R}_{2r}\). In general, the orthonormalisation process produces \(\tilde{\mathbf {u}}_{k}\in \mathcal {R} _{kr}\) with increasing representation rank kr. A reduction of the rank by a truncation is not possible without losing again orthonormality.
To avoid this increase of the rank, Ballani–Grasedyck [2] propose a variant of the GMRES method with tensor truncation and without orthonormalisation. Nevertheless, the iterates are built in such a way that the residuals are decreasing.
4 Multigrid approach
First we transfer the setting of the geometric multigrid method to the tensor case. In Sect. 4.7 we discuss the pros and cons of the method.
4.1 Grid hierarchy
Since the tensors directly correspond to grid functions associated to product grids (cf. Sect. 2.1), the traditional geometric multigrid structure is appropriate.
Let \(V_{j}^{0},V_{j}^{1},\ldots ,V_{j}^{\ell },\ldots \) be a one-dimensional grid hierarchy. In the case of a difference scheme, \(V_{j}^{\ell }=\mathbb {R} ^{n_{\ell }^{(j)}}\) corresponds to a grid with \(n_{\ell }^{(j)}\) points, where \(n_{0}^{(j)}<n_{1}^{(j)}<\cdots \;\). In the case of a finite element discretisation, \(V_{j}^{\ell }\) may be the space of piecewise linear functions defined on a grid of \(n_{\ell }^{(j)}\) points.
The d-dimensional grid hierarchy is given by the tensor spaces
In the case of the difference scheme, the elements of \(\mathbf {V}^{\ell }\) are grid functions on a d-dimensional grid with \(N:=\prod _{j=1}^{d}n_{\ell }^{(j)}\) nodal points (cf. Sect. 2.1), while, in the finite element case, \(\mathbf {V}^{\ell }\) consists of piecewise d-linear functions on the cubical grid of the dimension \(N:=\prod _{j=1}^{d}n_{\ell }^{(j)}\) (cf. Sect. 2.2). Note that the corresponding finite element basis functions are elementary tensors of the form (2.1).
4.2 Prolongation, restriction
Let \(p_{j}:V_{j}^{\ell }\rightarrow V_{j}^{\ell +1}\) be the one-dimensional linear prolongation (cf. [13, Sect. 3.4.2]). The d-linear prolongation \(\mathbf {p}:\mathbf {V}^{\ell }\rightarrow \mathbf {V}^{\ell +1}\) is the elementary (Kronecker) tensor product of the one-dimensional linear prolongations:
Similarly, the restriction satisfies
Note that usually \(\mathbf {r}\) is the adjoint of \(\mathbf {p}\) (cf. [13, Sect. 3.5]).
An important observation is the fact that the application of \(\mathbf {p}\) and \(\mathbf {r}\) does not increase the representation ranks. Let \(\delta \mathbf {x}^{\ell -1}\) be the correction obtained from the coarse grid. Then the multigrid iteration determines the new iterate by \(\mathbf {x}^{\ell } \leftarrow \mathbf {x}^{\ell }-\mathbf {p}\ \delta \mathbf {x}^{\ell -1}.\) To restrict the representation rank, we have to replace this assignment by the truncated version
4.3 Coarse-grid matrices
Let \(\mathbf {A}^{L}\) be the fine-grid matrix (L is the level number of the finest grid). The auxiliary matrices \(\mathbf {A}^{\ell }\) (\(\ell <L\)) can be defined by the Galerkin approach (cf. [13, Sect. 3.7]):
Assume that the fine-grid matrix \(\mathbf {A}^{L}\) has the representation
The Galerkin approach leads to
with the same representation rank \(r_{A}\) involving
The defect \(\mathbf {d}^{(\ell -1)}:=\mathbf {A}^{\ell -1}\mathbf {x}^{\ell -1}-\mathbf {b}^{\ell -1}\) computed in the coarse grid has an increased rank because of the matrix-vector multiplication (see the example in Sect. 2.6). Hence, this step has to be replaced by the truncated version
4.4 Smoothing
4.4.1 Richardson iteration
The exact Richardson iteration is
As discussed in Sect. 3.2, this approach is not advisable in the tensor case because of the increasing representation ranks. Instead, the truncated Richardson iteration
is the method of choice.
A s the semilinear variant of the truncated Richardson iteration (cf. [13, Sect. 3.3.5]).
4.4.2 Jacobi iteration
The damped Jacobi iteration is an often applied smoothing iteration. However, in the tensor case, the performance of this iteration is already too complicated as pointed out below.
The diagonal of the (Kronecker) matrix \(\mathbf {A}^{\ell }\!=\!\sum \nolimits _{\nu =1}^{r_{A}}\bigotimes \nolimits _{j=1}^{d}A_{\nu ,j}^{\ell }\) is given by
In the tensor case, diagonal matrices may be called sparse because of the many zero entries, but in fact they are not sparse since the number of nonzero elements is still huge. The number of nonzero entries of \(\mathbf {D}^{\ell } \in \mathbb {R}^{N\times N}\) is \(N=\prod _{j=1}^{d}n_{\ell }^{(j)}\), which is far too large for practical use. Instead one may try to use an approximation
as in (3.1). However, Kronecker exponentials of the form \(\exp \! \big ( \bigotimes _{j=1}^{d}D_{j} \big ) \) are not easy to evaluate or to apply to a tensor. The only exception arises when at most one \(D_{j}\) is different from the identity I.
4.5 Treatment of the coarsest grid
We recall that \(\dim (\mathbf {V}^{\ell })=\prod _{j=1}^{d}n_{j}^{(\ell )}\) with \(n_{j}^{(\ell )}:=\dim (V_{j}^{\ell })\). Since the dimension of the coarsest grid is \(\prod _{j=1}^{d}n_{j}^{(0)}\), a direct solution of the system at the lowest level requires
-
either \(n_{j}^{(0)}=1,\)
-
or \(n_{j}^{(0)}\) and d so small that also \(\prod _{j=1}^{d}n_{j}^{(0)}\) is sufficiently small.
The choice \(n_{j}^{(0)}=1\) is possible for positive definite problems (cf. [13, Sect. 7.2]). Only if \(n_{j}^{(0)}=1,\) the coarse-grid problem has a size \(\prod _{j=1}^{d}n_{j}^{(0)}=1\) not depending on d. Otherwise, some other method from Sects. 3.2–3.3 must be applied.
4.6 Convergence
Choosing the multigrid parameters as above, we can apply the standard convergence results (e.g., [13, Theorems 7.2.2 and 7.2.3]), provided that no tensor truncation is applied. The additional truncation effect is similar to the truncated iteration analysed by Hackbusch–Khoromskij–Tyrtyshnikov [18] (see also [14, Sect. 14.3.2] or [17, Sect. 15.3.2]).
Numerical examples for the use of a tensor multigrid iteration can be seen in Ballani–Grasedyck [2, Example 7.5]. These examples with \(n_{\mathsf {fine}}:=n^{(L)}=1023\) and dimensions up to \(d=32\) demonstrate that the convergence behaviour does not depend on d.
In (1.2) we mention the Lyapunov equation. A multigrid solution of the more general Sylvester matrix equation \(AX-XB=C\) is described in Grasedyck–Hackbusch [10]. A nonlinear multigrid approach to the quadratic Riccati equation is given in [9].
4.7 Cost
A disadvantage is the following effect. In traditional 3D multigrid applications, computations at level \(\ell -1\) with the step size \(h_{\ell -1}=2h_{\ell }\) cost only \(\frac{1}{8}\) of the computations at level \(\ell \). This explains why the cost of one V- or W-cycle is dominated by the operations spent on the finest grid.
In tensor applications, the cost of operations is described in (2.8 ): \(\mathcal {O}(dr^{4}+dnr^{2})\). In the multigrid case, n has to be replaced with \(n^{(\ell )}:=\max \{n_{j}^{(\ell )}:1\le j\le d\}.\) For the finest grid, we use the notation \(n_{\mathsf {fine}}:=n^{(L)}.\) Assuming \(n_{j}^{(\ell )}\approx 2n_{j}^{(\ell -1)},\) we obtain the following work per V- or W-cycle:
This result can be interpreted in different ways. On the negative side, the portion of the work corresponding to the auxiliary grids \(\ell <L\) is larger, but on the positive side, even for the W-cycle, the cost is less than \(\mathcal {O}(n_{\mathsf {fine}}^{1+\varepsilon })\) for all \(\varepsilon >0\) with respect to \(n_{\mathsf {fine}}\). Nevertheless, the factor \(r^{4}\) may become more problematic than \(n_{\mathsf {fine}}\).
A possible reduction of the cost may be obtained by choosing different representation ranks r for the finest grid and for the auxiliary problems at the levels \(\ell <L\). For \(\ell =L\) a possibly larger rank \(r_{\mathsf {fine}}\) is needed to approximate the solution of the discretised pde. The size of \(r_{\mathsf {fine}}\) depends on the nature of the solution. However, for \(\ell <L\) only corrections are to be represented and the rank required by the corrections is not related to \(r_{\mathsf {fine}}.\)
4.8 Parabolic problems
Since the dimensions in the sense of number of coordinates are no limitation, space-time simultaneous discretisations with the additional time variable are not disadvantageous. In this respect, the results of Andreev–Tobler [1] are of interest. In the latter paper, a BPX preconditioner is used.
5 Variational approach
Finally we mention a quite different approach to solving \(\mathbf {Ax} =\mathbf {b}\) approximately. If \(\mathbf {A}\) is positive definite, we may minimise the quadratic cost function
In the general case of a regular \(\mathbf {A}\), define
with a suitable preconditioner \(\mathbf {B}\) and try to minimise \(\Phi (\mathbf {x})\). The minimisation over all tensors \(\mathbf {x}\) is not feasible because of the huge dimension. Instead one fixes a certain format for the representation of \(\mathbf {x}\) and minimises over all representation parameters of \(\mathbf {x}\). In practice, this works quite well although the theoretical understanding of the convergence properties is still incomplete. Another difficulty is that \(\Phi \) has many local minima and the minimisation is nonconvex.
Concerning the variational approach we refer to the following papers: Espig–Hackbusch–Rohwedder–Schneider [7], Falcó–Nouy [8], Holtz–Rohwedder–Schneider [21], Mohlenkamp [26], Osedelets [27] and others cited in these papers.
Notes
The asterix in \(\log ^{*}n\) replaces a variable exponent. This factor also contains the bound of the local rank of the hierarchical method, which usually depends indirectly on n.
The representation rank has to be distinguished from \({\text {rank}} (\mathbf {v})\), the rank of the tensor \(\mathbf {v}\), which is the minimal r such that \(\mathbf {v}\in \mathcal {R}_{r}.\) Since the determination of \({\text {rank}}(\mathbf {v})\) is in general NP hard (cf. Håstad [20]), the representation rank used in practice will be larger than the \({\text {rank}}(\mathbf {v})\).
The upper theoretical bound \(r\le O(n^{d-1})\) is useless, since then \(rdn=O(n^{d})\) is not better than the full representation by all entries.
References
Andreev, R., Tobler, C.: Multilevel preconditioning and low-rank tensor iteration for space-time simultaneous discretizations of parabolic PDEs. Numer. Linear Algebra Appl. 22, 317–337 (2015)
Ballani, J., Grasedyck, L.: A projection method to solve linear systems in tensor format. Numer. Linear Algebra Appl. 20, 27–43 (2013)
Benner, P., Breiten, T.: Low rank methods for a class of generalized Lyapunov equations and related issues. Numer. Math. 124, 441–470 (2013)
Braess, D., Hackbusch, W.: Approximation of \(1/x\) by exponential sums in \([1,\infty )\). IMA J. Numer. Anal. 25, 685–697 (2005)
Braess, D., Hackbusch, W.: On the efficient computation of high-dimensional integrals and the approximation by exponential sums. In: DeVore, R.A., Kunoth, A. (eds.) Multiscale, Nonlinear and Adaptive Approximation, pp. 39–74. Springer, Berlin (2009)
De Lathauwer, L., De Moor, B., Vandewalle, J.: A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl. 21, 1253–1278 (2000)
Espig, M., Hackbusch, W., Rohwedder, T., Schneider, R.: Variational calculus with sums of elementary tensors of fixed rank. Numer. Math. 122, 469–488 (2012)
Falcó, A., Nouy, A.: Proper generalized decomposition for nonlinear convex problems in tensor Banach spaces. Numer. Math. 121, 503–530 (2012)
Grasedyck, L.: Nonlinear multigrid for the solution of large-scale Riccati equations in low-rank and \({\cal {H}}\)-matrix format. Numer. Linear Algebra Appl. 15, 779–807 (2008)
Grasedyck, L., Hackbusch, W.: A multigrid method to solve large scale Sylvester equations. SIAM J. Matrix Anal. Appl. 29, 870–894 (2007)
Grasedyck, L., Hackbusch, W., Khoromskij, B.: Solution of large scale algebraic matrix Riccati equations by use of hierarchical matrices. Computing 70, 121–165 (2003)
Hackbusch, W.: Iterative Solution of Large Sparse Systems of Equations. Springer, New York (1994)
Hackbusch, W.: Multi-grid Methods and Applications, SCM, vol. 4. Springer, Berlin (2003)
Hackbusch, W.: Hierarchische Matrizen: Algorithmen und Analysis. Springer, Berlin (2009)
Hackbusch, W.: Tensor Spaces and Numerical Tensor Calculus, SCM, vol. 42. Springer, Berlin (2012)
Hackbusch, W.: Numerical tensor calculus. Acta Numer. 23, 651–742 (2014)
Hackbusch, W.: Hierarchical Matrices: Algorithms and Analysis. Springer, Berlin (2015)
Hackbusch, W., Khoromskij, B., Tyrtyshnikov, E.E.: Approximate iterations for structured matrices. Numer. Math. 109, 365–383 (2008)
Hackbusch, W., Kühn, S.: A new scheme for the tensor representation. J. Fourier Anal. Appl. 15, 706–722 (2009)
Håstad, J.: Tensor rank is NP-complete. J. Algorithms 11, 644–654 (1990)
Holtz, S., Rohwedder, T., Schneider, R.: The alternating linear scheme for tensor optimization in the tensor train format. SIAM J. Sci. Comput. 34, A683–A713 (2012)
Khoromskij, B.: Tensor-structured preconditioners and approximate inverse of elliptic operators in \(\mathbb{R}^d\). Constr. Approx. 30, 599–620 (2009)
Kressner, D., Tobler, C.: Krylov subspace methods for linear systems with tensor product structure. SIAM J. Matrix Anal. Appl. 31, 1688–1714 (2010)
Kressner, D., Tobler, C.: Low-rank tensor Krylov subspace methods for parametrized linear systems. SIAM J. Matrix Anal. Appl. 32, 1288–1316 (2011)
Kressner, D., Tobler, C.: Preconditioned low-rank methods for high-dimensional elliptic PDE eigenvalue problems. Comput. Methods Appl. Math. 11, 363–381 (2011)
Mohlenkamp, M.J.: Musing on multilinear fitting. Linear Algebra Appl. 438, 834–852 (2013)
Oseledets, I.V.: DMRG approach to fast linear algebra in the TT-format. Comput. Methods Appl. Math. 11, 382–393 (2011)
Savas, B., Eldén, L.: Krylov-type methods for tensor computations I. Linear Algebra Appl. 438, 891–918 (2013)
Tobler, C.: Low-rank tensor methods for linear systems and eigenvalue problems. Doctoral thesis, ETH Zürich (2012)
Tucker, L.R.: Some mathematical notes on three-mode factor analysis. Psychometrika 31, 279–311 (1966)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Artem Napov, Yvan Notay, and Stefan Vandewalle.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Hackbusch, W. Solution of linear systems in high spatial dimensions. Comput. Visual Sci. 17, 111–118 (2015). https://doi.org/10.1007/s00791-015-0252-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00791-015-0252-0