1. Introduction
Many problems in science and engineering are hard to compute due their numerical complexity. Moreover, in the analysis of complex systems under a real time constraint, the evaluation of all possible scenarios appears as a necessity [
1]. Despite the improvements in techniques used in high dimensional problems, some challenging questions remain unresolved due to the efficiency of our computers. However, a novel technique called Proper Generalized Decomposition (PGD) [
2,
3] has been developed to provide an answer of these difficult tasks. It was initially proposed to compute, in a separated representation framework, the variational solution of partial differential equations (PDE) defined over a tensor product space [
4]. It is possible to distinguish two different benefits. The first one is the possibility of managing high dimensional problems, and the second is the possibility to include the model’s parameters as extra-coordinates. This last fact gives a powerful strategy to deal with classical problems because the PGD framework facilitates an efficient design and a real-time decision-making [
5,
6].
This novel technique allows for computing the whole set of solutions of a parametrized problem. The strategy is to include in an equivalent non-parametrized problem all possible parameter values as extra-coordinates. The name of this particular PGD based approach is Progressive Variational Vademecum [
1], and it can be implemented offline. As a consequence, the PGD based approach opens the possibility of solving problems in industry with a different strategy not envisioned until now.
The mathematical analysis of the PGD was given by Falcó and Nouy, Ref. [
4] in a Hilbert space framework and in [
7] for a more general setting.
The Greedy Rank One Algorithm (GROA) [
8] used to solve high-dimensional linear systems (with a full rank matrix) is the procedure of choice in the engineering community to implement the PGD. It is an iterative method made up of two steps that we can cyclically repeat until convergence. The first one consists of computing the minimal residual of the linear system over the set of tensors with bounded rank-one. In the second step, we use this optimal rank-one solution to update the residual. In the following, we return to the first step. Ammar et al. [
8] propose an Alternating Least Squares (ALS) Algorithm for the practical implementation of the first step of the GROA. In the aforementioned paper, the authors justify the choice of the ALS showing that its convergence to a critical point of the optimization problem (not necessarily an optimal one) is assured under very weak conditions. In addition, the convergence has been studied by El Hamidi, Osman and Jazar [
9] in the framework of Sobolev tensor spaces.
In this work, we want to study the optimization problem of the first step the GROA. To this end, even though our problem is not convex, we will take into account the relationship between the underlying convex optimization problem and the behaviour of its associated gradient flow. It is well-known that the vector field constructed by using a convex functional (for example related with a convex minimization problem defined over a finite dimensional vector space), has a gradient flow that provides a dynamical system with a unique stationary point. Moreover, it can be shown that it is a sink and its stable manifold coincides with the whole domain of the convex functional. This fact motivates the classical paradigm about the convergence of gradient-based numerical optimization algorithms.
In this work, we want to adapt the above paradigm in the framework of the GROA [
8]. The main goal is to find a vector field in a low-dimensional vector space related to the gradient flow of a convex functional defined over the set of tensors of fixed rank one. The idea is to use this vector field to characterize the behaviour of the solutions of the non-convex optimization problem associated with the first step of GROA. To achieve this, we will prove that the set of critical points of the optimization problem over the set of tensors with fixed rank-one can be identified with the set of stationary points of that vector field. In order to construct it, we will proceed as follows. First, we will endow the set tensors of fixed rank-one with an explicit structure of smooth manifold. Second, by the help of this geometric structure, we will explicitly construct a vector field over a low dimensional vector space related with the first step of GROA. Finally, we will show that the set of stationary points of this vector field coincides with the set of critical points of the optimization problem associated with the PGD algorithm. In consequence, this vector field allows us to explain the dynamical behaviour around each of its stationary points. Moreover, we can get explicit information, in a neighbourhood of each of these stationary points, about the structure of its stable and unstable manifolds. In our opinion, a more precise knowledge of these invariant sets can help us develop better and more efficient PGD approaches.
The paper is organised as follows.
Section 2 provides some preliminary definitions and results used along this paper.
Section 3 shows a geometric approach to the PGD. In
Section 4, the characterization of the smooth manifold of the set of tensors of fixed rank-one is given. After that,
Section 5 shows the first order optimality conditions for the PGD which is the main result of this paper. Finally,
Section 6 provides some conclusions of the work.
2. Preliminary Definitions and Results
First of all, we introduce some notation used along this paper. We denote by
the set of
-matrices and by
the transpose of a given matrix
As usual, we use
to denote the Euclidean inner product in
and its corresponding 2-norm, by
Let
be the
-identity matrix and when the dimension is clear from the context, we simply denote it by
Given a sequence
we say that a vector
can be written as
if and only if
in the
-topology. Now, we recall the definition and some properties of the Kronecker product. The Kronecker product of
and
written
is the tensor algebraic operation defined as
To conclude, we list some of the well-known properties of the Kronecker product (see, for example, [
10] or [
11]).
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
If A and B are banded, then is banded.
- 7.
If A and B are symmetric, then is symmetric.
- 8.
If A and B are definite positive, then is definite positive.
The concept of separated representation was introduced by Beylkin and Mohlenkamp in [
12], and it is related to the problem of constructing the approximate solutions of some classes of problems in high-dimensional spaces by means of a separable function. In particular, for a given map
we say that it has a
separable representation if
Now, consider a mesh of
in the
-variable given by
-mesh points,
then we can write a discrete version of (
1) by
where
for
Observe that, for each
if
denotes the vector with components
for
then (
2) is equivalent to
We point out that (
3) is a useful expression to implement numerical algorithms using the
Matlab and
Octave function
kron.
Suppose that, for given a linear Partial Differential Equation, and after a discretization by means of Finite Elements, we need to solve the linear system:
where
A is a
-dimensional invertible matrix, for some
that is,
Then, from all said above, a low rank approximation
with sufficient approximation exists, for some
and where
for
and
Moreover, we would show that
that is,
Thus, in a first approach to solve it, we would like to determine vectors
for
that minimizes
or, in short
by using the notation introduced in [
13].
The Proper Generalised Decomposition (PGD in short) appears when we consider solving the linear Equation (
4) as an optimization problem as follows. For each fixed
and
, we define a map
hence
holds. The goal is to use (
5) to approximate the solution of (
4). To this end, for each
we define the set
introduced in [
13], in the following way. Given
, we say that
if
where
for
For
, we define inductively
that is,
Note that for all
Unfortunately, from Proposition 4.1 (a) of [
13], we have that the set
is not necessarily (or even usually) closed for each
However, from Proposition 4.2 of [
13], it follows that
is a closed set in any norm-topology. This fact implies (see Lemma 1 in [
8]), that given
, then, for every
, we have that the set
This allows for considering the following iterative scheme. Let
and, for each
, take
Note that, given
and
, we can construct for each
by using (
8) and (
9), a vector
Here, we assume that
for
that is,
Since
we define the
for
obtained by the Greedy Rank-One Update Algorithms (
8) and (
9) as
The following theorem (see Theorem 1 in [
8]) gives the convergence of the Greedy Rank-One Update Approximation for solving linear systems with full rank matrix.
Theorem 1. Let and Then, by using the iterative scheme (8) and (9), we obtain that the sequence is strictly decreasing and Moreover, the rate of convergence is given byfor wherefor From (
10), we obtain that, if
then
for all
Thus, the above theorem allows for us to construct a procedure, which we give in the pseudo-code form in Algorithm 1, under the assumption that we have a numerical method in order to find a
solving (
7) (see the step 5 in Algorithm 1) and that we introduce below.
Algorithm 1 Greedy Rank-One Update |
1: procedure GROU () 2: 3: 4: for do 5: 6: 7: 8: if or then goto 13 9: end if 10: end for 11: return and 12: break 13: return and 14:
end procedure |
3. A Geometric Approach to the PGD
In this section, to study the procedure given in the line 5 of the Algorithm 1, we introduce a smooth manifold. To this end, introduce first the set of tensors of fixed rank-one in the tensor space
defined as
where
Observe that the set
Then, our first result is the following theorem of the alternative.
Theorem 2. Let and Either or but not both.
Proof. Assume that
and that there exists
Since we can write
and
holds for all
we have
Now, consider the map
defined as
Then,
for all
holds. Observe that
and hence the map
f has a global minimum for
a contradiction. □
The main consequence of this result is the following. It says that the output of the procedure given in step 5 in Algorithm 1 always remains in the set before it gives us the final output.
Corollary 1. Let and such that Then, for some if and only if and hence for all
As a consequence of the above corollary, the situation of interest is when given and , we have that Thus, in order to study the vectors in we need to characterize the structure of the critical points of the map restricted to the set To see this in the next section, we provide to of a structure of smooth manifold.
4. The Set of Tensors of Fixed Rank-One as a Smooth Manifold
Along this paper, we will consider a manifold as a pair , where is a subset of some finite-dimensional vector space V and is an atlas representing the local coordinate system of We recall the definition of an atlas associated with a set
Definition 1. Let be a set. An atlas of class or analytic on is a family of charts with some indexing set namely having the following properties (see [14]): - AT1
is a covering of that is, for all and
- AT2
For each stands for a bijection of onto an open set of a finite dimensional normed space and for any α and β the set is open in
- AT3
Finally, if we let and the transition mapping is a diffeomorphism of class or analytic.
Observe that the condition of an open covering is not used, see [
14]. Moreover, in AT2, we do not require that the normed spaces to be the same for all indices
or even to be isomorphic. If
is linearly isomorphic to some finite dimensional normed space
X for all
we have the following definition.
Definition 2. Let be a set and X be a finite dimensional normed space. We say that is a (respectively, analytic) manifold modelled on X if there exists an atlas of class (respectively, analytic) over with linearly isomorphic to X for all
Since different atlases can give the same manifold, we say that two atlases are compatible if each chart of one atlas is compatible with the charts of the other atlas in the sense of AT3. One verifies that the relation of compatibility between atlases is an equivalence relation.
Definition 3. An equivalence class of atlases of class on also denoted by is said to define a structure of a -manifold on and hence we say that is a finite dimensional manifold. In a similar way, if an equivalence class of atlases is given by analytic maps, then we say that is an analytic finite dimensional manifold.
For each
, we construct a local chart as follows. Let be
the orthogonal complement of the linear space
for
Let us consider the set
which is an open and dense set of the finite-dimensional vector space
Observe that the vector space is linearly isomorphic to the vector space for all
Now, we introduce the set
in
for which we can construct a natural bijection:
Then, we can state the following result.
Theorem 3. The set is an atlas for and hence is a -manifold modelled on Proof. Clearly, AT1 holds. To prove AT2 and AT3, let us consider
be such that
Without loss of generality we may assume that
and
, where
for
Then, for each
, there exists a unique
and a unique
such that
Since
holds, there exists a unique
such that
Thus, multiplying (
13) on the left side by
and, by using that
and
we obtain
Hence,
defines a
-function from the open set
to
for each
Moreover,
holds for
Observe that
can be written as
where
has norm one for
In addition,
where
has norm one for
Thus,
clearly defines a
-function from the open set
to
Finally, we conclude that
the map
is given by
and it is
This follows AT2, AT3 and concludes the proof of the theorem. □
The construction of
as an algebraic variety is well-known (see, for example, [
15]). More recently, in [
16], a structure of smooth manifold is given, in the framework of Banach spaces, for the set of tensors of fixed rank-one. Following [
16], it can be shown that the manifold
is also a principal bundle as follows. Consider the Grassmann manifold of one-dimensional, subspaces of
denoted by
for
and define the surjective map
Then, for each
, it holds that
Consequently,
is also a principal bundle with base space
and fibre
It allows for decomposing the tangent space at
denoted
into the vertical and horizontal spaces:
where
and
5. On the First Order Optimality Conditions for the PGD
The goal of this section is given
and
characterize the points in the manifold
satisfying the first order optimality conditions of the problem
Recall that the map
is defined in the whole ambient space
We will denote its derivative at
by
which is a bounded linear map from
to
From Theorem 3, we known that
is a
-manifold and hence it allows us to write the constrained map
as follows. Since
and
we can take into account the standard inclusion map
in order to write
Definition 4. We say that is a critical point for in ifholds for all Clearly, if is an extremal point for in , then it is also a critical point for in Observe that we can write
where on the left side of the equality, we consider
over
, whereas, in the right one,
is considered defined over the whole space. Thus, by using the chain rule, we have
that is,
In order to compute (
15), we consider first the standard inclusion map
i that in local coordinates in a neighbourhood of
looks like
Here,
Hence, its derivative as a morphism (a map between manifolds)
is given by
where
for
From (
15), we have that
is a critical point for
in
if and only if
that is, it is equivalent to state that
Now, we will prove the following result that characterizes the set of critical points for in as the stationary points of a vector field in , and it is the main result of this paper.
Theorem 4. Given and Then, is a critical point for in if and only if , where is a vector field in given byHere,for Proof. Since
is a critical point for
in
if and only if (
17) holds for
Then, (
17) is equivalent to prove that
and
Since
when
for
we conclude that
is a critical point for
in
if and only if
Writing the equality in (
20) as
we conclude that it is equivalent to state that
for
This concludes the proof of theorem. □
Remark 1. From (17), we can deduce that is a critical point for in if and only ifStatement (21) was first introduced in [2,3] as a step to enrich the approximation basis in the PGD algorithm. From the proof of Theorem 4, we have that (21) is also equivalent to (18) and (19). There are several strategies in the literature that can be used to solve
. The first one, closely related to the one used in [
2,
3], is to find a fixed point of the map
One of the most popular numerical strategies to compute an approximated value of
such that
is founded under the following argument. Since the
is equal to
we can choose randomly some vectors
for
and then try to solve the linear system
By using Least Squares, we can compute a
and next we can proceed in a similar way iteratively with each of the other components
as follows. After the step
, we know that
Then, by choosing randomly
, we can solve, by using Least Squares, the linear system
to compute
We can proceed cyclically until
holds. This strategy is known by the name of the Alternating Least Squares (ALS).
From (
17), we have that
is a critical point of
in
if and only if
holds, that is, the residual at
is orthogonal to linear subspace
Observe that
where
is an
-dimensional subspace of
for
Then, the next result explicitly describes the linear subspace
Proposition 1. The linear map is injective. Moreover, Proof. Since
for
and
for all
then
Since
, then
for
This follows the first statement. To prove the second one, we remark that the inclusion
between both subspaces is trivial. Clearly,
holds and then the second statement is also proved. □