Abstract
Dynamic mode decomposition (DMD) has attracted much attention in recent years as an analysis method for time series data. In this paper, we perform asymptotic analysis on the DMD to prove strong consistency in the statistical sense. More specifically, we first give a statistical model of random noise for data with observation noise. Among many variants of the DMD, the total least squares DMD (TLS-DMD) is known as a robust method for the random noise in the observation. We focus on consistency analysis of the TLS-DMD in the statistical sense and aim to generalize the analysis for projected methods. This paper gives a general framework for designing projection methods based on efficient dimensionality reduction analogously to the proper orthogonal decomposition under the statistical model and proves its strong consistency of the estimation.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Dynamic mode decomposition (DMD) [16] is an analysis method originally proposed in the field of fluids analysis, and in recent years, it has attracted much attention as an analysis method for time series data with the trend of data science. The effectiveness of dynamic mode decomposition has been verified for various time-series data by many researchers, and the algorithmic development in recent years has been remarkable [11, 12].
Time-series data is often contaminated with random noise in the fields of data science. Due to its practical importance, the development of numerical methods that are robust against noise has been promoted in recent years. The first such attempt was seen in 2016 [7], where noise-corrected DMD, forward-backward DMD, and total least squares DMD were proposed. The total least squares DMD was also proposed in [10] at the same time, and detailed results are given specifically for this method. In 2018 [3], the optimized DMD [6] proposed around 2012 was reviewed from the viewpoint of robustness to noise, and its nice performance was reported. Furthermore, in 2019, the consistent DMD [4] was proposed with an emphasis on noise resistance. Apart from such studies, a statistical model that expresses noise is given, a framework for removing noise by estimating the probability distribution from the data is shown, and Bayesian DMD and others have been specifically proposed in [17]. As described above, although the development of algorithms aimed at removing noise has been promoted recently, the effects have mainly been verified by numerical experiments. The asymptotic analysis in the statistical sense has only a few origins from the result of statistical analysis by Gleser [8, Lemma 3.3].
With such a background, we focus on the statistical model of the DMD for noisy datasets with the aid of numerical analysis for matrix computation, i.e., numerical linear algebra. From the viewpoint of numerical linear algebra, the algorithmic aspect of the DMD is explained as follows. The DMD is a dimensionality reduction algorithm using matrix datasets \(X,Y\in {\mathbb {R}}^{m\times n}\) regarded as the snapshots for the dynamical system. The aim is to find \(r(\le \min (m,n))\) eigenvalues and the corresponding eigenvectors of unknown matrix A with \(\mathrm{rank}(A)\approx r\) such that \(AX\approx Y\). Usually, matrix datasets \(X,Y\in {\mathbb {R}}^{m\times n}\) are contaminated with noise. Under a statistical model of the noise, the total least squares (TLS) method is successfully applied to the DMD [7, 10]. In [1], for convergence theory, a nice statistical model of the noise is presented, where the strong convergence for the DMD based on the TLS can be straightforwardly proved.
The aim of this paper is to develop the convergence theory in a more general situation, where an efficient dimensionality reduction technique is applied to the DMD. The above technique is interpreted as the proper orthogonal decomposition (POD) for dimensionality reduction. Importantly, the strong convergence can be straightforwardly proved for the TLS-DMD combined with the POD under reasonable assumptions.
This paper is organized as follows. Section 2 is devoted to a description of the previous study. In Sect. 3, we construct a general framework of the projected TLS-DMD with the POD basis. In Sect. 4, we prove strong convergence of the proposed method in a statistical sense. Section 5 describes the conclusion of this paper.
2 Previous study
We consider a dynamical system, producing a sequence of vectors \({\varvec{z}}_{k}\in {\mathbb {R}}^{m}\ (k=0,1,\ldots )\) such that
for an \(m \times m\) matrix A, where the meaning of \(\approx \) is defined in the following sections with mathematical rigor. In addition, we assume that \({\varvec{z}}_{k}\in {\mathbb {R}}^{m}\ (k=0,1,\ldots )\) can be partially observed as follows. For positive integers \(k_{1},k_{2},\ldots ,k_{n}\), data pairs \(({\varvec{z}}_{k_{\ell }},{\varvec{z}}_{k_{\ell }+1})\) are given for \(\ell =1,2,\ldots , n\). Define \(X,Y \in {\mathbb {R}}^{m\times n}\) such that
Then we have \(AX\approx Y\) from (1).
Our problem setting is that datasets presented as \(m\times n\) matrices X, Y are given. We aim to find the r eigenvalues with large absolute values and the corresponding eigenvectors of unknown matrix A, where r is considerably smaller than \(\min (m,n)\). The aim is due to the assumption that A can be approximated by a rank r matrix. Basically, the DMD is used to compute the above eigenpairs without explicitly constructing the large scale matrix A. For this purpose, the singular value decomposition (SVD) [9, Sect. 6.3] is applied. More precisely, the truncated SVD is computed to approximate X in the algorithm of the DMD. The algorithm is written as follows.
Remark 1
Throughout the paper, the first r singular values and vectors mean the r largest singular values and the corresponding singular vectors. In addition, we assume that \(\mathrm{rank}(X)\) and \(\mathrm{rank}(Y)\) are not smaller than r. This feature is consistent with the problem setting below.
The above matrix expression of the DMD shows relations to other decomposition techniques based on the least squares methods. See [16, Sect. 2.2] for the details.
2.1 Noise model of the DMD
Usually, the observed snapshot datasets are contaminated with noise. In addition, with (1) and (2) in mind, we consider the following simple statistical model. Let
where \({\bar{X}}\) and \({\bar{Y}}\) are not contaminated with noise. In addition, let
where all the elements of E, F are random variables. Hence, we have \(AX\approx Y\) for the above exact A associated with the matrix datasets without noise. The above noise model can be applied to time series data related to the Koopman spectral analysis for dynamical system; see [2, 5, 14, 17, 18] for the details.
Noting a particular feature of the Koopman spectral analysis, we can discover the following normalization for our problem setting. Letting \({\mathcal {R}}\) denote the range of the argument matrix, we assume \({\mathcal {R}}({\bar{X}})={\mathcal {R}}({\bar{Y}})\) in the following discussion. The theoretical justification of this assumption is explained in [1].
Simply, if the dynamical system forming the time series is \({\varvec{z}}_{k+1}=A{\varvec{z}}_{k}\ (k=0,1,\dots , )\), we can see that \({\mathcal {R}}(X)={\mathcal {R}}(Y)(\subset {\mathcal {R}}(A))\) by the definitions as in (2). However, in practical situations, \({\varvec{z}}_{k+1}=A{\varvec{z}}_{k}\) for each step k is only approximate as in (1), and thus X and Y are defined based on \({\bar{X}}\) and \({\bar{Y}}\) above, taking stochastic perturbations into account.
With such a background, we consider the next noise model, slightly different from [1].
Problem 1
Find r nonzero eigenvalues \(\lambda _{i}\ (i=1,\ldots ,r)\) and the corresponding eigenvectors \({\varvec{w}}_{i}\ (i=1,\ldots ,r)\) of \(A \in {\mathbb {R}}^{m\times m}\) with rank \(r(\le m)\) under the following assumptions:
-
Assume that unknown matrices \({\bar{X}},{\bar{Y}}\in {\mathbb {R}}^{m\times n}\) and \(A \in {\mathbb {R}}^{m\times m}\) satisfy
$$\begin{aligned} A{\bar{X}}={\bar{Y}},\ {\bar{Z}}=[{\bar{X}}^{\top },{\bar{Y}}^{\top }]^{\top },\ \mathrm{rank}({\bar{Z}})=r,\ {\mathcal {R}}({\bar{X}})={\mathcal {R}}({\bar{Y}}). \end{aligned}$$ -
Let \(\lambda _{i}\ (i=1,\ldots ,r)\) denote nonzero eigenvalues of A. Assume that the eigenvalues \(\lambda _{i}\ (i=1,\ldots ,r)\) are all distinct, i.e., \(\lambda _{i}\not =\lambda _{j}\ (1\le i\not = j \le r)\). Moreover, let \({\varvec{w}}_{i}\ (i=1,\ldots ,r)\) denote the eigenvectors corresponding to \(\lambda _{i}\ (i=1,\ldots ,r)\). In other words,
$$\begin{aligned} A{\varvec{w}}_{i}=\lambda _{i}{\varvec{w}}_{i}\ (i=1,\ldots ,r),\quad \lambda _{i}\not =\lambda _{j}\ (1\le i\not = j \le r). \end{aligned}$$ -
Observed matrices are \(X,Y\in {\mathbb {R}}^{m\times n}\) such that
$$\begin{aligned}X={\bar{X}}+E,Y={\bar{Y}}+F,\end{aligned}$$where \(E,F \in {\mathbb {R}}^{m\times n}\) are random matrices.
In the above problem setting, we assume that the nonzero eigenvalues of A are all simple, different from the statistical model in [1] because we aim another generalization concerning algorithmic development and its convergence theory under the above simple condition. Our purpose is to perform an asymptotic convergence analysis for \(n\rightarrow \infty \) under reasonable assumptions. To this end, we consider the next assumptions in this paper.
Assumption 1
The columns of \([E^{\top },F^{\top }]^{\top }\) are i.i.d. with common mean vector \({\varvec{0}}\in {\mathbb {R}}^{2m}\) and common covariance matrix \(\sigma ^{2}I \in {\mathbb {R}}^{2m\times 2m}\). Moreover, \(\lim _{n\rightarrow \infty }n^{-1}{\bar{X}}{\bar{X}}^{\top }\) exists.
Assumption 2
Under Assumption 1, the rank of the matrix \(\lim _{n\rightarrow \infty }n^{-1}{\bar{X}}{\bar{X}}^{\top }\) is r in Problem 1.
Under the above assumptions, we consider numerical methods for solving Problem 1. The parameter estimation of the above noise model is relevant to the total least squares (TLS) problems as in [8, 13]. In the following, we explain the TLS and its application to the DMD.
Remark 2
In Problem 1, the condition \({\mathcal {R}}({\bar{X}})={\mathcal {R}}({\bar{Y}})\) might be too strong. In addition, one may think that Assumptions 1 and 2 are unrealistic assumptions for time series data. While the above criticisms are valid, it should be noted that, to the best of the author’s knowledge, there are currently no analytical results for stochastic perturbations when projection is used for the DMD and its variants. This study aims to prove consistency under the simplest of conditions, which is a significant contribution in itself. Each of the conditions imposed on Problem 1 is based on the following background and makes some sense.
The condition \({\mathcal {R}}({\bar{X}})={\mathcal {R}}({\bar{Y}})\) is due to (1) and (2). In terms of theoretical aspect of Problem 1, this condition leads to the uniqueness of the target eigenpairs of A, though A is not unique [1, Lemma 9].
Assumptions 1 and 2 originally come from the asymptotic analysis of an errors-in-variables linear regression model [8, Assumptions A, B, and C]. It is an important future issue to examine how natural these assumptions are for time series data from dynamical systems. The following is an intuitive reason why the assumptions seem to hold. In the DMD, we assume that the nonzero eigenvalues of matrix A are approximately in the unit circle of the complex plane. This is because the goal is to analyze the stable behavior of the system over time. The time series \({\varvec{z}}_{k_{\ell }}\ (\ell =1,\ldots ,n)\) in each column of the matrix X have approximately the same size, and \(n^{-1}XX^{\top }\) is the average of \({\varvec{z}}_{k_{\ell }}{\varvec{z}}_{k_{\ell }}{}^{\top }\ (\ell =1,\ldots ,n)\). Since the absolute values of the nonzero eigenvalues of A are equal, it is unlikely that the information on a particular eigenvector will disappear, and we can expect that the necessary information on the r eigenvectors will remain in X. This property leads to Assumptions 1 and 2 for \({\bar{X}}\). Mathematically rigorously formulating and proving this property is an issue to be addressed in the future.
2.2 Total least squares DMD (TLS-DMD)
For given \(X,Y \in {\mathbb {R}}^{m\times n}\), the total least squares (TLS) problem is formulated as
For computing \(A_{\mathrm{tls}}\), there is a sophisticated method using the singular value decomposition (SVD) [9, Sect. 6.3]. Although \(A_{\mathrm{tls}}\) is usually assumed to be nonsingular, the rank-deficient case is investigated in [13]. In [7, 10], such a method is successfully applied to the DMD, where \(m\le n\) is assumed. We consider the method in [10] as follows.
Important features of this algorithm are described as follows. In the following, \(\dag \) means the Moore–Penrose inverse. In the above algorithm, let
Note that, in the case of \(r=m\le n\), the matrix \({\widehat{A}}_{\mathrm{tls}}\) is the solution of the TLS problem. In general, for any \(r\le \min (m,n)\), it is easy to see that \({\widehat{A}}_{\mathrm{tls}}\) in (3) satisfies \({\widehat{A}}_{\mathrm{tls}}{\widehat{X}}={\widehat{Y}}\) for \(m\times r\) matrices \({\widehat{X}},{\widehat{Y}}\) in line 3 of Algorithm 2, where we assume \(\mathrm{rank}({\widehat{X}})=r\). More importantly,
This implies that Algorithm 2 computes eigenpairs of \({\widehat{A}}_{\mathrm{tls}}\) by the Rayleigh–Ritz procedure [15, Sect. 4.3] using the r dimensional subspace.
2.3 Strong consistency of the TLS-DMD
First of all, note that, in general, the strong convergence of random variables is defined as follows.
Definition 1
(Strong convergence) The sequence of random variables \({\mathcal {X}}^{(n)}\ (n=0,1,\ldots )\) converges to \({\mathcal {X}}\) with probability one if and only if
where \(\varOmega \) is a sample space and \(\mathrm{N}\) is a null set for a probability measure.
On the basis of probabilistic analysis of the TLS in [8, 13], the strong convergence of eigenpairs computed by Algorithm 2 is proved in [1]. In other words, the strong consistency of the estimation by the TLS-DMD is theoretically guaranteed for Problem 1, where we assume that \(\mathrm{rank}(A)\) can be estimated by some methods to set \(r:=\mathrm{rank}(A)\).
Theorem 1
(Strong consistency of the TLS-DMD [1]) Suppose that Algorithm 2 is applied to Problem 1 with \(r(<m)\) under Assumptions 1 and 2. Let \(\lambda _{\mathrm{dmd}}^{(n)},{\varvec{w}}_{\mathrm{dmd}}^{(n)}\) denote \(\lambda _{\mathrm{dmd}},{\varvec{w}}_{\mathrm{dmd}}\) in Algorithm 2 for n, respectively. Then each eigenvalue \(\lambda _{\mathrm{dmd}}^{(n)}\) is convergent to a nonzero eigenvalue of A with probability one, and the accumulation points of \({\varvec{w}}_{\mathrm{dmd}}^{(n)}\) are the eigenvector corresponding to \(\lambda _{\mathrm{dmd}}^{(\infty )}\) with probability one.
More precisely, in [1], the convergence theorem states that the eigenvectors (or generalized eigenvectors) of a perturbed matrix go toward the corresponding invariant subspace. However, if we assume that all the nonzero eigenvalues are simple, all the computed eigenpairs converge to the target eigenpairs, as in the above theorem. The proof is due to the continuity of eigenpairs for simple eigenvalues.
3 General framework with the use the POD approach
Proper orthogonal decomposition (POD) is an efficient technique for dimensionality reduction, resulting in an idea to use it together with the DMD. Such a method is often called the projected DMD [10, 18] because the POD basis is used for the subspace to be projected. A typical usage of the POD is based on the truncated SVD of data matrices. As an example,
where \(\varSigma _{r}\) is a diagonal matrix comprising the r largest singular values of X and \(U_{r},V_{r}\) are the corresponding singular vector matrices. Then the columns of \(U_{r}\) is a basis of the POD. As another example, one might choose
due to \({\mathcal {R}}({\bar{Y}})\subset {\mathcal {R}}(A)\) in general cases. If we aim to reduce errors based on the law of large numbers, we might use all the snapshots
for the POD basis.
In this paper, to cover the above various cases, we construct a general framework for computing the POD basis, resulting in the projected TLS-DMD.
3.1 General projection methods using the POD basis
The aim of this section is to design an algorithm for computing the exact eigenpairs of A in Problem 1 in the asymptotic regime \(n\rightarrow \infty \). To this end, we consider the next algorithm, regarded as a general framework to use the POD basis, where the basis vectors are the columns of a matrix \(M_{r}\in {\mathbb {R}}^{m\times r}.\)
The matrix \(M_{r}\) is obtained by the SVD as the above discussions. The use of \(M_{r}\) is related to the Rayleigh–Ritz procedure. More specifically, \(M_{r}\) comprises orthonormal basis vectors for the subspace of the Rayleigh–Ritz procedure. If \(\mathrm{rank}(X)=m\),
where \(\dag \) means the Moore–Penrose inverse. In other words, the Rayleigh–Ritz procedure for computing the eigenpairs of \(YX^{\dagger }\) is performed. Due to \(A\approx {Y}{X}^{\dagger }\), the above Rayleigh–Ritz procedure aims to compute the eigenpairs of A.
3.2 Numerical tests
Practical efficiency of the DMD and the POD approach is investigated in previous study [7, 10]. Here we present simple numerical results to validate the convergence theory of Algorithm 3, proved in the next section, and show its efficiency. All the tests are performed in MATLAB.
Let \(L\in {\mathbb {R}}^{m\times r}\) denote a random matrix given by the MATLAB’s built-in function randn. In our test, let \(m=200,r=2\) and
implying \(\mathrm{rank}(A)=2\), where \(L^{\dagger }\) is computed by the MATLAB’s built-in function pinv. In this case, the exact eigenvalues are \(\pm \mathrm{i}\). The matrices \({\bar{X}}\) and \({\bar{Y}}\) are given as follows. Generate \(K \in {\mathbb {R}}^{m\times n}\) as a random matrix with the MATLAB’s built-in function randn and let \({\bar{X}}:=AK\) and \({\bar{Y}}:=A{\bar{X}}\) to satisfy the condition \({\mathcal {R}}({\bar{X}})={\mathcal {R}}({\bar{Y}})(={\mathcal {R}}(A))\) in Problem 1.
The results of the numerical tests are shown in Tables 1 and 2. We used three methods: Original DMD (Algorithm 1), TLS-DMD (Algorithm 2), Projected TLS-DMD (Algorithm 3). In Algorithm 3, \(M_{r}\) is given by the SVD of [X, Y]. The error of eigenvalues is defined as the square root of the sum of squares of the errors of each computed eigenvalue. The error of each eigenvector is defined as follows. Let \(\theta \) denote the angle between \({\varvec{w}}\) and \({\varvec{w}}_{\mathrm{dmd}}\). Here \({\varvec{w}}\) and \({\varvec{w}}_{\mathrm{dmd}}\) are normalized such that \(\Vert {\varvec{w}}\Vert = \Vert {\varvec{w}}_{\mathrm{dmd}}\Vert =1\) for the Euclidean norm \(\Vert \cdot \Vert \). Then we have
where \(\mathrm{H}\) means the Hermitian transpose due to \({\varvec{w}},{\varvec{w}}_{\mathrm{dmd}}\in {\mathbb {C}}^{m}\). On the basis of this feature, we compute
for each eigenvector. The square root of their sum of squares is defined as the errors of eigenvectors. The random noise for each component of X, Y is given by the Gaussian distribution \(\mathrm{N}(0,0.1^2)\). Tables 1 and 2 display the averages of 100 trials of the errors for sample size \(n=50,100\).
From Tables 1 and 2, we see the efficiency of the projected TLS-DMD in terms of accuracy and its convergence behavior. More specifically, the projection in Algorithm 3 appears to improve estimation accuracy. Since the matrix \(M_{r}\) is given by the SVD of [X, Y] comprising all the snapshots, the subspace spanned by the columns of \(M_{r}\) can be a nice approximation of \({\mathcal {R}}({\bar{X}})(={\mathcal {R}}({\bar{Y}}))\). Such a practical performance must be deeply investigated with mathematical rigor in the future.
We would like to emphasize here that the main result of this paper is to prove strong consistency of the projected TLS-DMD (Algorithm 3), and thus the detailed numerical and theoretical comparisons with existing methods from the practical viewpoints are beyond the scope of this paper. The theoretical justification of the convergence of the projected TLS-DMD is presented in the next section.
4 Consistency analysis
In this section, we prove strong consistency for Algorithm 3. More specifically, we prove that the computed eigenpairs converge to the exact eigenpairs of A under reasonable assumptions. To this end, we first show an important feature of the Rayleigh–Ritz procedure for denoised matrices in Sect. 4.1. Next, we prove consistency in the statistical sense of Algorithm 3 for noisy datasets in Sect. 4.2. This entire mathematical analysis is the contribution of this paper. Although the presented strong consistency below is a development of the result in [1], its consistency analysis is a completely new approach. From a technical perspective, the discovery of Lemma 5 is crucial, and to take advantage of it, the key properties of projective methods are organized to complete the proof.
As a preparation, we note a basic feature of Algorithm 2. In Algorithm 2, let \(N_{r}\) be a \(2m\times r\) matrix comprising the first r left singular vectors of Z. Even if we define \({\widehat{X}},{\widehat{Y}} \in {\mathbb {R}}^{m\times r}\) such that \([{\widehat{X}}^{\top },{\widehat{Y}}^{\top }]:=N_{r}^{\top }\) in line 3 of Algorithm 2, the eigenpairs computed by this modified algorithm are equal to those in the original algorithm.
The above feature is easily proved as follows. Using a diagonal matrix \(\varSigma _{Z}\in {\mathbb {R}}^{r\times r}\) comprising the nonzero singular values of Z, we have
due to the relationship between the left singular vectors and the right singular vectors. Thus, we have
in the modified algorithm. Therefore, \({\widehat{Y}}{\widehat{X}}^{\dagger }=YQ\varSigma _{Z}^{-1}\varSigma _{Z}(XQ)^{\dagger }=YQ(XQ)^{\dagger }\) is equal to \({\widehat{Y}}{\widehat{X}}^{\dagger }\) in Algorithm 2, where \(\dag \) means the Moore–Penrose inverse. Since the TLS-DMD is the Rayleigh–Ritz procedure for \({\widehat{Y}}{\widehat{X}}^{\dagger }\), the modified algorithm is mathematically equivalent to the original algorithm.
Note that the first r left singular vectors of Z and the first r eigenvectors of \(ZZ^{\top }\) are the same. Hence, the above modified algorithm can be descried as follows.
The next lemma is a basic tool of our mathematical analysis.
Lemma 1
Algorithm 4 is mathematically equivalent to Algorithm 2.
We use Algorithm 4 in the following discussion.
4.1 Deterministic analysis
Our mathematical analysis begins with deterministic analysis. We aim to analyze Algorithm 3. In line 2 of Algorithm 3, Algorithm 2 is applied to \(r \times n\) matrices X, Y. Noting Lemma 1, we investigate the case of \(r=m\) in Algorithm 4, resulting in the next lemma.
Lemma 2
Suppose that Algorithm 4 is applied with input matrices \(X:={\bar{X}},Y:={\bar{Y}}\), where \({\bar{X}},{\bar{Y}}\) are defined in Problem 1 with \(r=m\). Assume that there exist a nonsingular matrix \(C \in {\mathbb {C}}^{r \times r}\), a diagonal nonsingular matrix \(D \in {\mathbb {C}}^{r \times r}\), and a full-row rank matrix \(K \in {\mathbb {C}}^{r \times n}\) such that \(X = CK\) and \(Y = C D K\). Then
Moreover, each computed eigenvalue \(\lambda _{\mathrm{dmd}}\) in Algorithm 4 is an eigenvalue of A, and \({\varvec{w}}_{\mathrm{dmd}}\) is the corresponding eigenvector that is a column of C.
Proof
Using \(AX=Y,\ X=CK,\ Y=CDK\), we see
From \(\mathrm{rank}(K)=r\), it is easy to see that \(AC=CD\) holds, where D is a diagonal matrix. Therefore, we obtain (4).
In the following, we prove that each computed eigenvalue \(\lambda _{\mathrm{dmd}}\) in Algorithm 4 is an eigenvalue of A, and \({\varvec{w}}_{\mathrm{dmd}}\) is the corresponding eigenvector. First of all, noting the definitions of X, Y, Z, we have
In the following, \(\mathrm{H}\) means the Hermitian transpose for complex matrices. Thus,
holds. In addition, from
the eigenspace spanned by the first r eigenvectors of \(ZZ^{\top }\) is the range of
Therefore, letting \(L \in {\mathbb {C}}^{r\times r}\) denote a nonsingular matrix, we have
where \(N_{r}\in {\mathbb {R}}^{2m\times r}\) is an eigenvector matrix of \(ZZ^{\top }\). In other words,
in line 3 of Algorithm 4. Note that \({\widehat{X}}=CL \in {\mathbb {R}}^{r\times r}\) is nonsingular, where we now assume \(r=m\). It then follows that
Thus, Algorithm 4 computes the eigenpairs of A. Therefore, each computed eigenvalue \(\lambda _{\mathrm{dmd}}\) in Algorithm 4 is an eigenvalue of A, and \({\varvec{w}}_{\mathrm{dmd}}\) is the corresponding eigenvector. \(\square \)
Using the above lemma for Algorithm 4, we see that, in an ideal case, Algorithm 3 computes the exact eigenpairs of A defined in Problem 1, as in the next lemma.
Lemma 3
Suppose that Algorithm 3 is applied with input matrices \(X:={\bar{X}},Y:={\bar{Y}}\), where \({\bar{X}},{\bar{Y}}\) are defined in Problem 1. In addition, assume
Then each computed eigenvalue \(\lambda _{\mathrm{dmd}}\) is an eigenvalue of A, and \({\varvec{w}}_{\mathrm{dmd}}\) is the corresponding eigenvector.
Proof
Let
where W is an eigenvector matrix corresponding to the nonzero eigenvalues of A. Noting \(\mathrm{rank}(A)=r\), we have \({\mathcal {R}}(A)={\mathcal {R}}(W)\). In addition, due to \(\mathrm{rank}({\bar{Z}})=\mathrm{rank}([{\bar{X}}^{\top },{\bar{Y}}^{\top }]^{\top })=r\) and \({\mathcal {R}}({\bar{X}})={\mathcal {R}}({\bar{Y}})\), we see \(\mathrm{rank}({\bar{X}})=\mathrm{rank}({\bar{Y}})=r\). Hence, \({\mathcal {R}}({\bar{X}})={\mathcal {R}}({\bar{Y}})={\mathcal {R}}(W)\). Thus, noting (5), we have \({\mathcal {R}}(W)={\mathcal {R}}(M_{r})\), leading to the existence of a nonsingular matrix \(B\in {\mathbb {C}}^{r\times r}\) such that
In addition, \(M_{r}^{\top }M_{r}=I\) yields
Moreover, from \({\mathcal {R}}({\bar{X}})={\mathcal {R}}({\bar{Y}})\), using \(K_{W}\in {\mathbb {C}}^{r \times n}\) with \(\mathrm{rank}(K_{W})=r\), we have
The above Eqs. (6)–(8) indicate
Thus, noting Lemma 2, we see that Algorithm 4 with \(X:=M_{r}^{\top }{\bar{X}},Y:=M_{r}^{\top }{\bar{Y}}\) computes the diagonal elements of \(\varLambda \) and the columns of \(B^{-1}\) as eigenpairs of \({\widetilde{A}}_{\mathrm{dmd}}\). Let \(W_{\mathrm{dmd}}\in {\mathbb {C}}^{r\times r}\) denote an eigenvector matrix comprising the r eigenvectors of \({\varvec{w}}_{\mathrm{dmd}}\) computed by Algorithm 4 with \(m=r\). Then Algorithm 3 computes the columns of
which is the target eigenvector matrix of A. \(\square \)
In a real situation, the exact \({\bar{X}},{\bar{Y}}\) are unknown, perturbed matrices X, Y can be observed as in Problem 1. In the following, we analyze this situation.
4.2 Statistical analysis
This section is devoted to statistical analysis for Algorithm 3, where the data matrices X, Y are contaminated with random noise.
Recall that E and F are random matrices, and thus \(X:={\bar{X}}+E\) and \(Y:={\bar{Y}}+F\) are also random matrices. In addition, any matrix computed by X and Y is random matrix such as Z and \(M_{r}\). The basic tool for asymptotic analysis \(n\rightarrow \infty \) is the next lemma.
Lemma 4
[8, Lemma 3.1] Under Assumption 1, we have
More specifically, letting \(Z^{(n)}:=Z\) comprising random variables with the use of the sample size n, we have
where \(\varOmega \) is a sample space and \(\mathrm{N}_{1}\) is a null set as in Definition 1.
Thus, \(n^{-1}ZZ^{\top }\) is not convergent to \(n^{-1}{\bar{Z}}{\bar{Z}}^{\top }\). At first glance, this property makes an analysis of convergence less straightforward. However, we can easily avoid this difficulty as follows. Importantly, the eigenvectors of \(n^{-1}ZZ^{\top }\) are convergent to those of \(n^{-1}{\bar{Z}}{\bar{Z}}^{\top }\) from the above equality. Algorithm 4 computes the eigenvector matrix \(N_{r}\) to obtain the eigenpairs of A. Thus, Algorithm 4 can compute the exact eigenpairs in the asymptotic regime \(n\rightarrow \infty \).
In the following, as in Lemma 4, we use superscripts to denote matrices for a fixed n, as in \(Z^{(n)}\). In addition, if a discussion of the sample space is required, \(\omega \in \varOmega \) is explicitly indicated, as in \(Z^{(n)}(\omega )\). The same notation is used for random matrices other than Z.
The next lemma states that Algorithm 3 has a similar feature to Lemma 4.
Lemma 5
Suppose that Algorithm 3 is applied to Problem 1 under Assumptions 1 and 2. In addition, we assume
where \(\mathrm{N}_{2}\) is a null set. Then letting
we have
where \(\mathrm{N}_{1}\cup \mathrm{N}_{2}\) is a null set.
Proof
For \({Z}_{M}^{(n)}\) comprising the random variables, we see
in view of \(Z^{(n)}=[X^{(n) {\top }},Y^{(n) {\top }}]^{\top }\). Noting
and Lemma 4, we have
where \(\varPsi \) is defined in (9). \(\square \)
The above lemma implies
because \(\mathrm{N}_{1}\cup \mathrm{N}_{2}\) is a null set. In other words, this indicates the strong convergence in Definition 1.
We are now in position to present the main theorem that states the strong consistency of Algorithm 3.
Theorem 2
Suppose that Algorithm 3 is applied to Problem 1 under Assumptions 1 and 2. In addition, letting \(M_{r}^{(n)}\) denote \(M_{r}\) for n, we assume
Let \(\lambda _{\mathrm{dmd}}^{(n)},{\varvec{w}}_{\mathrm{dmd}}^{(n)}\) denote \(\lambda _{\mathrm{dmd}},{\varvec{w}}_{\mathrm{dmd}}\) in Algorithm 3 for n. Then each eigenvalue \(\lambda _{\mathrm{dmd}}^{(n)}\) is convergent to a nonzero eigenvalue of A with probability one, and the accumulation points of \({\varvec{w}}_{\mathrm{dmd}}^{(n)}\) are the eigenvector corresponding to \(\lambda _{\mathrm{dmd}}^{(\infty )}\) with probability one.
Proof
Algorithm 3 uses Algorithm 2, and Algorithm 2 is mathematically equivalent to Algorithm 4. Recall that Lemma 3 states that Algorithm 3 computes the exact eigenpairs in an ideal case \(X={\bar{X}},Y={\bar{Y}}\), where the TLS-DMD (Algorithm 4) is applied to \(X={{\bar{M}}_{r}}^{\top }{\bar{X}},Y={{\bar{M}}_{r}}^{\top }{\bar{Y}}\). Under the condition, \(N_{r}\) in line 2 of Algorithm 4 is an eigenvector matrix of \({\bar{Z}}_{{\bar{M}}}{\bar{Z}}_{{\bar{M}}}{}^{\top }\) in Lemma 5. Let \({\bar{N}}_{r}\) be this \(N_{r}\). Note that the eigenpairs computed from \({\bar{N}}_{r}\) in Algorithm 4 coincide with the eigenpairs of A, independent of n, as shown in Lemma 3.
Below we consider general \(X:={\bar{X}}+E, Y:={\bar{Y}}+F\) in Algorithm 3. Then the TLS-DMD (Algorithm 4) is applied to \(X:={{M}_{r}}^{\top }{X},Y:={{M}_{r}}^{\top }{Y}\). Using the matrices, define
Then \(N_{r}\) in line 2 of Algorithm 4 is an eigenvector matrix of \(Z_{M}Z_{M}^{\top }\). Let \(N_{r}^{(n)}\) denote this \(N_{r}\) for n. Due to these definitions and Lemma 5, as \(n\rightarrow \infty \), the accumulation points of \(N_{r}^{(n)}\) are equal to \({\bar{N}}_{r}\) with probability one under the assumptions in (10). Since the eigenvalues \(\lambda _{\mathrm{dmd}}^{(n)}\) and the corresponding eigenvectors \({\varvec{w}}_{\mathrm{dmd}}^{(n)}\) are computed by \(N_{r}^{(n)}\) and \(M_{r}^{(n)}\), the continuity of eigenvalues and eigenvectors yields the strong convergence of \(\lambda _{\mathrm{dmd}}^{(n)}\) and \({\varvec{w}}_{\mathrm{dmd}}^{(n)}\). \(\square \)
Finally, we discuss how to construct \(M_{r}\) to satisfy the assumption for the strong consistency. The most simple solution is \(M_{r}\) to be the first r left singular vectors of [X, Y]. In this case, \((2n)^{-1}[X,Y] [X,Y]^{\top }\) is convergent to \((2n)^{-1}[{\bar{X}},{\bar{Y}}] [{\bar{X}},{\bar{Y}}]^{\top }+\sigma ^2 I\) with probability one, proved in the same manner as in Lemma 4 [8, Lemma 3.1]. Thus, \(M_{r}\) is convergent to some \({\bar{M}}_{r}\) such that \({\mathcal {R}}({\bar{M}}_{r})={\mathcal {R}}({\bar{X}})\) with probability one. Consequently, we have the next corollary.
Corollary 1
Suppose that Algorithm 3 is applied to Problem 1 under Assumptions 1 and 2. In addition, assume that \(M_{r}\) comprises the first r left singular vectors of [X, Y]. Moreover, let \(\lambda _{\mathrm{dmd}}^{(n)},{\varvec{w}}_{\mathrm{dmd}}^{(n)}\) denote \(\lambda _{\mathrm{dmd}},{\varvec{w}}_{\mathrm{dmd}}\) in Algorithm 3 for n, respectively. Then each eigenvalue \(\lambda _{\mathrm{dmd}}^{(n)}\) is convergent to a nonzero eigenvalue of A with probability one, and the accumulation points of \({\varvec{w}}_{\mathrm{dmd}}^{(n)}\) are the eigenvector corresponding to \(\lambda _{\mathrm{dmd}}^{(\infty )}\) with probability one.
Note that the SVD of [X, Y] for \(M_{r}\) can be replaced by the SVD of X or Y to illustrate the strong consistency. In this sense, our consistency analysis covers general situations concerning the subspace for the projection.
5 Conclusion
In this paper, we gave a general framework for algorithm design of the TLS-DMD using the projection for the POD basis and proved its strong consistency. This is a pioneering work on asymptotic statistical analysis to noisy datasets in the situation where projection methods are combined with the DMD. More precisely, the contribution of this paper is that we formulated Algorithm 3 based on the DMD widely used for the analysis of time series data to prove its strong consistency straightforwardly under a simple statistical model of noise as presented in Sect. 4. Theorem 2 can be interpreted as a fundamental theorem for applying the TLS-DMD with the aid of projections to noisy datasets.
References
Aishima, K.: Strong convergence for the dynamic mode decomposition based on the total least squares to noisy datasets. JSIAM Lett. 12, 33–36 (2020)
Arbabi, H., Mezić, I.: Ergodic theory, dynamic mode decomposition, and computation of spectral properties of the Koopman operator. SIAM J. Appl. Dyn. Syst. 16, 2096–2126 (2017)
Askham, T., Kutz, J.N.: Variable projection methods for an optimized dynamic mode decomposition. SIAM J. Appl. Dyn. Syst. 17, 380–416 (2018)
Azencot, O., Yin, W., Bertozzi, A.: Consistent dynamic mode decomposition. SIAM J. Appl. Dyn. Syst. 18, 1565–1585 (2019)
Brunton, S.L., Brunton, B.W., Proctor, J.L., Kutz, J.N.: Koopman invariant subspaces and finite linear representations of nonlinear dynamical systems for control. PLoS ONE 11(2), e0150171 (2016)
Chen, K.K., Tu, J.H., Rowley, C.W.: Variants of dynamic mode decomposition: boundary condition, Koopman, and Fourier analyses. J. Nonlinear Sci. 22, 887–915 (2012)
Dawson, S.T.M., Hemati, M.S., Williams, M.O., Rowley, C.W.: Characterizing and correcting for the effect of sensor noise in the dynamic mode decomposition. Exp. Fluids 57, 1–19 (2016)
Gleser, L.J.: Estimation in a multivariate “errors in variables’’ regression model: large sample results. Ann. Stat. 9, 24–44 (1981)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. Johns Hopkins University Press, Baltimore (2013)
Hemati, M.S., Rowley, C.W., Deem, E.A., Cattafesta, L.N.: De-biasing the dynamic mode decomposition for applied Koopman spectral analysis of noisy datasets. Theor. Comput. Fluid Dyn. 31, 349–368 (2017)
Hirsh, S.M., Harris, K.D., Kutz, J.N., Brunton, B.W.: Centering data improves the dynamic mode decomposition. SIAM J. Appl. Dyn. Syst. 19, 1920–1955 (2020)
Kutz, J.N., Brunton, S.L., Brunton, B.W., Proctor, J.L.: Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems. SIAM, Philadelphia (2016)
Park, S., O’Leary, D.P.: Implicitly-weighted total least squares. Linear Algebra Appl. 435, 560–577 (2011)
Rowley, C.W., Mezić, I., Bagheri, S., Schlatter, P., Henningson, D.S.: Spectral analysis of nonlinear flows. J. Fluid Mech. 641, 115–127 (2009)
Saad, Y.: Numerical Methods for Large Eigenvalue Problems. SIAM, Philadelphia (2011)
Schmid, P.J.: Dynamic mode decomposition of numerical and experimental data. J. Fluid Mech. 656, 5–28 (2010)
Takeishi, N., Kawahara, Y., Tabei, Y., Yairi, T.: Bayesian dynamic mode decomposition. In: Proceedings of the 26th IJCAI, pp. 2814–2821 (2017)
Tu, J.H., Rowley, C.W., Luchtenburg, D.M., Brunton, S.L., Kutz, J.N.: On dynamic mode decomposition: theory and applications. J. Comput. Dyn. 1, 391–421 (2014)
Acknowledgements
The author thanks the reviewers for many valuable comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This study was supported by JSPS KAKENHI Grant Nos. JP17K14143, JP21K11909, and JP22K03422.
Rights and permissions
This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.
About this article
Cite this article
Aishima, K. Strong consistency of the projected total least squares dynamic mode decomposition for datasets with random noise. Japan J. Indust. Appl. Math. 40, 691–707 (2023). https://doi.org/10.1007/s13160-022-00547-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13160-022-00547-6
Keywords
- Dynamic mode decomposition
- Singular value decomposition
- Total least squares
- Eigenvalue problems
- Strong convergence