1. A Review of the Rigorous Information Flow-Based Causality Analysis
Causal inference is a fundamental problem in scientific research. Recently it has been shown that the problem can be recast into the framework of information flow, another fundamental notion in general physics which has wide applications in different disciplines (see [
1]), and hence can be put on a rigorous footing. The causality between two time series can then be analyzed in a quantitative sense, and, besides, the resulting formula is very concise in form. In the linear limit, it involves only the usual statistics namely sample covariances [
2], making the important and otherwise difficult problem an easy task.
To briefly review the theory, consider a two-dimensional continuous-time stochastic system for state variables
where
may be arbitrary nonlinear functions of
and
t,
is a vector of white noise, and
is the matrix of perturbation amplitudes which may also be any functions of
and
t. Here we adopt the convention in physics and do not distinguish deterministic and random variables; in probability theory, they are ususally distinguished with capital and lower-case symbols. Assume that
and
are both differentiable with respect to
and
t. Then the information flow from
to
(in nats per unit time) can be explicitly found in a closed form [
3] (the multiple-dimensional case is referred to [
1]):
where
E stands for mathematical expectation, and
,
is the marginal probability density function (pdf) of
. The rate of information flowing from
to
can be obtained by switching the indices. If
, then
is not causal to
; otherwise it is causal, and the absolute value measures the magnitude of the causality from
to
. For discrete-time mappings, the information flow is in much more complicated a form; see [
1].
In the case with only two time series (no dynamical system is given)
and
, under the assumption of a linear model with additive noise, the maximum likelihood estimator (MLE) of the rate of information flowing from
to
is [
2]
where
is the sample covariance between
and
, and
the sample covariance between
and a series derived from
using the Euler forward differencing scheme (also see the Euler–Maruyama scheme in [
4]):
, with
some integer. Note that Equation (
3) is rather concise in form; it only involves the common statistics, i.e., sample covariances. In other words, a combination of some sample convariances will give a quantiative measure of the causality between the time series. This makes causality analysis, which otherwise would be complicated with the classical empirical/half-empirical methods, very easy. Nonetheless, note that Equation (
3) cannot replace (
1); it is just the mle of the latter. Statistical significance test must be performed before a causal inference is made based on the computed
. For details, refer to [
2].
Considering the long-standing debate ever since Berkeley (1710) [
5] over correlation versus causation, we may rewrite (
3) in terms of linear correlation coefficients, which immediately implies [
2]:
Causation implies correlation, but correlation does not imply causation.
The above formalism has been validated with many benchmark systems (e.g., [
1]) such as baker transformation, Hénon map, Kaplan-Yorke map, Rössler system, etc. It also has been successfully applied to the studies of many real world problems such those in financial economics (e.g., the “Seven Dwarfs vs. a Giant” problem [
6]), earth system science (e.g., the Antarctica mass balance problem [
7] and the global warming problem [
8]), neuroscience (e.g., the concussion problem [
9]), to name but a few.
3. The Solution
The problem can be more formally stated with the harmonic system:
If the system is initialized with
,
, the solution is,
,
. Thus, the population covariance
(
is one period or many periods). This yields an information flow from
to
:
Fundamentally the above problem arises from the fact that it is a deterministic system. In Granger causality test [
10] (also see a recent reference [
11]), this case has been explicitly excluded, as in such case the trajectories do not form appropriate ensembles in the sample space. For a harmonic series, it shows on a Poincaré section only one single point; so the total information does not accrue. If the total information does not change, the information flow to
must also vanish. However, the vanishing information flow does not mean that there is no influence of
on
. As we argued in Liang (2015), the so-obtained information must be normalized, just as covariance needs to be normalized into correlation, for one to assess the causal influence. Here if the normalizer is zero,
involves the indeterminate form
. We may then approach it by taking the limit. Specifically, we may approach it by enlarging the sample space slightly, i.e., by adding some stochasticity to the system, then take the limit by letting the stochastic perturbation amplitude go zero.
By Liang (2015), the normalizer for
is
where on the right hand side, the second term is the contribution from
itself, and the third term the contribution from noise. In Liang (2015), it has shown that
is a Lyapunov exponent-like, phase-space stretching rate, and
a noise-to-signal ratio. In this problem, we do not have noise taken into account. However, in reality, noise is ubiquitous. We may hence view a deterministic system as a limit or extreme case as the amplitude of stochastic perturbation goes to zero. For this case, we add to (
4) a stochastic term:
where
is a vector of standard Wiener processes. For simplicity, let the perturbation amplitude
a constant matrix. Further let
, with elements
Liang (2008) established that
So in this case, the normalized flow from
to
is
Note that
(or
) may be positive or negative. In causal inference, this does not matter; we need only consider the absolute value, although the sign does carry a meaning according to the original formulation. (A positive
means
causes the marginal entropy of
to grow and vice versa; see [
1,
2].)
Now for the stochastic equation, the covariance matrix
evolves as
We hence obtain the following equation set:
Solving, we get
So the solution is
If
,
, then the integration constants
. So
Two cases are distinguished:
- Case I
.
- Case II
As t goes to infinity, also approaches .
If initially there exists some covariance, say,
, then
, and hence
In this case, as , we always have . Either way, the relative information flow approaches in the limit of deterministic system.
In the other direction, we now need to consider the uncertainty growth of
and hence perturb
. Repeating the above procedure, when
, the normalized information flow is
If
, then
else (
)
which approaches to 1 for enough long time (
). On the other hand, if initially there exists some covariance such that
then
which implies
This is indeed what we expect. So even for this extreme case, there is no contradiction at all for causal inference using information flow.
4. Discussion
To summarize, a recent rigorously formulated causality analysis asserts that, in the linear limit, causation implies correlation, while correlation does not necessarily mean causation. In this short note, an extreme case which apparently contradicts to the assertion is examined. In this case an event
takes a harmonic form (sine/cosine), and generates through some process another event
so that
is always out of phase with
, i.e., lag
by
. Obviously
causes
, but by computation the correlation between
and
is zero. In this study we show that this is an extreme case, with only one point in the phase space and hence the problem becomes singular. We re-examine the problem by enlarging the ensemble space slightly through adding some noise. A stochastic differential equation is then solved for the corresponding covariances, which allows us to obtaint the information flows for the perturbed system. Then as the noisy perturbation goes to zero, the normalized information flow rate from
to
is established to be 100%, just as one would have expected. So actually no contradiction exists. (see [
12] for how a stochastic differential equation is solved by perturbing it with noise [
12].)
One thing that merits mentioning is that, here although it seems that causes , actually here the normalized information flow rate from to is also 100%. That is to say, for such a harmonic system with circular cause-effect relation, it is actually impossible to differentiate causality by simply assessing which takes place first; anyhow, taking lead by is equivalent to lagging by . The moral is, for a process that is nonsequential (e.g., that in the nonsequential stochastic control systems), circular cause and consequence coexist, it is essentially impossible to distinguish a delay from an advance.