Open AccessArticle

A Note on Causation versus Correlation in an Extreme Situation

X. San Liang

^1,*

and

Xiu-Qun Yang

Nanjing Institute of Meteorology, 219 Ningliu Blvd, Nanjing 210044, China

School of Atmospheric Sciences, Nanjing University, 163 Xianlin Avenue, Nanjing 210023, China

Author to whom correspondence should be addressed.

Entropy 2021, 23(3), 316; https://doi.org/10.3390/e23030316

Submission received: 10 February 2021 / Revised: 2 March 2021 / Accepted: 4 March 2021 / Published: 7 March 2021

Download Versions Notes

Abstract

Recently, it has been shown that the information flow and causality between two time series can be inferred in a rigorous and quantitative sense, and, besides, the resulting causality can be normalized. A corollary that follows is, in the linear limit, causation implies correlation, while correlation does not imply causation. Now suppose there is an event A taking a harmonic form (sine/cosine), and it generates through some process another event B so that B always lags A by a phase of

π / 2

. Here the causality is obviously seen, while by computation the correlation is, however, zero. This apparent contradiction is rooted in the fact that a harmonic system always leaves a single point on the Poincaré section; it does not add information. That is to say, though the absolute information flow from A to B is zero, i.e.,

T_{A \to B} = 0

, the total information increase of B is also zero, so the normalized

T_{A \to B}

, denoted as

τ_{A \to B}

, takes the form of

\frac{0}{0}

. By slightly perturbing the system with some noise, solving a stochastic differential equation, and letting the perturbation go to zero, it can be shown that

τ_{A \to B}

approaches 100%, just as one would have expected.

Keywords:

causality; time series; information flow; correlation

1. A Review of the Rigorous Information Flow-Based Causality Analysis

Causal inference is a fundamental problem in scientific research. Recently it has been shown that the problem can be recast into the framework of information flow, another fundamental notion in general physics which has wide applications in different disciplines (see [1]), and hence can be put on a rigorous footing. The causality between two time series can then be analyzed in a quantitative sense, and, besides, the resulting formula is very concise in form. In the linear limit, it involves only the usual statistics namely sample covariances [2], making the important and otherwise difficult problem an easy task.

To briefly review the theory, consider a two-dimensional continuous-time stochastic system for state variables

x = (x_{1}, x_{2})

\begin{matrix} \frac{d x}{d t} = F (x, t) + B (x, t) \dot{w}, \end{matrix}

(1)

where

F = (F_{1}, F_{2})

may be arbitrary nonlinear functions of

x

and t,

\dot{w}

is a vector of white noise, and

B = (b_{i j})

is the matrix of perturbation amplitudes which may also be any functions of

x

and t. Here we adopt the convention in physics and do not distinguish deterministic and random variables; in probability theory, they are ususally distinguished with capital and lower-case symbols. Assume that

F

and

B

are both differentiable with respect to

x

and t. Then the information flow from

x_{2}

x_{1}

(in nats per unit time) can be explicitly found in a closed form [3] (the multiple-dimensional case is referred to [1]):

\begin{matrix} T_{2 \to 1} = - E [\frac{1}{ρ_{1}} \frac{\partial (F_{1} ρ_{1})}{\partial x_{1}}] + \frac{1}{2} E [\frac{1}{ρ_{1}} \frac{\partial^{2} g_{11} ρ_{1}}{\partial x_{1}^{2}}], \end{matrix}

(2)

where E stands for mathematical expectation, and

g_{i i} = \sum_{k = 1}^{n} b_{i k} b_{i k}

ρ_{i} = ρ_{i} (x_{i})

is the marginal probability density function (pdf) of

x_{i}

. The rate of information flowing from

x_{1}

x_{2}

can be obtained by switching the indices. If

T_{j \to i} = 0

, then

x_{j}

is not causal to

x_{i}

; otherwise it is causal, and the absolute value measures the magnitude of the causality from

x_{j}

x_{i}

. For discrete-time mappings, the information flow is in much more complicated a form; see [1].

In the case with only two time series (no dynamical system is given)

X_{1}

and

X_{2}

, under the assumption of a linear model with additive noise, the maximum likelihood estimator (MLE) of the rate of information flowing from

X_{2}

X_{1}

is [2]

\begin{matrix} {\hat{T}}_{2 \to 1} = \frac{C_{11} C_{12} C_{2, d 1} - C_{12}^{2} C_{1, d 1}}{C_{11}^{2} C_{22} - C_{11} C_{12}^{2}}, \end{matrix}

(3)

where

C_{i j}

is the sample covariance between

X_{i}

and

X_{j}

, and

C_{i, d j}

the sample covariance between

X_{i}

and a series derived from

X_{j}

using the Euler forward differencing scheme (also see the Euler–Maruyama scheme in [4]):

{\dot{X}}_{j, n} = (X_{j, n + k} - X_{j, n}) / (k Δ t)

, with

k \geq 1

some integer. Note that Equation (3) is rather concise in form; it only involves the common statistics, i.e., sample covariances. In other words, a combination of some sample convariances will give a quantiative measure of the causality between the time series. This makes causality analysis, which otherwise would be complicated with the classical empirical/half-empirical methods, very easy. Nonetheless, note that Equation (3) cannot replace (1); it is just the mle of the latter. Statistical significance test must be performed before a causal inference is made based on the computed

T_{2 \to 1}

. For details, refer to [2].

Considering the long-standing debate ever since Berkeley (1710) [5] over correlation versus causation, we may rewrite (3) in terms of linear correlation coefficients, which immediately implies [2]:

Causation implies correlation, but correlation does not imply causation.

The above formalism has been validated with many benchmark systems (e.g., [1]) such as baker transformation, Hénon map, Kaplan-Yorke map, Rössler system, etc. It also has been successfully applied to the studies of many real world problems such those in financial economics (e.g., the “Seven Dwarfs vs. a Giant” problem [6]), earth system science (e.g., the Antarctica mass balance problem [7] and the global warming problem [8]), neuroscience (e.g., the concussion problem [9]), to name but a few.

2. The Question

Now suppose we have a dynamic event A which drives another event B. The former has a harmonic form, leading the latter by a phase of

π / 2

. That is to say, the time series, say,

{x_{A} (t)}

and

{x_{B} (t)}

resulting from the two, are in quadrature. Then the correlation between the two is zero. Here by zero-correlation we mean a zero integral

\begin{matrix} \int_{Ω} (x_{A} (t) - {\bar{x}}_{A}) (x_{B} (t) - {\bar{x}}_{B}) d t, \end{matrix}

with the integration domain

Ω

being one period or more periods, and the overbar being the mean over the domain. However, since A causes B, the result is apparently in contradiction to the above corollary that “causation implies correlation”.

3. The Solution

The problem can be more formally stated with the harmonic system:

\frac{d x}{d t} = F (x, t) = A x = [\begin{matrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{matrix}] [\begin{matrix} x_{1} \\ x_{2} \end{matrix}] = [\begin{matrix} 0 & - 1 \\ 1 & 0 \end{matrix}] [\begin{matrix} x_{1} \\ x_{2} \end{matrix}] .

(4)

If the system is initialized with

x_{1} (0) = 1

{\dot{x}}_{1} (0) = 1

, the solution is,

x_{1} = cos t

x_{2} = sin t

. Thus, the population covariance

σ_{12} = \int_{Ω} cos t sin t d t = 0

(

Ω

is one period or many periods). This yields an information flow from

x_{2}

x_{1}

\begin{matrix} T_{2 \to 1} = a_{12} \frac{σ_{12}}{σ_{11}} = 0 . \end{matrix}

(5)

Fundamentally the above problem arises from the fact that it is a deterministic system. In Granger causality test [10] (also see a recent reference [11]), this case has been explicitly excluded, as in such case the trajectories do not form appropriate ensembles in the sample space. For a harmonic series, it shows on a Poincaré section only one single point; so the total information does not accrue. If the total information does not change, the information flow to

x_{1}

must also vanish. However, the vanishing information flow does not mean that there is no influence of

x_{2}

x_{1}

. As we argued in Liang (2015), the so-obtained information must be normalized, just as covariance needs to be normalized into correlation, for one to assess the causal influence. Here if the normalizer is zero,

T_{2 \to 1}

involves the indeterminate form

\frac{0}{0}

. We may then approach it by taking the limit. Specifically, we may approach it by enlarging the sample space slightly, i.e., by adding some stochasticity to the system, then take the limit by letting the stochastic perturbation amplitude go zero.

By Liang (2015), the normalizer for

T_{2 \to 1}

\begin{matrix} Z_{2 \to 1} = | T_{2 \to 1} | + | \frac{d H_{1}^{*}}{d t} | + | \frac{d H_{1}^{noise}}{d t} |, \end{matrix}

(6)

where on the right hand side, the second term is the contribution from

x_{1}

itself, and the third term the contribution from noise. In Liang (2015), it has shown that

\frac{d H_{1}^{*}}{d t}

is a Lyapunov exponent-like, phase-space stretching rate, and

\frac{d H_{1}^{n o i s e}}{d t}

a noise-to-signal ratio. In this problem, we do not have noise taken into account. However, in reality, noise is ubiquitous. We may hence view a deterministic system as a limit or extreme case as the amplitude of stochastic perturbation goes to zero. For this case, we add to (4) a stochastic term:

\begin{matrix} \frac{d x}{d t} = A x + B \dot{w}, \end{matrix}

(7)

where

w

is a vector of standard Wiener processes. For simplicity, let the perturbation amplitude

B

a constant matrix. Further let

G = B B^{T}

, with elements

(g_{i j}) = \sum_{k = 1}^{2} b_{i k} b_{j k} .

Liang (2008) established that

\begin{matrix} \frac{d H_{1}^{*}}{d t} = a_{11} = 0, \end{matrix}

(8)

\begin{matrix} \frac{d H_{1}^{noise}}{d t} = \frac{1}{2} \frac{g_{11}}{σ_{11}} . \end{matrix}

(9)

So in this case, the normalized flow from

x_{2}

x_{1}

\begin{matrix} τ_{2 \to 1} = \frac{a_{12} \frac{σ_{12}}{σ_{11}}}{| a_{12} \frac{σ_{12}}{σ_{11}} | + 0 + | \frac{1}{2} \frac{g_{11}}{σ_{11}} |} = \frac{- σ_{12}}{| σ_{12} | + \frac{g_{11}}{2}} . \end{matrix}

(10)

Likewise,

\begin{matrix} τ_{1 \to 2} = \frac{a_{21} \frac{σ_{12}}{σ_{22}}}{| a_{21} \frac{σ_{12}}{σ_{22}} | + | a_{22} | + | \frac{1}{2} \frac{g_{22}}{σ_{22}} |} = \frac{+ σ_{12}}{| σ_{12} | + \frac{g_{22}}{2}} . \end{matrix}

(11)

Note that

τ_{2 \to 1}

(or

τ_{1 \to 2}

) may be positive or negative. In causal inference, this does not matter; we need only consider the absolute value, although the sign does carry a meaning according to the original formulation. (A positive

τ_{2 \to 1}

means

x_{2}

causes the marginal entropy of

x_{1}

to grow and vice versa; see [1,2].)

Now for the stochastic equation, the covariance matrix

Σ

evolves as

\begin{matrix} \frac{d Σ}{d t} = A Σ + Σ A^{T} + B B^{T} = A Σ + Σ A^{T} + G \end{matrix}

(12)

Expanding, this is

\frac{d}{d t} [\begin{matrix} σ_{11} & σ_{12} \\ σ_{12} & σ_{22} \end{matrix}] = [\begin{matrix} - σ_{12} & - σ_{22} \\ σ_{11} & σ_{12} \end{matrix}] + [\begin{matrix} - σ_{12} & σ_{11} \\ - σ_{22} & σ_{12} \end{matrix}] + [\begin{matrix} g_{11} & g_{12} \\ g_{12} & g_{22} \end{matrix}] .

We hence obtain the following equation set:

\begin{matrix} \frac{d σ_{11}}{d t} = - 2 σ_{12} + g_{11}, \\ \frac{d σ_{12}}{d t} = - σ_{22} + σ_{11} + g_{12}, \\ \frac{d σ_{22}}{d t} = 2 σ_{12} + g_{22} . \end{matrix}

Solving, we get

\begin{matrix} \frac{d^{2} σ_{12}}{d t^{2}} = - 4 σ_{12} + (g_{11} - g_{22} + g_{12}) . \end{matrix}

So the solution is

\begin{matrix} σ_{12} = C_{1} cos 2 t + C_{2} sin 2 t + \frac{1}{2} (g_{11} - g_{22} + g_{12}) t^{2} . \end{matrix}

σ_{12} (0) = 0

{\dot{σ}}_{12} (0) = 0

, then the integration constants

C_{1} = C_{2} = 0

. So

\begin{matrix} τ_{2 \to 1} = \frac{- σ_{12}}{| σ_{12} | + \frac{1}{2} g_{11}} = \frac{- 1}{1 + \frac{g_{11}}{(g_{11} - g_{22} + g_{12}) t^{2}}} \end{matrix}

Two cases are distinguished:

Case I: $g_{12} - g_{22} = const \neq 0$ .

$lim_{g_{11} \to 0} τ_{2 \to 1} = - 1 .$
Case II: $g_{12} - g_{22} = 0$ .

$lim_{g_{11} \to 0} τ_{2 \to 1} = \frac{- 1}{1 + 1 / t^{2}} .$

As t goes to infinity, $τ_{2 \to 1}$ also approaches $- 1$ .

If initially there exists some covariance, say,

σ_{12} (0) = c

, then

C_{1} = c

, and hence

\begin{matrix} τ_{2 \to 1} = \frac{- 1}{1 + \frac{g_{11}}{2 c cos 2 t + (g_{11} - g_{22} + g_{12}) t^{2}}} . \end{matrix}

In this case, as

g_{11} \to 0

, we always have

τ_{2 \to 1} \to - 1

. Either way, the relative information flow

τ_{2 \to 1}

approaches

- 1

in the limit of deterministic system.

In the other direction, we now need to consider the uncertainty growth of

x_{2}

and hence perturb

g_{22}

. Repeating the above procedure, when

σ_{12} (0) = {\dot{σ}}_{12} (0) = 0

, the normalized information flow is

\begin{matrix} τ_{1 \to 2} = \frac{+ σ_{12}}{| σ_{12} | + \frac{1}{2} g_{22}} = \frac{+ 1}{1 + \frac{g_{22}}{(g_{11} - g_{22} + g_{12}) t^{2}}} . \end{matrix}

g_{11} + g_{12} = const \neq 0

, then

\begin{matrix} lim_{g_{22} \to 0} τ_{1 \to 2} = + 1; \end{matrix}

else (

g_{11} + g_{12} = 0

)

\begin{matrix} lim_{g_{22} \to 0} τ_{1 \to 2} = \frac{1}{1 - 1 / t^{2}} \end{matrix}

which approaches to 1 for enough long time (

t \to \infty

). On the other hand, if initially there exists some covariance such that

σ_{12} (0) = c

then

\begin{matrix} τ_{12} = \frac{1}{1 + \frac{g_{22}}{2 c cos 2 t + (g_{11} - g_{22} + g_{12}) t^{2}}} \end{matrix}

which implies

\begin{matrix} lim_{g_{22} \to 0} τ_{1 \to 2} = 1 . \end{matrix}

This is indeed what we expect. So even for this extreme case, there is no contradiction at all for causal inference using information flow.

4. Discussion

To summarize, a recent rigorously formulated causality analysis asserts that, in the linear limit, causation implies correlation, while correlation does not necessarily mean causation. In this short note, an extreme case which apparently contradicts to the assertion is examined. In this case an event

x_{1}

takes a harmonic form (sine/cosine), and generates through some process another event

x_{2}

so that

x_{2}

is always out of phase with

x_{1}

, i.e., lag

x_{1}

π / 2

. Obviously

x_{1}

causes

x_{2}

, but by computation the correlation between

x_{1}

and

x_{2}

is zero. In this study we show that this is an extreme case, with only one point in the phase space and hence the problem becomes singular. We re-examine the problem by enlarging the ensemble space slightly through adding some noise. A stochastic differential equation is then solved for the corresponding covariances, which allows us to obtaint the information flows for the perturbed system. Then as the noisy perturbation goes to zero, the normalized information flow rate from

x_{1}

x_{2}

is established to be 100%, just as one would have expected. So actually no contradiction exists. (see [12] for how a stochastic differential equation is solved by perturbing it with noise [12].)

One thing that merits mentioning is that, here although it seems that

x_{1}

causes

x_{2}

, actually here the normalized information flow rate from

x_{2}

x_{1}

is also 100%. That is to say, for such a harmonic system with circular cause-effect relation, it is actually impossible to differentiate causality by simply assessing which takes place first; anyhow, taking lead by

π / 2

is equivalent to lagging by

3 π / 2

. The moral is, for a process that is nonsequential (e.g., that in the nonsequential stochastic control systems), circular cause and consequence coexist, it is essentially impossible to distinguish a delay from an advance.

Author Contributions

Conceptualization, X.S.L. and X.-Q.Y.; methodology, X.S.L.; formal analysis, X.S.L.; investigation, X.S.L. and X.-Q.Y.; validation, X.S.L. and X.-Q.Y.; writing, X.S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was partially supported by the National Science Foundation of China (NSFC) under Grant No. 41975064, and the 2015 Jiangsu Program for Innovation Research and Entrepreneurship Groups.

Conflicts of Interest

The authors declare no conflict of interest.

References

Liang, X.S. Information flow and causality as rigorous notions ab initio. Phys. Rev. E 2016, 94, 052201. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Liang, X.S. Unraveling the cause-effect relation between time series. Phy. Rev. E 2014, 90, 052150. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Liang, X.S. Information flow within stochastic dynamical systems. Phys. Rev. E 2008, 78, 031113. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Iacus, S.M. Simulation and Inference for Stochastic Differential Equations: With R Examples; Springer: New York, NY, USA, 2008. [Google Scholar]
Berkeley, G. A Treatise Concerning the Principles of Human Knowledge; originally published in 1710; Hackett Publishing Company, Inc.: Indianapolis, IN, USA, 1982. [Google Scholar]
Liang, X.S. Normalizing the causality between time series. Phys. Rev. E 2015, 92, 022126. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Vannitsem, S.; Dalaiden, Q.; Goosse, H. Testing for dynamical dependence—Application to the surface mass balance over Antarctica. Geophys. Res. Lett. 2019. [Google Scholar] [CrossRef]
Stips, A.; Macias, D.; Coughlan, C.; Garcia-Gorriz, E.; Liang, X.S. On the causal structure between CO₂ and global temperature. Sci. Rep. 2016, 6, 21691. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Hristopulos, D.T.; Babul, A.; Babul, S.; Brucar, L.R.; Virji-Babul, N. Disrupted information flow in resting-state in adoloscents with sports related concussion. Front. Hum. Neurosci. 2019, 13, 419. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Granger, C.W. Investigating causal relations by econometric models and cross-spectral methods. Econometrica 1969, 37, 424–438. [Google Scholar] [CrossRef]
Contreras-Reyes, J.E.; Hernández-Santoro, C. Assessing Granger-causality in the southern Humboldt current ecosystem using cross-spectral methods. Entropy 2020, 22, 1071. [Google Scholar] [CrossRef] [PubMed]
Argyris, J.; Andreadis, I.; Pavlos, G.; Athanasiou, M. The influence of noise on the correlation dimension of chaotic attractors. Chaos Solit. Fract. 1998, 9, 343–361. [Google Scholar] [CrossRef]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liang, X.S.; Yang, X.-Q. A Note on Causation versus Correlation in an Extreme Situation. Entropy 2021, 23, 316. https://doi.org/10.3390/e23030316

AMA Style

Liang XS, Yang X-Q. A Note on Causation versus Correlation in an Extreme Situation. Entropy. 2021; 23(3):316. https://doi.org/10.3390/e23030316

Chicago/Turabian Style

Liang, X. San, and Xiu-Qun Yang. 2021. "A Note on Causation versus Correlation in an Extreme Situation" Entropy 23, no. 3: 316. https://doi.org/10.3390/e23030316

APA Style

Liang, X. S., & Yang, X. -Q. (2021). A Note on Causation versus Correlation in an Extreme Situation. Entropy, 23(3), 316. https://doi.org/10.3390/e23030316

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Note on Causation versus Correlation in an Extreme Situation

Abstract

1. A Review of the Rigorous Information Flow-Based Causality Analysis

2. The Question

3. The Solution

4. Discussion

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI