Causal Discovery with Attention-Based Convolutional Neural Networks
:1. Introduction
- We present a new temporal causal discovery method (TCDF) that uses attention-based CNNs to discover causal relationships in time series data, to discover the time delay between each cause and effect, and to construct a temporal causal graph of causal relationships with delays.
- We evaluate TCDF and several other temporal causal discovery methods on two benchmarks: financial data describing stock returns, and FMRI data measuring brain blood flow.
2. Problem Statement
- The method should distinguish direct from indirect causes. Vertex is seen as an indirect cause of if and if there is a two-edge path (Figure 2a). Pairwise methods, i.e., methods that only find causal relationships between two variables, are often unable to make this distinction [10]. In contrast, multivariate methods take all variables into account to distinguish between direct and indirect causality [11].
- The method should learn instantaneous causal effects, where the delay between cause and effect is 0 time steps. Neglecting instantaneous influences can lead to misleading interpretations [13]. In practice, instantaneous effects mostly occur when cause and effect refer to the same time step that cannot be causally ordered a priori, because of a too coarse time scale.
- The presence of a confounder, a common cause of at least two variables, is a well-known challenge for causal discovery methods (Figure 2b). Although confounders are quite common in real-world situations, they complicate causal discovery since the confounder’s effects ( and in Figure 2b) are correlated, but are not causally related. Especially when the delays between the confounder and its effects are not equal, one should be careful to not incorrectly include a causal relationship between the confounder’s effects (the grey edge in Figure 2b).
- A particular challenge occurs when a confounder is not observed (a hidden (or latent) confounder). Although it might not even be known how many hidden confounders exist, it is important that a causal discovery method can hypothesise the existence of a hidden confounder to prevent learning an incorrect causal relation between its effects.
3. Related Work
3.1. Temporal Causal Discovery
3.2. Deep Learning for Non-Temporal Causal Discovery
3.3. Time Series Prediction
3.4. Attention Mechanism in Neural Networks
4. TCDF—Temporal Causal Discovery Framework
4.1. The Architecture for Time Series Prediction
4.1.1. Dilations
4.1.2. Adaption for Discovering Self-Causation
4.1.3. Adaption for Multivariate Causal Discovery
4.1.4. The Attention Mechanism
4.1.5. Residual Connections
4.2. Attention Interpretation
- We require that , since all scores are initialized at 1 and a score will only be increased through backpropagation if the network attends to that time series.
- Since a temporal causal graph is usually sparse, we require that the gap selected for lies in the first half of (if ) to ensure that the algorithm does not include low attention scores in the selection. At most 50% of the input time series can be a potential cause of target . By this requirement, we limit the number of time series labeled as potential causes. Although this number can be configured, we experimentally estimated that 50% gives good results.
- We require that the gap for cannot be in first position (i.e., between the highest and second-highest attention score). This ensures that the algorithm does not truncate to zero the scores for time series which were actually a cause of the target time series, but were weaker than the top scorer. Thus, the potential causes for target will include at least two time series.
- and : is not correlated with and vice versa.
- and : is added to since is a potential cause of because of:
- (a)
- (In)direct causal relation from to , or
- (b)
- Presence of a (hidden) confounder between and where the delay from the confounder to is smaller than the delay to .
- and : is added to since is a potential cause of because of:
- (a)
- (In)direct causal relation from to , or
- (b)
- Presence of a (hidden) confounder between and where the delay from the confounder to is smaller than the delay to .
- and : is added to and is added to because of:
- (a)
- Presence of a 2-cycle where causes and causes , or
- (b)
- Presence of a (hidden) confounder with equal delays to its effects and .
4.3. Causal Validation
- Temporal precedence: the cause precedes its effect,
- Physical influence: manipulation of the cause changes its effect.
4.3.1. Permutation Importance Validation Method
4.3.2. Dealing with Hidden Confounders
4.4. Delay Discovery
5. Experiments
5.1. Data Sets
5.2. Experimental Setup
5.3. Evaluation Measures
5.4. Results
5.4.1. Overall Performance
5.4.2. Impact of the Causal Validation
5.4.3. Case Study: Detection of Hidden Confounders
5.5. Summary
6. Discussion
6.1. Hyperparameters
6.2. Limitations of Experiments
7. Summary and Future Work
Author Contributions
Conflicts of Interest
#datasets | 9 | 27 | 6 |
#non-stationary datasets | 0 | 1 | 0 |
#variables (time series) | 25 | {5, 10} | |
#causal relationships | |||
time series length | 4000 | 50–5000 (mean: 774) | 1000–5000 (mean: 2867) |
delays [timesteps] | 1–3 | n.a. | n.a. |
self-causation | ✓ | ✓ | ✓ |
confounders | ✓ | ✓ | ✓ |
type of relationship | linear | non-linear | non-linear |
TCDF () | 0.38 ± 0.09 | 0.84 ± 0.38 | 0.71 ± 0.05 |
TCDF () | 0.38 ± 0.10 | 1.06 ± 0.49 | 0.72 ± 0.07 |
TCDF () | 0.40 ± 0.10 | 1.13 ± 0.45 | 0.74 ± 0.08 |
FINANCE (9 Data Sets) | FMRI (27 Data Sets) | FMRI (6 Data Sets) | ||||
F1 | F1′ | F1 | F1′ | F1 | F1′ | |
TCDF () | 0.64 ± 0.06 | 0.77 ± 0.08 | 0.60 ± 0.09 | 0.63 ± 0.09 | 0.68 ± 0.05 | 0.68 ± 0.05 |
TCDF () | 0.65 ± 0.09 | 0.78 ± 0.10 | 0.58 ± 0.15 | 0.62 ± 0.14 | 0.65 ± 0.13 | 0.68 ± 0.11 |
TCDF () | 0.64 ± 0.09 | 0.77 ± 0.09 | 0.55 ± 0.13 | 0.63 ± 0.11 | 0.70 ± 0.09 | 0.73 ± 0.08 |
PCMCI | 0.55 ± 0.22 | 0.56 ± 0.22 | 0.63 ± 0.10 | 0.67 ± 0.11 | 0.67 ± 0.04 | 0.67 ± 0.04 |
tsFCI | 0.37 ± 0.11 | 0.37 ± 0.12 | 0.49 ± 0.22 | 0.49 ± 0.22 | 0.48 ± 0.28 | 0.48 ± 0.28 |
TiMINo | 0.13 ± 0.05 | 0.21 ± 0.10 | 0.23 ± 0.12 | 0.37 ± 0.14 | 0.23 ± 0.11 | 0.37 ± 0.15 |
TCDF () | PCMCI | tsFCI | TiMINo | |
FINANCE | 318 s | 10 s | 93 s | 499 s |
FMRI | 74 s | 1 s | 1 s | 14 s |
TCDF () | TCDF () | TCDF () | PCMCI | tsFCI | TiMINo | |
FINANCE | 97.79% ± 2.56 | 96.42% ± 3.68 | 95.49% ± 4.15 | 100.00% ± 0.00 | 98.77% ± 3.49 | n.a. |
FINANCE (9 Data Sets) | FMRI (27 Data Sets) | FMRI (6 Data Sets) | ||||
F1 | F1′ | F1 | F1′ | F1 | F1′ | |
TCDF () | 0.64 ± 0.06 | 0.77 ± 0.08 | 0.60 ± 0.09 | 0.63 ± 0.09 | 0.68 ± 0.05 | 0.68 ± 0.05 |
TCDF () w/o PIVM | 0.22 ± 0.09 | 0.30 ± 0.13 | 0.60 ± 0.09 | 0.63 ± 0.09 | 0.68 ± 0.05 | 0.68 ± 0.05 |
(PIVM) | −66% | −61% | 0% | 0% | 0% | 0% |
Dataset | Hidden Conf. | Effects | Equal Delays | Conf. Discovered | Learnt Causal Relationships |
20-1A | , | ✓ | ✓ | , | |
40-1-3 | , | ✓ | ✓ | , | |
40-1-3 | , | ✗ | ✗ | ||
40-1-3 | , | ✓ | ✓ | , | |
40-1-3 | , | ✗ | ✗ | - | |
40-1-3 | , | ✗ | ✗ | - | |
40-1-3 | , | ✗ | ✗ | - | |
40-1-3 | , | ✗ | ✗ | ||
40-1-3 | , | ✗ | ✗ | - |
# Incorrect Causal Relationships | 2 | 0 | 3 | 8 |
# Discovered Hidden Confounders | 3 | 0 | 0 | 0 |
