research-article

Causal inference using the algorithmic Markov condition

Authors:

Dominik Janzing,

Bernhard SchölkopfAuthors Info & Claims

IEEE Transactions on Information Theory, Volume 56, Issue 10

Pages 5168 - 5194

https://doi.org/10.1109/TIT.2010.2060095

Published: 01 October 2010 Publication History

Abstract

Inferring the causal structure that links n observables is usually based upon detecting statistical dependences and choosing simple graphs that make the joint measure Markovian. Here we argue why causal inference is also possible when the sample size is one. We develop a theory how to generate causal graphs explaining similarities between single objects. To this end, we replace the notion of conditional stochastic independence in the causal Markov condition with the vanishing of conditional algorithmic mutual information and describe the corresponding causal inference rules. We explain why a consistent reformulation of causal inference in terms of algorithmic complexity implies a new inference principle that takes into account also the complexity of conditional probability densities, making it possible to select among Markov equivalent causal graphs. This insight provides a theoretical foundation of a heuristic principle proposed in earlier work. We also sketch some ideas on how to replace Kolmogorov complexity with decidable complexity criteria. This can be seen as an algorithmic analog of replacing the empirically undecidable question of statistical independence with practical independence tests that are based on implicit or explicit assumptions on the underlying distribution.

References

[1]

J. Pearl, Causality. Cambridge, U.K.: Cambridge Univ. Press, 2000.

Digital Library

[2]

P. Spirtes, C. Glymour, and R. Scheines, Causation, Prediction, and Search, ser. Lecture Notes in Statistics. Berlin, Germany: Springer-Verlag.

[3]

S. Lauritzen, A. Dawid, B. Larsen, and H.-G. Leimer, "Independence properties of directed Markov fields," Networks, vol. 20, pp. 491-505, 1990.

[4]

T. Cover and J. Thomas, Elements of Information Theory, ser. Telecommunications. New York: Wiley, 1991.

Digital Library

[5]

S. Lauritzen, Graphical Models, ser. Oxford Statistical Science Series. New York: Clarendon, 1996.

[6]

C. Meek, "Strong completeness and faithfulness in Bayesian networks," in Proc. 11th Annu. Conf. Uncertainty Artif. Intell., Montreal, QC, Canada, 1995, pp. 411-418.

[7]

D. Heckerman, C. Meek, and G. Cooper, "A Bayesian approach to causal discovery," in Computation, Causation, and Discovery, C. Glymour and G. Cooper, Eds. Cambridge, MA: MIT Press, 1999, pp. 141-165.

[8]

R. Omnès, The Interpretation of Quantum Mechanics, ser. Physics. Princeton, NJ: Princeton Univ. Press, 1994.

[9]

J. Mooij and D. Janzing, "Distinguishing between cause and effect," J. Mach. Learn. Res., vol. 6, pp. 147-146, 2010, W&CP.

[10]

X. Sun, D. Janzing, and B. Schölkopf, "Causal inference by choosing graphs with most plausible Markov kernels," in Proc. 9th Int. Symp. Artif. Intell. Math., Fort Lauderdale, FL, 2006, pp. 1-11.

[11]

X. Sun, D. Janzing, and B. Schölkopf, "Causal reasoning by evaluating the complexity of conditional densities with kernel methods," Neurocomputing, vol. 71, pp. 1248-1256, 2008.

Digital Library

[12]

H. Reichenbach, The Philosophy of Space and Time. New York: Dover, 1958.

[13]

W. Salmon, Scientific Explanation and the Causal Structure of the World. Princeton, NJ: Princeton Univ. Press, 1984.

[14]

Y. Kano and S. Shimizu, "Causal inference using nonnormality," in Proc. Int. Symp. Sci. Model./30th Anniversary Inf. Criterion, Tokyo, Japan, 2003, pp. 261-270.

[15]

C.-H. Bennett, M. Li, and B. Ma, "Chain letters and evolutionary histories," Sci. Amer., vol. 288, no. 6, pp. 76-81, 2003.

[16]

D. Hofheinz, J. Müller-Quade, and R. Steinwandt, "On IND-CCA security modeling in cryptographic protocols," Tatra Mt. Meth. Publ., vol. 33, pp. 83-97, 2006.

[17]

R. Solomonoff, "A preliminary report on a general theory of inductive inference," ZTB-138 Zator Co., Tech. Rep. V-131, 1960.

[18]

R. Solomonoff, "A formal theory of inductive inference," Inf. Control II, vol. 7, no. 2, pp. 224-254, 1964.

[19]

A. Kolmogorov, "Three approaches to the quantitative definition of information," Probl. Inf. Transm., vol. 1, no. 1, pp. 1-7, 1965.

[20]

G. Chaitin, "On the length of programs for computing finite binary sequences," J. Assoc. Comput. Mach., vol. 13, pp. 547-569, 1966.

Digital Library

[21]

G. Chaitin, "A theory of program size formally identical to information theory," J. Assoc. Comput. Mach., vol. 22, pp. 329-340, 1975.

Digital Library

[22]

M. Li and P. Vitányi, An Introduction to Kolmogorov Complexity and Its Applications. New York: Springer-Verlag, 1997.

Digital Library

[23]

P. Gacs, J. Tromp, and P. Vitáenyi, "Algorithmic statistics," IEEE Trans. Inf. Theory, vol. 47, no. 6, pp. 2443-2463, Sep. 2001.

Digital Library

[24]

G. Chaitin, "Toward a mathematical definition of life," in The Maximum Entropy Formalism, R. Levine and M. Tribus, Eds. Cambridge, MA: MIT Press, 1997, pp. 477-498.

[25]

P. Grünwald and P. Vitányi, "Kolmogorov complexity and information theory. With an interpretation in terms of questions and answers," J. Logic Lang. Inf., vol. 12, no. 4, pp. 497-529, 2003.

[26]

C.-H. Bennett, P. Gács, M. Li, P. Vitányi, and W. Zurek, "Information distance," IEEE Trans. Inf. Theory, vol. 44, no. 4, pp. 1407-1423, Jul. 1998.

Digital Library

[27]

M. Li, X. Chen, X. Li, B. Ma, and P. Vitányi, "The similarity metric," IEEE Trans. Inf. Theory, vol. 50, no. 12, pp. 3250-3264, Dec. 2004.

Digital Library

[28]

M. Mahmud, "On universal transfer learning," Theor. Comput. Sci., vol. 410, no. 19, pp. 1826-1846, 2009.

[29]

A. Milosavljevic and J. Jurka, "Discovery by minimal length encoding: A case study in molecular evolution," Mach. Learn., vol. 12, pp. 69-87, 1993.

[30]

P. Hanus, Z. Dawy, J. Hagenauer, and J. Mueller, "DNA classification using mutual information based compression distance measures," in Proc. 14th Int. Conf. Med. Phys., Nürnberg, Gennany, 2005, vol. 50, pp. 1434-1435.

[31]

A. Brudno, "Entropy and the complexity of the trajectories of a dynamical system," Trans. Moscow Math. Soc., vol. 2, pp. 127-151, 1983.

[32]

H. Reichenbach, The Direction of Time. Berkeley, CA: Univ. California Press, 1956.

[33]

M. Hutter, "On universal prediction and Bayesian confirmation," Theor. Comput. Sci., vol. 384, no. 1, pp. 33-48, 2007.

[34]

D. Deutsch, "Quantum theory, the Church-Turing principle and the universal quantum computer," Proc. R. Soc. A, vol. 400, pp. 97-117, 1985.

[35]

M. Nielsen and I. Chuang, Quantum Computation and Quantum Information. Cambridge, U.K.: Cambridge Univ. Press, 2000.

Digital Library

[36]

D. Janzing and T. Beth, "On the potential influence of quantum noise on measuring effectiveness of drugs in clinical trials," Int. J. Quant. Inf., vol. 4, no. 2, pp. 347-364, 2006.

[37]

J. Lemeire and E. Dirkx, "Causal models as minimal descriptions of multivariate systems," 2006 {Online}. Available: http://parallel. vub.ac.be/~jan/

[38]

J. Lemeire and K. Steenhaut, "Inference of graphical causal models: Representing the meaningful information of probability distributions," J. Mach. Learn. Res., vol. 6, pp. 107-120, 2010, W&CP.

[39]

P. Grünwald, The Minimum Description Length Principle. Cambridge, MA: MIT Press, 2007.

[40]

A. Barron and T. Cover, "Minimum complexity density estimation," IEEE Trans. Inf. Theory, vol. 37, no. 4, pp. 1034-1054, Jul. 1991.

Digital Library

[41]

J. Peters, D. Janzing, A. Gretton, and B. Schölkopf, "Detecting the direction of causal time series," in Proc. Int. Conf. Mach. Learn., New York, 2009, vol. 382, pp. 801-808 {Online}. Available: http://www.cs. mcgill.ca/icml2009/papers/503.pdf

[42]

D. Janzing, "On the entropy production of time series with unidirectional linearity," J. Stat. Phys., vol. 138, pp. 767-779, 2010.

[43]

D. Janzing, "On causally asymmetric versions of Occam's Razor and their relation to thermodynamics," 2008 {Online}. Available: http://arxiv.org/abs/0708.3411v2

[44]

H. Cramér, Mathematical Methods of Statistics. Princeton, NJ: Princeton Univ. Press, 1946.

[45]

D. Janzing and T. Beth, "Quasi-order of clocks and their synchronism and quantum bounds for copying timing information," IEEE Trans. Inf. Theory, vol. 49, no. 1, pp. 230-240, Jan. 2003.

Digital Library

[46]

J. Vaccaro, F. Anselmi, H. Wiseman, and K. Jacobs, "Complementarity between extractable mechanical work, accessible entanglement, and ability to act as a reference frame, under arbitrary superselection rules," {Online}. Available: http://arxiv.org/abs/quant-ph/0501121

[47]

D. Janzing, "Quantum thermodynamics with missing reference frames: Decompositions of free energy into non-increasing components," J. Stat. Phys., vol. 125, no. 3, pp. 757-772, 2006.

[48]

A. Gretton, R. Herbrich, A. Smola, O. Bousquet, and B. Schölkopf, "Kernel methods for measuring independence," J. Mach. Learn. Res., vol. 6, pp. 2075-2129, 2005.

Digital Library

[49]

S. Grumbach and F. Tahi, "A new challenge for compression algorithms: Genetic sequences," Inf. Process. Manage., vol. 30, no. 6, pp. 875-886, 1994.

Digital Library

[50]

X. Chen, X. Kwong, and M. Li, "A compression algorithm for DNA sequences and its applications in genome comparison," in Proc. 4th Annu. Int. Conf. Comput. Molecular Bioi., Tokyo, Japan, 2000, p. 107.

[51]

C.-H. Bennett, "How to define complexity in physics and why," in Complexity, Entropy, and the Physics of Information. Volume VIII of Santa Fee Studies of Complexity, W. Zurek, Ed. Reading, MA: Addison-Wesley, 1990, pp. 137-148.

[52]

C.-H. Bennett, "On the nature and origin of complexity in discrete, homogeneous, locally-interacting systems," Found. Phys., vol. 16, no. 6, pp. 585-592. 1986.

[53]

C.-H. Bennett, "Logical depth and physical complexity," in The Universal Turing Machine - a Half-Century Survey, R. Herken, Ed. Oxford, U.K.: Oxford Univ. Press, 1988, pp. 227-257.

Cited By

Mian OMameche SVreeken JBaeza-Yates RBonchi F(2024)Learning Causal Networks from Episodic DataProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671999(2224-2235)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671999
Huang PXie HHuang HShuai HCheng W(2024)CA-FER: Mitigating Spurious Correlation With Counterfactual Attention in Facial Expression RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2023.331276815:3(977-989)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1109/TAFFC.2023.3312768
Yang MFang ZZhang YDu YLiu FTon JWang JWang JOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Invariant learning via probability of sufficient and necessary causesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669618(79832-79857)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669618
Show More Cited By

Causal inference using the algorithmic Markov condition
1. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic representations

Recommendations

Causal discovery with hidden confounders using the algorithmic Markov condition
UAI '23: Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence

Causal sufficiency is a cornerstone assumption in causal discovery. It is, however, both unlikely to hold in practice as well as unverifiable. When it does not hold, existing methods struggle to return meaningful results. In this paper, we show how to ...
Causal inference and causal explanation with background knowledge
UAI'95: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

This paper presents correct algorithms for answering the following two questions; (i) Does there exist a causal explanation consistent with a set of background knowledge which explains all of the observed independence facts in a sample? (ii) Given that ...
Inference in multi-agent causal models

In this article, we demonstrate the usefulness of causal Bayesian networks as probabilistic reasoning systems. The biggest advantage of causal Bayesian networks over traditional probabilistic Bayesian networks is that they sometimes allow to perform ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Information Theory

IEEE Transactions on Information Theory Volume 56, Issue 10

October 2010

610 pages

ISSN:0018-9448

Issue’s Table of Contents

Copyright © 2010.

Publisher

IEEE Press

Publication History

Published: 01 October 2010

Revised: 13 January 2010

Received: 23 April 2008

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

59
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mian OMameche SVreeken JBaeza-Yates RBonchi F(2024)Learning Causal Networks from Episodic DataProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671999(2224-2235)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671999
Huang PXie HHuang HShuai HCheng W(2024)CA-FER: Mitigating Spurious Correlation With Counterfactual Attention in Facial Expression RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2023.331276815:3(977-989)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1109/TAFFC.2023.3312768
Yang MFang ZZhang YDu YLiu FTon JWang JWang JOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Invariant learning via probability of sufficient and necessary causesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669618(79832-79857)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669618
Mameche SKaltenpoth DVreeken JOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Learning causal models under independent changesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669426(75595-75622)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669426
Karlsson RKrijthe JOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Detecting hidden confounding in observational data using multiple environmentsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668038(44280-44309)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668038
Guo STóth VSchölkopf BHuszár FOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Causal de finettiProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667705(36463-36475)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667705
Park JBuchholz SSchölkopf BMuandet KOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)A measure-theoretic axiomatisation of causalityProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667361(28510-28540)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667361
Burauel P(2023)Evaluating instrument validity using the principle of independent mechanismsThe Journal of Machine Learning Research10.5555/3648699.364887524:1(8339-8394)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.5555/3648699.3648875
Kaltenpoth DVreeken JEvans RShpitser I(2023)Causal discovery with hidden confounders using the algorithmic Markov conditionProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625930(1016-1026)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.5555/3625834.3625930
Kaltenpoth DVreeken JKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Nonlinear causal discovery with latent confoundersProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619046(15639-15654)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619046
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents