Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Causal inference using the algorithmic Markov condition

Published: 01 October 2010 Publication History

Abstract

Inferring the causal structure that links n observables is usually based upon detecting statistical dependences and choosing simple graphs that make the joint measure Markovian. Here we argue why causal inference is also possible when the sample size is one. We develop a theory how to generate causal graphs explaining similarities between single objects. To this end, we replace the notion of conditional stochastic independence in the causal Markov condition with the vanishing of conditional algorithmic mutual information and describe the corresponding causal inference rules. We explain why a consistent reformulation of causal inference in terms of algorithmic complexity implies a new inference principle that takes into account also the complexity of conditional probability densities, making it possible to select among Markov equivalent causal graphs. This insight provides a theoretical foundation of a heuristic principle proposed in earlier work. We also sketch some ideas on how to replace Kolmogorov complexity with decidable complexity criteria. This can be seen as an algorithmic analog of replacing the empirically undecidable question of statistical independence with practical independence tests that are based on implicit or explicit assumptions on the underlying distribution.

References

[1]
J. Pearl, Causality. Cambridge, U.K.: Cambridge Univ. Press, 2000.
[2]
P. Spirtes, C. Glymour, and R. Scheines, Causation, Prediction, and Search, ser. Lecture Notes in Statistics. Berlin, Germany: Springer-Verlag.
[3]
S. Lauritzen, A. Dawid, B. Larsen, and H.-G. Leimer, "Independence properties of directed Markov fields," Networks, vol. 20, pp. 491-505, 1990.
[4]
T. Cover and J. Thomas, Elements of Information Theory, ser. Telecommunications. New York: Wiley, 1991.
[5]
S. Lauritzen, Graphical Models, ser. Oxford Statistical Science Series. New York: Clarendon, 1996.
[6]
C. Meek, "Strong completeness and faithfulness in Bayesian networks," in Proc. 11th Annu. Conf. Uncertainty Artif. Intell., Montreal, QC, Canada, 1995, pp. 411-418.
[7]
D. Heckerman, C. Meek, and G. Cooper, "A Bayesian approach to causal discovery," in Computation, Causation, and Discovery, C. Glymour and G. Cooper, Eds. Cambridge, MA: MIT Press, 1999, pp. 141-165.
[8]
R. Omnès, The Interpretation of Quantum Mechanics, ser. Physics. Princeton, NJ: Princeton Univ. Press, 1994.
[9]
J. Mooij and D. Janzing, "Distinguishing between cause and effect," J. Mach. Learn. Res., vol. 6, pp. 147-146, 2010, W&CP.
[10]
X. Sun, D. Janzing, and B. Schölkopf, "Causal inference by choosing graphs with most plausible Markov kernels," in Proc. 9th Int. Symp. Artif. Intell. Math., Fort Lauderdale, FL, 2006, pp. 1-11.
[11]
X. Sun, D. Janzing, and B. Schölkopf, "Causal reasoning by evaluating the complexity of conditional densities with kernel methods," Neurocomputing, vol. 71, pp. 1248-1256, 2008.
[12]
H. Reichenbach, The Philosophy of Space and Time. New York: Dover, 1958.
[13]
W. Salmon, Scientific Explanation and the Causal Structure of the World. Princeton, NJ: Princeton Univ. Press, 1984.
[14]
Y. Kano and S. Shimizu, "Causal inference using nonnormality," in Proc. Int. Symp. Sci. Model./30th Anniversary Inf. Criterion, Tokyo, Japan, 2003, pp. 261-270.
[15]
C.-H. Bennett, M. Li, and B. Ma, "Chain letters and evolutionary histories," Sci. Amer., vol. 288, no. 6, pp. 76-81, 2003.
[16]
D. Hofheinz, J. Müller-Quade, and R. Steinwandt, "On IND-CCA security modeling in cryptographic protocols," Tatra Mt. Meth. Publ., vol. 33, pp. 83-97, 2006.
[17]
R. Solomonoff, "A preliminary report on a general theory of inductive inference," ZTB-138 Zator Co., Tech. Rep. V-131, 1960.
[18]
R. Solomonoff, "A formal theory of inductive inference," Inf. Control II, vol. 7, no. 2, pp. 224-254, 1964.
[19]
A. Kolmogorov, "Three approaches to the quantitative definition of information," Probl. Inf. Transm., vol. 1, no. 1, pp. 1-7, 1965.
[20]
G. Chaitin, "On the length of programs for computing finite binary sequences," J. Assoc. Comput. Mach., vol. 13, pp. 547-569, 1966.
[21]
G. Chaitin, "A theory of program size formally identical to information theory," J. Assoc. Comput. Mach., vol. 22, pp. 329-340, 1975.
[22]
M. Li and P. Vitányi, An Introduction to Kolmogorov Complexity and Its Applications. New York: Springer-Verlag, 1997.
[23]
P. Gacs, J. Tromp, and P. Vitáenyi, "Algorithmic statistics," IEEE Trans. Inf. Theory, vol. 47, no. 6, pp. 2443-2463, Sep. 2001.
[24]
G. Chaitin, "Toward a mathematical definition of life," in The Maximum Entropy Formalism, R. Levine and M. Tribus, Eds. Cambridge, MA: MIT Press, 1997, pp. 477-498.
[25]
P. Grünwald and P. Vitányi, "Kolmogorov complexity and information theory. With an interpretation in terms of questions and answers," J. Logic Lang. Inf., vol. 12, no. 4, pp. 497-529, 2003.
[26]
C.-H. Bennett, P. Gács, M. Li, P. Vitányi, and W. Zurek, "Information distance," IEEE Trans. Inf. Theory, vol. 44, no. 4, pp. 1407-1423, Jul. 1998.
[27]
M. Li, X. Chen, X. Li, B. Ma, and P. Vitányi, "The similarity metric," IEEE Trans. Inf. Theory, vol. 50, no. 12, pp. 3250-3264, Dec. 2004.
[28]
M. Mahmud, "On universal transfer learning," Theor. Comput. Sci., vol. 410, no. 19, pp. 1826-1846, 2009.
[29]
A. Milosavljevic and J. Jurka, "Discovery by minimal length encoding: A case study in molecular evolution," Mach. Learn., vol. 12, pp. 69-87, 1993.
[30]
P. Hanus, Z. Dawy, J. Hagenauer, and J. Mueller, "DNA classification using mutual information based compression distance measures," in Proc. 14th Int. Conf. Med. Phys., Nürnberg, Gennany, 2005, vol. 50, pp. 1434-1435.
[31]
A. Brudno, "Entropy and the complexity of the trajectories of a dynamical system," Trans. Moscow Math. Soc., vol. 2, pp. 127-151, 1983.
[32]
H. Reichenbach, The Direction of Time. Berkeley, CA: Univ. California Press, 1956.
[33]
M. Hutter, "On universal prediction and Bayesian confirmation," Theor. Comput. Sci., vol. 384, no. 1, pp. 33-48, 2007.
[34]
D. Deutsch, "Quantum theory, the Church-Turing principle and the universal quantum computer," Proc. R. Soc. A, vol. 400, pp. 97-117, 1985.
[35]
M. Nielsen and I. Chuang, Quantum Computation and Quantum Information. Cambridge, U.K.: Cambridge Univ. Press, 2000.
[36]
D. Janzing and T. Beth, "On the potential influence of quantum noise on measuring effectiveness of drugs in clinical trials," Int. J. Quant. Inf., vol. 4, no. 2, pp. 347-364, 2006.
[37]
J. Lemeire and E. Dirkx, "Causal models as minimal descriptions of multivariate systems," 2006 {Online}. Available: http://parallel. vub.ac.be/~jan/
[38]
J. Lemeire and K. Steenhaut, "Inference of graphical causal models: Representing the meaningful information of probability distributions," J. Mach. Learn. Res., vol. 6, pp. 107-120, 2010, W&CP.
[39]
P. Grünwald, The Minimum Description Length Principle. Cambridge, MA: MIT Press, 2007.
[40]
A. Barron and T. Cover, "Minimum complexity density estimation," IEEE Trans. Inf. Theory, vol. 37, no. 4, pp. 1034-1054, Jul. 1991.
[41]
J. Peters, D. Janzing, A. Gretton, and B. Schölkopf, "Detecting the direction of causal time series," in Proc. Int. Conf. Mach. Learn., New York, 2009, vol. 382, pp. 801-808 {Online}. Available: http://www.cs. mcgill.ca/icml2009/papers/503.pdf
[42]
D. Janzing, "On the entropy production of time series with unidirectional linearity," J. Stat. Phys., vol. 138, pp. 767-779, 2010.
[43]
D. Janzing, "On causally asymmetric versions of Occam's Razor and their relation to thermodynamics," 2008 {Online}. Available: http://arxiv.org/abs/0708.3411v2
[44]
H. Cramér, Mathematical Methods of Statistics. Princeton, NJ: Princeton Univ. Press, 1946.
[45]
D. Janzing and T. Beth, "Quasi-order of clocks and their synchronism and quantum bounds for copying timing information," IEEE Trans. Inf. Theory, vol. 49, no. 1, pp. 230-240, Jan. 2003.
[46]
J. Vaccaro, F. Anselmi, H. Wiseman, and K. Jacobs, "Complementarity between extractable mechanical work, accessible entanglement, and ability to act as a reference frame, under arbitrary superselection rules," {Online}. Available: http://arxiv.org/abs/quant-ph/0501121
[47]
D. Janzing, "Quantum thermodynamics with missing reference frames: Decompositions of free energy into non-increasing components," J. Stat. Phys., vol. 125, no. 3, pp. 757-772, 2006.
[48]
A. Gretton, R. Herbrich, A. Smola, O. Bousquet, and B. Schölkopf, "Kernel methods for measuring independence," J. Mach. Learn. Res., vol. 6, pp. 2075-2129, 2005.
[49]
S. Grumbach and F. Tahi, "A new challenge for compression algorithms: Genetic sequences," Inf. Process. Manage., vol. 30, no. 6, pp. 875-886, 1994.
[50]
X. Chen, X. Kwong, and M. Li, "A compression algorithm for DNA sequences and its applications in genome comparison," in Proc. 4th Annu. Int. Conf. Comput. Molecular Bioi., Tokyo, Japan, 2000, p. 107.
[51]
C.-H. Bennett, "How to define complexity in physics and why," in Complexity, Entropy, and the Physics of Information. Volume VIII of Santa Fee Studies of Complexity, W. Zurek, Ed. Reading, MA: Addison-Wesley, 1990, pp. 137-148.
[52]
C.-H. Bennett, "On the nature and origin of complexity in discrete, homogeneous, locally-interacting systems," Found. Phys., vol. 16, no. 6, pp. 585-592. 1986.
[53]
C.-H. Bennett, "Logical depth and physical complexity," in The Universal Turing Machine - a Half-Century Survey, R. Herken, Ed. Oxford, U.K.: Oxford Univ. Press, 1988, pp. 227-257.

Cited By

View all
  • (2024)Learning Causal Networks from Episodic DataProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671999(2224-2235)Online publication date: 25-Aug-2024
  • (2024)CA-FER: Mitigating Spurious Correlation With Counterfactual Attention in Facial Expression RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2023.331276815:3(977-989)Online publication date: 1-Jul-2024
  • (2023)Invariant learning via probability of sufficient and necessary causesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669618(79832-79857)Online publication date: 10-Dec-2023
  • Show More Cited By
  1. Causal inference using the algorithmic Markov condition

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Information Theory
    IEEE Transactions on Information Theory  Volume 56, Issue 10
    October 2010
    610 pages

    Publisher

    IEEE Press

    Publication History

    Published: 01 October 2010
    Revised: 13 January 2010
    Received: 23 April 2008

    Author Tags

    1. Algorithmic information
    2. Church–Turing thesis
    3. algorithmic information
    4. church-turing thesis
    5. data compression
    6. graphical models
    7. probability-free causal inference

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 22 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Learning Causal Networks from Episodic DataProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671999(2224-2235)Online publication date: 25-Aug-2024
    • (2024)CA-FER: Mitigating Spurious Correlation With Counterfactual Attention in Facial Expression RecognitionIEEE Transactions on Affective Computing10.1109/TAFFC.2023.331276815:3(977-989)Online publication date: 1-Jul-2024
    • (2023)Invariant learning via probability of sufficient and necessary causesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669618(79832-79857)Online publication date: 10-Dec-2023
    • (2023)Learning causal models under independent changesProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669426(75595-75622)Online publication date: 10-Dec-2023
    • (2023)Detecting hidden confounding in observational data using multiple environmentsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668038(44280-44309)Online publication date: 10-Dec-2023
    • (2023)Causal de finettiProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667705(36463-36475)Online publication date: 10-Dec-2023
    • (2023)A measure-theoretic axiomatisation of causalityProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667361(28510-28540)Online publication date: 10-Dec-2023
    • (2023)Evaluating instrument validity using the principle of independent mechanismsThe Journal of Machine Learning Research10.5555/3648699.364887524:1(8339-8394)Online publication date: 1-Jan-2023
    • (2023)Causal discovery with hidden confounders using the algorithmic Markov conditionProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625930(1016-1026)Online publication date: 31-Jul-2023
    • (2023)Nonlinear causal discovery with latent confoundersProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619046(15639-15654)Online publication date: 23-Jul-2023
    • Show More Cited By

    View Options

    View options

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media