Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3524938.3525801guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article
Free access

Distinguishing cause from effect using quantiles: bivariate quantile causal discovery

Published: 13 July 2020 Publication History

Abstract

Causal inference using observational data is challenging, especially in the bivariate case. Through the minimum description length principle, we link the postulate of independence between the generating mechanisms of the cause and of the effect given the cause to quantile regression. Based on this theory, we develop Bivariate Quantile Causal Discovery (bQCD), a new method to distinguish cause from effect assuming no confounding, selection bias or feedback. Because it uses multiple quantile levels instead of the conditional mean only, bQCD is adaptive not only to additive, but also to multiplicative or even location-scale generating mechanisms. To illustrate the effectiveness of our approach, we perform an extensive empirical comparison on both synthetic and real datasets. This study shows that bQCD is robust across different implementations of the method (i.e., the quantile regression), computationally efficient, and compares favorably to state-of-the-art methods.

Supplementary Material

Additional material (3524938.3525801_supp.pdf)
Supplemental material.

References

[1]
An Introduction to Causal Inference. The International Journal of Biostatistics, 2010.
[2]
Structural Intervention Distance for Evaluating Causal Graphs. Neural Computation, 27(3):771-799, mar 2015.
[3]
Aas, K., Czado, C., Frigessi, A., and Bakken, H. Paircopula constructions of multiple dependence. Insurance: Mathematics and economics, 44(2):182-198, 2009.
[4]
Aue, A., Cheung, R. C. Y., Lee, T. C. M., and Zhong, M. Segmented model selection in quantile regression using the minimum description length principle. Journal of the American Statistical Association, 109(109):507-1241, 2014.
[5]
Bedford, T., Cooke, R. M., et al. Vines-a new graphical model for dependent random variables. The Annals of Statistics, 30(4):1031-1068, 2002.
[6]
Blöbaum, P., Janzing, D., Washio, T., Shimizu, S., and Schölkopf, B. Cause-effect inference by comparing regression errors. In International Conference on Artificial Intelligence and Statistics, pp. 900-909, 2018.
[7]
Breiman, L. Random forests. Machine learning, 45(1): 5-32, 2001.
[8]
Budhathoki, K. and Vreeken, J. MDL for Causal Inference on Discrete Data.
[9]
Budhathoki, K. and Vreeken, J. Causal inference by compression. In ICDM, pp. 41-50, 2017.
[10]
Bühlmann, P., Peters, J., and Ernest, J. CAM: Causal additive models, high-dimensional order search and penalized regression. Annals of Statistics, 42(6):2526-2556, 2014.
[11]
Cannon, A. J. Non-crossing nonlinear regression quantiles by monotone composite quantile regression neural network, with application to rainfall extremes. Stochastic Environmental Research and Risk Assessment, pp. 3207- 3225, 2018. ISSN 1436-3259.
[12]
Chen, Z., Zhang, K., Chan, L., and Schölkopf, B. Causal discovery via reproducing kernel hilbert space embeddings. Neural Computation, 26(7):1484-1517, 2014.
[13]
Chickering, D. M. Optimal Structure Identification With Greedy Search. Journal of Machine Learning Research, 3:507-554, 2002.
[14]
Cui, R., Groot, P., and Heskes, T. Copula PC algorithm for causal discovery from mixed data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), volume 9852 LNAI, pp. 337-392. Springer, Cham, sep 2016.
[15]
Dawid, C. A. et al. Fundamentals of statistical causality. 2007.
[16]
Drton, M. and Maathuis, M. H. Structure Learning in Graphical Modeling. Annual Review of Statistics and Its Application, 4:365-393, 2017.
[17]
Embrechts, P., Mcneil, E., and Straumann, D. Correlation: Pitfalls and alternatives. Risk Magazine, 1999.
[18]
Ernest, J. Causal inference in semiparametric and nonparametric structural equation models ETH Library. PhD thesis, 2016.
[19]
Fisher, R. A. Statistical methods for research workers. Especially Section, 21, 1936.
[20]
Flaxman, S. R., Neill, D. B., and Smola, A. J. Gaussian Processes for Independence Tests with Non-iid Data in Causal Inference. ACM Transactions on Intelligent Systems and Technology (TIST), 7(2):21-22, 2016.
[21]
Freedman, D. and Diaconis, P. On the histogram as a density estimator:l2 theory. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 1981.
[22]
Geenens, G., Charpentier, A., and Paindaveine, D. Probit transformation for nonparametric kernel estimation of the copula density. Bernoulli, 23(3):1848-1873, 2017.
[23]
Geraci, M. and Bottai, M. Quantile regression for longitudinal data using the asymmetric laplace distribution. Biostatistics, 8(1):140-154, 2007.
[24]
Gneiting, T. Making and evaluating point forecasts. Journal of the American Statistical Association, 106(494):746- 762, jun 2011.
[25]
Goudet, O., Kalainathan, D., Caillou, P., Guyon, I., Lopez-Paz, D., and Sebag, M. Learning functional causal models with generative neural networks. In Explainable and Interpretable Models in Computer Vision and Machine Learning. Springer, 2018.
[26]
Grünwald, P. D. and Grunwald, A. The minimum description length principle. MIT press, 2007.
[27]
Guennebaud, G., Jacob, B., and Others. Eigen v3, 2010. URL http://eigen.tuxfamily.org.
[28]
Hall, P. and Hannan, E. J. On stochastic complexity and nonparametric density estimation. Biometrika, 75(4):705- 714, 1988.
[29]
Hansen, M. H. and Yu, B. Model selection and the principle of minimum description length. Journal of the American Statistical Association, 96(454):746-774, 2001.
[30]
Harrell Jr, F. E., with contributions from Charles Dupont, and others., M. Hmisc: Harrell Miscellaneous, 2017. URL https://cran.r-project.org/package=Hmisc.
[31]
Harris, N. and Drton, M. PC Algorithm for nonparanormal graphical models. Journal of Machine Learning Research, 14:3365-3383, 2013.
[32]
Heinze-Deml, C., Peters, J., and Meinshausen, N. Invariant causal prediction for nonlinear models. Journal of Causal Inference, 6(2), 2018.
[33]
Hernández-Lobato, D., Morales Mombiela, P., Lopez-Paz, D., and Suárez, A. Non-linear causal inference using Gaussianity measures. Journal of Machine Learning Research, 17(1):939-977, 2016.
[34]
Hernández-Orallo, J. and Dowe, D. L. Measuring universal intelligence: Towards an anytime intelligence test. Artificial Intelligence, 174(18):1508-1539, 2010.
[35]
Hoeffding, W. A Non-Parametric Test of Independence. The Annals of Mathematical Statistics, 19(4):546-557, 1948.
[36]
Hoyer, P. O., Janzing, D., Mooij, J., Peters, J., and Schölkopf, B. Nonlinear causal discovery with additive noise models. In NIPS 22, pp. 689-696, 2009.
[37]
Hyvärinen, A. and Smith, S. M. Pairwise Likelihood Ratios for Estimation of Non-Gaussian Structural Equation Models. Journal of Machine Learning Research, 14:111-152, 2013.
[38]
Janzing, D. and Schölkopf, B. Causal Inference using the algorithmic markov condition. IEEE Transactions on Information Theory, 56(10):5168-5194, 2010.
[39]
Janzing, D., Peters, J., Mooij, J., and Schölkopf, B. Identifying confounders using additive noise models.
[40]
Janzing, D., Mooij, J., Zhang, K., Lemeire, J., Zscheischler, J., Daniušis, P., Steudel, B., and Schölkopf, B. Information-geometric approach to inferring causal directions. Artificial Intelligence, 182:1-31, 2012.
[41]
Joe, H. Families of m-variate distributions with given margins and m(m-1)/2 bivariate dependence parameters. In Distributions with fixed marginals and related topics, pp. 120-141. Institute of Mathematical Statistics, 1996.
[42]
Kocaoglu, M., Dimakis, A. G., Vishwanath, S., and Hassibi, B. Entropic causal inference. In Thirty-First AAAI Conference on Artificial Intelligence, 2017.
[43]
Koenker, R. and Machado, J. A. Goodness of fit and related inference processes for quantile regression. Journal of the american statistical association, 94(448):1296-1310, 1999.
[44]
Koenker, Roger . Quantile regression. Econometric Society monographs, 2005.
[45]
Kolmogorov, A. N. On tables of random numbers. Sankhya: The Indian Journal of Statistics, Series A, pp. 369-376, 1963.
[46]
Kpotufe, S., Sgouritsa, E., Janzing, D., and Schölkopf, B. Consistency of causal inference under the additive noise model. In ICML 31, pp. 478-486, 2014.
[47]
Lichman, M. fUCIg Machine Learning Repository, 2013. URL http://archive.ics.uci.edu/ml.
[48]
Liu, F. and Chan, L.-W. Causal inference on multidimensional data using free probability theory. IEEE Transactions on Neural Networks and Learning Systems, 2017.
[49]
Liu, H., Lafferty, J., and Wasserman, L. The Nonparanormal: semiparametric estimation of high dimensional undirected graphs. Journal of Machine Learning Research, 10:2295-2328, 2009.
[50]
Loader, C. Local regression and likelihood. Springer Science & Business Media, 2006.
[51]
Lopez-Paz, D. From Dependence to Causation. PhD thesis, University of Cambridge, 2016.
[52]
Lopez-Paz, D., Hernandez-Lobato, J. M., and Schölkopf, B. Semi-supervised domain adaptation with copulas. NIPS 26, pp. 674-682, 2013.
[53]
Lopez-Paz, D., Muandet, K., Schölkopf, B., and Tolstikhin, I. Towards a learning theory of cause-effect inference. In ICML 32, pp. 1452-1461, 2015.
[54]
Maathuis, M. H. and Nandy, P. A review of some recent advances in causal inference. In Handbook of Big Data. CRC Press, 2016.
[55]
Marx, A. and Vreeken, J. Telling cause from effect using MDL-based local and global regression. In ICDM, 2017.
[56]
Marx, A. and Vreeken, J. Identifiability of cause and effect using regularized regression. In ACM SIGKDD, 2019.
[57]
Meinshausen, N. Quantile regression forests. JMLR, 2006.
[58]
Mitrovic, J., Sejdinovic, D., and Teh, Y.W. Causal inference via kernel deviance measures. In Advances in Neural Information Processing Systems, pp. 6986-6994, 2018.
[59]
Mooij, J., Janzing, D., Peters, J., and Schölkopf, B. Regression by dependence minimization and its application to causal inference in additive noise models. In ICML 26, pp. 745-752, 2009.
[60]
Mooij, J. M., Stegle, O., Janzing, D., Zhang, K., and Schölkopf, B. Probabilistic latent variable models for distinguishing between cause and effect. In NIPS 23, pp. 1687-1695, 2010.
[61]
Mooij, J. M., Peters, J., Janzing, D., Zscheischler, J., and Schölkopf, B. Distinguishing cause from effect using observational data: methods and benchmarks. Journal of Machine Learning Research, 17:1-102, 2016.
[62]
Müller, D. and Czado, C. Selection of sparse vine copulas in high dimensions with the lasso. may 2017.
[63]
Nagler, T. and Vatter, T. vinecopulib: High Performance Algorithms for Vine Copula Modeling in C++. http://vinecopulib.org, 2017.
[64]
Nagler, T. and Vatter, T. rvinecopulib: high performance algorithms for vine copula modeling, 2018. URL https://cran.r-project.org/package=rvinecopulib.
[65]
Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Series in Representation and Reasoning, 1988.
[66]
Pearl, J. Causality. Cambridge University Press, 2009.
[67]
Pearl, J., Glymour, M., and Jewell, N. P. Causal Inference in Statistics: A Primer. John Wiley & Sons, 2016.
[68]
Peters, J. and Ernest, J. CAM: Causal Additive Model (CAM), 2015. URL https://cran.r-project.org/package=CAM.
[69]
Peters, J., Mooij, J. M., Janzing, D., and Schölkopf, B. Identifiability of causal graphs using functional models. In UAI 27, pp. 589-598, 2011.
[70]
Peters, J., Mooij, J. M., Janzing, D., and Schölkopf, B. Causal discovery with continuous additive noise models. Journal of Machine Learning Research, 15(1):2009-2053, 2014.
[71]
Peters, J., Bühlmann, P., and Meinshausen, N. Causal inference by using invariant prediction: identification and confidence intervals. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 78(5):947-1012, 2016.
[72]
Peters, J., Janzing, D., and Schölkopf, B. Elements of Causal Inference: Foundations and Learning Algorithms. MIT Press (available on-line), 2017.
[73]
R Core Team. R: A language and environment for statistical computing, 2017. URL https://www.r-project.org/.
[74]
Rasmussen, C. E. and Williams, C. K. I. Gaussian Processes for Machine Learning, volume 1. MIT Press Cambridge, 2006.
[75]
Rissanen, J. Modeling by shortest data description. Automatica, 14(5):465-471, 1978.
[76]
Rissanen, J. A Universal Prior for Integers and Estimation by Minimum Description Length. The Annals of Statistics, 11(2):416-431, 1983.
[77]
Rissanen, J. Stochastic complexity and modeling. The annals of statistics, pp. 1080-1100, 1986.
[78]
Rissanen, J. Complexity and Information in Modeling. pp. 1-16, 2005.
[79]
Rissanen, J., Speed, T. P., and Yu, B. Density estimation by stochastic complexity. IEEE Transactions on Information Theory, 38(2):315-323, 1992.
[80]
Rojas-Carulla, M., Baroni, M., and Lopez-Paz, D. Causal Discovery Using Proxy Variables. 2017.
[81]
Sachs, K., Perez, O., Pe'er, D., Lauffenburger, D. A., and Nolan, G. P. Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data. Science, 308 (5721):523-529, apr 2005.
[82]
Scaillet, O., Charpentier, A., and Fermanian, J.-D. The estimation of copulas: Theory and practice. Technical report, Ensae-Crest and Katholieke Universiteit Leuven, NP-Paribas and Crest; HEC Genve and Swiss Finance Institute, 2007.
[83]
Schäling, B. The Boost C++ Libraries. 2011.
[84]
Schölkopf, B. Causality for machine learning. arXiv preprint arXiv:1911.10500, 2019.
[85]
Schölkopf, B., Janzing, D., Peters, J., Sgouritsa, E., Zhang, K., and Mooij, J. On causal and anticausal learning. In ICML 29, pp. 1255-1262, 2012.
[86]
Schölkopf, B., Hogg, D. W., Wang, D., Foreman-Mackey, D., Janzing, D., Simon-Gabriel, C.-J., and Peters, J. Modeling confounding by half-sibling regression. Proceedings of the National Academy of Sciences, 2016.
[87]
Sgouritsa, E., Janzing, D., Hennig, P., and Schölkopf, B. Inference of cause and effect with unsupervised inverse regression. In AISTATS 38, pp. 847-855, 2015.
[88]
Shimizu, S., Hoyer, P., Hyvärinen, Aapo, and Antti, K. A linear non-gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7:2003-2030, 2006.
[89]
Sklar, A. Fonctions de répartition à n dimensions et leurs marges. Publications de L'Institut de Statistique de L'Université de Paris, 8:229-231, 1959.
[90]
Smyth, G. K. Numerical integration. Encyclopedia of Biostatistics, pp. 3088-3095, 2005.
[91]
Spirtes, P. and Glymour, C. An algorithm for fast recovery of sparse causal graphs. Social science computer review, 9(1):62-72, 1991.
[92]
Spirtes, P. and Zhang, K. Causal discovery and inference: Concepts and recent methodological advances. Applied Informatics, pp. 165-191, 2016.
[93]
Spirtes, P., Glymour, C., and Scheines, R. Causation, Prediction, and Search. MIT press, Causation2000, 2000.
[94]
Tagasovska, N. and Lopez-Paz, D. Single-model uncertainties for deep learning. In NeurIPS, 2019.
[95]
Tagasovska, N., Ackerer, D., and Vatter, T. Copulas as high-dimensional generative models: Vine copula autoencoders. In Advances in Neural Information Processing Systems, pp. 6525-6537, 2019.
[96]
Tsamardinos, I., Brown, L. E., and Aliferis, C. F. The maxmin hill-climbing Bayesian network structure learning algorithm. Machine Learning, 65(1):31-78, oct 2006.
[97]
Turing, A. M. On computable numbers, with an application to the Entscheidungsproblem. A correction. Proceedings of the London Mathematical Society, 2(1):544-546, 1938.
[98]
Vereshchagin, N. and Vitanyi, P. Kolmogorov's Structure Functions and Model Selection. IEEE Transactions on Information Theory, 50(12):3265-3290, dec 2004.
[99]
Vereshchagin, N. K. and Vitányi, P. M. B. Kolmogorov 's structure functions and model selection. IEEE Transactions on Information Theory, 50(12):3265-3290, 2004.
[100]
Vreeken, J. Causal Inference by Direction of Information. SIAM, 2015. URL http://eda.mmci.uni-saarland.de/pubs/2015/ergo-vreeken.pdf.
[101]
Wasserman, L. All of nonparametric statistics. Springer Science & Business Media, 2006.
[102]
Yu, K., Lu, Z., and Stander, J. Quantile regression: applications and current research areas. Journal of the Royal Statistical Society: Series D (The Statistician), 52(3): 331-350, 2003.
[103]
Zhang, K. and Hyvärinen, A. On the identifiability of the post-nonlinear causal model. In UAI 25, pp. 647-655, 2009.
[104]
Zhang, K., Schölkopf, B., Spirtes, P., and Glymour, C. Special Topic: Machine Learning Learning Causality and Causality-Related Learning: Some Recent Progress Learning Causal Relations. National Science Review, pp. nwx137, 2017.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICML'20: Proceedings of the 37th International Conference on Machine Learning
July 2020
11702 pages

Publisher

JMLR.org

Publication History

Published: 13 July 2020

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 51
    Total Downloads
  • Downloads (Last 12 months)30
  • Downloads (Last 6 weeks)6
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media