Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1553374.1553470acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
research-article

Regression by dependence minimization and its application to causal inference in additive noise models

Published: 14 June 2009 Publication History

Abstract

Motivated by causal inference problems, we propose a novel method for regression that minimizes the statistical dependence between regressors and residuals. The key advantage of this approach to regression is that it does not assume a particular distribution of the noise, i.e., it is non-parametric with respect to the noise distribution. We argue that the proposed regression method is well suited to the task of causal inference in additive noise models. A practical disadvantage is that the resulting optimization problem is generally non-convex and can be difficult to solve. Nevertheless, we report good results on one of the tasks of the NIPS 2008 Causality Challenge, where the goal is to distinguish causes from effects in pairs of statistically dependent variables. In addition, we propose an algorithm for efficiently inferring causal models from observational data for more than two variables. The required number of regressions and independence tests is quadratic in the number of variables, which is a significant improvement over the simple method that tests all possible DAGs.

References

[1]
Bollen, K. A. (1989). Structural equations with latent variables. John Wiley & Sons.
[2]
Geiger, D., & Heckerman, D. (1994). Learning Gaussian networks. Proc. of the 10th Annual Conference on Uncertainty in Artificial Intelligence (pp. 235--243).
[3]
Gretton, A., Bousquet, O., Smola, A., & Schöölkopf, B. (2005). Measuring statistical dependence with Hilbert-Schmidt norms. Algorithmic Learning Theory: 16th International Conference (ALT 2005) (pp. 63--78).
[4]
Györfi, L., Kohler, M., Krzyżak, A., & Walk, H. (2002). A distribution - free theory of nonparametric regression. New York: Springer Verlag.
[5]
Hoyer, P. O., Janzing, D., Mooij, J. M., Peters, J., & Schöölkopf, B. (2009). Nonlinear causal discovery with additive noise models. In D. Koller, D. Schuurmans, Y. Bengio and L. Bottou (Eds.), Advances in Neural Information Processing Systems 21 (NIPS* 2008), 689--696.
[6]
Liu, D. C., & Nocedal, J. (1989). On the limited memory method for large scale optimization. Mathematical Programming B, 45, 503--528.
[7]
Mooij, J., Janzing, D., & Schöölkopf, B. (2008). Distinguishing between cause and effect. http://www.kyb.tuebingen.mpg.de/bs/people/jorism/causality-data/.
[8]
Okazaki, N., & Nocedal, J. (2008). libLBFGS: C library of limited-memory BFGS (L-BFGS). http://www.chokkan.org/software/liblbfgs/.
[9]
Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge University Press.
[10]
Rasmussen, C. E., & Williams, C. (2006). Gaussian Processes for Machine Learning. MIT Press.
[11]
Rasmussen, C. E., & Williams, C. (2007). GPML code. http://www.gaussianprocess.org/gpml/code.
[12]
Schölkopf, B., & Smola, A. (2002). Learning with kernels. MIT Press.
[13]
Shimizu, S., Hoyer, P. O., Hyvärinen, A., & Kerminen, A. J. (2006). A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7, 2003--2030.
[14]
Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction, and search. Springer-Verlag. (2nd ed. MIT Press 2000).
[15]
Steinwart, I. (2002). On the influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research, 2, 67--93.
[16]
Zhang, K., & Hyväärinen, A. (2008). Distinguishing causes from effects using nonlinear acyclic causal models. http://videolectures.net/coa08_zhang_hyvarinen_dcfeu/. Talk at the NIPS 2008 Workshop on Causality: objectives and assessment.

Cited By

View all
  • (2024)Perturbation graphs, invariant causal prediction and causal relations in psychologyBritish Journal of Mathematical and Statistical Psychology10.1111/bmsp.12361Online publication date: 21-Oct-2024
  • (2024)Causal Discovery on Discrete Data via Weighted Normalized Wasserstein DistanceIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.3213641(1-13)Online publication date: 2024
  • (2024)Knowledge Verification From DataIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.320224435:3(4324-4338)Online publication date: Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning
June 2009
1331 pages
ISBN:9781605585161
DOI:10.1145/1553374

Sponsors

  • NSF
  • Microsoft Research: Microsoft Research
  • MITACS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ICML '09
Sponsor:
  • Microsoft Research

Acceptance Rates

Overall Acceptance Rate 140 of 548 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)64
  • Downloads (Last 6 weeks)7
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Perturbation graphs, invariant causal prediction and causal relations in psychologyBritish Journal of Mathematical and Statistical Psychology10.1111/bmsp.12361Online publication date: 21-Oct-2024
  • (2024)Causal Discovery on Discrete Data via Weighted Normalized Wasserstein DistanceIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.3213641(1-13)Online publication date: 2024
  • (2024)Knowledge Verification From DataIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.320224435:3(4324-4338)Online publication date: Mar-2024
  • (2024)Local machine learning model-based multi-objective optimization for managing system interdependencies in productionEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108099133:PBOnline publication date: 1-Jul-2024
  • (2024)Criterion Optimization-Based Unsupervised Domain AdaptationUnsupervised Domain Adaptation10.1007/978-981-97-1025-6_3(19-67)Online publication date: 16-Feb-2024
  • (2023)Uncovering meanings of embeddings via partial orthogonalityProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667510(31988-32005)Online publication date: 10-Dec-2023
  • (2023)A scale-invariant sorting criterion to find a causal order in additive noise modelsProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666158(785-807)Online publication date: 10-Dec-2023
  • (2023)Learning nonlinear causal effects via kernel anchor regressionProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3626016(1942-1952)Online publication date: 31-Jul-2023
  • (2023)Estimation beyond data reweightingProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619138(17745-17783)Online publication date: 23-Jul-2023
  • (2023)Generalization Performance of Pure Accuracy and its Application in Selective Ensemble LearningIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2022.317143645:2(1798-1816)Online publication date: 1-Feb-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media