Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

XInsight: eXplainable Data Analysis Through The Lens of Causality

Published: 20 June 2023 Publication History

Abstract

In light of the growing popularity of Exploratory Data Analysis (EDA), understanding the underlying causes of the knowledge acquired by EDA is crucial. However, it remains under-researched. This study promotes a transparent and explicable perspective on data analysis, called eXplainable Data Analysis (XDA). For this reason, we present XInsight, a general framework for XDA. XInsight provides data analysis with qualitative and quantitative explanations of causal and non-causal semantics. This way, it will significantly improve human understanding and confidence in the outcomes of data analysis, facilitating accurate data interpretation and decision making in the real world. XInsight is a three-module, end-to-end pipeline designed to extract causal graphs, translate causal primitives into XDA semantics, and quantify the quantitative contribution of each explanation to a data fact. XInsight uses a set of design concepts and optimizations to address the inherent difficulties associated with integrating causality into XDA. Experiments on synthetic and real-world datasets as well as a user study demonstrate the highly promising capabilities of XInsight.

Supplemental Material

MP4 File
Presentation video for SIGMOD 2023

References

[1]
Firas Abuzaid et al. "Diff: a relational interface for large-scale data explanation". In: The VLDB Journal 30.1 (2021), pp. 45--70.
[2]
Bryan Andrews, Peter Spirtes, and Gregory F Cooper. "On the completeness of causal discovery in the presence of latent confounding with tiered background knowledge". In: International Conference on Artificial Intelligence and Statistics. PMLR. 2020, pp. 4002--4011.
[3]
Nuno Antonio, Ana de Almeida, and Luis Nunes. "Hotel booking demand datasets". In: Data in brief 22 (2019), pp. 41--49.
[4]
Peter Bailis et al. "Macrobase: Prioritizing attention in fast data". In: Proceedings of the 2017 ACM International Conference on Management of Data. 2017, pp. 541--556.
[5]
Elias Bareinboim and Judea Pearl. "Controlling selection bias in causal inference". In: Artificial Intelligence and Statistics. PMLR. 2012, pp. 100--108.
[6]
Leopoldo Bertossi. "Score-Based Explanations in Data Management and Machine Learning". In: International Conference on Scalable Uncertainty Management. Springer. 2020, pp. 17--31.
[7]
Leopoldo Bertossi et al. "Causality-based explanation of classification outcomes". In: Proceedings of the Fourth International Workshop on Data Management for End-to-End Machine Learning. 2020, pp. 1--10.
[8]
Michael Buckland. Information and society. MIT Press, 2017.
[9]
Yiru Chen and Silu Huang. "TSExplain: Surfacing Evolving Explanations for Time Series". In: Proceedings of the 2021 International Conference on Management of Data. 2021, pp. 2686--2690.
[10]
Haoyue Dai et al. "ML4C: Seeing Causality Through Latent Vicinity". In: arXiv preprint arXiv:2110.00637 (2021).
[11]
Rui Ding et al. "Quickinsights: Quick and automatic discovery of insights from multi-dimensional data". In: Proceedings of the 2019 International Conference on Management of Data. 2019, pp. 317--332.
[12]
Rui Ding et al. "Reliable and Efficient Anytime Skeleton Learning". In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. 06. 2020, pp. 10101--10109.
[13]
Discover Insights Faster with Explain Data. https://help.tableau.com/current/pro/desktop/en-us/explain_data.htm. 2022.
[14]
Anna Fariha, Suman Nath, and Alexandra Meliou. "Causality-guided adaptive interventional debugging". In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2020, pp. 431--446.
[15]
Lampros Flokas et al. "Complaint-Driven Training Data Debugging at Interactive Speeds". In: Proceedings of the 2022 International Conference on Management of Data. 2022, pp. 369--383.
[16]
Sainyam Galhotra, Romila Pradhan, and Babak Salimi. "Explaining black-box algorithms using probabilistic contrastive counterfactuals". In: Proceedings of the 2021 International Conference on Management of Data. 2021, pp. 577--590.
[17]
Boris Glavic, Alexandra Meliou, Sudeepa Roy, et al. "Trends in Explanations: Understanding and Debugging Data-driven Systems". In: Foundations and Trends® in Databases 11.3 (2021), pp. 226--318.
[18]
Joseph Y Halpern. Actual causality. MIT Press, 2016.
[19]
Joseph Y Halpern and Judea Pearl. "Causes and explanations: A structural-model approach. Part I: Causes". In: The British journal for the philosophy of science 56.4 (2005), pp. 843--887.
[20]
Zhenlan Ji, Pingchuan Ma, and Shuai Wang. "PerfCE: Performance Debugging on Databases with Chaos Engineering-Enhanced Causality Analysis". In: arXiv preprint arXiv:2207.08369 (2022).
[21]
Frank C Keil. "Explanation and understanding". In: Annu. Rev. Psychol. 57 (2006), pp. 227--254.
[22]
Marc Lange. Because Without Cause: Non-Casual Explanations In Science and Mathematics. Oxford University Press, 2016.
[23]
Po-Ming Law et al. "Causal Perception in Question-Answering Systems". In: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 2021, pp. 1--15.
[24]
Chenjie Li et al. "Putting Things into Context: Rich Explanations for Query Answers using Join Graphs (extended version)". In: arXiv preprint arXiv:2103.15797 (2021).
[25]
Yanhui Li et al. "Training data debugging for the fairness of machine learning software". In: Proceedings of the 44th International Conference on Software Engineering. 2022, pp. 2215--2227.
[26]
Jinkun Lin et al. "Measuring the Effect of Training Data on Deep Learning Predictions via Randomized Experiments". In: International Conference on Machine Learning. PMLR. 2022, pp. 13468--13504.
[27]
Brandon Lockhart et al. "Explaining inference queries with bayesian optimization". In: Proceedings of the VLDB Endowment 14.11 (2021), pp. 2576--2585.
[28]
Pingchuan Ma et al. "MetaInsight: Automatic Discovery of Structured Knowledge for Exploratory Data Analysis". In: Proceedings of the 2021 International Conference on Management of Data. 2021, pp. 1262--1274.
[29]
Pingchuan Ma et al. "ML4S: Learning Causal Skeleton from Vicinal Graphs". In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2022.
[30]
Ahmed Mabrouk et al. "An efficient Bayesian network structure learning algorithm in the presence of deterministic relations". In: ECAI 2014. IOS Press, 2014, pp. 567--572.
[31]
Alexandra Meliou, Sudeepa Roy, and Dan Suciu. "Causality and explanations in databases". In: Proceedings of the VLDB Endowment 7.13 (2014), pp. 1715--1716.
[32]
Alexandra Meliou et al. "Causality in databases". In: IEEE Data Engineering Bulletin 33.ARTICLE (2010), pp. 59--67.
[33]
Alexandra Meliou et al. "The complexity of causality and responsibility for query answers and non-answers". In: Proceedings of the VLDB Endowment 4.1 (2010), pp. 34--45.
[34]
Alexandra Meliou et al. "Tracing data errors with view-conditioned causality". In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. 2011, pp. 505--516.
[35]
Zhengjie Miao et al. "Going beyond provenance: Explaining query answers with pattern-based counterbalances". In: Proceedings of the 2019 International Conference on Management of Data. 2019, pp. 485--502.
[36]
Microsoft. Microsoft/reliableAI. https://github.com/microsoft/reliableAI. 2022.
[37]
Microsoft. Use the Analyze feature to explain fluctuations in report visuals. https://learn.microsoft.com/en-us/power-bi/consumer/end-user-analyze-visuals. 2022.
[38]
Tova Milo and Amit Somech. "Automating exploratory data analysis via machine learning: An overview". In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2020, pp. 2617--2622.
[39]
Gregory L Murphy and Douglas L Medin. "The role of theories in conceptual coherence." In: Psychological review 92.3 (1985), p. 289.
[40]
Judea Pearl. "Causal inference in statistics: An overview". In: Statistics Surveys 3.none (2009), pp. 96--146.
[41]
Judea Pearl. Causality. Cambridge university press, 2009.
[42]
Jonas Peters, Dominik Janzing, and Bernhard Scholkopf. "Causal inference on discrete data using additive noise models". In: IEEE Transactions on Pattern Analysis and Machine Intelligence 33.12 (2011), pp. 2436--2450.
[43]
Jonas Peters, Dominik Janzing, and Bernhard Schölkopf. Elements of causal inference: foundations and learning algorithms. The MIT Press, 2017.
[44]
Mark Povich and Carl F Craver. Because without Cause: Non-Causal Explanations in Science and Mathematics. 2018.
[45]
Romila Pradhan et al. "Explainable AI: Foundations, Applications, Opportunities for Data Management Research". In: Proceedings of the 2022 International Conference on Management of Data. 2022, pp. 2452--2457.
[46]
Sudeepa Roy and Dan Suciu. "A formal approach to finding explanations for database queries". In: Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 2014, pp. 1579--1590.
[47]
Babak Salimi et al. "Causal relational learning". In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2020, pp. 241--256.
[48]
Babak Salimi et al. "Zaliql: causal inference from observational data at scale". In: Proceedings of the VLDB Endowment 10.12 (2017), pp. 1957--1960.
[49]
Richard Scheines, Peter Spirtes, and Clark Glymour. "A qualitative approach to causal modeling". In: Qualitative simulation modeling and analysis. Springer, 1991, pp. 72--97.
[50]
Peter Spirtes et al. Causation, prediction, and search. MIT press, 2000.
[51]
Sofia Triantafillou and Ioannis Tsamardinos. "Constraint-based causal discovery from multiple interventions over overlapping variable sets". In: The Journal of Machine Learning Research 16.1 (2015), pp. 2147--2205.
[52]
David S Watson et al. "Local explanations via necessity and sufficiency: Unifying theory and practice". In: Uncertainty in Artificial Intelligence. PMLR. 2021, pp. 1382--1392.
[53]
Eugene Wu and Samuel Madden. "Scorpion: Explaining Away Outliers in Aggregate Queries". In: Proceedings of the VLDB Endowment 6.8 (2013).
[54]
Weiyuan Wu et al. "Complaint-driven training data debugging for query 2.0". In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2020, pp. 1317--1334.
[55]
Dong Young Yoon, Ning Niu, and Barzan Mozafari. "Dbsherlock: A performance diagnostic tool for transactional databases". In: Proceedings of the 2016 International Conference on Management of Data. 2016, pp. 1599--1614.
[56]
Jiji Zhang. "On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias". In: Artificial Intelligence 172.16--17 (2008), pp. 1873--1896.
[57]
Yunjia Zhang, Zhihan Guo, and Theodoros Rekatsinas. "A statistical perspective on discovering functional dependencies in noisy data". In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2020, pp. 861--876.

Cited By

View all
  • (2024)Press ECCS to Doubt (Your Causal Graph)Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI10.1145/3665601.3669842(6-15)Online publication date: 9-Jun-2024
  • (2024)Summarized Causal Explanations For Aggregate ViewsProceedings of the ACM on Management of Data10.1145/36393282:1(1-27)Online publication date: 26-Mar-2024
  • (2024)Scalable Differentiable Causal Discovery in the Presence of Latent Confounders with Skeleton PosteriorProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672031(2141-2152)Online publication date: 25-Aug-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 1, Issue 2
PACMMOD
June 2023
2310 pages
EISSN:2836-6573
DOI:10.1145/3605748
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2023
Published in PACMMOD Volume 1, Issue 2

Permissions

Request permissions for this article.

Author Tags

  1. bayesian network
  2. data analysis
  3. data management

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)212
  • Downloads (Last 6 weeks)24
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Press ECCS to Doubt (Your Causal Graph)Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI10.1145/3665601.3669842(6-15)Online publication date: 9-Jun-2024
  • (2024)Summarized Causal Explanations For Aggregate ViewsProceedings of the ACM on Management of Data10.1145/36393282:1(1-27)Online publication date: 26-Mar-2024
  • (2024)Scalable Differentiable Causal Discovery in the Presence of Latent Confounders with Skeleton PosteriorProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672031(2141-2152)Online publication date: 25-Aug-2024
  • (2024)SIERRA: A Counterfactual Thinking-based Visual Interface for Property Graph Query ConstructionCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654729(440-443)Online publication date: 9-Jun-2024
  • (2024)Enabling Runtime Verification of Causal Discovery Algorithms with Automated Conditional Independence ReasoningProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623348(1-13)Online publication date: 20-May-2024
  • (2024)Chat2Query: A Zero-Shot Automatic Exploratory Data Analysis System with Large Language Models2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00420(5429-5432)Online publication date: 13-May-2024
  • (2024)Interestingness Measures for Exploratory Data Analysis: a SurveyNew Trends in Database and Information Systems10.1007/978-3-031-70421-5_2(14-24)Online publication date: 14-Nov-2024
  • (2023)Explain any conceptProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667076(21826-21840)Online publication date: 10-Dec-2023
  • (2023)Towards Practical Federated Causal Structure LearningMachine Learning and Knowledge Discovery in Databases: Research Track10.1007/978-3-031-43415-0_21(351-367)Online publication date: 18-Sep-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media