Abstract
By pruning the search space of insight discovery based on some heuristic rules, automatic insight discovery can alleviate the heavy burden and spare unnecessary exploratory trials in exploratory data analysis (EDA). However, these rule-based methods cannot guarantee the quality of the discovered insights since the rules cannot cover all possible relationships between the data distribution and insights. To address this problem, this paper proposes a learning-based automatic insight discovery method SmartInsight. In SmartInsight, we first use a subspace-aware encoding approach to encode the data scopes that may contain insights, and train a machine learning model to capture the relationship between these data scopes and insights. Then, when discovering insights, SmartInsight predicts the scores of all possible data scopes and leverages them to prune the insight search space. Experimental results on both real-world datasets and user studies show that SmartInsight outperforms the existing automatic insight discovery methods.
W. Zhang and S. Shi—This work was done at Fudan University.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
References
Sihem Amer-Yahia, Tova Milo, and Brit Youngmann. Exploring ratings in subjective databases. In SIGMOD, 2021.
Yukun Cao, Xike Xie, and Kexin Huang. Learn to explore: on bootstrapping interactive data exploration with meta-learning. In ICDE, pages 1720–1733, 2023.
Rui Ding, Shi Han, Yong Xu, Haidong Zhang, and Dongmei Zhang. Quickinsights: Quick and automatic discovery of insights from multi-dimensional data. In SIGMOD, pages 317–332, 2019.
Andrew T Jebb, Scott Parrigon, and Sang Eun Woo. Exploratory data analysis as a foundation of inductive research. HRMR, 27(2):265–276, 2017.
Qingwei Lin, Weichen Ke, Jian-Guang Lou, Hongyu Zhang, Kaixin Sui, Yong Xu, Ziyi Zhou, Bo Qiao, and Dongmei Zhang. Bigin4: Instant, interactive insight identification for multi-dimensional big data. In SIGKDD, pages 547–555, 2018.
Pingchuan Ma, Rui Ding, Shi Han, and Dongmei Zhang. Metainsight: Automatic discovery of structured knowledge for exploratory data analysis. In SIGMOD, pages 1262–1274, 2021.
Dominik Moritz, Chenglong Wang, Greg L Nelson, Halden Lin, Adam M Smith, Bill Howe, and Jeffrey Heer. Formalizing visualization design knowledge as constraints: Actionable and extensible models in draco. TVCG, 25(1):438–448, 2018.
Xin Qian, Ryan A Rossi, Fan Du, Sungchul Kim, Eunyee Koh, Sana Malik, Tak Yeon Lee, and Joel Chan. Learning to recommend visualizations from data. In SIGKDD, pages 1359–1369, 2021.
Xuedi Qin, Chengliang Chai, Yuyu Luo, Tianyu Zhao, Nan Tang, Guoliang Li, Jianhua Feng, Xiang Yu, and Mourad Ouzzani. Ranking desired tuples by database exploration. In ICDE, pages 1973–1978. IEEE, 2021.
Tarique Siddiqui, Albert Kim, John Lee, Karrie Karahalios, and Aditya G. Parameswaran. Effortless data exploration with zenvisage: An expressive and interactive visual analytics system. Proc. VLDB Endow., 10(4):457–468, 2016.
Bo Tang, Shi Han, Man Lung Yiu, Rui Ding, and Dongmei Zhang. Extracting top-k insights from multi-dimensional data. In SIGMOD, pages 1509–1524, 2017.
Ryutaro Tanno, Kai Arulkumaran, Daniel Alexander, Antonio Criminisi, and Aditya Nori. Adaptive neural trees. In ICML, pages 6166–6175, 2019.
Yun Wang, Zhida Sun, Haidong Zhang, Weiwei Cui, Ke Xu, Xiaojuan Ma, and Dongmei Zhang. Datashot: Automatic generation of fact sheets from tabular data. TVCG, 26(1):895–905, 2019.
Acknowledgement
This work was supported by the National Natural Science Foundation of China No.62072113.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, H. et al. (2024). SmartInsight: Learning-Based Automatic Insight Discovery for Exploratory Data Analysis. In: Onizuka, M., et al. Database Systems for Advanced Applications. DASFAA 2024. Lecture Notes in Computer Science, vol 14853. Springer, Singapore. https://doi.org/10.1007/978-981-97-5562-2_31
Download citation
DOI: https://doi.org/10.1007/978-981-97-5562-2_31
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5561-5
Online ISBN: 978-981-97-5562-2
eBook Packages: Computer ScienceComputer Science (R0)