Nothing Special   »   [go: up one dir, main page]

Skip to main content

SmartInsight: Learning-Based Automatic Insight Discovery for Exploratory Data Analysis

  • Conference paper
  • First Online:
Database Systems for Advanced Applications (DASFAA 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14853))

Included in the following conference series:

  • 150 Accesses

Abstract

By pruning the search space of insight discovery based on some heuristic rules, automatic insight discovery can alleviate the heavy burden and spare unnecessary exploratory trials in exploratory data analysis (EDA). However, these rule-based methods cannot guarantee the quality of the discovered insights since the rules cannot cover all possible relationships between the data distribution and insights. To address this problem, this paper proposes a learning-based automatic insight discovery method SmartInsight. In SmartInsight, we first use a subspace-aware encoding approach to encode the data scopes that may contain insights, and train a machine learning model to capture the relationship between these data scopes and insights. Then, when discovering insights, SmartInsight predicts the scores of all possible data scopes and leverages them to prune the insight search space. Experimental results on both real-world datasets and user studies show that SmartInsight outperforms the existing automatic insight discovery methods.

W. Zhang and S. Shi—This work was done at Fudan University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://superset.apache.org/.

  2. 2.

    https://www.kaggle.com/datasets/jessemostipak/hotel-booking-demand?select=hotel_bookings.csv.

  3. 3.

    https://www.kaggle.com/datasets/georgealice/carsale.

  4. 4.

    https://www.kaggle.com/datasets/carlolepelaars/toy-dataset.

  5. 5.

    https://www.kaggle.com/datasets/vivek468/superstore-dataset-final.

  6. 6.

    https://www.kaggle.com/datasets/georgealice/patientcharge.

  7. 7.

    https://www.kaggle.com/datasets/andrewmvd/udemy-courses.

References

  1. Sihem Amer-Yahia, Tova Milo, and Brit Youngmann. Exploring ratings in subjective databases. In SIGMOD, 2021.

    Google Scholar 

  2. Yukun Cao, Xike Xie, and Kexin Huang. Learn to explore: on bootstrapping interactive data exploration with meta-learning. In ICDE, pages 1720–1733, 2023.

    Google Scholar 

  3. Rui Ding, Shi Han, Yong Xu, Haidong Zhang, and Dongmei Zhang. Quickinsights: Quick and automatic discovery of insights from multi-dimensional data. In SIGMOD, pages 317–332, 2019.

    Google Scholar 

  4. Andrew T Jebb, Scott Parrigon, and Sang Eun Woo. Exploratory data analysis as a foundation of inductive research. HRMR, 27(2):265–276, 2017.

    Google Scholar 

  5. Qingwei Lin, Weichen Ke, Jian-Guang Lou, Hongyu Zhang, Kaixin Sui, Yong Xu, Ziyi Zhou, Bo Qiao, and Dongmei Zhang. Bigin4: Instant, interactive insight identification for multi-dimensional big data. In SIGKDD, pages 547–555, 2018.

    Google Scholar 

  6. Pingchuan Ma, Rui Ding, Shi Han, and Dongmei Zhang. Metainsight: Automatic discovery of structured knowledge for exploratory data analysis. In SIGMOD, pages 1262–1274, 2021.

    Google Scholar 

  7. Dominik Moritz, Chenglong Wang, Greg L Nelson, Halden Lin, Adam M Smith, Bill Howe, and Jeffrey Heer. Formalizing visualization design knowledge as constraints: Actionable and extensible models in draco. TVCG, 25(1):438–448, 2018.

    Google Scholar 

  8. Xin Qian, Ryan A Rossi, Fan Du, Sungchul Kim, Eunyee Koh, Sana Malik, Tak Yeon Lee, and Joel Chan. Learning to recommend visualizations from data. In SIGKDD, pages 1359–1369, 2021.

    Google Scholar 

  9. Xuedi Qin, Chengliang Chai, Yuyu Luo, Tianyu Zhao, Nan Tang, Guoliang Li, Jianhua Feng, Xiang Yu, and Mourad Ouzzani. Ranking desired tuples by database exploration. In ICDE, pages 1973–1978. IEEE, 2021.

    Google Scholar 

  10. Tarique Siddiqui, Albert Kim, John Lee, Karrie Karahalios, and Aditya G. Parameswaran. Effortless data exploration with zenvisage: An expressive and interactive visual analytics system. Proc. VLDB Endow., 10(4):457–468, 2016.

    Article  Google Scholar 

  11. Bo Tang, Shi Han, Man Lung Yiu, Rui Ding, and Dongmei Zhang. Extracting top-k insights from multi-dimensional data. In SIGMOD, pages 1509–1524, 2017.

    Google Scholar 

  12. Ryutaro Tanno, Kai Arulkumaran, Daniel Alexander, Antonio Criminisi, and Aditya Nori. Adaptive neural trees. In ICML, pages 6166–6175, 2019.

    Google Scholar 

  13. Yun Wang, Zhida Sun, Haidong Zhang, Weiwei Cui, Ke Xu, Xiaojuan Ma, and Dongmei Zhang. Datashot: Automatic generation of fact sheets from tabular data. TVCG, 26(1):895–905, 2019.

    Google Scholar 

Download references

Acknowledgement

This work was supported by the National Natural Science Foundation of China No.62072113.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yinan Jing .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, H. et al. (2024). SmartInsight: Learning-Based Automatic Insight Discovery for Exploratory Data Analysis. In: Onizuka, M., et al. Database Systems for Advanced Applications. DASFAA 2024. Lecture Notes in Computer Science, vol 14853. Springer, Singapore. https://doi.org/10.1007/978-981-97-5562-2_31

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-5562-2_31

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-5561-5

  • Online ISBN: 978-981-97-5562-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics