research-article

Pipe(line) Dreams: Fully Automated End-to-End Analysis and Visualization

Authors:

Azza AbouziedAuthors Info & Claims

HILDA 24: Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics

Pages 1 - 7

https://doi.org/10.1145/3665939.3665962

Published: 18 June 2024 Publication History

Abstract

We exploit large language models (LLMs) to automate the end-to-end process of descriptive analytics and visualization. A user simply declares who they are and provides their data set. Our tool LLM4Vis sets analysis goals or metrics, generates code to process and analyze the data, visualizes the results and interprets the visualization to summarize key takeaways for our user. We examine the power of LLMs in democratizing data science for the non-technical user and in handling rich, multimodal data sets. We also explore LLM4Vis's limitations, opportunities for human-in-the-loop interventions, and challenges to measuring and improving the robustness and the utility of LLM-generated end-to-end data analysis pipelines.

References

[1]

2024. ChatGPT Assistants API. https://platform.openai.com/docs/assistants/overview?context=with-streaming Accessed on March 23rd, 2024.

[2]

2024. GPT Data Analyst. https://chatgpt.com/g/g-HMNcP6w7d-data-analyst?oai-dm=1 Accessed on May 27th, 2024.

[3]

2024. Inside Airbnb. http://insideairbnb.com/ Accessed on February 27th, 2024.

[4]

2024. Tableau AI. https://www.tableau.com/products/tableau-ai Accessed on May 28th, 2024.

[5]

Elif Akata, Lion Schulz, Julian Coda-Forno, Seong Joon Oh, Matthias Bethge, and Eric Schulz. 2023. Playing repeated games with Large Language Models. arXiv:2305.16867 [cs.CL]

[6]

Tuhin Chakrabarty, Philippe Laban, Divyansh Agarwal, Smaranda Muresan, and Chien-Sheng Wu. 2024. Art or Artifice? Large Language Models and the False Promise of Creativity. arXiv:2309.14556 [cs.CL]

[7]

Shih-Chieh Dai, Aiping Xiong, and Lun-Wei Ku. 2023. LLM-in-the-loop: Leveraging Large Language Model for Thematic Analysis. arXiv:2310.15100 [cs.CL]

[8]

Jakub Drápal, Hannes Westermann, and Jaromir Savelka. 2023. Using Large Language Models to Support Thematic Analysis in Empirical Legal Studies. arXiv:2310.18729 [cs.AI]

[9]

Cody Dunne, Carl Skelton, Sara Diamond, Isabel Meirelles, and Mauro Martino. 2016. Quantitative, Qualitative, and Historical Urban Data Visualization Tools for Professionals and Stakeholders. In Distributed, Ambient and Pervasive Interactions, Norbert Streitz and Panos Markopoulos (Eds.). Springer International Publishing, Cham, 405--416.

[10]

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. 2024. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv:2312.10997 [cs.CL]

[11]

Nahum Gershon and Ward Page. 2001. What storytelling can do for information visualization. Commun. ACM 44, 8 (aug 2001), 31--37.

Digital Library

[12]

Carlos Gómez-Rodríguez and Paul Williams. 2023. A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative Writing. In Findings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 14504--14528.

[13]

Alicia Key, Bill Howe, Daniel Perry, and Cecilia Aragon. 2012. VizDeck: self-organizing dashboards for visual analytics. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (Scottsdale, Arizona, USA) (SIGMOD '12). Association for Computing Machinery, New York, NY, USA, 681--684.

Digital Library

[14]

Dominik Moritz, Chenglong Wang, Greg L. Nelson, Halden Lin, Adam M. Smith, Bill Howe, and Jeffrey Heer. 2019. Formalizing Visualization Design Knowledge as Constraints: Actionable and Extensible Models in Draco. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019), 438--448.

Digital Library

[15]

Arvind Satyanarayan and Jeffrey Heer. 2014. Authoring Narrative Visualizations with Ellipsis. Computer Graphics Forum 33, 3 (2014), 361--370. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.12392

[16]

Edward Segel and Jeffrey Heer. 2010. Narrative Visualization: Telling Stories with Data. IEEE Transactions on Visualization and Computer Graphics 16, 6 (2010), 1139--1148.

Digital Library

[17]

Zeyuan Shang, Emanuel Zgraggen, Benedetto Buratti, Ferdinand Kossmann, Philipp Eichmann, Yeounoh Chung, Carsten Binnig, Eli Upfal, and Tim Kraska. 2019. Democratizing Data Science through Interactive Curation of ML Pipelines. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD '19). Association for Computing Machinery, New York, NY, USA, 1171--1188.

Digital Library

[18]

Evan R. Sparks, Ameet Talwalkar, Daniel Haas, Michael J. Franklin, Michael I. Jordan, and Tim Kraska. 2015. Automating model search for large scale machine learning. In Proceedings of the Sixth ACM Symposium on Cloud Computing (Kohala Coast, Hawaii) (SoCC '15). Association for Computing Machinery, New York, NY, USA, 368--380.

Digital Library

[19]

Manasi Vartak, Sajjadur Rahman, Samuel Madden, Aditya Parameswaran, and Neoklis Polyzotis. 2015. SeeDB: Efficient Data-Driven Visualization Recommendations to Support Visual Analytics. Vol. 8. 2182--2193.

[20]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed H. Chi, Quoc Le, and Denny Zhou. 2022. Chain of Thought Prompting Elicits Reasoning in Large Language Models. CoRR abs/2201.11903 (2022). arXiv:2201.11903 https://arxiv.org/abs/2201.11903

[21]

Aoyu Wu, Yun Wang, Xinhuan Shu, Dominik Moritz, Weiwei Cui, Haidong Zhang, Dongmei Zhang, and Huamin Qu. 2022. AI4VIS: Survey on Artificial Intelligence Approaches for Data Visualization. IEEE Transactions on Visualization and Computer Graphics 28, 12 (2022), 5049--5070.

Digital Library

[22]

Zhuofeng Wu, He Bai, Aonan Zhang, Jiatao Gu, VG Vinod Vydiswaran, Navdeep Jaitly, and Yizhe Zhang. 2024. Divide-or-Conquer? Which Part Should You Distill Your LLM? arXiv:2402.15000 [cs.CL]

[23]

Xinlu Zhang, Yujie Lu, Weizhi Wang, An Yan, Jun Yan, Lianke Qin, Heng Wang, Xifeng Yan, William Yang Wang, and Linda Ruth Petzold. 2023. GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks. arXiv:2311.01361 [cs.CV]

[24]

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. 2023. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. arXiv:2306.05685 [cs.CL]

Recommendations

End-User Visualization and Manipulation of Distributed Aggregate Data

Aggregate visualization and manipulation enables the viewing and interaction of dynamically changing data sets in a graphically meaningful way. However, off-the-shelf applications typically provide only limited ways to view static aggregates and ...
Big Data Analysis with Interactive Visualization using R packages
BigDataScience '14: Proceedings of the 2014 International Conference on Big Data Science and Computing

Compared to the traditional data storing, processing, analyzing and visualization which have been performed, Big data requires evolutionary technologies of massive data processing on distributed and parallel systems, such as Hadoop system. Big data ...
Using end-user visualization environments to mediate conversations: a 'Communicative Dimensions' framework

An end-user visualization environment aims to empower end users to create graphical representations of phenomena within a scientific domain of interest. Research into end-user visualization environments has traditionally focused on developing the human-...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

HILDA 24: Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics

June 2024

91 pages

ISBN:9798400706936

DOI:10.1145/3665939

Program Chairs:
Jean-Daniel Fekete
Inria & Université Paris-Saclay
,
Behrooz Omidvar-Tehrani
AWS AI Labs
,
Kexin Rong
Georgia Institute of Technology
,
Roee Shraga
Worcester Polytechnic Institute

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2024

Check for updates

Qualifiers

Research-article

Conference

HILDA 24

Sponsor:

SIGMOD

HILDA 24: 2024 Workshop on Human-In-the-Loop Data Analytics

June 14, 2024

AA, Santiago, Chile

Acceptance Rates

Overall Acceptance Rate 28 of 56 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
142
Total Downloads

Downloads (Last 12 months)142
Downloads (Last 6 weeks)21

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents