Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3665939.3665962acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Pipe(line) Dreams: Fully Automated End-to-End Analysis and Visualization

Published: 18 June 2024 Publication History

Abstract

We exploit large language models (LLMs) to automate the end-to-end process of descriptive analytics and visualization. A user simply declares who they are and provides their data set. Our tool LLM4Vis sets analysis goals or metrics, generates code to process and analyze the data, visualizes the results and interprets the visualization to summarize key takeaways for our user. We examine the power of LLMs in democratizing data science for the non-technical user and in handling rich, multimodal data sets. We also explore LLM4Vis's limitations, opportunities for human-in-the-loop interventions, and challenges to measuring and improving the robustness and the utility of LLM-generated end-to-end data analysis pipelines.

References

[1]
2024. ChatGPT Assistants API. https://platform.openai.com/docs/assistants/overview?context=with-streaming Accessed on March 23rd, 2024.
[2]
2024. GPT Data Analyst. https://chatgpt.com/g/g-HMNcP6w7d-data-analyst?oai-dm=1 Accessed on May 27th, 2024.
[3]
2024. Inside Airbnb. http://insideairbnb.com/ Accessed on February 27th, 2024.
[4]
2024. Tableau AI. https://www.tableau.com/products/tableau-ai Accessed on May 28th, 2024.
[5]
Elif Akata, Lion Schulz, Julian Coda-Forno, Seong Joon Oh, Matthias Bethge, and Eric Schulz. 2023. Playing repeated games with Large Language Models. arXiv:2305.16867 [cs.CL]
[6]
Tuhin Chakrabarty, Philippe Laban, Divyansh Agarwal, Smaranda Muresan, and Chien-Sheng Wu. 2024. Art or Artifice? Large Language Models and the False Promise of Creativity. arXiv:2309.14556 [cs.CL]
[7]
Shih-Chieh Dai, Aiping Xiong, and Lun-Wei Ku. 2023. LLM-in-the-loop: Leveraging Large Language Model for Thematic Analysis. arXiv:2310.15100 [cs.CL]
[8]
Jakub Drápal, Hannes Westermann, and Jaromir Savelka. 2023. Using Large Language Models to Support Thematic Analysis in Empirical Legal Studies. arXiv:2310.18729 [cs.AI]
[9]
Cody Dunne, Carl Skelton, Sara Diamond, Isabel Meirelles, and Mauro Martino. 2016. Quantitative, Qualitative, and Historical Urban Data Visualization Tools for Professionals and Stakeholders. In Distributed, Ambient and Pervasive Interactions, Norbert Streitz and Panos Markopoulos (Eds.). Springer International Publishing, Cham, 405--416.
[10]
Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, and Haofen Wang. 2024. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv:2312.10997 [cs.CL]
[11]
Nahum Gershon and Ward Page. 2001. What storytelling can do for information visualization. Commun. ACM 44, 8 (aug 2001), 31--37.
[12]
Carlos Gómez-Rodríguez and Paul Williams. 2023. A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative Writing. In Findings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapore, 14504--14528.
[13]
Alicia Key, Bill Howe, Daniel Perry, and Cecilia Aragon. 2012. VizDeck: self-organizing dashboards for visual analytics. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (Scottsdale, Arizona, USA) (SIGMOD '12). Association for Computing Machinery, New York, NY, USA, 681--684.
[14]
Dominik Moritz, Chenglong Wang, Greg L. Nelson, Halden Lin, Adam M. Smith, Bill Howe, and Jeffrey Heer. 2019. Formalizing Visualization Design Knowledge as Constraints: Actionable and Extensible Models in Draco. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019), 438--448.
[15]
Arvind Satyanarayan and Jeffrey Heer. 2014. Authoring Narrative Visualizations with Ellipsis. Computer Graphics Forum 33, 3 (2014), 361--370. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.12392
[16]
Edward Segel and Jeffrey Heer. 2010. Narrative Visualization: Telling Stories with Data. IEEE Transactions on Visualization and Computer Graphics 16, 6 (2010), 1139--1148.
[17]
Zeyuan Shang, Emanuel Zgraggen, Benedetto Buratti, Ferdinand Kossmann, Philipp Eichmann, Yeounoh Chung, Carsten Binnig, Eli Upfal, and Tim Kraska. 2019. Democratizing Data Science through Interactive Curation of ML Pipelines. In Proceedings of the 2019 International Conference on Management of Data (Amsterdam, Netherlands) (SIGMOD '19). Association for Computing Machinery, New York, NY, USA, 1171--1188.
[18]
Evan R. Sparks, Ameet Talwalkar, Daniel Haas, Michael J. Franklin, Michael I. Jordan, and Tim Kraska. 2015. Automating model search for large scale machine learning. In Proceedings of the Sixth ACM Symposium on Cloud Computing (Kohala Coast, Hawaii) (SoCC '15). Association for Computing Machinery, New York, NY, USA, 368--380.
[19]
Manasi Vartak, Sajjadur Rahman, Samuel Madden, Aditya Parameswaran, and Neoklis Polyzotis. 2015. SeeDB: Efficient Data-Driven Visualization Recommendations to Support Visual Analytics. Vol. 8. 2182--2193.
[20]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed H. Chi, Quoc Le, and Denny Zhou. 2022. Chain of Thought Prompting Elicits Reasoning in Large Language Models. CoRR abs/2201.11903 (2022). arXiv:2201.11903 https://arxiv.org/abs/2201.11903
[21]
Aoyu Wu, Yun Wang, Xinhuan Shu, Dominik Moritz, Weiwei Cui, Haidong Zhang, Dongmei Zhang, and Huamin Qu. 2022. AI4VIS: Survey on Artificial Intelligence Approaches for Data Visualization. IEEE Transactions on Visualization and Computer Graphics 28, 12 (2022), 5049--5070.
[22]
Zhuofeng Wu, He Bai, Aonan Zhang, Jiatao Gu, VG Vinod Vydiswaran, Navdeep Jaitly, and Yizhe Zhang. 2024. Divide-or-Conquer? Which Part Should You Distill Your LLM? arXiv:2402.15000 [cs.CL]
[23]
Xinlu Zhang, Yujie Lu, Weizhi Wang, An Yan, Jun Yan, Lianke Qin, Heng Wang, Xifeng Yan, William Yang Wang, and Linda Ruth Petzold. 2023. GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks. arXiv:2311.01361 [cs.CV]
[24]
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. 2023. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. arXiv:2306.05685 [cs.CL]

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
HILDA 24: Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics
June 2024
91 pages
ISBN:9798400706936
DOI:10.1145/3665939
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2024

Check for updates

Qualifiers

  • Research-article

Conference

HILDA 24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 28 of 56 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 142
    Total Downloads
  • Downloads (Last 12 months)142
  • Downloads (Last 6 weeks)21
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media