Computer Science > Computation and Language

arXiv:2410.07331 (cs)

[Submitted on 9 Oct 2024 (v1), last revised 11 Oct 2024 (this version, v2)]

Title:DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models

Authors:Yiming Huang, Jianwen Luo, Yan Yu, Yitong Zhang, Fangyu Lei, Yifan Wei, Shizhu He, Lifu Huang, Xiao Liu, Jun Zhao, Kang Liu

View PDF HTML (experimental)

Abstract:We introduce DA-Code, a code generation benchmark specifically designed to assess LLMs on agent-based data science tasks. This benchmark features three core elements: First, the tasks within DA-Code are inherently challenging, setting them apart from traditional code generation tasks and demanding advanced coding skills in grounding and planning. Second, examples in DA-Code are all based on real and diverse data, covering a wide range of complex data wrangling and analytics tasks. Third, to solve the tasks, the models must utilize complex data science programming languages, to perform intricate data processing and derive the answers. We set up the benchmark in a controllable and executable environment that aligns with real-world data analysis scenarios and is scalable. The annotators meticulously design the evaluation suite to ensure the accuracy and robustness of the evaluation. We develop the DA-Agent baseline. Experiments show that although the baseline performs better than other existing frameworks, using the current best LLMs achieves only 30.5% accuracy, leaving ample room for improvement. We release our benchmark at this https URL.

Comments:	EMNLP 2024
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2410.07331 [cs.CL]
	(or arXiv:2410.07331v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2410.07331

Submission history

From: Jianwen Luo [view email]
[v1] Wed, 9 Oct 2024 18:00:05 UTC (3,858 KB)
[v2] Fri, 11 Oct 2024 00:53:05 UTC (3,868 KB)

Computer Science > Computation and Language

Title:DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators