When Do Program-of-Thought Works for Reasoning?

Authors

Zhen Bi Zhejiang University Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph
Ningyu Zhang Zhejiang University Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph
Yinuo Jiang Zhejiang University Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph
Shumin Deng NUS-NCS Joint Lab, National University of Singapore
Guozhou Zheng Zhejiang University Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph Donghai Laboratory
Huajun Chen Zhejiang University Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph Donghai Laboratory

DOI:

https://doi.org/10.1609/aaai.v38i16.29721

Keywords:

NLP: Interpretability, Analysis, and Evaluation of NLP Models, NLP: (Large) Language Models

Abstract

In the realm of embodied artificial intelligence, the reasoning capabilities of Large Language Models (LLMs) play a pivotal role. Although there are effective methods like program-of-thought prompting for LLMs which uses programming language to tackle complex reasoning tasks, the specific impact of code data on the improvement of reasoning capabilities remains under-explored. To address this gap, we propose complexity-impacted reasoning score CIRS, which combines structural and logical attributes, to measure the correlation between code and reasoning abilities. Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity by considering the difficulty and the cyclomatic complexity. Through an empirical analysis, we find not all code data of complexity can be learned or understood by LLMs. Optimal level of complexity is critical to the improvement of reasoning abilities by program-aided prompting. Then we design an auto-synthesizing and stratifying algorithm, and apply it to instruction generation for mathematical reasoning and code data filtering for code generation tasks. Extensive results demonstrates the effectiveness of our proposed approach.

AAAI-24 / IAAI-24 / EAAI-24 Proceedings Cover

Downloads

Published

2024-03-24

How to Cite

Bi, Z., Zhang, N., Jiang, Y., Deng, S., Zheng, G., & Chen, H. (2024). When Do Program-of-Thought Works for Reasoning?. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16), 17691-17699. https://doi.org/10.1609/aaai.v38i16.29721

Download Citation

Issue

Vol. 38 No. 16: AAAI-24 Technical Tracks 16

Section

AAAI Technical Track on Natural Language Processing I