Revisiting Parallel Context Windows: A Frustratingly Simple Alternative and Chain-of-Thought Deterioration

Kejuan Yang, Xiao Liu, Kaiwen Men, Aohan Zeng, Yuxiao Dong, Jie Tang

Abstract

We identify two crucial limitations in the evaluation of recent parallel-integrated method Parallel Context Windows (PCW), which extends the maximum context lengths of language models, e.g., 2048 for LLaMA, by harnessing window-wise attention and positional embedding techniques. We first show that a simple yet strong baseline, weighted sum ensemble, is missing for the in-context few-shot classification. Moreover, on more challenging Chain-of-Thought (CoT) reasoning (e.g., HotpotQA), PCW would present unexpected deterioration regarding question miscomprehension and false inference. Based on our findings, we suggest that the existing PCW design may not guarantee sufficient improvement and practicality in handling lengthy documents in real-world applications. More community efforts on enabling language models’ long context understanding ability should be paid.

Anthology ID:: 2024.findings-acl.523
Volume:: Findings of the Association for Computational Linguistics: ACL 2024
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Lun-Wei Ku, Andre Martins, Vivek Srikumar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8841–8852
Language:
URL:: https://aclanthology.org/2024.findings-acl.523
DOI:: 10.18653/v1/2024.findings-acl.523
Bibkey:
Cite (ACL):: Kejuan Yang, Xiao Liu, Kaiwen Men, Aohan Zeng, Yuxiao Dong, and Jie Tang. 2024. Revisiting Parallel Context Windows: A Frustratingly Simple Alternative and Chain-of-Thought Deterioration. In Findings of the Association for Computational Linguistics: ACL 2024, pages 8841–8852, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: Revisiting Parallel Context Windows: A Frustratingly Simple Alternative and Chain-of-Thought Deterioration (Yang et al., Findings 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.findings-acl.523.pdf

PDF Cite Search