Computer Science > Computation and Language

arXiv:2406.04267 (cs)

[Submitted on 6 Jun 2024 (v1), last revised 24 Oct 2024 (this version, v2)]

Title:Transformers need glasses! Information over-squashing in language tasks

Authors:Federico Barbero, Andrea Banino, Steven Kapturowski, Dharshan Kumaran, João G.M. Araújo, Alex Vitvitskyi, Razvan Pascanu, Petar Veličković

View PDF HTML (experimental)

Abstract:We study how information propagates in decoder-only Transformers, which are the architectural backbone of most existing frontier large language models (LLMs). We rely on a theoretical signal propagation analysis -- specifically, we analyse the representations of the last token in the final layer of the Transformer, as this is the representation used for next-token prediction. Our analysis reveals a representational collapse phenomenon: we prove that certain distinct sequences of inputs to the Transformer can yield arbitrarily close representations in the final token. This effect is exacerbated by the low-precision floating-point formats frequently used in modern LLMs. As a result, the model is provably unable to respond to these sequences in different ways -- leading to errors in, e.g., tasks involving counting or copying. Further, we show that decoder-only Transformer language models can lose sensitivity to specific tokens in the input, which relates to the well-known phenomenon of over-squashing in graph neural networks. We provide empirical evidence supporting our claims on contemporary LLMs. Our theory also points to simple solutions towards ameliorating these issues.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2406.04267 [cs.CL]
	(or arXiv:2406.04267v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.04267

Submission history

From: Federico Barbero [view email]
[v1] Thu, 6 Jun 2024 17:14:44 UTC (3,096 KB)
[v2] Thu, 24 Oct 2024 23:12:55 UTC (3,563 KB)

Computer Science > Computation and Language

Title:Transformers need glasses! Information over-squashing in language tasks

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Transformers need glasses! Information over-squashing in language tasks

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators