Computer Science > Computation and Language

arXiv:2305.13048 (cs)

[Submitted on 22 May 2023 (v1), last revised 11 Dec 2023 (this version, v2)]

Title:RWKV: Reinventing RNNs for the Transformer Era

Abstract:Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. We propose a novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of transformers with the efficient inference of RNNs.
Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transformer or an RNN, thus parallelizing computations during training and maintains constant computational and memory complexity during inference. We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers, suggesting future work can leverage this architecture to create more efficient models. This work presents a significant step towards reconciling trade-offs between computational efficiency and model performance in sequence processing tasks.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2305.13048 [cs.CL]
	(or arXiv:2305.13048v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.13048

Submission history

From: Quentin Anthony [view email]
[v1] Mon, 22 May 2023 13:57:41 UTC (5,484 KB)
[v2] Mon, 11 Dec 2023 03:58:56 UTC (4,261 KB)

Computer Science > Computation and Language

Title:RWKV: Reinventing RNNs for the Transformer Era

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:RWKV: Reinventing RNNs for the Transformer Era

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators