Computer Science > Computation and Language

arXiv:2205.10770 (cs)

[Submitted on 22 May 2022 (v1), last revised 2 Nov 2022 (this version, v2)]

Title:Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

Authors:Kushal Tirumala, Aram H. Markosyan, Luke Zettlemoyer, Armen Aghajanyan

View PDF

Abstract:Despite their wide adoption, the underlying training and memorization dynamics of very large language models is not well understood. We empirically study exact memorization in causal and masked language modeling, across model sizes and throughout the training process. We measure the effects of dataset size, learning rate, and model size on memorization, finding that larger language models memorize training data faster across all settings. Surprisingly, we show that larger models can memorize a larger portion of the data before over-fitting and tend to forget less throughout the training process. We also analyze the memorization dynamics of different parts of speech and find that models memorize nouns and numbers first; we hypothesize and provide empirical evidence that nouns and numbers act as a unique identifier for memorizing individual training examples. Together, these findings present another piece of the broader puzzle of trying to understand what actually improves as models get bigger.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2205.10770 [cs.CL]
	(or arXiv:2205.10770v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2205.10770

Submission history

From: Kushal Tirumala [view email]
[v1] Sun, 22 May 2022 07:43:50 UTC (405 KB)
[v2] Wed, 2 Nov 2022 19:53:44 UTC (412 KB)

Computer Science > Computation and Language

Title:Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators