Computer Science > Computation and Language

arXiv:2305.15096 (cs)

[Submitted on 24 May 2023 (v1), last revised 10 Feb 2024 (this version, v3)]

Title:Dynamic Masking Rate Schedules for MLM Pretraining

Authors:Zachary Ankner, Naomi Saphra, Davis Blalock, Jonathan Frankle, Matthew L. Leavitt

Abstract:Most works on transformers trained with the Masked Language Modeling (MLM) objective use the original BERT model's fixed masking rate of 15%. We propose to instead dynamically schedule the masking rate throughout training. We find that linearly decreasing the masking rate over the course of pretraining improves average GLUE accuracy by up to 0.46% and 0.25% in BERT-base and BERT-large, respectively, compared to fixed rate baselines. These gains come from exposure to both high and low masking rate regimes, providing benefits from both settings. Our results demonstrate that masking rate scheduling is a simple way to improve the quality of masked language models, achieving up to a 1.89x speedup in pretraining for BERT-base as well as a Pareto improvement for BERT-large.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2305.15096 [cs.CL]
	(or arXiv:2305.15096v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.15096

Submission history

From: Zachary Ankner [view email]
[v1] Wed, 24 May 2023 12:24:12 UTC (64 KB)
[v2] Thu, 14 Sep 2023 21:26:14 UTC (159 KB)
[v3] Sat, 10 Feb 2024 20:56:20 UTC (160 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2023-05

Change to browse by:

cs
cs.AI

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:Dynamic Masking Rate Schedules for MLM Pretraining

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Dynamic Masking Rate Schedules for MLM Pretraining

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators