Computer Science > Machine Learning

arXiv:2306.01244 (cs)

[Submitted on 2 Jun 2023]

Title:Towards Sustainable Learning: Coresets for Data-efficient Deep Learning

Authors:Yu Yang, Hao Kang, Baharan Mirzasoleiman

View PDF

Abstract:To improve the efficiency and sustainability of learning deep models, we propose CREST, the first scalable framework with rigorous theoretical guarantees to identify the most valuable examples for training non-convex models, particularly deep networks. To guarantee convergence to a stationary point of a non-convex function, CREST models the non-convex loss as a series of quadratic functions and extracts a coreset for each quadratic sub-region. In addition, to ensure faster convergence of stochastic gradient methods such as (mini-batch) SGD, CREST iteratively extracts multiple mini-batch coresets from larger random subsets of training data, to ensure nearly-unbiased gradients with small variances. Finally, to further improve scalability and efficiency, CREST identifies and excludes the examples that are learned from the coreset selection pipeline. Our extensive experiments on several deep networks trained on vision and NLP datasets, including CIFAR-10, CIFAR-100, TinyImageNet, and SNLI, confirm that CREST speeds up training deep networks on very large datasets, by 1.7x to 2.5x with minimum loss in the performance. By analyzing the learning difficulty of the subsets selected by CREST, we show that deep models benefit the most by learning from subsets of increasing difficulty levels.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2306.01244 [cs.LG]
	(or arXiv:2306.01244v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2306.01244

Submission history

From: Yu Yang [view email]
[v1] Fri, 2 Jun 2023 02:51:08 UTC (670 KB)

Computer Science > Machine Learning

Title:Towards Sustainable Learning: Coresets for Data-efficient Deep Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Towards Sustainable Learning: Coresets for Data-efficient Deep Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators