Computer Science > Machine Learning

arXiv:2311.18557 (cs)

[Submitted on 30 Nov 2023]

Title:Can semi-supervised learning use all the data effectively? A lower bound perspective

Authors:Alexandru Ţifrea, Gizem Yüce, Amartya Sanyal, Fanny Yang

View PDF

Abstract:Prior works have shown that semi-supervised learning algorithms can leverage unlabeled data to improve over the labeled sample complexity of supervised learning (SL) algorithms. However, existing theoretical analyses focus on regimes where the unlabeled data is sufficient to learn a good decision boundary using unsupervised learning (UL) alone. This begs the question: Can SSL algorithms simultaneously improve upon both UL and SL? To this end, we derive a tight lower bound for 2-Gaussian mixture models that explicitly depends on the labeled and the unlabeled dataset size as well as the signal-to-noise ratio of the mixture distribution. Surprisingly, our result implies that no SSL algorithm can improve upon the minimax-optimal statistical error rates of SL or UL algorithms for these distributions. Nevertheless, we show empirically on real-world data that SSL algorithms can still outperform UL and SL methods. Therefore, our work suggests that, while proving performance gains for SSL algorithms is possible, it requires careful tracking of constants.

Comments:	Published in Advances in Neural Information Processing Systems 2023
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2311.18557 [cs.LG]
	(or arXiv:2311.18557v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2311.18557

Submission history

From: Alexandru Ţifrea [view email]
[v1] Thu, 30 Nov 2023 13:48:50 UTC (1,138 KB)

Computer Science > Machine Learning

Title:Can semi-supervised learning use all the data effectively? A lower bound perspective

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Can semi-supervised learning use all the data effectively? A lower bound perspective

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators