Computer Science > Machine Learning

arXiv:2312.13978 (cs)

[Submitted on 21 Dec 2023 (v1), last revised 1 Apr 2024 (this version, v2)]

Title:Metalearning with Very Few Samples Per Task

Authors:Maryam Aliakbarpour, Konstantina Bairaktari, Gavin Brown, Adam Smith, Nathan Srebro, Jonathan Ullman

Abstract:Metalearning and multitask learning are two frameworks for solving a group of related learning tasks more efficiently than we could hope to solve each of the individual tasks on their own. In multitask learning, we are given a fixed set of related learning tasks and need to output one accurate model per task, whereas in metalearning we are given tasks that are drawn i.i.d. from a metadistribution and need to output some common information that can be easily specialized to new tasks from the metadistribution.
We consider a binary classification setting where tasks are related by a shared representation, that is, every task $P$ can be solved by a classifier of the form $f_{P} \circ h$ where $h \in H$ is a map from features to a representation space that is shared across tasks, and $f_{P} \in F$ is a task-specific classifier from the representation space to labels. The main question we ask is how much data do we need to metalearn a good representation? Here, the amount of data is measured in terms of the number of tasks $t$ that we need to see and the number of samples $n$ per task. We focus on the regime where $n$ is extremely small. Our main result shows that, in a distribution-free setting where the feature vectors are in $\mathbb{R}^d$, the representation is a linear map from $\mathbb{R}^d \to \mathbb{R}^k$, and the task-specific classifiers are halfspaces in $\mathbb{R}^k$, we can metalearn a representation with error $\varepsilon$ using $n = k+2$ samples per task, and $d \cdot (1/\varepsilon)^{O(k)}$ tasks. Learning with so few samples per task is remarkable because metalearning would be impossible with $k+1$ samples per task, and because we cannot even hope to learn an accurate task-specific classifier with $k+2$ samples per task. Our work also yields a characterization of distribution-free multitask learning and reductions between meta and multitask learning.

Subjects:	Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:2312.13978 [cs.LG]
	(or arXiv:2312.13978v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2312.13978

Submission history

From: Konstantina Bairaktari [view email]
[v1] Thu, 21 Dec 2023 16:06:44 UTC (65 KB)
[v2] Mon, 1 Apr 2024 14:13:22 UTC (165 KB)

Computer Science > Machine Learning

Title:Metalearning with Very Few Samples Per Task

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Metalearning with Very Few Samples Per Task

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators