Computer Science > Computation and Language

arXiv:1909.00161 (cs)

[Submitted on 31 Aug 2019]

Title:Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach

Authors:Wenpeng Yin, Jamaal Hay, Dan Roth

View PDF

Abstract:Zero-shot text classification (0Shot-TC) is a challenging NLU problem to which little attention has been paid by the research community. 0Shot-TC aims to associate an appropriate label with a piece of text, irrespective of the text domain and the aspect (e.g., topic, emotion, event, etc.) described by the label. And there are only a few articles studying 0Shot-TC, all focusing only on topical categorization which, we argue, is just the tip of the iceberg in 0Shot-TC. In addition, the chaotic experiments in literature make no uniform comparison, which blurs the progress.
This work benchmarks the 0Shot-TC problem by providing unified datasets, standardized evaluations, and state-of-the-art baselines. Our contributions include: i) The datasets we provide facilitate studying 0Shot-TC relative to conceptually different and diverse aspects: the ``topic'' aspect includes ``sports'' and ``politics'' as labels; the ``emotion'' aspect includes ``joy'' and ``anger''; the ``situation'' aspect includes ``medical assistance'' and ``water shortage''. ii) We extend the existing evaluation setup (label-partially-unseen) -- given a dataset, train on some labels, test on all labels -- to include a more challenging yet realistic evaluation label-fully-unseen 0Shot-TC (Chang et al., 2008), aiming at classifying text snippets without seeing task specific training data at all. iii) We unify the 0Shot-TC of diverse aspects within a textual entailment formulation and study it this way.
Code & Data: this https URL

Comments:	EMNLP2019 camera-ready, 10 pages
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1909.00161 [cs.CL]
	(or arXiv:1909.00161v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1909.00161

Submission history

From: Wenpeng Yin [view email]
[v1] Sat, 31 Aug 2019 07:42:11 UTC (112 KB)

Computer Science > Computation and Language

Title:Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators