Computer Science > Computation and Language

arXiv:2305.13723 (cs)

[Submitted on 23 May 2023 (v1), last revised 20 Oct 2023 (this version, v2)]

Title:PIEClass: Weakly-Supervised Text Classification with Prompting and Noise-Robust Iterative Ensemble Training

Authors:Yunyi Zhang, Minhao Jiang, Yu Meng, Yu Zhang, Jiawei Han

View PDF

Abstract:Weakly-supervised text classification trains a classifier using the label name of each target class as the only supervision, which largely reduces human annotation efforts. Most existing methods first use the label names as static keyword-based features to generate pseudo labels, which are then used for final classifier training. While reasonable, such a commonly adopted framework suffers from two limitations: (1) keywords can have different meanings in different contexts and some text may not have any keyword, so keyword matching can induce noisy and inadequate pseudo labels; (2) the errors made in the pseudo label generation stage will directly propagate to the classifier training stage without a chance of being corrected. In this paper, we propose a new method, PIEClass, consisting of two modules: (1) a pseudo label acquisition module that uses zero-shot prompting of pre-trained language models (PLM) to get pseudo labels based on contextualized text understanding beyond static keyword matching, and (2) a noise-robust iterative ensemble training module that iteratively trains classifiers and updates pseudo labels by utilizing two PLM fine-tuning methods that regularize each other. Extensive experiments show that PIEClass achieves overall better performance than existing strong baselines on seven benchmark datasets and even achieves similar performance to fully-supervised classifiers on sentiment classification tasks.

Comments:	Accepted to EMNLP 2023 Main Conference
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2305.13723 [cs.CL]
	(or arXiv:2305.13723v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.13723

Submission history

From: Yunyi Zhang [view email]
[v1] Tue, 23 May 2023 06:19:14 UTC (1,565 KB)
[v2] Fri, 20 Oct 2023 15:14:34 UTC (2,321 KB)

Computer Science > Computation and Language

Title:PIEClass: Weakly-Supervised Text Classification with Prompting and Noise-Robust Iterative Ensemble Training

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:PIEClass: Weakly-Supervised Text Classification with Prompting and Noise-Robust Iterative Ensemble Training

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators