Abstract
To meet the challenge of lack of labeled data in document classification tasks, semi-supervised learning has been studied, in which unlabeled samples are also utilized for training. Self-training is one of the iconic strategies for semi-supervised learning, in which a classifier trains itself by its own predictions. However, self-training has been mostly applied to multi-class classification, and rarely applied to the multi-label scenario. In this paper, we propose a self-training-based approach for semi-supervised multi-label document classification, in which semantic-space finetuning is introduced and integrated into the self-training process. Newly discovered credible predictions are used not only for classifier finetuning, but also for semantic-space finetuning, which further benefit label propagation for exploring more credible predictions. Experimental results confirm the effectiveness of the proposed approach and show a satisfactory improvement over the baseline methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aly, R., Remus, S., Biemann, C.: Hierarchical multi-label classification of text with capsule networks. In: ACL: Student Research Workshop (2019)
Apte, C., Damerau, F., Weiss, S.M.: Towards language independent automated learning of text categorization models. In: Croft, B.W., van Rijsbergen, C.J. (eds.) SIGIR 1994. Springer, London (1994). https://doi.org/10.1007/978-1-4471-2099-5_3
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL-HLT (2019)
Iscen, A., et al.: Label propagation for deep semi-supervised learning. In: CVPR (2019)
Scudder, H.J.: Probability of error of some adaptive pattern-recognition machines. IEEE Trans. Inf. Theory 11(3), 363–371 (1965)
Kang, F., Jin, R., Sukthankar, R.: correlated label propagation with application to multi-label learning. In: CVPR (2006)
Kong, X., Ng, M.K., Zhou, Z.H.: Transductive multilabel learning via label set propagation. IEEE Trans. Knowl. Data Eng. 25(3), 704–719 (2011)
Lee, D.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML, vol. 3, no. 2 (2013)
Li, X., et al.: Learning to self-train for semi-supervised few-shot classification. In: NeurIPS (2019)
Liu, Y., et al.: Learning to propagate labels: transductive propagation network for few-shot learning. In: ICLR (2019)
Meng, Y., et al.: Weakly-supervised neural text classification. In: CIKM (2018)
Meng, Y., et al.: Weakly-supervised hierarchical text classification. In: AAAI (2019)
Mukherjee, S., Ahmed, A.: Uncertainty-aware self-training for few-shot text classification. In: NeurIPS (2020)
Reimers, N., Gurevych, I.: Sentence-bert: sentence embeddings using siamese bert-networks. In: EMNLP-IJCNLP (2019)
Su, J.: Blog post. https://www.spaces.ac.cn/archives/7359. Accessed 13 July 2021
Wang, B., Tu, Z., Tsotsos, J.K.: Dynamic label propagation for semi-supervised multi-class multi-label classification. In: ICCV (2013)
Wang, L., et al.: Dual relation semi-supervised multi-label learning. In: AAAI (2020)
Wei, C., et al.: CReST: a class-rebalancing self-training framework for imbalanced semi-supervised learning. In: CVPR (2021)
Xie, Q., et al.: Self-training with noisy student improves imagenet classification. In: CVPR (2020)
Xing, Y., et al.: Multi-label co-training. In: IJCAI (2018)
Yang, P., et al.: SGM: sequence generation model for multi-label classification. In: COLING (2018)
Zhan, W., Zhang, M.L.: Inductive semi-supervised multi-label learning with co-training. In: SIGKDD (2017)
Zhang, Y., Zhou, Z.: Non-metric label propagation. In: IJCAI (2009)
Zhu, X., Ghahramani, Z.: learning from labeled and unlabeled data with label propagation. Technical report CMU-CALD-02–107, Carnegie Mellon University (2002)
Zou, Y., Yu, Z., Vijaya Kumar, B.V.K., Wang, J.: Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 297–313. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_18
Zou, Y., et al.: Confidence regularized self-training. In: ICCV (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, Z., Iwaihara, M. (2021). Integrating Semantic-Space Finetuning and Self-Training for Semi-Supervised Multi-label Text Classification. In: Ke, HR., Lee, C.S., Sugiyama, K. (eds) Towards Open and Trustworthy Digital Societies. ICADL 2021. Lecture Notes in Computer Science(), vol 13133. Springer, Cham. https://doi.org/10.1007/978-3-030-91669-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-91669-5_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91668-8
Online ISBN: 978-3-030-91669-5
eBook Packages: Computer ScienceComputer Science (R0)