Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1557019.1557119acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Effective multi-label active learning for text classification

Published: 28 June 2009 Publication History

Abstract

Labeling text data is quite time-consuming but essential for automatic text classification. Especially, manually creating multiple labels for each document may become impractical when a very large amount of data is needed for training multi-label text classifiers. To minimize the human-labeling efforts, we propose a novel multi-label active learning approach which can reduce the required labeled data without sacrificing the classification accuracy. Traditional active learning algorithms can only handle single-label problems, that is, each data is restricted to have one label. Our approach takes into account the multi-label information, and select the unlabeled data which can lead to the largest reduction of the expected model loss. Specifically, the model loss is approximated by the size of version space, and the reduction rate of the size of version space is optimized with Support Vector Machines (SVM). An effective label prediction method is designed to predict possible labels for each unlabeled data point, and the expected loss for multi-label data is approximated by summing up losses on all labels according to the most confident result of label prediction. Experiments on several real-world data sets (all are publicly available) demonstrate that our approach can obtain promising classification result with much fewer labeled data than state-of-the-art methods.

Supplementary Material

JPG File (p917-yang.jpg)
MP4 File (p917-yang.mp4)

References

[1]
C. Campbell, N. Cristianini, and A. J. Smola. Query learning with large margin classifiers. In Proceedings of the 7th International Conference on Machine Learning (ICML'00), pages 111--118, 2000.
[2]
D. A. Cohn, Z. Ghahramani, and M. I. Jordan. Active learning with statistical models. In Advances in Neural Information Processing Systems, volume 7, pages 705--712. The MIT Press, 1995.
[3]
C. Cortes and V. Vapnik. Support vector networks. In Machine Learning, pages 273--297, 1995.
[4]
A. Esuli and F. Sebastiani. Active learning strategies for multi-label text classification. In Proceedings of the 31th European Conference on Information Retrieval (ECIR'09), pages 102--113, 2009.
[5]
R.-E. Fan and C.-J. Lin. A study on threshold selection for multi-label classification. Technical Report, National Taiwan University, 2007.
[6]
T. Joachims. Text categorization with support vector machines: Learning with many relevant features. pages 137--142. Springer Verlag, 1998.
[7]
T. Joachims. Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Kluwer Academic Publishers, Norwell, MA, USA, 2002.
[8]
H. Kazawa, T. Izumitani, H. Taira, and E. Maeda. Maximal margin labeling for multi-topic text categorization. In Advances in Neural Information Processing Systems (NIPS'05), pages 649--656, 2005.
[9]
K. Brinker. On Active Learning in Multi-label Classification. "FromData and Information Analysis to Knowledge Engineering" of BookSeries "Studiesin Classification, Data Analysis, and Knowledge Organization", Springer, 2006. 1, 2.
[10]
D. D. Lewis and W. A. Gale. A sequential algorithm for training text classifiers. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval(SIGIR'94), pages 3--12, 1994.
[11]
D. D. Lewis, Y. Yang, T. G. Rose, G. Dietterich, F. Li, and F. Li. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361--397, 2004.
[12]
X. Li, L. Wang, and E. Sung. Multi-label svm active learning for image classification. In International Conference on Image Processing, pages 2207--2210, 2004.
[13]
H.-T. Lin, C.-J. Lin, and R. C. Weng. A note on Platt's probabilistic outputs for support vector machines. Journal of Machine Learning Research, 68(3):267--276, 2007.
[14]
T. Luo, K. Kramer, D. B. Goldgof, L. O. Hall, S. Samson, A. Remsen,and T. Hopkins. Active learning to recognize multiple types of plankton. Journal of Machine Learning Research, 6:589--613, 2005.
[15]
N. Ueda and K. Saito. Single-shot detection of multiple categories of text using parametric mixture models. In Proceedings of the 8th ACM SIGKDD international conference on Knowledge discovery and data mining(KDD'02), pages 626--631, 2002.
[16]
G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, and H.-J. Zhang. Two-dimensional active learning for image classification. IEEE Conference on Computer Vision and Pattern Recognition, 2008.
[17]
G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, and H.-J. Zhang. Two-dimensional multi-label active learning with an efficient online adaptation model for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 99(1), 2008.
[18]
N. Roy and A. McCallum. Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the 8th International Conference on Machine Learning(ICML'01), pages 441--448, 2001.
[19]
H. S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Proceedings of the 5th annual workshop on Computational learning theory(COLT'92), pages 287--294, 1992.
[20]
S. Tong. Active learning: Theory and Applications. PhD thesis, Standford University, CA, 2001.
[21]
S. Tong and D. Koller. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research, 2:45--66, 2002.
[22]
R. Yan, J. Yang, and A. Hauptmann. Automatically labeling video data using multi-class active learning. In Proceedings of the 9th IEEE International Conference on Computer Vision(ICCV'03), page 516, 2003.
[23]
Y. Yang. A study on thresholding strategies for text categorization. In Proceedings of 24th International Conference on Research and Development in Information Retrieval(SIGIR'01), pages 137--145, 2001.

Cited By

View all
  • (2024)Competence Awareness for Humans and Machines: A Survey and Future Research Directions from PsychologyACM Computing Surveys10.1145/368962657:1(1-26)Online publication date: 7-Oct-2024
  • (2024)Active Batch Sampling for Multi-label Classification with Binary User Feedback2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00252(2522-2531)Online publication date: 3-Jan-2024
  • (2024)Deciphering Empathy in Developer Responses: A Hybrid Approach Utilizing the Perception Action Model and Automated Classification2024 IEEE 32nd International Requirements Engineering Conference Workshops (REW)10.1109/REW61692.2024.00017(88-94)Online publication date: 24-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
June 2009
1426 pages
ISBN:9781605584959
DOI:10.1145/1557019
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. active learning
  2. multi-label classification
  3. support vector machines
  4. text classification

Qualifiers

  • Research-article

Conference

KDD09

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)76
  • Downloads (Last 6 weeks)6
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Competence Awareness for Humans and Machines: A Survey and Future Research Directions from PsychologyACM Computing Surveys10.1145/368962657:1(1-26)Online publication date: 7-Oct-2024
  • (2024)Active Batch Sampling for Multi-label Classification with Binary User Feedback2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV57701.2024.00252(2522-2531)Online publication date: 3-Jan-2024
  • (2024)Deciphering Empathy in Developer Responses: A Hybrid Approach Utilizing the Perception Action Model and Automated Classification2024 IEEE 32nd International Requirements Engineering Conference Workshops (REW)10.1109/REW61692.2024.00017(88-94)Online publication date: 24-Jun-2024
  • (2024)Deep active learning for multi label text classificationScientific Reports10.1038/s41598-024-79249-714:1Online publication date: 15-Nov-2024
  • (2024)A novel full-resolution convolutional neural network for urban-fringe-rural identification: A case study of urban agglomeration regionLandscape and Urban Planning10.1016/j.landurbplan.2024.105122249(105122)Online publication date: Sep-2024
  • (2024)An efficient approach for multi-label classification based on Advanced Kernel-Based Learning SystemIntelligent Systems with Applications10.1016/j.iswa.2024.200332(200332)Online publication date: Jan-2024
  • (2024)DeBERTa-BiLSTM: A multi-label classification model of Arabic medical questions using pre-trained models and deep learningComputers in Biology and Medicine10.1016/j.compbiomed.2024.107921170(107921)Online publication date: Mar-2024
  • (2024)Automatic Requirement Dependency Extraction Based on Integrated Active Learning StrategiesMachine Intelligence Research10.1007/s11633-023-1420-121:5(993-1010)Online publication date: 22-Feb-2024
  • (2024)Classifying the content of online notepad services using active learningJournal of Intelligent Information Systems10.1007/s10844-024-00902-8Online publication date: 17-Oct-2024
  • (2023)CalpricProceedings of the 32nd USENIX Conference on Security Symposium10.5555/3620237.3620297(1055-1072)Online publication date: 9-Aug-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media