research-article

On the Thresholding Strategy for Infrequent Labels in Multi-label Classification

Authors:

Chih-Jen LinAuthors Info & Claims

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Pages 1441 - 1450

https://doi.org/10.1145/3583780.3614996

Published: 21 October 2023 Publication History

Abstract

In multi-label classification, the imbalance between labels is often a concern. For a label that seldom occurs, the default threshold used to generate binarized predictions of that label is usually sub-optimal. However, directly tuning the threshold to optimize F-measure has been observed to overfit easily. In this work, we explain why this overfitting occurs. Then, we analyze the FBR heuristic, a previous technique proposed to address the overfitting issue. We explain its success but also point out some problems unobserved before. Then, we first propose a variant of the FBR heuristic that not only fixes the problems but is also more justifiable. Second, we propose a new technique based on smoothing the F-measure when tuning the threshold. We theoretically prove that, with proper parameters, smoothing results in desirable properties of the tuned threshold. Based on the idea of smoothing, we then propose jointly optimizing micro-F and macro-F as a lightweight alternative free from extra hyperparameters. Our methods are empirically evaluated on text and node classification datasets. The results show that our methods consistently outperform the FBR heuristic.

References

[1]

Janez Brank, Marko Grobelnik, Natavs a Milić-Frayling, and Dunja Mladenić. 2003. Training text classifiers with SVM on very few positive examples. Technical Report. Technical Report MSR-TR-2003--34, Microsoft Corp.

[2]

Wei-Cheng Chang, Daniel Jiang, Hsiang-Fu Yu, Choon-Hui Teo, Jiong Zhang, Kai Zhong, Kedarnath Kolluri, Qie Hu, Nikhil Shandilya, Vyacheslav Ievgrafov, Japinder Singh, and Inderjit S Dhillon. 2021. Extreme multi-label learning for semantic matching in product search. In Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD).

Digital Library

[3]

Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: a library for large linear classification. Journal of Machine Learning Research, Vol. 9 (2008), 1871--1874. http://www.csie.ntu.edu.tw/ cjlin/papers/liblinear.pdf

Digital Library

[4]

Rong-En Fan and Chih-Jen Lin. 2007. A study on threshold selection for multi-label classification. Technical Report. Department of Computer Science, National Taiwan University.

[5]

Aditya Grover and Jure Leskovec. 2016. Node2vec: scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 855--864. https://doi.org/10.1145/2939672.2939754

Digital Library

[6]

Haixiang Guo, Yijing Li, Jennifer Shang, Mingyun Gu, Yuanyue Huang, and Bing Gong. 2017. Learning from class-imbalanced data: Review of methods and applications. Expert Systems With Applications, Vol. 73 (2017), 220--239.

Digital Library

[7]

Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Advances in Neural Information Processing Systems.

[8]

Haibo He and Edwardo A. Garcia. 2009. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, Vol. 21, 9 (2009), 1263--1284.

Digital Library

[9]

Kalina Jasinska, Krzysztof Dembczy'nski, Róbert Busa-Fekete, Karlson Pfannschmidt, Timo Klerx, and Eyke Hüllermeier. 2016. Extreme F-measure maximization using sparse probability estimates. In Proceedings of The 33rd International Conference on Machine Learning (ICML). 1435--1444.

[10]

Justin M. Johnson and Taghi M. Khoshgoftaar. 2019. Deep learning and thresholding with class-imbalanced big data. In Proceedings of the 18th IEEE International Conference on Machine Learning and Applications (ICMLA). 755--762.

[11]

Oluwasanmi O. Koyejo, Nagarajan Natarajan, Pradeep K. Ravikumar, and Inderjit S. Dhillon. 2015. Consistent Multilabel Classification. In Advances in Neural Information Processing Systems, Vol. 28. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2015/file/85f007f8c50dd25f5a45fca73cad64bd-Paper.pdf

[12]

David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, Vol. 5 (2004), 361--397.

Digital Library

[13]

Li-Chung Lin, Cheng-Hung Liu, Chih-Ming Chen, Kai-Chin Hsu, I-Feng Wu, Ming-Feng Tsai, and Chih-Jen Lin. 2022. On the use of unrealistic predictions in hundreds of papers evaluating graph representations. In Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI). https://www.csie.ntu.edu.tw/ cjlin/papers/multilabel-embedding/multilabel_embedding.pdf

[14]

Zachary C. Lipton, Charles Elkan, and Balakrishnan Naryanaswamy. 2014. Optimal thresholding of classifiers to maximize F1 measure. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD). 225--239.

Digital Library

[15]

Johannes Loza Mencía, Eneldoand Fürnkranz. 2010. Efficient multilabel classification algorithms for large-scale problems in the legal domain. In Semantic Processing of Legal Texts: Where the Language of Law Meets the Law of Language, Enrico Francesconi, Simonetta Montemagni, Wim Peters, and Daniela Tiscornia (Eds.). Springer Berlin Heidelberg, 192--215.

[16]

James Mullenbach, Sarah Wiegreffe, Jon Duke, Jimeng Sun, and Jacob Eisenstein. 2018. Explainable prediction of medical codes from clinical text. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL). 1101--1111. https://doi.org/10.18653/v1/N18--1100

[17]

Shameem A. Puthiya Parambath, Nicolas Usunier, and Yves Grandvalet. 2014. Optimizing F-measures by cost-sensitive classification. In Advances in Neural Information Processing Systems, Vol. 27.

[18]

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online Learning of Social Representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). 701--710. https://doi.org/10.1145/2623330.2623732

Digital Library

[19]

Ignazio Pillai, Giorgio Fumera, and Fabio Roli. 2013. Threshold optimisation for multi-label classifiers. Pattern Recognition, Vol. 46, 7 (2013), 2055--2065.

Digital Library

[20]

Foster Provost. 2000. Machine Learning from Imbalanced Data Sets 101. In Proceedings of the AAAI Workshop on Imbalanced Data Sets. 1--3.

[21]

Erik Schultheis, Marek Wydmuch, Rohit Babbar, and Krzysztof Dembczynski. 2022. On missing labels, long-tails and propensities in extreme multi-label classification. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD). 1547--1557.

Digital Library

[22]

Fabrizio Sebastiani. 2002. Machine learning in automated text categorization. ACM Comput. Surv., Vol. 34, 1 (2002), 1--47.

Digital Library

[23]

Shai Shalev-Shwartz, Yoram Singer, and Nathan Srebro. 2007. Pegasos: primal estimated sub-gradient solver for SVM. In Proceedings of the Twenty Fourth International Conference on Machine Learning (ICML).

Digital Library

[24]

Aixin Sun, Ee-Peng Lim, and Ying Liu. 2009. On strategies for imbalanced text classification using SVM: A comparative study. Decision Support Systems, Vol. 48, 1 (2009), 191--201.

Digital Library

[25]

Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. Line: Large-scale information network embedding. In Proceedings of the 24th international Conference on World Wide Web (WWW). 1067--1077.

Digital Library

[26]

Lei Tang and Huan Liu. 2009. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD). 817--826.

Digital Library

[27]

Gang Wu and Edward Y. Chang. 2003. Class-Boundary Alignment for Imbalanced Dataset Learning. In ICML Workshop on Learning from Imbalanced Data Sets II. 49--56.

[28]

Yiming Yang. 2001. A Study on Thresholding Strategies for Text Categorization. In Proceedings of the 24th ACM International Conference on Research and Development in Information Retrieval, W. Bruce Croft, David J. Harper, Donald H. Kraft, and Justin Zobel (Eds.). ACM Press, New York, US, New Orleans, US, 137--145.

Digital Library

[29]

Hsiang-Fu Yu, Kai Zhong, Jiong Zhang, Wei-Cheng Chang, and Inderjit S. Dhillon. 2022. PECOS: Prediction for Enormous and Correlated Output Spaces. Journal of Machine Learning Research, Vol. 23, 98 (2022), 1--32.

[30]

Guo-Xun Yuan, Kai-Wei Chang, Cho-Jui Hsieh, and Chih-Jen Lin. 2010. A Comparison of Optimization Methods and software for Large-scale L1-regularized Linear Classification. Journal of Machine Learning Research, Vol. 11 (2010), 3183--3234. http://www.csie.ntu.edu.tw/ cjlin/papers/l1.pdf

Digital Library

[31]

Jiong Zhang, Wei-Cheng Chang, Hsiang-Fu Yu, and Inderjit S. Dhillon. 2021. Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification., Vol. 34 (2021), 7267--7280.

[32]

Arkaitz Zubiaga. 2009. Enhancing Navigation on Wikipedia with Social Tags. In Proceedings of Wikimania.

Cited By

Schultheis EWydmuch MKotłowski WBabbar RDembczyński KOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Generalized test utilities for long-tail performance in extreme multi-label classificationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667100(22269-22303)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667100

Index Terms

On the Thresholding Strategy for Infrequent Labels in Multi-label Classification
1. Computing methodologies
  1. Machine learning

Recommendations

Hierarchical Multi-Label Classification with Partial Labels and Unknown Hierarchy
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Hierarchical multi-label classification aims at learning a multi-label classifier from a dataset whose labels are organized into a hierarchical structure. To the best of our knowledge, we propose for the first time the problem of finding a multi-label ...
Confidence-based Weighted Loss for Multi-label Classification with Missing Labels
ICMR '20: Proceedings of the 2020 International Conference on Multimedia Retrieval

The problem of multi-label classification with missing labels (MLML) is a common challenge that is prevalent in several domains, e.g. image annotation and auto-tagging. In multi-label classification, each instance may belong to multiple class labels ...
Clustered intrinsic label correlations for multi-label classification

The classifier for each label consists of a label-specific part and a shared one.The label-specific part characterizes the corresponding label.The shared part represents the information shared by all labels.Intrinsic label correlations are represented ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

October 2023

5508 pages

ISBN:9798400701245

DOI:10.1145/3583780

General Chairs:
Ingo Frommholz
University of Wolverhampton, UK
,
Frank Hopfgartner
University of Koblenz, Germany
,
Mark Lee
University of Birmingham, UK
,
Michael Oakes
University of Birmingham, UK
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Min Zhang
Tsinghua University, China
,
Rodrygo Santos
Federal University of Minas Gerais, Brazil

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science and Technology Council

Conference

CIKM '23

Sponsor:

CIKM '23: The 32nd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2023

Birmingham, United Kingdom

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
113
Total Downloads

Downloads (Last 12 months)93
Downloads (Last 6 weeks)11

Reflects downloads up to 22 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Schultheis EWydmuch MKotłowski WBabbar RDembczyński KOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Generalized test utilities for long-tail performance in extreme multi-label classificationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667100(22269-22303)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667100

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents