Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3416028.3416042acmotherconferencesArticle/Chapter ViewAbstractPublication PagesimmsConference Proceedingsconference-collections
research-article

A Comparative Study on Three Multi-Label Classification Tools

Published: 21 September 2020 Publication History

Abstract

Many science, technology and innovation (STI) resources are attached with several different labels, such as IPC and CPC for patents, and PACS (Physics and Astronomy Classification Scheme) numbers for scientific publications. This problem is well known as the multi-label classification. Though there are a number of approaches and open-source tools for this task in the literature that work well on benchmark datasets, real-world is more complex in terms of both the number and hierarchy of labels. This work aims to compare comprehensively the performance of three state-of-the-art tools, Dependency LDA, Scikit-Multilearn and Neural Classifier on Scigraph of academic resource data. It is found that Neural Classifier works better on an unbalanced distribution dataset with more complex hierarchical structure and a larger number of label scale in terms of Micro F1, Micro F1 and Hamming Loss than the other two tools. On the basis of our comparisons, several directions are suggested in the near future.

References

[1]
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. A. 1990. Indexing by Latent Semantic Analysis. J ASSOC INF SCI TECH, 41(6), 391--407.
[2]
Xu, S., and An, X., 2019. ML2S-SVM: multi-label least-squares support vector machine classifiers, ELECTRON LIBR, vol. 37, no. 6, pp. 1040--1058.
[3]
Wehrmann, J., & Barros, R. C. (2018). Hierarchical Multi-Label Classification Networks. Proceedings of the 35th International Conference on Machine Learning, PMLR 80:5075--5084, 2018.
[4]
Rubin, T. N., Chambers, A., Smyth, P., and Steyvers, M. 2012. Statistical topic models for multi-label document classification, MACH LEARN, vol. 88, no. 1, pp. 157--208. DOI= https://doi.org/10.1007/s10994-011-5272-5.
[5]
Liu, L., Mu, F., Li, P., Mu, X., Tang, J., Ai, X., and Zhou, X. 2019. Neural Classifier: An Open-source Neural Hierarchical Multi-label Text Classification Toolkit, Meeting of the Association for Computational Linguistics, (Florence, Italy, July 2019). pp. 87--92. DOI= 10.18653/v1/P19-3015.
[6]
Szymanski, P. and Kajdanowicz, T. 2017. A scikit-based Python environment for performing multi-label classification, The Journal of Machine Learning ResearchVol. 20, No. 1. arXiv:1702.01460.
[7]
Boutell, M., Luo, J., Shen, X., and Brown, C. M. 2004. Learning multi-label scene classification. PATTERN RECOGN, September 2004. vol. 37, no. 9, pp. 1757--1771.
[8]
Read, J., Pfahringer, B., Holmes, G., and Frank, E. 2011. Classifier chains for multi-label classification, Machine Learning and Knowledge Discovery in Databases. ECML PKDD. Lecture Notes in Computer Science, vol. 5782. DOI=https://doi.org/10.1007/978-3-642-04174-7_17.
[9]
Furnkranz, J., Hullermeier, E., Mencia, E. L., and Brinker, K. 2008. Multilabel classification via calibrated label ranking, MACH LEARN, vol. 73, no.2, pp. 133--153. DOI=https://doi.org/10.1007/s10994-008-5064-8.
[10]
Zhang, M., and Zhou, Z. 2007. ML-KNN: A lazy learning approach to multi-label learning, PATTERN RECOGN, vol. 40, no. 7, pp. 2038--2048. DOI=https://doi.org/10.1016/j.patcog.2006.12.019.
[11]
Clare, A. and King, R. D. 2001. Knowledge Discovery in Multi-Label Phenotype Data, Principles of Data Mining and Knowledge Discovery: 5th European Conference, PKDD 2001 (Freiburg, Germany, September 3--5). vol. 2168. pp. 42--53. DOI=https://doi.org/10.1007/3-540-44794-6_4.
[12]
Elisseeff, A. and Weston, J. 2001. A kernel method for multi-labelled classification, Neural Information Processing Systems: Natural and Synthetic, NIPS 2001 (Vancouver, British Columbia, December 3-8, 2001). pp. 681--687.
[13]
Tsoumakas, G. and Vlahavas, I. 2007. Random k-Labelsets: An Ensemble Method for Multilabel Classification, Machine Learning: ECML 2007. Lecture Notes in Computer Science, vol. 4701. pp. 406--417. DOI= https://doi.org/10.1007/978-3-540-74958-5_38.
[14]
Kocev, D., Vens, C., Struyf, J. and Dzeroski, S. 2007. Ensembles of Multi-Objective Decision Trees, Machine Learning: ECML 2007. Lecture Notes in Computer Science, vol. 4701. pp. 624--631. DOI= https://doi.org/10.1007/978-3-540-74958-5_61.
[15]
Zhang, M. and Zhou, Z. 2014. A Review on Multi-Label Learning Algorithms, IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 8, pp. 1819--1837, Aug. 2014.
[16]
Lewis, D. D., Yang, Y., Rose, T. and Li, F. 2004. RCV1: A New Benchmark Collection for Text Categorization Research, J MACH LEARN RES, pp. 361--397.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
IMMS '20: Proceedings of the 3rd International Conference on Information Management and Management Science
August 2020
120 pages
ISBN:9781450375467
DOI:10.1145/3416028
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Southwest Jiaotong University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 September 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Dependency LDA
  2. Multi-label classification
  3. Neural Classifier
  4. Scikit-Multilearn

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • Social Science Foundation of Beijing Municipality
  • Fundamental Research Funds for the Central Universities
  • Natural Science Foundation of Guangdong Province

Conference

IMMS 2020

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 92
    Total Downloads
  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media