Nothing Special   »   [go: up one dir, main page]

skip to main content
article

mldr.resampling: : Efficient reference implementations of multilabel resampling algorithms

Published: 28 November 2023 Publication History

Abstract

Resampling algorithms are a useful approach to deal with imbalanced learning in multilabel scenarios. These methods have to deal with singularities in the multilabel data, such as the occurrence of frequent and infrequent labels in the same instance. Implementations of these methods are sometimes limited to the pseudocode provided by their authors in a paper. This Original Software Publication presents mldr.resampling, a software package that provides reference implementations for eleven multilabel resampling methods, with an emphasis on efficiency since these algorithms are usually time-consuming.

References

[1]
Herrera F., Charte F., Rivera A.J., del Jesus M.J., Multilabel Classification. Problem Analysis, Metrics and Techniques, Springer, 2016,.
[2]
Charte F., Rivera A.J., del Jesus M.J., Herrera F., Addressing imbalance in multilabel classification: Measures and random resampling algorithms, Neurocomputing 163 (2015) 3–16,.
[3]
Luque A., Carrasco A., Martín A., de las Heras A., The impact of class imbalance in classification performance metrics based on the binary confusion matrix, Pattern Recognit. 91 (2019) 216–231,.
[4]
Sun Y., Wong A.K., Kamel M.S., Classification of imbalanced data: A review, Int. J. Pattern Recognit. Artif. Intell. 23 (04) (2009) 687–719,.
[5]
Haixiang G., Yijing L., Shang J., Mingyun G., Yuanyue H., Bing G., Learning from class-imbalanced data: Review of methods and applications, Expert Syst. Appl. 73 (2017) 220–239,.
[6]
Menon A., Narasimhan H., Agarwal S., Chawla S., On the statistical consistency of algorithms for binary classification under class imbalance, in: Dasgupta S., McAllester D. (Eds.), Proceedings of the 30th International Conference on Machine Learning, in: Proceedings of Machine Learning Research, vol. 28, PMLR, Atlanta, Georgia, USA, 2013, pp. 603–611.
[7]
He H., Ma Y., Imbalanced Learning: Foundations, Algorithms, and Applications, Wiley-IEEE, 2013,.
[8]
Kotsiantis S., Kanellopoulos D., Pintelas P., et al., Handling imbalanced datasets: A review, GESTS Int. Trans. Comput. Sci. Eng. 30 (1) (2006) 25–36.
[9]
Mohammed R., Rawashdeh J., Abdullah M., Machine learning with oversampling and undersampling techniques: Overview study and experimental results, in: 2020 11th international conference on information and communication systems, ICICS, IEEE, 2020, pp. 243–248,.
[10]
Sadhukhan P., Palit S., Reverse-nearest neighborhood based oversampling for imbalanced, multi-label datasets, Pattern Recognit. Lett. 125 (2019) 813–820,.
[11]
Charte F., Rivera A.J., del Jesus M.J., Herrera F., MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation, Knowl.-Based Syst. 89 (2015) 385–397,.
[12]
Liu B., Blekas K., Tsoumakas G., Multi-label sampling based on local label imbalance, Pattern Recognit. 122 (2022),.
[13]
Pereira R.M., Costa Y.M., Silla Jr. C.N., MLTL: A multi-label approach for the Tomek Link undersampling algorithm, Neurocomputing 383 (2020) 95–105,.
[14]
Charte F., Rivera A., del Jesus M., Herrera F., MLeNN: A first approach to heuristic multilabel undersampling, in: Proc. 15th Int. Conf. Intelligent Data Engineering and Automated Learning, Salamanca, Spain, in: LNCS, vol. 8669, IDEAL’14, 2014, pp. 1–9,.
[15]
Charte F., Rivera A., del Jesus M.J., Herrera F., Resampling multilabel datasets by decoupling highly imbalanced labels, in: Hybrid Artificial Intelligent Systems, in: Lecture Notes in Computer Science, vol. 9121, Springer International Publishing, 2015, pp. 489–501,.
[16]
Charte F., Charte D., Working with multilabel datasets in R: The mldr package, R J. 7 (2) (2015) 149–162,.
[17]
Charte F., Rivera A.J., Charte D., del Jesus M.J., Herrera F., Tips, guidelines and tools for managing multi-label datasets: The mldr.datasets R package and the cometa data repository, Neurocomputing 289 (2018) 68–85,.
[18]
Charte D., Charte F., García S., Herrera F., A snapshot on nonstandard supervised learning problems: Taxonomy, relationships, problem transformations and algorithm adaptations, Progress Artif. Intell. 8 (1) (2019) 1–14,.
[19]
Sun S., A survey of multi-view machine learning, Neural Comput. Appl. 23 (7–8) (2013) 2031–2038,.
[20]
Zhou Z.H., Multi-Instance Learning: A Survey, Department of Computer Science & Technology, Nanjing University, 2004.
[21]
Robinson S.D., Multi-label classification of contributing causal factors in self-reported safety narratives, Safety 4 (2018) 30,.
[22]
Dai O.E., Demir B., Sankur B., Bruzzone L., A novel system for content-based retrieval of single and multi-label high-dimensional remote sensing images, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. (99) (2018) 1–18,.
[23]
Liu T., Chen L., Pan X., An integrated multi-label classifier with chemical-chemical interactions for prediction of chemical toxicity effects, Comb. Chem. High Throughput Screen. 21 (6) (2018) 403–410,.
[24]
Charte F., Rivera A.J., del Jesus M.J., Herrera F., QUINTA: A question tagging assistant to improve the answering ratio in electronic forums, in: EUROCON 2015 - International Conference on Computer as a Tool, EUROCON, IEEE, 2015, pp. 1–6,.
[25]
Zhang M., Zhou Z., A review on multi-label learning algorithms, IEEE Trans. Knowl. Data Eng. 26 (8) (2014) 1819–1837,.
[26]
Gibaja E., Ventura S., Multi-label learning: A review of the state of the art and ongoing research, Wiley Interdiscipl. Rev.: Data Min. Knowl. Discov. 4 (6) (2014) 411–444,.
[27]
Gibaja E., Ventura S., A tutorial on multilabel learning, ACM Comput. Surv. 47 (3) (2015) 52:1–52:38,.
[28]
Japkowicz N., Stephen S., The class imbalance problem: A systematic study, Intell. Data Anal. 6 (5) (2002) 429–449,.
[29]
Fernández A., López V., Galar M., del Jesus M.J., Herrera F., Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches, Knowl. Based Syst. 42 (2013) 97–110,.
[30]
López V., Fernández A., García S., Palade V., Herrera F., An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics, Inf. Sci. 250 (2013) 113–141,.
[31]
Godbole S., Sarawagi S., Discriminative methods for multi-labeled classification, Advances in Knowledge Discovery and Data Mining, vol. 3056, 2004, pp. 22–30,.
[32]
Charte F., Rivera A.J., del Jesus M.J., Herrera F., Dealing with difficult minority labels in imbalanced mutilabel data sets, Neurocomputing 326 (2019) 39–53,.
[33]
Tsoumakas G., Vlahavas I., Random k-labelsets: An ensemble method for multilabel classification, in: Proc. 18th European Conf. on Machine Learning, Warsaw, Poland, Vol. 4701, ECML’07, 2007, pp. 406–417,.
[34]
G. Tsoumakas, I. Katakis, I. Vlahavas, Effective and efficient multilabel classification in domains with large number of labels, in: Proc. ECML/PKDD Workshop on Mining Multidimensional Data, Antwerp, Belgium, MMD’08, 2008, pp. 30–44.
[35]
Read J., Pfahringer B., Holmes G., Frank E., Classifier chains for multi-label classification, Mach. Learn. 85 (2011) 333–359,.
[36]
Read J., Pfahringer B., Holmes G., Multi-label classification using ensembles of pruned sets, in: 8th International Conference on Data Mining, 2008, ICDM’08, IEEE, 2008, pp. 995–1000,.
[37]
Read J., Martino L., Olmos P.M., Luengo D., Scalable multi-output label prediction: From classifier chains to classifier trellises, Pattern Recognit. 48 (6) (2015) 2096–2109,.
[38]
Liu B., Tsoumakas G., Synthetic oversampling of multi-label data based on local label distribution, in: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2019, Würzburg, Germany, September 16–20 2019, Proceedings, Part II, Springer, 2020, pp. 180–193.
[39]
Tarekegn A.N., Giacobini M., Michalak K., A review of methods for imbalanced multi-label classification, Pattern Recognit. 118 (2021),.
[40]
Charte F., A comprehensive and didactic review on multilabel learning software tools, IEEE Access 8 (2020) 50330–50354,.
[41]
Bischl B., Lang M., Kotthoff L., Schiffner J., Richter J., Studerus E., Casalicchio G., Jones Z.M., Mlr: Machine learning in r, J. Mach. Learn. Res. 17 (170) (2016) 1–5.
[42]
Rivolli A., de Carvalho A.C.P.L.F., The utiml package: Multi-label classification in R, R Journal 10 (2) (2018) 24–37,.
[43]
Chen L.-P., et al., Netda: An R package for network-based discriminant analysis subject to multilabel classes, J. Probab. Stat. (2022) (2022).
[44]
Popov M., Multi-label classification with MLPUGS, Compr. R Netw. Arch. (2016) URL https://github.com/bearloga/MLPUGS.
[45]
Schiebout C., Frost H.R., CAMML: Multi-label immune cell-typing and stemness analysis for single-cell RNA-sequencing, in: PACIFIC SYMPOSIUM on BIOCOMPUTING 2022, World Scientific, 2021, pp. 199–210.
[46]
Gautier L., rpy2 3.5.13 - R in Python, 2023, URL https://rpy2.github.io.
[47]
Urbanek S., RJava: Low-level R to Java interface, 2021, URL https://cran.r-project.org/web/packages/rJava/index.html.
[48]
Eddelbuettel D., Rinside, in: Seamless R and C++ Integration with Rcpp, Springer New York, New York, NY, 2013, pp. 127–137,.
[49]
Wilson D.L., Asymptotic properties of nearest neighbor rules using edited data, IEEE Trans. Syst. Man Cybern. (3) (1972) 408–421,.
[50]
Chawla N.V., Bowyer K.W., Hall L.O., Kegelmeyer W.P., SMOTE: Synthetic minority over-sampling technique, J. Artif. Intell. Res. 16 (2002) 321–357,.
[51]
Tomek I., Two modifications of CNN, IEEE Trans. Syst. Man Cybern. 6 (11) (1976) 769–772,.
[52]
Stanfill C., Waltz D., Toward memory-based reasoning, Commun. ACM 29 (12) (1986) 1213–1228,.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Neurocomputing
Neurocomputing  Volume 559, Issue C
Nov 2023
442 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 28 November 2023

Author Tags

  1. Multilabel learning
  2. Imbalanced learning
  3. Resampling algorithms
  4. R software package

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media