Abstract
This paper proposes \(\text {Auto-MEKA}_{\text {GGP}}\), an Automated Machine Learning (Auto-ML) method for Multi-Label Classification (MLC) based on the MEKA tool, which offers a number of MLC algorithms. In MLC, each example can be associated with one or more class labels, making MLC problems harder than conventional (single-label) classification problems. Hence, it is essential to select an MLC algorithm and its configuration tailored (optimized) for the input dataset. \(\text {Auto-MEKA}_{\text {GGP}}\) addresses this problem with two key ideas. First, a large number of choices of MLC algorithms and configurations from MEKA are represented into a grammar. Second, our proposed Grammar-based Genetic Programming (GGP) method uses that grammar to search for the best MLC algorithm and configuration for the input dataset. \(\text {Auto-MEKA}_{\text {GGP}}\) was tested in 10 datasets and compared to two well-known MLC methods, namely Binary Relevance and Classifier Chain, and also compared to GA-Auto-MLC, a genetic algorithm we recently proposed for the same task. Two versions of \(\text {Auto-MEKA}_{\text {GGP}}\) were tested: a full version with the proposed grammar, and a simplified version where the grammar includes only the algorithmic components used by GA-Auto-MLC. Overall, the full version of \(\text {Auto-MEKA}_{\text {GGP}}\) achieved the best predictive accuracy among all five evaluated methods, being the winner in six out of the 10 datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Code and documentation are available at: https://github.com/laic-ufmg/automlc/.
- 2.
The implementation of the grammar(s) for EpochX is available at: https://github.com/laic-ufmg/automlc/tree/master/PPSN.
- 3.
The datasets are available at: http://www.uco.es/kdis/mllresources/.
- 4.
- 5.
References
de Sá, A.G.C., Freitas, A.A., Pappa, G.L.: Multi-label classification search space in the MEKA software. Technical report, UFMG (2018). https://github.com/laic-ufmg/automlc/tree/master/PPSN/MLC-SearchSpace.pdf
de Sá, A.G.C., Pappa, G.L., Freitas, A.A.: Towards a method for automatically selecting and configuring multi-label classification algorithms. In: Proceedings of GECCO Companion, pp. 1125–1132 (2017)
de Sá, A.G.C., Pinto, W.J.G.S., Oliveira, L.O.V.B., Pappa, G.L.: RECIPE: a grammar-based framework for automatically evolving classification pipelines. In: McDermott, J., Castelli, M., Sekanina, L., Haasdijk, E., García-Sánchez, P. (eds.) EuroGP 2017. LNCS, vol. 10196, pp. 246–261. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55696-3_16
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Feurer, M., Klein, A., Eggensperger, K., et al.: Efficient and robust automated machine learning. In: Proceedings of the International Conference on Neural Information Processing Systems, pp. 2755–2763 (2015)
Křen, T., Pilát, M., Neruda, R.: Automatic creation of machine learning workflows with strongly typed genetic programming. Int. J. Artif. Intell. Tools 26(5), 1–24 (2017)
Mckay, R., Hoai, N., Whigham, P., Shan, Y., O’Neill, M.: Grammar-based genetic programming: a survey. Genet. Program Evolvable Mach. 11(3), 365–396 (2010)
Olson, R., Bartley, N., Urbanowicz, R., Moore, J.: Evaluation of a tree-based pipeline optimization tool for automating data science. In: Proceedings of GECCO, pp. 485–492 (2016)
Otero, F., Castle, T., Johnson, C.: EpochX: genetic programming in Java with statistics and event monitoring. In: Proceedings of GECCO Companion, pp. 93–100 (2012)
Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 85(3), 333–359 (2011)
Read, J., Reutemann, P., Pfahringer, B., Holmes, G.: MEKA: a multi-label/multi-target extension to WEKA. J. Mach. Learn. Res. 17(21), 1–5 (2016)
Sechidis, K., Tsoumakas, G., Vlahavas, I.: On the stratification of multi-label data. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6913, pp. 145–158. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23808-6_10
Thornton, C., Hutter, F., Hoos, H., Leyton-Brown, K.: Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the ACM SIGKDD Conference, pp. 847–855 (2013)
Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer, Boston (2010). https://doi.org/10.1007/978-0-387-09823-4_34
Witten, I., Frank, E., Hall, M.A., Pal, C.: Data Mining: Practical Machine Learning Tools and Techniques, 4th edn. Morgan Kaufmann, Burlington (2016)
Acknowledgments
This work has been partially funded by the EUBra-BIGSEA project by the European Commission under the Cooperation Programme (MCTI/RNP 3rd Coordinated Call), Horizon 2020 grant agreement 690116. In addition, this work has been partially supported by the following Brazilian Research Support Agencies: CNPq, CAPES and FAPEMIG.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
de Sá, A.G.C., Freitas, A.A., Pappa, G.L. (2018). Automated Selection and Configuration of Multi-Label Classification Algorithms with Grammar-Based Genetic Programming. In: Auger, A., Fonseca, C., Lourenço, N., Machado, P., Paquete, L., Whitley, D. (eds) Parallel Problem Solving from Nature – PPSN XV. PPSN 2018. Lecture Notes in Computer Science(), vol 11102. Springer, Cham. https://doi.org/10.1007/978-3-319-99259-4_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-99259-4_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99258-7
Online ISBN: 978-3-319-99259-4
eBook Packages: Computer ScienceComputer Science (R0)