Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1609/aaai.v38i10.28984guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

MEPSI: an MDL-based ensemble pruning approach with structural information

Published: 20 February 2024 Publication History

Abstract

Ensemble pruning that combines a subset of individual learners generated in parallel to make predictions is an important topic in ensemble learning. Past decades have developed a lot of pruning algorithms that focus on the external behavior of learners on samples, which may lead to over-fitting. In this paper, we conjecture that the generalization performance of an ensemble is not only related to its external behavior on samples but also dependent on the internal structure of individual learners. We propose the general MEPSI approach based on Kolmogorov complexity and the Minimum Description Length (MDL) principle, which formulates the ensemble pruning task as the two-objective optimization problem that comprises the empirical error and structural information among individual learners. We also provide a concrete implementation of MEPSI on decision trees. The theoretical results provide generalization bounds for both the general MEPSI approach and tree-based implementation. The comparative experiments conducted on multiple real-world data sets demonstrate the effectiveness of our proposed method.

References

[1]
Banfield, R. E.; Hall, L. O.; Bowyer, K. W.; and Kegelmeyer, W. P. 2005. Ensemble diversity measures and their application to thinning. Information Fusion, 6(1): 49-62.
[2]
Breiman, L. 1996. Stacked regressions. Machine Learning, 24(1): 49-64.
[3]
Breiman, L. 2001. Random forests. Machine Learning, 45(1): 5-32.
[4]
Breiman, L. 2017. Classification and Regression Trees. Routledge.
[5]
Brooks Jr, F. P. 2003. Three great challenges for half-century-old computer science. Journal of the ACM, 50(1): 25-26.
[6]
Chen, T.; and Guestrin, C. 2016. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794.
[7]
Cunningham, P.; and Carney, J. 2000. Diversity versus quality in classification ensembles based on feature selection. In Proceedings of the 11th European Conference on Machine Learning, 109-116.
[8]
Dietterich, T. G. 2000. Ensemble methods in machine learning. In Proceedings of the 1st International Workshop on Multiple Classifier Systems, 1-15.
[9]
Dorogush, A. V.; Ershov, V.; and Gulin, A. 2018. CatBoost: Gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363.
[10]
Friedman, J. H. 2001. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 1189-1232.
[11]
Geurts, P.; Ernst, D.; and Wehenkel, L. 2006. Extremely randomized trees. Machine Learning, 63(1): 3-42.
[12]
Giacinto, G.; Roli, F.; and Fumera, G. 2000. Design of effective multiple classifier systems by clustering of classifiers. In Proceedings of the 15th International Conference on Pattern Recognition, 160-163.
[13]
Grünwald, P. D.; and Vitányi, P. M. 2008. Algorithmic information theory. Handbook of the Philosophy of Information, 281-320.
[14]
Hull, J. J. 1994. A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(5): 550-554.
[15]
Kelly, M.; Longjohn, R.; and Nottingham, K. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml. Accessed: 2023-09-25.
[16]
Kohavi, R.; and Wolpert, D. H. 1996. Bias plus variance decomposition for zero-one loss functions. In Proceedings of the 13th International Conference on Machine Learning, 275-83.
[17]
Kolmogorov, A. N. 1965. Three approaches to the quantitative definition of information. Problems of Information Transmission, 1(1): 1-7.
[18]
Kuncheva, L. I.; and Whitaker, C. J. 2003. Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine Learning, 51(2): 181-207.
[19]
Lazarevic, A.; and Obradovic, Z. 2001. Effective pruning of neural network classifier ensembles. In Proceedings of the 2001 International Joint Conference on Neural Networks, 796-801.
[20]
Li, M.; and Vitányi, P. M. B. 2019. An Introduction to Kolmogorov Complexity and its Applications, 4th Edition. Springer.
[21]
Li, N.; and Zhou, Z.-H. 2009. Selective ensemble under regularization framework. In Proceedings of the 8th International Workshop on Multiple Classifier Systems, 293-303.
[22]
Margineantu, D. D.; and Dietterich, T. G. 1997. Pruning adaptive boosting. In Proceedings of the 14th International Conference on Machine Learning, 211-218.
[23]
Martinez-Munoz, G.; Hernández-Lobato, D.; and Suárez, A. 2008. An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2): 245-259.
[24]
Martínez-Muñoz, G.; and Suárez, A. 2006. Pruning in ordered bagging ensembles. In Proceedings of the 23rd International Conference on Machine learning, 609-616.
[25]
Martinez-Munoz, G.; and Suárez, A. 2007. Using boosting to prune bagging ensembles. Pattern Recognition Letters, 28(1): 156-165.
[26]
Pawlik, M.; and Augsten, N. 2011. RTED: A robust algorithm for the tree edit distance. arXiv preprint arXiv:1201.0230.
[27]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; and Dubourg, V. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12: 2825-2830.
[28]
Rissanen, J. 1978. Modeling by shortest data description. Automatica, 14(5): 465-471.
[29]
Rodriguez, J. J.; Kuncheva, L. I.; and Alonso, C. J. 2006. Rotation forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10): 1619-1630.
[30]
Shannon, C. E. 1948. A mathematical theory of communication. The Bell System Technical Journal, 27(3): 379-423.
[31]
Skalak, D. B. 1996. The sources of increased accuracy for two proposed boosting algorithms. In Working Notes of the AAAI'96 Workshop on Integrating Multiple Learned Models, 1133.
[32]
Solomonoff, R. J. 1964. A formal theory of inductive inference. Part I and Part II. Information and Control, 7: 1-22,224-254.
[33]
Sun, T.; and Zhou, Z.-H. 2018. Structural diversity for decision tree ensemble learning. Frontiers of Computer Science, 12(3): 560-570.
[34]
Vereshchagin, N. K.; and Vitányi, P. M. 2004. Kolmogorov's structure functions and model selection. IEEE Transactions on Information Theory, 50(12): 3265-3290.
[35]
Wolpert, D. H. 1992. Stacked generalization. Neural Networks, 5(2): 241-259.
[36]
Wu, Y.-C.; He, Y.-X.; Qian, C.; and Zhou, Z.-H. 2022. Multi-objective evolutionary ensemble pruning guided by margin distribution. In Proceedings of the 17th International Conference on Parallel Problem Solving from Nature, 427-441.
[37]
Yule, G. U. 1900. On the association of attributes in statistics. Philosophical Transactions of the Royal Society of London, 194(252-261): 257-319.
[38]
Zhang, K.; and Shasha, D. 1989. Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal on Computing, 18(6): 1245-1262.
[39]
Zhang, S.-Q.; Wu, J.-H.; Zhang, G.; Xiong, H.; Gu, B.; and Zhou, Z.-H. 2023. On the Generalization of Spiking Neural Networks via Minimum Description Length and Structural Stability. arXiv preprint arXiv:2207.04876.
[40]
Zhang, Y.; Burer, S.; Nick Street, W.; Bennett, K. P.; and Parrado-Hernández, E. 2006. Ensemble Pruning via Semi-definite Programming. Journal of Machine Learning Research, 7(48): 1315-1338.
[41]
Zhou, Z.-H. 2012. Ensemble Methods: Foundations and Algorithms. CRC press.
[42]
Zhou, Z.-H.; and Feng, J. 2019. Deep forest. National Science Review, 6(1): 74-86.
[43]
Zhou, Z.-H.; and Tang, W. 2003. Selective ensemble of decision trees. In Proceedings of the 9th International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing, 476-483.
[44]
Zhou, Z.-H.; Wu, J.; and Tang, W. 2002. Ensembling neural networks: Many could be better than all. Artificial Intelligence, 137(1-2): 239-263.

Index Terms

  1. MEPSI: an MDL-based ensemble pruning approach with structural information
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image Guide Proceedings
            AAAI'24/IAAI'24/EAAI'24: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence
            February 2024
            23861 pages
            ISBN:978-1-57735-887-9

            Sponsors

            • Association for the Advancement of Artificial Intelligence

            Publisher

            AAAI Press

            Publication History

            Published: 20 February 2024

            Qualifiers

            • Research-article
            • Research
            • Refereed limited

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • 0
              Total Citations
            • 0
              Total Downloads
            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 28 Feb 2025

            Other Metrics

            Citations

            View Options

            View options

            Figures

            Tables

            Media

            Share

            Share

            Share this Publication link

            Share on social media