Abstract
Probabilistic Classifier Chains are a multi-label classification method which has gained the attention of researchers in recent years. This is because of their ability to optimally estimate the entire joint conditional probability of a label combination through the product rule of probability. Their main drawback is that they require performing an exhaustive search in order to obtain Bayes optimal predictions. This means computing this probability for all possible label combinations before taking a label combination with the highest value of probability. This is the reason why several works have been published in recent years that avoid exploring all combinations, while maintaining optimality. Approaches such as greedy search, beam search and Monte Carlo reduce the computational cost, but at the cost of not ensuring Bayes optimal predictions (although, in general, they provide close to optimal solutions). Methods based on a heuristic search provide optimal predictions, but the computational time has not been as good as expected. In this respect, the \(\epsilon \)-approximate algorithm has been found to be the best inference approach among those that provide Bayes optimal predictions, not only for its optimality, but also for its computational time. However, this paper both theoretically and experimentally shows that it sometimes performs some backtracking during the search for optimal predictions which may prolong the prediction time. The aim of this paper is thus to improve this algorithm by achieving a more direct search. Specifically, it enhances the criterion under which the next node to be expanded is chosen by adding heuristic information, although it is only applicable for linear-based models. The experiments carried out confirm that the improved \(\epsilon \)-approximate algorithm explores fewer nodes and reduces the computational time of the original version.
Similar content being viewed by others
Notes
A heuristic is admissible if it never underestimates the gain of reaching the goal, i.e., the gain it estimates to reach the goal is not lower than the highest possible gain. This property means that the A* algorithm ensures an optimal solution.
A heuristic is more dominant than another heuristic if it estimates the gain of reaching the goal closer to the actual value. A consequence of this property is that it makes the former algorithm explore fewer nodes than the latter.
References
Brier GW (1950) Verification of forecasts expressed in terms of probability. Mon Weather Rev 78(1):1–3
Cheng W, Hüllermeier E (2009) Combining instance-based learning and logistic regression for multi-label classification. Mach Learn 76(2–3):211–225
Clare A, King RD (2001) Knowledge discovery in multi-label phenotype data. In: European conference on data mining and knowledge discovery, pp 42–53
Dembczyński K, Cheng W, Hüllermeier E (2010) Bayes optimal multilabel classification via probabilistic classifier chains. ICML 2010:279–286
Dembczyński K, Waegeman W, Cheng W, Hüllermeier E (2012) On label dependence and loss minimization in multi-label classification. Mach Learn 88(1–2):5–45
Dembczynski K, Waegeman W, Hüllermeier E (2012) An analysis of chaining in multi-label classification. Front Artif Intell Appl 242:294–299
Elisseeff A, Weston J (2005) A kernel method for multi-labelled classification. In: ACM Conference on Research and Develop. In: Information retrieval, pp 274–281
Fürnkranz J, Hüllermeier E, Loza Mencía E, Brinker K (2008) Multilabel classification via calibrated label ranking. Mach Learn 73:133–153
García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Machine Learn Res 9(12):2677–2694
Ghamrawi N, McCallum A (2005) Collective multi-label classification. In: ACM International Conference on Information and Knowledge Management. ACM, pp 195–200
Gibaja E, Ventura S (2015) A tutorial on multilabel learning. ACM Comput Surv 47(3):52:1–52:38. https://doi.org/10.1145/2716262
Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In: Pacific-Asia conference on knowledge discovery and data mining, pp 22–30
Goncalves EC, Plastino A, Freitas AA (2013) A genetic algorithm for optimizing the label ordering in multi-label classifier chains. In: IEEE 25th international conference on tools with artificial intelligence, pp 469–476. https://doi.org/10.1109/ICTAI.2013.76
Kumar A, Vembu S, Menon AK, Elkan C (2012) Learning and inference in probabilistic classifier chains with beam search. ECML/PKDD 2012:665–680
Kumar A, Vembu S, Menon AK, Elkan C (2013) Beam search algorithms for multi-label learning. Mach Learn 92(1):65–89
Lin CJ, Weng RC, Keerthi SS (2008) Trust region Newton method for logistic regression. J Machine Learn Res 9(Apr):627–650
McCallum AK (1999) Multi-label text classification with a mixture model trained by em. In: AAAI 99 workshop on text learning
Mena D, Montañés E, Quevedo JR, Del Coz JJ (2015) Using A* for inference in probabilistic classifier chains. In: Proceedings of the 24th international conference on artificial intelligence, IJCAI’15. AAAI Press, pp 3707–3713
Mena D, Montañés E, Quevedo JR, del Coz JJ (2016) An overview of inference methods in probabilistic classifier chains for multilabel classification. Wiley Interdiscip Rev Data Min Knowl Discov 6(6):215–230. https://doi.org/10.1002/widm.1185
Mena D, Montañés E, Quevedo JR, del Coz JJ (2017) A family of admissible heuristics for A* to perform inference in probabilistic classifier chains. Mach Learn 106(1):143–169. https://doi.org/10.1007/s10994-016-5593-5
Mena D, Quevedo JR, Montañés E, del Coz JJ (2017) A heuristic in A* for inference in nonlinear probabilistic classifier chains. Knowl-Based Syst 126:78–90. https://doi.org/10.1016/j.knosys.2017.03.015
Montañés E, Quevedo J, del Coz JJ (2011) Aggregating independent and dependent models to learn multi-label classifiers. In: ECML’11, pp 484–500
Montañés E, Senge R, Barranquero J, Quevedo J, del Coz JJ, Hüllermeier E (2014) Dependent binary relevance models for multi-label classification. Pattern Recogn 47(3):1494–1508
Prati R (2015) Fuzzy rule classifiers for multi-label classification. https://doi.org/10.1109/FUZZ-IEEE.2015.7337815
Qi GJ, Hua XS, Rui Y, Tang J, Mei T, Zhang HJ (2007) Correlative multi-label video annotation. In: Proceedings of the international conference on multimedia. ACM, New York, pp 17–26
Read J, Martino L, Hollmén J (2016) Multi-label methods for prediction with sequential data. CoRR arXiv:1609.08349
Read J, Martino L, Luengo D (2014) Efficient monte carlo methods for multi-dimensional learning with classifier chains. Pattern Recogn 47(3):1535–1546
Read J, Martino L, Olmos PM, Luengo D (2015) Scalable multi-output label prediction: from classifier chains to classifier trellises. Pattern Recogn 48(6):2096–2109. https://doi.org/10.1016/j.patcog.2015.01.004
Read J, Pfahringer B, Holmes G (2008) Multi-label classification using ensembles of pruned sets. In: IEEE International Conference on Data Mining, pp 995–1000. IEEE
Read J, Pfahringer B, Holmes G, Frank E (2009) Classifier chains for multi-label classification. In: ECML/PKDD’09, LNCS. Springer, pp 254–269
Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359
Schapire RE, Singer Y (2000) Boostexter: a boosting-based system for text categorization. Machine Learn 39:135–168
Tsoumakas G, Katakis I, Vlahavas I (2008) Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of the ECML/PKDD 2008 Workshop on Mining Multidimensional Data (MMD’08), vol. 21. Antwerp, Belgium, pp 53–59
Tsoumakas G, Vlahavas I (2007) Random k-Labelsets: An ensemble method for multi-label classification. In: ECML/PKDD’07. Springer, pp 406–417
Wu YP, Lin HT (2017) Progressive random k-labelsets for cost-sensitive multi-label classification. Mach Learn 106(5):671–694. https://doi.org/10.1007/s10994-016-5600-x
Zhang ML, Zhou ZH (2006) Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng 18:1338–1351
Zhang ML, Zhou ZH (2007) Ml-knn: a lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research has been funded by MINECO (the Spanish Ministerio de Economía y Competitividad) and ERDF (European Regional Development Fund) under Grant TIN2015-65069-C2-2-R (MINECO/FEDER).
Rights and permissions
About this article
Cite this article
Fdez-Díaz, M., Fdez-Díaz, L., Mena, D. et al. Improving the \(\epsilon \)-approximate algorithm for Probabilistic Classifier Chains. Knowl Inf Syst 62, 2709–2738 (2020). https://doi.org/10.1007/s10115-020-01436-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-020-01436-5