Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Improving the \(\epsilon \)-approximate algorithm for Probabilistic Classifier Chains

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Probabilistic Classifier Chains are a multi-label classification method which has gained the attention of researchers in recent years. This is because of their ability to optimally estimate the entire joint conditional probability of a label combination through the product rule of probability. Their main drawback is that they require performing an exhaustive search in order to obtain Bayes optimal predictions. This means computing this probability for all possible label combinations before taking a label combination with the highest value of probability. This is the reason why several works have been published in recent years that avoid exploring all combinations, while maintaining optimality. Approaches such as greedy search, beam search and Monte Carlo reduce the computational cost, but at the cost of not ensuring Bayes optimal predictions (although, in general, they provide close to optimal solutions). Methods based on a heuristic search provide optimal predictions, but the computational time has not been as good as expected. In this respect, the \(\epsilon \)-approximate algorithm has been found to be the best inference approach among those that provide Bayes optimal predictions, not only for its optimality, but also for its computational time. However, this paper both theoretically and experimentally shows that it sometimes performs some backtracking during the search for optimal predictions which may prolong the prediction time. The aim of this paper is thus to improve this algorithm by achieving a more direct search. Specifically, it enhances the criterion under which the next node to be expanded is chosen by adding heuristic information, although it is only applicable for linear-based models. The experiments carried out confirm that the improved \(\epsilon \)-approximate algorithm explores fewer nodes and reduces the computational time of the original version.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. A heuristic is admissible if it never underestimates the gain of reaching the goal, i.e., the gain it estimates to reach the goal is not lower than the highest possible gain. This property means that the A* algorithm ensures an optimal solution.

  2. A heuristic is more dominant than another heuristic if it estimates the gain of reaching the goal closer to the actual value. A consequence of this property is that it makes the former algorithm explore fewer nodes than the latter.

References

  1. Brier GW (1950) Verification of forecasts expressed in terms of probability. Mon Weather Rev 78(1):1–3

    Article  Google Scholar 

  2. Cheng W, Hüllermeier E (2009) Combining instance-based learning and logistic regression for multi-label classification. Mach Learn 76(2–3):211–225

    Article  Google Scholar 

  3. Clare A, King RD (2001) Knowledge discovery in multi-label phenotype data. In: European conference on data mining and knowledge discovery, pp 42–53

  4. Dembczyński K, Cheng W, Hüllermeier E (2010) Bayes optimal multilabel classification via probabilistic classifier chains. ICML 2010:279–286

    Google Scholar 

  5. Dembczyński K, Waegeman W, Cheng W, Hüllermeier E (2012) On label dependence and loss minimization in multi-label classification. Mach Learn 88(1–2):5–45

    Article  MathSciNet  Google Scholar 

  6. Dembczynski K, Waegeman W, Hüllermeier E (2012) An analysis of chaining in multi-label classification. Front Artif Intell Appl 242:294–299

    MATH  Google Scholar 

  7. Elisseeff A, Weston J (2005) A kernel method for multi-labelled classification. In: ACM Conference on Research and Develop. In: Information retrieval, pp 274–281

  8. Fürnkranz J, Hüllermeier E, Loza Mencía E, Brinker K (2008) Multilabel classification via calibrated label ranking. Mach Learn 73:133–153

    Article  Google Scholar 

  9. García S, Herrera F (2008) An extension on “statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. J Machine Learn Res 9(12):2677–2694

    MATH  Google Scholar 

  10. Ghamrawi N, McCallum A (2005) Collective multi-label classification. In: ACM International Conference on Information and Knowledge Management. ACM, pp 195–200

  11. Gibaja E, Ventura S (2015) A tutorial on multilabel learning. ACM Comput Surv 47(3):52:1–52:38. https://doi.org/10.1145/2716262

    Article  Google Scholar 

  12. Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In: Pacific-Asia conference on knowledge discovery and data mining, pp 22–30

  13. Goncalves EC, Plastino A, Freitas AA (2013) A genetic algorithm for optimizing the label ordering in multi-label classifier chains. In: IEEE 25th international conference on tools with artificial intelligence, pp 469–476. https://doi.org/10.1109/ICTAI.2013.76

  14. Kumar A, Vembu S, Menon AK, Elkan C (2012) Learning and inference in probabilistic classifier chains with beam search. ECML/PKDD 2012:665–680

    Google Scholar 

  15. Kumar A, Vembu S, Menon AK, Elkan C (2013) Beam search algorithms for multi-label learning. Mach Learn 92(1):65–89

    Article  MathSciNet  Google Scholar 

  16. Lin CJ, Weng RC, Keerthi SS (2008) Trust region Newton method for logistic regression. J Machine Learn Res 9(Apr):627–650

    MathSciNet  MATH  Google Scholar 

  17. McCallum AK (1999) Multi-label text classification with a mixture model trained by em. In: AAAI 99 workshop on text learning

  18. Mena D, Montañés E, Quevedo JR, Del Coz JJ (2015) Using A* for inference in probabilistic classifier chains. In: Proceedings of the 24th international conference on artificial intelligence, IJCAI’15. AAAI Press, pp 3707–3713

  19. Mena D, Montañés E, Quevedo JR, del Coz JJ (2016) An overview of inference methods in probabilistic classifier chains for multilabel classification. Wiley Interdiscip Rev Data Min Knowl Discov 6(6):215–230. https://doi.org/10.1002/widm.1185

    Article  Google Scholar 

  20. Mena D, Montañés E, Quevedo JR, del Coz JJ (2017) A family of admissible heuristics for A* to perform inference in probabilistic classifier chains. Mach Learn 106(1):143–169. https://doi.org/10.1007/s10994-016-5593-5

    Article  MathSciNet  MATH  Google Scholar 

  21. Mena D, Quevedo JR, Montañés E, del Coz JJ (2017) A heuristic in A* for inference in nonlinear probabilistic classifier chains. Knowl-Based Syst 126:78–90. https://doi.org/10.1016/j.knosys.2017.03.015

    Article  MATH  Google Scholar 

  22. Montañés E, Quevedo J, del Coz JJ (2011) Aggregating independent and dependent models to learn multi-label classifiers. In: ECML’11, pp 484–500

  23. Montañés E, Senge R, Barranquero J, Quevedo J, del Coz JJ, Hüllermeier E (2014) Dependent binary relevance models for multi-label classification. Pattern Recogn 47(3):1494–1508

    Article  Google Scholar 

  24. Prati R (2015) Fuzzy rule classifiers for multi-label classification. https://doi.org/10.1109/FUZZ-IEEE.2015.7337815

  25. Qi GJ, Hua XS, Rui Y, Tang J, Mei T, Zhang HJ (2007) Correlative multi-label video annotation. In: Proceedings of the international conference on multimedia. ACM, New York, pp 17–26

  26. Read J, Martino L, Hollmén J (2016) Multi-label methods for prediction with sequential data. CoRR arXiv:1609.08349

  27. Read J, Martino L, Luengo D (2014) Efficient monte carlo methods for multi-dimensional learning with classifier chains. Pattern Recogn 47(3):1535–1546

    Article  Google Scholar 

  28. Read J, Martino L, Olmos PM, Luengo D (2015) Scalable multi-output label prediction: from classifier chains to classifier trellises. Pattern Recogn 48(6):2096–2109. https://doi.org/10.1016/j.patcog.2015.01.004

    Article  MATH  Google Scholar 

  29. Read J, Pfahringer B, Holmes G (2008) Multi-label classification using ensembles of pruned sets. In: IEEE International Conference on Data Mining, pp 995–1000. IEEE

  30. Read J, Pfahringer B, Holmes G, Frank E (2009) Classifier chains for multi-label classification. In: ECML/PKDD’09, LNCS. Springer, pp 254–269

  31. Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359

    Article  MathSciNet  Google Scholar 

  32. Schapire RE, Singer Y (2000) Boostexter: a boosting-based system for text categorization. Machine Learn 39:135–168

    Article  Google Scholar 

  33. Tsoumakas G, Katakis I, Vlahavas I (2008) Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of the ECML/PKDD 2008 Workshop on Mining Multidimensional Data (MMD’08), vol. 21. Antwerp, Belgium, pp 53–59

  34. Tsoumakas G, Vlahavas I (2007) Random k-Labelsets: An ensemble method for multi-label classification. In: ECML/PKDD’07. Springer, pp 406–417

  35. Wu YP, Lin HT (2017) Progressive random k-labelsets for cost-sensitive multi-label classification. Mach Learn 106(5):671–694. https://doi.org/10.1007/s10994-016-5600-x

    Article  MathSciNet  MATH  Google Scholar 

  36. Zhang ML, Zhou ZH (2006) Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng 18:1338–1351

    Article  Google Scholar 

  37. Zhang ML, Zhou ZH (2007) Ml-knn: a lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elena Montañés.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research has been funded by MINECO (the Spanish Ministerio de Economía y Competitividad) and ERDF (European Regional Development Fund) under Grant TIN2015-65069-C2-2-R (MINECO/FEDER).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fdez-Díaz, M., Fdez-Díaz, L., Mena, D. et al. Improving the \(\epsilon \)-approximate algorithm for Probabilistic Classifier Chains. Knowl Inf Syst 62, 2709–2738 (2020). https://doi.org/10.1007/s10115-020-01436-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-020-01436-5

Keywords

Navigation