Abstract
Given a binary classification task, a ranker sorts a set of instances from highest to lowest expectation that the instance is positive. We propose a lexicographic ranker, LexRank, whose rankings are derived not from scores, but from a simple ranking of attribute values obtained from the training data. When using the odds ratio to rank the attribute values we obtain a restricted version of the naive Bayes ranker. We systematically develop the relationships and differences between classification, ranking, and probability estimation, which leads to a novel connection between the Brier score and ROC curves. Combining LexRank with isotonic regression, which derives probability estimates from the ROC convex hull, results in the lexicographic probability estimator LexProb. Both LexRank and LexProb are empirically evaluated on a range of data sets, and shown to be highly effective.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In: Brodley, C.E., Danyluk, A.P. (eds.) Proceedings of the Eigteenth International Conference on Machine Learning (ICML 2001), pp. 609–616. Morgan Kaufmann, San Francisco (2001)
Fawcett, T., Niculescu-Mizil, A.: PAV and the ROC convex hull. Machine Learning 68(1), 97–106 (2007)
Ferri, C., Flach, P.A., Hernández-Orallo, J.: Learning decision trees using the area under the ROC curve. In: Sammut, C., Hoffmann, A.G. (eds.) Proceedings of the Nineteenth International Conference (ICML 2002), pp. 139–146. Morgan Kaufmann, San Francisco (2002)
Provost, F., Domingos, P.: Tree induction for probability-based ranking. Machine Learning 52(3), 199–215 (2003)
Brier, G.: Verification of forecasts expressed in terms of probabilities. Monthly Weather Review 78, 1–3 (1950)
Cohen, I., Goldszmidt, M.: Properties and benefits of calibrated classifiers. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 125–136. Springer, Heidelberg (2004)
Provost, F., Fawcett, T.: Robust classification for imprecise environments. Machine Learning 42(3), 203–231 (2001)
Newman, D., Hettich, S., Blake, C., Merz, C.: UCI repository of machine learning databases (1998)
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Flach, P., Matsubara, E.T. (2007). A Simple Lexicographic Ranker and Probability Estimator. In: Kok, J.N., Koronacki, J., Mantaras, R.L.d., Matwin, S., Mladenič, D., Skowron, A. (eds) Machine Learning: ECML 2007. ECML 2007. Lecture Notes in Computer Science(), vol 4701. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74958-5_55
Download citation
DOI: https://doi.org/10.1007/978-3-540-74958-5_55
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74957-8
Online ISBN: 978-3-540-74958-5
eBook Packages: Computer ScienceComputer Science (R0)