Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation

Marina Sokolova²⁰,
Nathalie Japkowicz²¹ &
Stan Szpakowicz²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4304))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

6652 Accesses
590 Citations

Abstract

Different evaluation measures assess different characteristics of machine learning algorithms. The empirical evaluation of algorithms and classifiers is a matter of on-going debate among researchers. Most measures in use today focus on a classifier’s ability to identify classes correctly. We note other useful properties, such as failure avoidance or class discrimination, and we suggest measures to evaluate such properties. These measures – Youden’s index, likelihood, Discriminant power – are used in medical diagnosis. We show that they are interrelated, and we apply them to a case study from the field of electronic negotiations. We also list other learning problems which may benefit from the application of these measures.

We did this work while the first author was at the University of Ottawa. Partial support came from the Natural Sciences and Engineering Research Council of Canada.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Sensitivity and Specificity versus Precision and Recall, and Related Dilemmas

Article 26 June 2024

Introduction to Classification Algorithms and Their Performance Analysis Using Medical Examples

Notes on the H-measure of classifier performance

Article Open access 10 January 2022

References

Demsar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)
MathSciNet Google Scholar
Chawla, N., Japkowicz, N., Kolcz, A. (eds.): Special Issue on Learning from Imbalanced Data Sets. ACM SIGKDD Explorations, vol. 6(1) (2004)
Google Scholar
Isselbacher, K., Braunwald, E.: Harrison’s Principles of Internal Medicine. McGraw-Hill, New York (1994)
Google Scholar
Cohen, J.: Statistical Power Analysis for the Behavioral Sciences. Lawrence Erlbaum Associates, Hillsdale (1988)
MATH Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classification using machine learning techniques. In: Proc. Empirical Methods of Natural Language Processing EMNLP 2002, pp. 79–86 (2002)
Google Scholar
Sokolova, M., Nastase, V., Shah, M., Szpakowicz, S.: Feature selection for electronic negotiation texts. In: Proc. Recent Advances in Natural Language Processing RANLP 2005, pp. 518–524 (2005)
Google Scholar
Kersten, G., et al.: Electronic negotiations, media and transactions for socio-economic interactions (2006) (2002-2006), http://interneg.org/enegotiation/
Witten, I., Frank, E.: Data Mining. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Cherkassky, V., Muller, F.: Learning from Data. Wiley, Chichester (1998)
MATH Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern Classification. Wiley, Chichester (2000)
Google Scholar
Youden, W.: Index for rating diagnostic tests. Cancer 3, 32–35 (1950)
Article Google Scholar
Biggerstaff, B.: Comparing diagnostic tests: a simple graphic using likelihood ratios. Statistics in Medicine 19(5), 649–663 (2000)
Article Google Scholar
Blakeley, D., Oddone, E.: Noninvasive carotid artery testing. Ann. Intern. Med. 122, 360–367 (1995)
Google Scholar
Mishne, G.: Experiments with mood classification in blog posts. In: Proc. 1st Workshop on Stylistic Analysis of Text for Information Access (Style 2005) (2005), staff.science.uva.nl/gilad/pubs/style2005-blogmoods.pdf
Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proc. 10th ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining KDD 2004, pp. 168–177 (2004)
Google Scholar
Boparai, J., Kay, J.: Supporting user task based conversations via email. In: Proc. 7th Australasian Document Computing Symposium (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

DIRO, Université de Montréal, Montreal, Canada
Marina Sokolova
SITE, University of Ottawa, Ottawa, Canada
Nathalie Japkowicz
SITE, University of Ottawa, Ottawa, Canada, ICS, Polish Academy of Sciences, Warsaw, Poland
Stan Szpakowicz

Authors

Marina Sokolova
View author publications
You can also search for this author in PubMed Google Scholar
Nathalie Japkowicz
View author publications
You can also search for this author in PubMed Google Scholar
Stan Szpakowicz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

DisPRR, National ICT Australia Ltd, QLD, Australia
Abdul Sattar
School of Computing, University of Tasmania, Sandy Bay, 7005, Tasmania, Australia
Byeong-ho Kang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sokolova, M., Japkowicz, N., Szpakowicz, S. (2006). Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation. In: Sattar, A., Kang, Bh. (eds) AI 2006: Advances in Artificial Intelligence. AI 2006. Lecture Notes in Computer Science(), vol 4304. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11941439_114

Download citation

DOI: https://doi.org/10.1007/11941439_114
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49787-5
Online ISBN: 978-3-540-49788-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Sensitivity and Specificity versus Precision and Recall, and Related Dilemmas

Introduction to Classification Algorithms and Their Performance Analysis Using Medical Examples

Notes on the H-measure of classifier performance

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Sensitivity and Specificity versus Precision and Recall, and Related Dilemmas

Introduction to Classification Algorithms and Their Performance Analysis Using Medical Examples

Notes on the H-measure of classifier performance

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation