Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3539813.3545121acmconferencesArticle/Chapter ViewAbstractPublication PagesictirConference Proceedingsconference-collections
short-paper

BCubed Revisited: Elements Like Me

Published: 25 August 2022 Publication History

Abstract

BCubed is a mathematically clean, elegant and intuitively well behaved external performance metric for clustering tasks. BCubed compares a predicted clustering to a known ground truth through elementwise precision and recall scores. For each element, the predicted and ground truth clusters containing the element are compared, and the mean over all elements is taken. We argue that BCubed overestimates performance, for the intuitive reason that the clustering gets credit for putting an element in its own cluster. This is repaired, and we investigate the repaired version, called "Elements Like Me (ELM)". We extensively evaluate ELM and conclude that it retains all positive properties of BCubed and gives a minimum 0 zero score when it should.

References

[1]
Enrique Amigó, Julio Gonzalo, Javier Artiles, and Felisa Verdejo. 2009. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval, Vol. 12, 4 (2009), 461--486.
[2]
Amit Bagga and Breck Baldwin. 1998. Entity-Based Cross-Document Coreferencing Using the Vector Space Model. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics - Volume 1 (Montreal, Quebec, Canada) (ACL '98/COLING '98). Association for Computational Linguistics, USA, 79--85. https://doi.org/10.3115/980845.980859
[3]
Albert-László Barabási and Márton Pósfai. 2016. Network science .Cambridge University Press, Cambridge. http://barabasi.com/networksciencebook/
[4]
Marcilio CP de Souto, André LV Coelho, Katti Faceli, Tiemi C Sakata, Viviane Bonadia, and Ivan G Costa. 2012. A comparison of external clustering evaluation indices in the context of imbalanced data sets. In 2012 Brazilian Symposium on Neural Networks . IEEE, IEEE Computer Society, Curitiba, Paraná, Brazil, 49--54.
[5]
Filippo Menczer, Santo Fortunato, and Clayton A. Davis. 2020. A First Course in Network Science .Cambridge University Press, Cambridge. https://doi.org/10.1017/9781108653947
[6]
Jose G. Moreno and Gaël Dias. 2015. Adapted B-CUBED Metrics to Unbalanced Datasets. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (Santiago, Chile) (SIGIR '15). Association for Computing Machinery, New York, NY, USA, 911--914. https://doi.org/10.1145/2766462.2767836
[7]
Lev Pevzner and Marti A Hearst. 2002. A critique and improvement of an evaluation metric for text segmentation. Computational Linguistics, Vol. 28, 1 (2002), 19--36.
[8]
Lior Rokach. 2009. A survey of clustering algorithms. In Data Mining and knowledge discovery handbook. Springer, Boston, MA, 269--298.
[9]
Gregor Wiedemann and Gerhard Heyer. 2021. Multi-Modal Page Stream Segmentation with Convolutional Neural Networks. Lang. Resour. Eval., Vol. 55, 1 (2021), 127--150. https://doi.org/10.1007/s10579-019-09476--2

Cited By

View all

Index Terms

  1. BCubed Revisited: Elements Like Me

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICTIR '22: Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval
    August 2022
    289 pages
    ISBN:9781450394123
    DOI:10.1145/3539813
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 August 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. bcubed
    2. clustering
    3. information retrieval
    4. metrics

    Qualifiers

    • Short-paper

    Funding Sources

    • Nederlandse Organisatie voor Wetenschappelijk Onderzoek

    Conference

    ICTIR '22
    Sponsor:

    Acceptance Rates

    ICTIR '22 Paper Acceptance Rate 32 of 80 submissions, 40%;
    Overall Acceptance Rate 235 of 527 submissions, 45%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 75
      Total Downloads
    • Downloads (Last 12 months)30
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media