Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1102351.1102481acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

A new Mallows distance based metric for comparing clusterings

Published: 07 August 2005 Publication History

Abstract

Despite of the large number of algorithms developed for clustering, the study on comparing clustering results is limited. In this paper, we propose a measure for comparing clustering results to tackle two issues insufficiently addressed or even overlooked by existing methods: (a) taking into account the distance between cluster representatives when assessing the similarity of clustering results; (b) constructing a unified framework for defining a distance based on either hard or soft clustering and ensuring the triangle inequality under the definition. Our measure is derived from a complete and globally optimal matching between clusters in two clustering results. It is shown that the distance is an instance of the Mallows distance---a metric between probability distributions in statistics. As a result, the defined distance inherits desirable properties from the Mallows distance. Experiments show that our clustering distance measure successfully handles cases difficult for other measures.

References

[1]
A. Ben-Hur, A. Elisseeff and I. Guyon, A stability based method for discovering structure in clustered data, In Pacific Symposium on Biocomputing, pages 6--17, 2002.]]
[2]
A. Dempster, N. Laird, and D. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, Series B, 39(1):138, 1977.]]
[3]
S. Dongen, Performance criteria for graph clustering and Markov cluster experiments, Technical Report INS-R0012, Centrum voor Wiskundeen Informatica, 2000.]]
[4]
M. Ester, H. P. Kriegel, J. Sander and X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, In proceedings of the KDD, 1996.]]
[5]
E. B. Fowlkes and C. L. Mallows, A method for comparing two hierarchical clusterings, Journal of the American Statistical Association, 78(383):553--569, 1983.]]
[6]
L. Hubert and P. Arabie, Comparing partitions, Journal of Classification, 2:193--218, 1985.]]
[7]
E. Levina and P. Bickel, The Earth Mover's distance is the Mallows distance: some insights from statistics, In proceedings of IEEE International Conference on Computer Vision, Vol. 2, pages 251--256. Vancouver, BC, Canada, 2001.]]
[8]
C. L. Mallows, A note on asymptotic joint normality, Annals of Mathematical Statistics, 43(2): 508--515, 1972.]]
[9]
M. Meila, Comparing Clusterings, Technical Report, Statistics, University of Washington, 2002.]]
[10]
J. B. Orlin, A faster strongly polynomial minimim cost flow algorithm. Proc. 20th ACM Symposium on the Theory of Computing, 1988.]]
[11]
S. T. Rachev, The Monge-Kantorovich mass transference problem and its stochastic applications, Theory of Probability and its Applications, 29:647--676, 1984.]]
[12]
W. M. Rand, Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association, 66:846--850, 1971.]]
[13]
C. Robardet and F. Feschet, A new methodology to compare clustering algorithms, In proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning, HK, China, 2000.]]
[14]
Y. Rubner, C. Tomasi and L. J. Guibas, A metric for distribution with applications to image databases, In proceedings of IEEE International Conference on Computer Vision, pages 59--66. Bombay, India, 1998.]]

Cited By

View all
  • (2023)A Distributional Framework for Evaluation, Comparison and Uncertainty Quantification in Soft ClusteringInternational Journal of Approximate Reasoning10.1016/j.ijar.2023.109008(109008)Online publication date: Aug-2023
  • (2022)A Distributional Approach for Soft Clustering Comparison and EvaluationBelief Functions: Theory and Applications10.1007/978-3-031-17801-6_1(3-12)Online publication date: 30-Sep-2022
  • (2021)Unsupervised and Semisupervised LearningWiley StatsRef: Statistics Reference Online10.1002/9781118445112.stat08320(1-18)Online publication date: 18-Aug-2021
  • Show More Cited By
  1. A new Mallows distance based metric for comparing clusterings

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICML '05: Proceedings of the 22nd international conference on Machine learning
    August 2005
    1113 pages
    ISBN:1595931805
    DOI:10.1145/1102351
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 August 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate 140 of 548 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)13
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)A Distributional Framework for Evaluation, Comparison and Uncertainty Quantification in Soft ClusteringInternational Journal of Approximate Reasoning10.1016/j.ijar.2023.109008(109008)Online publication date: Aug-2023
    • (2022)A Distributional Approach for Soft Clustering Comparison and EvaluationBelief Functions: Theory and Applications10.1007/978-3-031-17801-6_1(3-12)Online publication date: 30-Sep-2022
    • (2021)Unsupervised and Semisupervised LearningWiley StatsRef: Statistics Reference Online10.1002/9781118445112.stat08320(1-18)Online publication date: 18-Aug-2021
    • (2020)A Resampling Based Grid Search Method to Improve Reliability and Robustness of Mixture-Item Response Theory Models of Multimorbid High-Risk PatientsIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2019.294873424:6(1780-1787)Online publication date: Jun-2020
    • (2020)A stimulus-response based EEG biometric using mallows distanceCCF Transactions on Networking10.1007/s42045-020-00033-yOnline publication date: 30-Jul-2020
    • (2019)MCC: a Multiple Consensus Clustering FrameworkJournal of Classification10.1007/s00357-019-09318-4Online publication date: 9-Aug-2019
    • (2019)Optimal transport, mean partition, and uncertainty assessment in cluster analysisStatistical Analysis and Data Mining10.1002/sam.1141812:5(359-377)Online publication date: 26-Sep-2019
    • (2017)Wasserstein learning of deep generative point process modelsProceedings of the 31st International Conference on Neural Information Processing Systems10.5555/3294996.3295084(3250-3259)Online publication date: 4-Dec-2017
    • (2017)A visual tool for ticket monitoring and management2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)10.1109/ISKE.2017.8258805(1-7)Online publication date: Nov-2017
    • (2017)UB-CQA: A user attribute based community question answering system2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)10.1109/ISKE.2017.8258750(1-9)Online publication date: Nov-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media