Assessing Clustering Reliability and Features Informativeness by Random Permutations

Michele Ceccarelli¹ &
Antonio Maratea¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4694))

Included in the following conference series:

International Conference on Knowledge-Based and Intelligent Information and Engineering Systems

1252 Accesses
4 Citations

Abstract

Assessing the quality of a clustering’s outcome is a challenging task that can be cast in a number of different frameworks, depending on the specific subtask, like estimating the right clusters’ number or quantifying how much the data support the partition given by the algorithm. In this paper we propose a computational intensive procedure to evaluate: (i) the consistence of a clustering solution, (ii) the informativeness of each feature and (iii) the most suitable value for a parameter. The proposed approach does not depend on the specific clustering algorithm chosen, it is based on random permutations and produces an ensemble of empirical probability distributions of an index of quality. Looking to this ensemble it is possible to extract hints on how single features affect the clustering outcome, how consistent is the clustering result and what’s the most suitable value for a parameter (e.g. the correct number of clusters). Results on simulated and real data highlight a surprisingly effective discriminative power.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An Exploratory Study of the Inputs for Ensemble Clustering Technique as a Subset Selection Problem

DStab: estimating clustering quality by distance stability

Article 21 June 2023

Reliable Clustering Indexes

References

Antoniol, G., Ceccarelli, M., Maratea, A., Russo, F.: classification of digital terrain models through fuzzy clustering: an application. In: Di Gesù, V., Masulli, F., Petrosino, A. (eds.) WILF 2003. LNCS (LNAI), vol. 2955, pp. 174–182. Springer, Heidelberg (2006)
Chapter Google Scholar
Archie, J.W.: A randomization Test for Phylogenetic information in Systematic Data. Syst. Zool. 38, 239–252 (1989)
Article Google Scholar
Baldi, P., Ceccarelli, M., Maratea, A.: An approach to multifactorial microarray data analysis. In: BITS 2007, Naples (2007)
Google Scholar
Ben-Hur, A., Elisseeff, A., Guyon, I.: A Stability Based Method for Discovering Structure in Clustered Data. In: Proceedings of the Pacific Symposium on Biocomputing Kaua’i, HI (2002)
Google Scholar
Bishop, C.M.: Neural Networks for Pattern Recognition. Clarendon Press, Oxford (1996)
MATH Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2001)
MATH Google Scholar
Dudoit, S., Fridlyand, J.: Bagging to improve the accuracy of a Clustering Procedure. Bioinformatics 19(9), 1090–1099 (2003)
Article Google Scholar
Fridlyand, J., Dudoit, S.: Applications of resampling methods to estimate the number of clusters and to improve the accuracy of a clustering method, Stat. Berkeley Tech. Report No. 600 (2001)
Google Scholar
Golub, T.R., Slonim, K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfield, C.D., Lande, E.S.: Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science 286, 531–537 (1999)
Article Google Scholar
Heer, J., Chi, E.: Mining the Structure of User Activity using Cluster Stability. In: Proceedings of the Workshop on Web Analytics, SIAM Conference on Data Mining (2002)
Google Scholar
McShane, L.M., Radmacher, M.D., Friedlin, B., Yu, R., Li, M.C., Simon, R.: Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data. Bioinformatics 18, 1462–1469 (2002)
Article Google Scholar
Smolkin, M., Ghosh, D.: Cluster Stability Scores for Microarray Data in Cancer Studies. BMC Bioinformatics 4(36) (2003)
Google Scholar
Watanabe, S.: Knowing and Guessing: A Quantitative Study of Inference and Information. Wiley, New York (1969)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Research Centre On Software Technology, University of Sannio, via Traiano 11, Benevento, Italy
Michele Ceccarelli & Antonio Maratea

Authors

Michele Ceccarelli
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Maratea
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Bruno Apolloni Robert J. Howlett Lakhmi Jain

Copyright information

About this paper

Cite this paper

Ceccarelli, M., Maratea, A. (2007). Assessing Clustering Reliability and Features Informativeness by Random Permutations. In: Apolloni, B., Howlett, R.J., Jain, L. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2007. Lecture Notes in Computer Science(), vol 4694. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74829-8_107

Download citation

DOI: https://doi.org/10.1007/978-3-540-74829-8_107
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74828-1
Online ISBN: 978-3-540-74829-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Assessing Clustering Reliability and Features Informativeness by Random Permutations

Abstract

Access this chapter

Preview

Similar content being viewed by others

An Exploratory Study of the Inputs for Ensemble Clustering Technique as a Subset Selection Problem

DStab: estimating clustering quality by distance stability

Reliable Clustering Indexes

References

Author information

Authors and Affiliations

Editor information

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Assessing Clustering Reliability and Features Informativeness by Random Permutations

Abstract

Access this chapter

Preview

Similar content being viewed by others

An Exploratory Study of the Inputs for Ensemble Clustering Technique as a Subset Selection Problem

DStab: estimating clustering quality by distance stability

Reliable Clustering Indexes

References

Author information

Authors and Affiliations

Editor information

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation