Abstract
We introduce a measure of ultrametricity for dissimilarity spaces and examine transformations of dissimilarities that impact this measure. Then, we study the influence of ultrametricity on the behavior of two classes of data mining algorithms (kNN classification and PAM clustering) applied on dissimilarity spaces. We show that there is an inverse variation between ultrametricity and performance of classifiers. For clustering, increased ultrametricity generate clusterings with better separation. Lowering ultrametricity produces more compact clusters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Amice, Y. (1975). Les nombres p-adiques. Paris: Presses Universitaires de France.
Barthélemy, J.-P., & Brucker, F. (2008). Binary clustering. Discrete Applied Mathematics, 156(8), 1237–1250.
Barthélemy, J.-P., Brucker, F., & Osswald, C. (2004). Combinatorial optimization and hierarchical classifications. 4OR, 2(3), 179–219.
Bertrand, P., & Janowitz, M. F. (2002). Pyramids and weak hierarchies in the ordinal model for clustering. Discrete Applied Mathematics, 122(1–3), 55–81.
Brucker, F. (2006). Sub-dominant theory in numerical taxonomy. Discrete Applied Mathematics, 154(7), 1085–1099.
Contreras, P., & Murtagh, F. (2012). Fast, linear time hierarchical clustering using the baire metric. Journal of Classification, 29(2), 118–143.
Deza, M. M., & Laurent, M. (1997). Geometry of cuts and metrics. Heidelberg: Springer.
Di Summa, M., Pritchard, D., & Sanità, L. (2015). Finding the closest ultrametric. Discrete Applied Mathematics, 180, 70–80.
Diatta, J., & Fichet, B. (1998). Quasi-ultrametrics and their 2-ball hypergraphs. Discrete Mathematics, 192(1–3), 87–102.
Gordon, A. D. (1981). Classification. London: Chapman and Hall.
Gordon, A. D. (1987). A review of hierarchical classification. Journal of the Royal Statistical Society, Series (A), 150(2), 119–137.
Jardine, N., & Sibson, R. (1971). Mathematical taxonomy. New York: Wiley.
Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. New York: Wiley.
Kimura, M. (1983). The neutral theory of molecular evolution. Cambridge: Cambridge University Press.
Leclerc, B. (1985). La comparaison des hiérarchies: indices et métriques. Mathématiques et sciences humaines, 92, 5–40.
Lerman, I. C. (1981). Classification et Analyse Ordinale des Données. Paris: Dunod.
Liu, Y., Li, Z., Xiong, H., Gao, X., & Wu, J. (2010). Understanding of internal clustering validation measures. In 2010 IEEE 10th international conference on data mining (pp. 911–916). IEEE.
Manning, C. D., Raghwan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press.
Maulik, U., & Bandyopadhyay, S. (2002). Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(12), 1650–1654.
Murtagh, F., Downs, G., & Contreras, P. (2008). Hierarchical clustering of massive, high dimensional data sets by exploiting ultrametric embedding. SIAM Journal on Scientific Computing, 30(2), 707–730.
Ninio, J. (1983). Molecular approaches to evolution. Princeton: Princeton University.
Rammal, R., Angles d’Auriac, C., & Doucot, D. (1985). On the degree of ultrametricity. Le Journal de Physique - Letteres, 45, 945–952.
Rammal, R., Toulouse, G., & Virasoro, M. A. (1986). Ultrametricity for physicists. Reviews of Modern Physics, 58, 765–788.
Schikhof, W. H. (1984). Ultrametric calculus. Cambridge: Cambridge University Press.
Simovici, D. A., & Djeraba, C. (2014). Mathematical tools for data mining (2nd ed.). London: Springer.
Tang, P. N., Steinbach, M., & Kumar, V. (2005). Introduction to data mining. Reading: Addison-Wesley.
Xiong, H., Wu, J., & Chen, J. (2009). K-means clustering versus validation measures: a data-distribution perspective. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 39(2), 318–331.
Zhao, Y., & Karypis, G. (2002). Evaluation of hierarchical clustering algorithms for document datasets. In Proceedings of the Eleventh International Conference on Information and Knowledge Management (pp. 515–524). ACM.
Acknowledgments
The authors wish to thank the referees for the careful reading of the paper and for their suggestions and observations.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Simovici, D.A., Vetro, R., Hua, K. (2017). Ultrametricity of Dissimilarity Spaces and Its Significance for Data Mining. In: Guillet, F., Pinaud, B., Venturini, G. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 665. Springer, Cham. https://doi.org/10.1007/978-3-319-45763-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-45763-5_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45762-8
Online ISBN: 978-3-319-45763-5
eBook Packages: EngineeringEngineering (R0)