Nothing Special   »   [go: up one dir, main page]

Skip to main content

Ultrametricity of Dissimilarity Spaces and Its Significance for Data Mining

  • Chapter
  • First Online:
Advances in Knowledge Discovery and Management

Part of the book series: Studies in Computational Intelligence ((SCI,volume 665))

Abstract

We introduce a measure of ultrametricity for dissimilarity spaces and examine transformations of dissimilarities that impact this measure. Then, we study the influence of ultrametricity on the behavior of two classes of data mining algorithms (kNN classification and PAM clustering) applied on dissimilarity spaces. We show that there is an inverse variation between ultrametricity and performance of classifiers. For clustering, increased ultrametricity generate clusterings with better separation. Lowering ultrametricity produces more compact clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Amice, Y. (1975). Les nombres p-adiques. Paris: Presses Universitaires de France.

    MATH  Google Scholar 

  • Barthélemy, J.-P., & Brucker, F. (2008). Binary clustering. Discrete Applied Mathematics, 156(8), 1237–1250.

    Article  MathSciNet  MATH  Google Scholar 

  • Barthélemy, J.-P., Brucker, F., & Osswald, C. (2004). Combinatorial optimization and hierarchical classifications. 4OR, 2(3), 179–219.

    Article  MathSciNet  MATH  Google Scholar 

  • Bertrand, P., & Janowitz, M. F. (2002). Pyramids and weak hierarchies in the ordinal model for clustering. Discrete Applied Mathematics, 122(1–3), 55–81.

    Article  MathSciNet  MATH  Google Scholar 

  • Brucker, F. (2006). Sub-dominant theory in numerical taxonomy. Discrete Applied Mathematics, 154(7), 1085–1099.

    Article  MathSciNet  MATH  Google Scholar 

  • Contreras, P., & Murtagh, F. (2012). Fast, linear time hierarchical clustering using the baire metric. Journal of Classification, 29(2), 118–143.

    Article  MathSciNet  MATH  Google Scholar 

  • Deza, M. M., & Laurent, M. (1997). Geometry of cuts and metrics. Heidelberg: Springer.

    Book  MATH  Google Scholar 

  • Di Summa, M., Pritchard, D., & Sanità, L. (2015). Finding the closest ultrametric. Discrete Applied Mathematics, 180, 70–80.

    Article  MathSciNet  MATH  Google Scholar 

  • Diatta, J., & Fichet, B. (1998). Quasi-ultrametrics and their 2-ball hypergraphs. Discrete Mathematics, 192(1–3), 87–102.

    Article  MathSciNet  MATH  Google Scholar 

  • Gordon, A. D. (1981). Classification. London: Chapman and Hall.

    MATH  Google Scholar 

  • Gordon, A. D. (1987). A review of hierarchical classification. Journal of the Royal Statistical Society, Series (A), 150(2), 119–137.

    Article  MathSciNet  MATH  Google Scholar 

  • Jardine, N., & Sibson, R. (1971). Mathematical taxonomy. New York: Wiley.

    MATH  Google Scholar 

  • Kaufman, L., & Rousseeuw, P. J. (1990). Finding groups in data: An introduction to cluster analysis. New York: Wiley.

    Book  MATH  Google Scholar 

  • Kimura, M. (1983). The neutral theory of molecular evolution. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Leclerc, B. (1985). La comparaison des hiérarchies: indices et métriques. Mathématiques et sciences humaines, 92, 5–40.

    MATH  Google Scholar 

  • Lerman, I. C. (1981). Classification et Analyse Ordinale des Données. Paris: Dunod.

    MATH  Google Scholar 

  • Liu, Y., Li, Z., Xiong, H., Gao, X., & Wu, J. (2010). Understanding of internal clustering validation measures. In 2010 IEEE 10th international conference on data mining (pp. 911–916). IEEE.

    Google Scholar 

  • Manning, C. D., Raghwan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  • Maulik, U., & Bandyopadhyay, S. (2002). Performance evaluation of some clustering algorithms and validity indices. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(12), 1650–1654.

    Article  Google Scholar 

  • Murtagh, F., Downs, G., & Contreras, P. (2008). Hierarchical clustering of massive, high dimensional data sets by exploiting ultrametric embedding. SIAM Journal on Scientific Computing, 30(2), 707–730.

    Article  MathSciNet  MATH  Google Scholar 

  • Ninio, J. (1983). Molecular approaches to evolution. Princeton: Princeton University.

    Book  Google Scholar 

  • Rammal, R., Angles d’Auriac, C., & Doucot, D. (1985). On the degree of ultrametricity. Le Journal de Physique - Letteres, 45, 945–952.

    Article  Google Scholar 

  • Rammal, R., Toulouse, G., & Virasoro, M. A. (1986). Ultrametricity for physicists. Reviews of Modern Physics, 58, 765–788.

    Article  MathSciNet  Google Scholar 

  • Schikhof, W. H. (1984). Ultrametric calculus. Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  • Simovici, D. A., & Djeraba, C. (2014). Mathematical tools for data mining (2nd ed.). London: Springer.

    Google Scholar 

  • Tang, P. N., Steinbach, M., & Kumar, V. (2005). Introduction to data mining. Reading: Addison-Wesley.

    Google Scholar 

  • Xiong, H., Wu, J., & Chen, J. (2009). K-means clustering versus validation measures: a data-distribution perspective. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 39(2), 318–331.

    Article  Google Scholar 

  • Zhao, Y., & Karypis, G. (2002). Evaluation of hierarchical clustering algorithms for document datasets. In Proceedings of the Eleventh International Conference on Information and Knowledge Management (pp. 515–524). ACM.

    Google Scholar 

Download references

Acknowledgments

The authors wish to thank the referees for the careful reading of the paper and for their suggestions and observations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dan A. Simovici .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Simovici, D.A., Vetro, R., Hua, K. (2017). Ultrametricity of Dissimilarity Spaces and Its Significance for Data Mining. In: Guillet, F., Pinaud, B., Venturini, G. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 665. Springer, Cham. https://doi.org/10.1007/978-3-319-45763-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45763-5_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45762-8

  • Online ISBN: 978-3-319-45763-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics