Abstract
The notion of distance is the most important basis for classification. This is especially true for unsupervised learning, i.e. clustering, since there is no validation mechanism by means of objects of known groups. But also for supervised learning standard distances often do not lead to appropriate results. For every individual problem the adequate distance is to be decided upon. This is demonstrated by means of three practical examples from very different application areas, namely social science, music science, and production economics. In social science, clustering is applied to spatial regions with very irregular borders. Then adequate spatial distances may have to be taken into account for clustering. In statistical musicology the main problem is often to find an adequate transformation of the input time series as an adequate basis for distance definition. Also, local modelling is proposed in order to account for different subpopulations, e.g. instruments. In production economics often many quality criteria have to be taken into account with very different scaling. In order to find a compromise optimum classification, this leads to a pre-transformation onto the same scale, called desirability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Anderberg, M.R.: Cluster Analysis for Applications. Acadamic Press, New York (1973)
Gnanadesikan, R.: Methods for Statistical Data Analysis of Multivariate Observations. Wiley, New York (1977)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning - Data Mining, Inference and Prediction. Springer, New York (2001)
Harrington, J.: The desirability function. Industrial Quality Control 21(10), 494–498 (1965)
Neumann, C.: Einsatz von Clusterverfahren zur Produktfamilienbildung. Diploma Thesis, Department of Statistics, TU Dortmund (2007)
Perner, P.: Case-based reasoning and the statistical challenges. Journal Quality and Reliability Engineering International 24(6), 705–720 (2008)
Perner, P. (ed.): Data Mining on Multimedia Data, vol. 2558. Springer, Heidelberg (2002)
Roever, C., Szepannek, G.: Application of a Genetic Algorithm to Variable Selection in Fuzzy Clustering. In: Weihs, C., Gaul, W. (eds.) Classification - the Ubiquitous Challenge, pp. 674–681. Springer, Heidelberg (2005)
Sturtz, S.: Comparing models for variables given on disparate spatial scales: An epidemiological example. PhD Thesis, Department of Statistics, TU Dortmund, p. 38 (2007)
Szepannek, G., Schiffner, J., Wilson, J., Weihs, C.: Local Modelling in Classification. In: Perner, P. (ed.) ICDM 2008. LNCS, vol. 5077, pp. 153–164. Springer, Heidelberg (2008)
Tan, P.-N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Addison-Wesley, Reading (2005)
Ward, J.H.: Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58, 236–244 (1963)
Weihs, C., Ligges, U., Mörchen, F., Müllensiefen, D.: Classification in Music Research. Advances in Data Analysis and Classification (ADAC) 1(3), 255–291 (2007)
Weihs, C., Szepannek, G., Ligges, U., Lübke, K., Raabe, N.: Local models in register classification by timbre. In: Batagelj, V., Bock, H.-H., Ferligoj, A., Ziberna, A. (eds.) Data Science and Classification, pp. 315–332. Springer, Heidelberg (2006)
Weihs, C., Reuter, C., Ligges, U.: Register Classification by Timbre. In: Weihs, C., Gaul, W. (eds.) Classification - The Ubiquitous Challenge, pp. 624–631. Springer, Berlin (2005)
Weihs, C., Ligges, U.: Voice Prints as a Tool for Automatic Classification of Vocal Performance. In: Kopiez, R., Lehmann, A.C., Wolther, I., Wolf, C. (eds.) Proceedings of the 5th Triennial ESCOM Conference, Hanover University of Music and Drama, Germany, September 8-13, pp. 332–335 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Weihs, C., Szepannek, G. (2009). Distances in Classification. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2009. Lecture Notes in Computer Science(), vol 5633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03067-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-03067-3_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03066-6
Online ISBN: 978-3-642-03067-3
eBook Packages: Computer ScienceComputer Science (R0)