Abstract
It is a critical problem for the clustering analysis techniques to select the appropriate value of parameters. Meanwhile, the clustering algorithms lack the effective mechanism to detect outliers while treating outliers as “noise”. By regarding outliers as valuable information, the paper proposes a novel hierarchical clustering algorithm that integrates a new outlier-mining method. The algorithm stops clustering according to the dissimilarity reflected by the detected outliers and needs only one parameter, whose appropriate value can be decided in the outlier mining process. After discussing some related topics, the paper adopts 5 real-life datasets to evaluate the performance of the clustering algorithm in outlier mining and clustering and compare it with other algorithms.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Rosenberger, C., Chehdi, K.: Unsupervised Clustering Method with Optimal Estimation of the Number of Clusters: Application to Image Segmentation. In: International Conference on Pattern Recognition, vol. 1, pp. 656–659 (September 2000)
Xiong, X., Chan, K.L.: Towards: An Unsupervised Optimal Fuzzy Clustering algorithm for Image Database Organization. In: International Conference on Pattern Recognition, vol. 3, pp. 3909–3913 (September 2000)
Gehrke, J.: Report on the SIGKDD 2001 Conference Panel “New Research directions in KDD”. SIGKDD Explorations 3(2), 76–77 (2002)
Guha, S., Rastogi, R., Shim, K.: CURE: an Efficient Clustering Algorithm for Large Database. In: Haas, L.M., Tiwary, A. (eds.) Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 73–84. ACM Press, Seattle (1998)
Guha, S., Rastogi, R., Shim, K.: ROCK: a robust clustering algorithm for categorical attributes. In: Proc. of the 15th Int’l Conf. on Data Eng., pp. 512–521 (1999)
Zhang, T., et al.: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 103–114 (1996)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, pp. 226–231 (1996)
Fred, A.L.N., Leitão, J.M.N.: A new Cluster Isolation criterion Based on Dissimilarity Increments. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(8), 944–958 (2003)
Knorr, E.M., Ng, R.T.: Finding Intensional Knowledge of Distance-Based outliers. In: Proceedings of the 25th Very Large Data Bases conference, Edinburgh, Scotland, pp. 211–222 (1999)
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient Algorithms for mining outliers from Large Data Sets. In: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, Dallas, Texas, United States, pp. 427–438 (2000)
Hettich, S., Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Zhao, Y., Karypis, G.: Criterion Functions for Document Clustering: Experiment and Analysis. Technical Report #01-40, University of Minnesota, 1–40 (2001)
Faloutsos, C., Lin, K.: FastMap: A Fast Algorithm for Indexing, Data-Mining and Visualization of Traditional and Multimedia Datasets. In: Proceedings of 1995 ACM SIGMOD, SIGMOD RECORD, vol. 24(2), pp. 163–174 (1995)
Hawkins, D.: Identification of Outliers. Chapman and Hall, London (1980)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lv, Ty., Su, Tx., Wang, Zx., Zuo, Wl. (2005). An Auto-stopped Hierarchical Clustering Algorithm Integrating Outlier Detection Algorithm. In: Fan, W., Wu, Z., Yang, J. (eds) Advances in Web-Age Information Management. WAIM 2005. Lecture Notes in Computer Science, vol 3739. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563952_41
Download citation
DOI: https://doi.org/10.1007/11563952_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29227-2
Online ISBN: 978-3-540-32087-6
eBook Packages: Computer ScienceComputer Science (R0)