Abstract
We present new geometric approximation and exact algorithms for the density-based data clustering problem in d-dimensional space ℓd (for any constant integer d ≥ 2). Previously known algorithms for this problem are efficient only for uniformly-distributed points. However, these algorithms all run in θ(n 2) time in the worst case, where n is the number of input points. Our approximation algorithm based on the ε-fuzzy distance function takes O(n log n) time for any given fixed value ε> 0, and our exact algorithms take sub-quadratic time. The running times and output quality of our algorithms do not depend on any particular data distribution. We believe that our fast approximation algorithm is of considerable practical importance, while our sub-quadratic exact algorithms are more of theoretical interest. We implemented our approximation algorithm and the experimental results show that our approximation algorithm is efficient on arbitrary input point sets.
The work of these authors was supported in part by Lockheed Martin Corporation and by the National Science Foundation under Grant CCR-9988468.
The work of this author was supported in part by NSERC.
Corresponding author.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
P.K. Agarwal, D. Eppstein, and J. Matoušek, Dynamic half-space reporting, geometric optimization, and minimum spanning trees, Proc. 33rd Annual IEEE Symp. Found. Comput. Sci., 1992, pp. 80–89.
P.K. Agarwal and J. Erickson, Geometric range searching and its relatives. In: B. Chazelle, J.E. Goodman and R. Pollack (Eds.), Advances in Discrete and Computational Geometry, American Mathematical Society, Providence, RI, 1999, pp. 1–56.
S. Arya and D. M. Mount, Approximate range searching, Comput. Geom. Theory Appl., 17 (2000), pp. 135–152.
S. Arya and D.M. Mount, ANN: A library for approximate nearest neighbor searching, 2nd CGC Workshop on Computational Geometry, 1997. lso, see http://www.cs.umd.edu/~mount/.
S. Arya, D.M. Mount, N. S. Netanyahu, R. Silverman, and A. Wu, An optimal algorithm for approximate nearest neighbor searching in fixed dimensions, J. ACM, 45 (1998), pp. 891–923.
M. Ester, H.-P. Kriegel, J. Sander, M. Wimmer, and X. Xu, Incremental clustering for mining in a data warehousing environment, Proc. 24th Int. Conf. on Very Large Databases, 1998, pp. 323–333.
M. Ester, H.P. Kriegel, J. Sander, and X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, 1996, pp. 226–231.
A.K. Jain and R. C. Dubes, Algorithms for Clustering Data, Prentice Hall, 1988.
R. Klein, O. Nurmi, T. Ottmann, and D. Wood, A dynamic fixed windowing problem, Algorithmica, 4 (1989), pp. 535–550.
J. Sander, M. Ester, H.-P. Kriegel, and X. Xu, Density-based clustering in spatial databases: The algorithm GDBSCAN and its application, Data Mining and Knowledge Discovery, 2 (2) (1998), pp. 169–194.
B. Zhou, D.W. Cheung, and B. Kao, A fast algorithm for density-based clustering in large database, Proc. 3rd Pacific-Asia Conf. on Methodologies for Knowledge Discovery and Data Mining, 1999, pp. 338–349.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, D.Z., Smid, M., Xu, B. (2002). Geometric Algorithms for Density-Based Data Clustering. In: Möhring, R., Raman, R. (eds) Algorithms — ESA 2002. ESA 2002. Lecture Notes in Computer Science, vol 2461. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45749-6_28
Download citation
DOI: https://doi.org/10.1007/3-540-45749-6_28
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44180-9
Online ISBN: 978-3-540-45749-7
eBook Packages: Springer Book Archive