Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3035918.3064050acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Dynamic Density Based Clustering

Published: 09 May 2017 Publication History

Abstract

Dynamic clustering---how to efficiently maintain data clusters along with updates in the underlying dataset---is a difficult topic. This is especially true for density-based clustering, where objects are aggregated based on transitivity of proximity, under which deciding the cluster(s) of an object may require the inspection of numerous other objects. The phenomenon is unfortunate, given the popular usage of this clustering approach in many applications demanding data updates.
Motivated by the above, we investigate the algorithmic principles for dynamic clustering by DBSCAN, a successful representative of density-based clustering, and ρ-approximate DBSCAN, proposed to bring down the computational hardness of the former on static data. Surprisingly, we prove that the ρ-approximate version suffers from the very same hardness when the dataset is fully dynamic, namely, when both insertions and deletions are allowed. We also show that this issue goes away as soon as tiny further relaxation is applied, yet still ensuring the same quality---known as the ``sandwich guarantee''---of ρ-approximate DBSCAN. Our algorithms guarantee near-constant update processing, and outperform existing approaches by a factor over two orders of magnitude.

References

[1]
M. Ankerst, M. M. Breunig, H. Kriegel, and J. Sander. OPTICS: ordering points to identify the clustering structure. In SIGMOD, pages 49--60, 1999.
[2]
S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. JACM, 45(6):891--923, 1998.
[3]
N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An efficient and robust access method for points and rectangles. In SIGMOD, pages 322--331, 1990.
[4]
F. Cao, M. Ester, W. Qian, and A. Zhou. Density-based clustering over an evolving data stream with noise. In ICDM, pages 328--339, 2006.
[5]
T. M. Chan. A dynamic data structure for 3-d convex hulls and 2-d nearest neighbor queries. JACM, 57(3), 2010.
[6]
J. Erickson. On the relative complexities of some geometric problems. In CCCG, pages 85--90, 1995.
[7]
J. Erickson. New lower bounds for Hopcroft's problem. Disc. & Comp. Geo., 16(4):389--418, 1996.
[8]
M. Ester, H. Kriegel, J. Sander, M. Wimmer, and X. Xu. Incremental clustering for mining in a data warehousing environment. In VLDB, pages 323--333, 1998.
[9]
M. Ester, H. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In SIGKDD, pages 226--231, 1996.
[10]
J. Gan and Y. Tao. DBSCAN revisited: Mis-claim, un-fixability, and approximation. In SIGMOD, pages 519--530, 2015.
[11]
A. Gunawan. A faster algorithm for DBSCAN. Master's thesis, Technische University Eindhoven, March 2013.
[12]
A. Guttman. R-trees: a dynamic index structure for spatial searching. In SIGMOD, pages 47--57, 1984.
[13]
J. Han, M. Kamber, and J. Pei. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2012.
[14]
J. Holm, K. de Lichtenberg, and M. Thorup. Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity. JACM, 48(4):723--760, 2001.
[15]
S. Lühr and M. Lazarescu. Incremental clustering of dynamic data streams using connectivity based representative points. 68(1):1--27, 2009.
[16]
D. M. Mount and E. Park. A dynamic data structure for approximate range searching. In SoCG, pages 247--256, 2010.
[17]
S. Nassar, J. Sander, and C. Cheng. Incremental and effective data summarization for dynamic hierarchical clustering. In SIGMOD, pages 467--478, 2004.
[18]
S. Nittel, K. T. Leung, and A. Braverman. Scaling clustering algorithms for massive data sets using data streams. In ICDE, page 830, 2004.
[19]
I. Ntoutsi, A. Zimek, T. Palpanas, P. Kröger, and H. Kriegel. Density-based projected clustering over high dimensional data streams. In ICDM, pages 987--998, 2012.
[20]
R. G. Pensa, D. Ienco, and R. Meo. Hierarchical co-clustering: off-line and incremental approaches. Data Min. Knowl. Discov., 28(1):31--64, 2014.
[21]
S. Singh and A. Awekar. Incremental shared nearest neighbor density-based clustering. In CIKM, pages 1533--1536, 2013.
[22]
P.-N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining. Pearson, 2006.
[23]
R. E. Tarjan. Efficiency of a good but not linear set union algorithm. JACM, 22(2):215--225, 1975.

Cited By

View all
  • (2024)An Effective, Efficient, and Stable Framework for Query Clustering2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00402(5334-5340)Online publication date: 13-May-2024
  • (2024)Independent Range Sampling on Interval Data2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00041(449-461)Online publication date: 13-May-2024
  • (2024)A Complete Linkage Algorithm for Clustering Dynamic DatasetsProceedings of the National Academy of Sciences, India Section A: Physical Sciences10.1007/s40010-024-00894-894:5(471-486)Online publication date: 25-Sep-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
May 2017
1810 pages
ISBN:9781450341974
DOI:10.1145/3035918
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 May 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. algorithms
  2. approximate dbscan
  3. dynamic clustering

Qualifiers

  • Research-article

Conference

SIGMOD/PODS'17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)5
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)An Effective, Efficient, and Stable Framework for Query Clustering2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00402(5334-5340)Online publication date: 13-May-2024
  • (2024)Independent Range Sampling on Interval Data2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00041(449-461)Online publication date: 13-May-2024
  • (2024)A Complete Linkage Algorithm for Clustering Dynamic DatasetsProceedings of the National Academy of Sciences, India Section A: Physical Sciences10.1007/s40010-024-00894-894:5(471-486)Online publication date: 25-Sep-2024
  • (2023)Tracking the Evolution of Clusters in Social Media StreamsIEEE Transactions on Big Data10.1109/TBDATA.2022.32042079:2(701-715)Online publication date: 1-Apr-2023
  • (2023)Learn to Explore: on Bootstrapping Interactive Data Exploration with Meta-learning2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00135(1720-1733)Online publication date: Apr-2023
  • (2023)Online Active Learning Framework for Data Stream Classification With Density-Peaks RecognitionIEEE Access10.1109/ACCESS.2023.325785711(27853-27864)Online publication date: 2023
  • (2022)DenForest: Enabling Fast Deletion in Incremental Density-Based Clustering over Sliding WindowsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517833(296-309)Online publication date: 10-Jun-2022
  • (2022)Incremental Density-Based Clustering on Multicore ProcessorsIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2020.302312544:3(1338-1356)Online publication date: 1-Mar-2022
  • (2022)A Fast and More Accurate Seed-and-Extension Density-Based Clustering AlgorithmIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.316111735:6(5458-5471)Online publication date: 22-Mar-2022
  • (2022)A Flexible and Explainable Vehicle Motion Prediction and Inference Framework Combining Semi-Supervised AOG and ST-LSTMIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2020.301630423:2(840-860)Online publication date: Feb-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media