Anytime density-based clustering of complex data

Son T. Mai¹,
Xiao He¹,
Jing Feng¹,
Claudia Plant² &
…
Christian Böhm¹

754 Accesses
22 Citations
Explore all metrics

Abstract

Many clustering algorithms suffer from scalability problems on massive datasets and do not support any user interaction during runtime. To tackle these problems, anytime clustering algorithms are proposed. They produce a fast approximate result which is continuously refined during the further run. Also, they can be stopped or suspended anytime to provide an intermediate answer. In this paper, we propose a novel anytime clustering algorithm modeled on the density-based clustering paradigm. Our algorithm called A-DBSCAN is applicable to many complex data such as trajectory and medical data. The general idea of our algorithm is to use a sequence of lower bounding functions (LBs) of the true distance function to produce multiple approximate results of the true density-based clusters. A-DBSCAN operates in multiple levels w.r.t. the LBs and is mainly based on two algorithmic schemes: (1) an efficient distance upgrade scheme which restricts distance calculations to core objects at each level of the LBs and (2) a local reclustering scheme which restricts update operations to the relevant objects only. To further improve the performance, we propose a significant extension version of A-DBSCAN called A-DBSCAN-XS which is built upon the anytime scheme of A-DBSCAN and the $\mu $-range query scheme of a data structure called extended Xseedlist. A-DBSCAN-XS requires less distance calculations at each level than A-DBSCAN and thus is more efficient. Extensive experiments demonstrate that A-DBSCAN and A-DBSCAN-XS acquire very good clustering results at very early stages of execution and thus save a large amount of computational time. Even if they run to the end, A-DBSCAN and A-DBSCAN-XS are still orders of magnitude faster than the original algorithm DBSCAN and its variants. We also introduce a novel application for our algorithms for the segmentation of the white matter fiber tracts in human brain which is an important tool for studying the brain structure and various diseases such as Alzheimer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Anytime parallel density-based clustering

Article 10 April 2018

Anytime OPTICS: An Efficient Approach for Hierarchical Density-Based Clustering

On temporal-constrained sub-trajectory cluster analysis

Article 04 April 2017

Notes

References

Ueno K, Xi X, Keogh E, Lee D (2006) Anytime classification using the nearest neighbor algorithm with applications to stream mining. In: ICDM, pp 623–632
Zhu Q, Batista G, Rakthanmanon T, Keogh E (2012) A novel approximation to dynamic time warping allows anytime clustering of massive time series datasets. In: SDM, pp 999–1010
Zilberstein S, Russell SJ (1995) Approximate reasoning using anytime algorithms. In: Natarajan S (ed) Imprecise and approximate computation. Kluwer Academic Publishers. http://rbr.cs.umass.edu/shlomo/papers/ZRchapter95.html
Seidl T, Assent I, Kranen P, Krieger R, Herrmann J (2009) Indexing density models for incremental learning and anytime classification on data streams. In: EDBT, pp 311–322
Seidl T, Assent I, Kranen P, Krieger R, Herrmann J (2009) Indexing density models for incremental learning and anytime classification on data streams. In: EDBT, pp 311–322
Assent I, Kranen P, Baldauf C, Seidl T (2012) AnyOut: anytime outlier detection on streaming data. In: DASFAA (1), pp 228–242
Assent I, Kranen P, Baldauf C, Seidl T (2010) Detecting outliers on arbitrary data streams using anytime approaches. In: StreamKDD. ACM, New York, NY, USA, pp 10–16
Lin J, Vlachos M, Keogh E, Gunopulos D (2004) Iterative incremental clustering of time series. In: EDBT, pp 106–122
Lin J, Vlachos M, Keogh E, Gunopulos D, Liu J, Yu S, Le J (2005) A MPAA-based iterative clustering algorithm augmented by nearest neighbors search for time-series data streams. In: PAKDD, pp 333–342
Ester M, Kriegel H, Sander J, Wimmer M, Xu X (1998) Incremental clustering for mining in a data warehousing environment. In: VLDB, pp 323–333
Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp 226–231
Ankerst M, Breunig MM, Kriegel H, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: SIGMOD, pp 49–60
Chan K, Fu AW (1999) Efficient time series matching by wavelets. In: ICDE, pp 126–133
Keogh EJ (2002) Exact indexing of dynamic time warping. In: VLDB, pp 406–417
Sakurai Y, Yoshikawa M, Faloutsos C (2005) FTW: fast similarity search under the time warping distance. In: PODS. ACM, New York, NY, USA, pp 326–337
Ding H, Trajcevski G, Scheuermann P, Wang X, Keogh E (2008) Querying and mining of time series data: experimental comparison of representations and distance measures. Proc VLDB Endow 1:1542–1552
Article Google Scholar
Brecheisen S, Kriegel H, Pfeif1e M (2004) Efficient density-based clustering of complex objects. In: ICDM, pp 43–50
Brecheisen S, Kriegel H, Pfeifle M (2006) Parallel density-based clustering of complex objects. In: PAKDD, pp 179–188
Kröger P, Kriegel H, Kailing K (2004) Density-connected subspace clustering for high-dimensional data. In: SDM, pp 246–256
Vinh NX, Epps J, Bailey J (2009) Information theoretic measures for clusterings comparison: is a correction for chance necessary? In: ICML, pp 1073–1080
Mai ST, He X, Feng J, Böhm C (2013) Efficient anytime density-based clustering. In: SDM, pp 112–120
Morris BT, Trivedi MM (2009) Learning trajectory patterns by clustering: experimental studies and comparative evaluation. In: CVPR, pp 312–319
Dom BE (2001) An information-theoretic external cluster-validity measure. Technical Report RJ 10219, IBM
Lee J, Han J (2007) Trajectory clustering: a partition-and-group framework. In: SIGMOD, pp 593–604
Zhang K, Kwok JT (2009) Density-weighted Nyström method for computing large kernel eigensystems. Neural Comput 21(1):121–146
Article MATH MathSciNet Google Scholar
Shang F, Jiao LC, Shi J, Gong M, Shang R (2011) Fast density-weighted low-rank approximation spectral clustering. Data Min Knowl Discov 23(2):345–378
Article MATH MathSciNet Google Scholar
Dhillon IS, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29(11):1944–1957
Article Google Scholar
Mai ST (2014) Density-based algorithms for active and anytime clustering. PhD thesis, University of Munich
Kranen P, Assent I, Baldauf C, Seidl T (2009) Self-adaptive anytime stream clustering. In: ICDM, pp 249–258
Mai ST, He X, Hubig N, Plant C, Böhm C (2013) Active density-based clustering. In: ICDM, pp 508–517
Mai ST, Goebl S, Plant C (2012) A similarity model and segmentation algorithm for white matter fiber tracts. In: ICDM, pp 1014–1019
Mai ST (2013) Density-based clustering: a comprehensive survey. University of Munich, Technical report
Kriegel H, Kröger P, Sander J, Zimek A (2011) Density-based clustering. Data Mining Knowl Discov 1(3):231–240
Article Google Scholar
Sander J, Ester M, Kriegel H, Xu X (1998) Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. Data Min Knowl Discov 2(2):169–194
Article Google Scholar
Mori S (2007) Introduction to diffusion tensor imaging. Elsevier, Amsterdam
Google Scholar
Basser PJ, Pajevic S, Pierpaoli C, Duda J, Aldroubi A (2000) In vivo fiber tractography using DT-MRI data. Magn Reson Med 44:625–632
Article Google Scholar
Catani M, Howard RJ, Pajevic S, Jones DK (2002) Virtual in vivo interactive dissection of white matter fasciculi in the human brain. Neuroimage 17(1):77–94
Article Google Scholar
Brun A, Park HJ, Knutsson H, Westin CF (2003) Coloring of DT-MRI fiber traces using Laplacian eigenmaps. In: EUROCAST, pp 564–572
Ding Z, Gore JC, Anderson AW (2003) Classification and quantification of neuronal fiber pathways using diffusion tensor MRI. Mag Res Med 49:716–721
Article Google Scholar
Corouge I, Gerig G, Gouttard S (2004) Towards a shape model of white matter fiber bundles using diffusion tensor MRI. In: ISBI, pp 344–347
Tsai A, Westin CF, Hero AO, Willsky AS (2007) Fiber tract clustering on manifolds with dual rooted-graphs. In: CVPR
Chen W, Ding Z, Zhang S, MacKay-Brandt A, Correia S, Qu H, Crow JA, Tate DF, Yan Z, Peng Q (2009) A novel interface for interactive exploration of DTI fibers. IEEE Trans Vis. Comput Graph 15(6):1433–1440
Article Google Scholar
Maddah M, Eric W, Grimson L, Warfield SK (2006) Statistical modeling and EM clustering of white matter fiber tracts. In: ISBI, pp 53–56
Böhm C, Feng J, He X, Mai ST, Plant C, Shao J (2011) A novel similarity measure for fiber clustering using longest common subsequence. In: KDD-DMMH, pp 1–9
Sherbondy A, Akers D, Mackenzie R, Dougherty R, Wandell B (2005) Exploring connectivity of the brain’s white matter with dynamic queries. IEEE Trans Vis Comput Graph 11(4):419–430
Article Google Scholar
Wang Q, Yap PT, Jia H, Wu G, Shen D (2010) Hierarchical fiber clustering based on multi-scale neuroanatomical features. In: MIAR, pp 448–456
Zhu Y, Shasha D (2003) Warping indexes with envelope transforms for query by humming. In: SIGMOD, pp 181–192
Yi B, Faloutsos C (2000) Fast time sequence indexing for arbitrary Lp norms. In: VLDB, pp 385–394
Kim S, Park S, Chu WW (2001) An index-based approach for similarity search supporting time warping in large sequence databases. In: ICDE, pp 607–614

Download references

Acknowledgments

We thank Diep M. T. Phan, Ha H. T. Mai, Hanh M. T. Vo, Nhan M. T. Luong, Quan A. Tran, Ninh A. Nguyen, Anh X. Nghiem, Sebastian Goebl, Nina Hubig, and Franz Krojer for their helps and supports. Our special thanks to Prof. Kai Zhang and Prof. Brian Kulis for kindly providing us the source codes of their papers. We special thank anonymous reviewers for their invaluable comments which help to significantly improve the quality of this paper.

Author information

Authors and Affiliations

Institute for Informatics, University of Munich, Munich, Germany
Son T. Mai, Xiao He, Jing Feng & Christian Böhm
Helmholtz Zentrum München, Technische Universität München, Munich, Germany
Claudia Plant

Authors

Son T. Mai
View author publications
You can also search for this author in PubMed Google Scholar
Xiao He
View author publications
You can also search for this author in PubMed Google Scholar
Jing Feng
View author publications
You can also search for this author in PubMed Google Scholar
Claudia Plant
View author publications
You can also search for this author in PubMed Google Scholar
Christian Böhm
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Son T. Mai.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mai, S.T., He, X., Feng, J. et al. Anytime density-based clustering of complex data. Knowl Inf Syst 45, 319–355 (2015). https://doi.org/10.1007/s10115-014-0797-0

Download citation

Received: 09 October 2013
Revised: 24 July 2014
Accepted: 10 October 2014
Published: 22 October 2014
Issue Date: November 2015
DOI: https://doi.org/10.1007/s10115-014-0797-0

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Anytime parallel density-based clustering

Anytime OPTICS: An Efficient Approach for Hierarchical Density-Based Clustering

On temporal-constrained sub-trajectory cluster analysis

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Anytime density-based clustering of complex data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Anytime parallel density-based clustering

Anytime OPTICS: An Efficient Approach for Hierarchical Density-Based Clustering

On temporal-constrained sub-trajectory cluster analysis

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation