A Comprehensive Survey of Clustering Algorithms

Dongkuan Xu^1,2 &
Yingjie Tian^2,3

160k Accesses
891 Citations
46 Altmetric
4 Mentions
Explore all metrics

Abstract

Data analysis is used as a common method in modern science research, which is across communication science, computer science and biology science. Clustering, as the basic composition of data analysis, plays a significant role. On one hand, many tools for cluster analysis have been created, along with the information increase and subject intersection. On the other hand, each clustering algorithm has its own strengths and weaknesses, due to the complexity of information. In this review paper, we begin at the definition of clustering, take the basic elements involved in the clustering process, such as the distance or similarity measurement and evaluation indicators, into consideration, and analyze the clustering algorithms from two perspectives, the traditional ones and the modern ones. All the discussed clustering algorithms will be compared in detail and comprehensively shown in Appendix Table 22.

Combinatorial Optimization Approaches for Data Clustering

Clustering Algorithm and Its Application in Data Mining

Article 21 August 2019

Effective Data Clustering Algorithms

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Clustering, considered as the most important question of unsupervised learning, deals with the data structure partition in unknown area and is the basis for further learning. The complete definition for clustering, however, isn’t come to an agreement, and a classic one is described as follows [1]:

(1)
Instances, in the same cluster, must be similar as much as possible;
(2)
Instances, in the different clusters, must be different as much as possible;
(3)
Measurement for similarity and dissimilarity must be clear and have the practical meaning;

The standard process of clustering can be divided into the following several steps [2]:

(1)
Feature extraction and selection: extract and select the most representative features from the original data set;
(2)
Clustering algorithm design: design the clustering algorithm according to the characteristics of the problem;
(3)
Result evaluation: evaluate the clustering result and judge the validity of algorithm;
(4)
Result explanation: give a practical explanation for the clustering result;

In the rest of this paper, the common similarity and distance measurements will be introduced in Sect. 2, the evaluation indicators for the clustering result will be listed in section 3, the traditional clustering algorithms and the modern ones will be analyzed systematically respectively in Sects. 4 and 5, and the final conclusion will be drawn in Sect. 6.

2 Distance and Similarity

Distance (dissimilarity) and similarity are the basis for constructing clustering algorithms. As for quantitative data features, distance is preferred to recognize the relationship among data. And similarity is preferred when dealing with qualitative data features [2].

The common used distance functions for quantitative data feature are summarized in Table 1.

Table 1 Distance functions

A Comprehensive Survey of Clustering Algorithms

Abstract

Similar content being viewed by others

Combinatorial Optimization Approaches for Data Clustering

Clustering Algorithm and Its Application in Data Mining

Effective Data Clustering Algorithms

Explore related subjects

1 Introduction

2 Distance and Similarity

3 Evaluation Indicator

4 Traditional Clustering Algorithms

4.1 Clustering Algorithm Based on Partition

4.2 Clustering Algorithm Based on Hierarchy

4.3 Clustering Algorithm Based on Fuzzy Theory

4.4 Clustering Algorithm Based on Distribution

4.5 Clustering Algorithm Based on Density

4.6 Clustering Algorithm Based on Graph Theory

4.7 Clustering Algorithm Based on Grid

4.8 Clustering Algorithm Based on Fractal Theory

4.9 Clustering Algorithm Based on Model

5 Modern Clustering Algorithms

5.1 Clustering Algorithm Based on Kernel

5.2 Clustering Algorithm Based on Ensemble

5.3 Clustering Algorithm Based on Swarm Intelligence

5.4 Clustering Algorithm Based on Quantum Theory

5.5 Clustering Algorithm Based on Spectral Graph Theory

5.6 Clustering Algorithm Based on Affinity Propagation

5.7 Clustering Algorithm Based on Density and Distance

5.8 Clustering Algorithm for Spatial Data

5.9 Clustering Algorithm for Data Stream

5.10 Clustering Algorithm for Large-Scale Data

6 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation