IM-c-means: a new clustering algorithm for clusters with skewed distributions

Yun Liu¹,
Tao Hou¹,
Yan Miao¹,
Meihe Liu¹ &
…
Fu Liu¹

340 Accesses
56 Citations
Explore all metrics

Abstract

In this paper, a new clustering algorithm, IM-c-means, is proposed for clusters with skewed distributions. C-means algorithm is a well-known and widely used strategy for data clustering, but at the same time prone to poor performance if the data set is not distributed uniformly, which is called “uniform effect” in studies. We first analyze the cause of this effect and find that it occurs only when clusters sizes are varied, whereas different object densities inter-clusters have no effect on c-means algorithm. According to this finding, we propose to form a new objective function by considering volumes and object densities of all clusters, which creates a new effective clustering algorithm with respect to the clusters with varied sizes or densities, while at the same time inheriting the good performance of traditional c-means algorithm for balanced data set. The experiments using both synthetic and real data sets have provided promising results of the proposed clustering algorithm. In addition, the nonparametric test has showed that the proposed algorithm could offer a significant improvement over other clustering methods for imbalanced data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

K-normal: An Improved K-means for Dealing with Clusters of Different Sizes

Subset K-Means Approach for Handling Imbalanced-Distributed Data

An extension of the K-means algorithm to clustering skewed data

Article 04 July 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Azamathulla HM, Ab Ghani A (2010) Genetic programming to predict river pipeline scour. J Pipeline Syst Eng Pract 1(3):127–132
Article Google Scholar
Babuka R, Van der Veen PJ, Kaymak U (2002) Improved covariance estimation for Gustafson-Kessel clustering. In: IEEE International conference on fuzzy systems, pp. 1081–1085
Bae E, Bailey J, Dong GZ (2010) A clustering comparison measure using density profiles and its application to the discovery of alternate clusterings. Data Min Knowl Disc 21(3):427–471
Article MathSciNet Google Scholar
Belo LDS, Jr CAC, Guimarães SJF (2016) Summarizing video sequence using a graph-based hierarchical approach. Neurocomputing 173(P3):1001–1016
Ben-Hur A, Horn D, Siegelmann HT, Vapnik V (2002) Support vector clustering. J Mach Learn Res 2(2):125–137
MATH Google Scholar
Cao F, Liang J, Jiang G (2009) An initialization method for the k-means algorithm using neighborhood model. Comput Math Appl 58(3):474–483
Article MathSciNet Google Scholar
Carvalho FDATD, Simões EC, Santana LVC, Ferreira MRP (2018) Gaussian Kernel c-means hard clustering algorithms with automated computation of the width hyper-parameters. Pattern Recogn 79:370–386
Article Google Scholar
Deng Z, Jiang Y, Chung FL, Ishibuchi H, Choi KS, Wang S (2016) Transfer prototype-based fuzzy clustering. IEEE Trans Fuzzy Syst 24(5):1210–1232
Article Google Scholar
Ferreira MR, De Carvalho FDA (2014) Kernel fuzzy c-means with automatic variable weighting. Fuzzy Sets Syst 237:1–46
Article MathSciNet Google Scholar
Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701
Article Google Scholar
Gath I, Geva AB (1989) Unsupervised optimal fuzzy clustering. IEEE Trans Pattern Anal Mach Intell 11(7):773–780
Article Google Scholar
He H, Tan YH (2012) A two-stage genetic algorithm for automatic clustering. Neurocomputing 81:49–59
Article Google Scholar
Ismkhan H (2018) I-k-means-+: an iterative clustering algorithm based on an enhanced version of the k -means. Pattern Recogn 79:402–413
Article Google Scholar
Jain AK (2015) Data clustering: a review. ACM Comput Surv 31(2):264–323
Google Scholar
Krishna K, Murty MN (1999) Genetic k-means algorithm. IEEE Trans Syst Man Cybern B Cybern 29(3):433–9
Article Google Scholar
Leung HC, Yiu SM, Yang B, Peng Y, Wang Y, Liu Z, Chen J, Qin J, Li R, Chin FY (2011) A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio. Bioinformatics 27(11):1489–95
Article Google Scholar
Liang JY, Bai L, Dang CY, Cao FY (2012) The k-means-type algorithms versus imbalanced data distributions. IEEE Trans Fuzzy Syst 20(4):728–745
Article Google Scholar
Liao R, Zhang R, Guan J, Zhou S (2014) A new unsupervised binning approach for metagenomic sequences based on n-grams and automatic feature weighting. IEEE/ACM Trans Comput Biol Bioinf 11(1):42–54
Article Google Scholar
Lin PL, Huang PW, Kuo CH, Lai YH (2014) A size-insensitive integrity-based fuzzy c-means method for data clustering. Pattern Recogn 47(5):2042–2056
Article Google Scholar
Liu J, Xu M (2008) Kernelized fuzzy attribute c-means clustering algorithm. Fuzzy Sets Syst 159(18):2428–2445
Article MathSciNet Google Scholar
Liu Y, Hou T, Liu F (2015) Improving fuzzy c-means method for unbalanced dataset. Electron Lett 51(23):1880–1881
Article Google Scholar
Noordam JC, van den Broek WHAM, Buydens LMC (2002) Multivariate image segmentation with cluster size insensitive fuzzy c-means. Chemometr Intell Lab Syst 64(1):65–78
Article Google Scholar
Pérez-Suárez A, Martínez-Trinidad JF, Carrasco-Ochoa JA, Medina-Pagola JE (2013) OClustR: a new graph-based algorithm for overlapping clustering. Neurocomputing 121(18):234–247
Article Google Scholar
Ramathilagam S, Huang YM (2011) Extended gaussian kernel version of fuzzy c-means in the problem of data analyzing. Expert Syst Appl 38(4):3793–3805
Article Google Scholar
Ruiz C, Spiliopoulou M, Menasalvas E (2010) Density-based semi-supervised clustering. Data Min Knowl Disc 21(3):345–370
Article MathSciNet Google Scholar
Siddiqui FU, Isa NAM (2012) Optimized k-means (okm) clustering algorithm for image segmentation. Opto-Electron Rev 20(3):216–225
Article Google Scholar
Tseng LY, Yang SB (2001) A genetic approach to the automatic clustering problem. Pattern Recogn 34(2):415–424
Article Google Scholar
Tu Q, Lu JF, Yuan B, Tang JB, Yang JY (2012) Density-based hierarchical clustering for streaming data. Pattern Recogn Lett 33(5):641–645
Article Google Scholar
Wang CD, Lai JH, Zhu JY (2012) Graph-based multiprototype competitive learning and its applications. IEEE Trans Syst Man Cybern Part C 42(6):934–946
Article Google Scholar
Wang Y, Leung HC, Yiu SM, Chin FY (2012) Metacluster 4.0: a novel binning algorithm for NGS reads and huge number of species. J Comput Biol 19(2):241–249
Article Google Scholar
Xiong H, Wu J, Chen J (2009) K-means clustering versus validation measures: a data-distribution perspective. IEEE Trans Syst Man Cybern B Cybern 39(2):318–331
Article Google Scholar
Zhou K, Yang S (2019) Effect of cluster size distribution on clustering: a comparative study of k-means and fuzzy c-means clustering. Pattern Anal Appl 23:255
Google Scholar
Zhou KL, Yang SL (2016) Exploring the uniform effect of FCM clustering: a data distribution perspective. Knowl Based Syst 96:76–83
Article Google Scholar
Zhu Y, Ting KM, Carman MJ (2016) Density-ratio based clustering for discovering clusters with varying densities. Pattern Recogn 60:983–997
Article Google Scholar

Download references

Acknowledgements

This work has been supported by the National Natural Science Foundation of China (61503151), the Natural Science Foundation of Jilin Province (20160520100JH) and the Project funded by China Postdoctoral Science Foundation (2019M651204).

Author information

Authors and Affiliations

College of Communication Engineering, Jilin University, Changchun, China
Yun Liu, Tao Hou, Yan Miao, Meihe Liu & Fu Liu

Authors

Yun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Tao Hou
View author publications
You can also search for this author in PubMed Google Scholar
Yan Miao
View author publications
You can also search for this author in PubMed Google Scholar
Meihe Liu
View author publications
You can also search for this author in PubMed Google Scholar
Fu Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fu Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Y., Hou, T., Miao, Y. et al. IM-c-means: a new clustering algorithm for clusters with skewed distributions. Pattern Anal Applic 24, 611–623 (2021). https://doi.org/10.1007/s10044-020-00932-2

Download citation

Received: 23 May 2019
Accepted: 27 October 2020
Published: 06 November 2020
Issue Date: May 2021
DOI: https://doi.org/10.1007/s10044-020-00932-2

IM-c-means: a new clustering algorithm for clusters with skewed distributions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

K-normal: An Improved K-means for Dealing with Clusters of Different Sizes

Subset K-Means Approach for Handling Imbalanced-Distributed Data

An extension of the K-means algorithm to clustering skewed data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

IM-c-means: a new clustering algorithm for clusters with skewed distributions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

K-normal: An Improved K-means for Dealing with Clusters of Different Sizes

Subset K-Means Approach for Handling Imbalanced-Distributed Data

An extension of the K-means algorithm to clustering skewed data

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation