Abstract
This article reflects research in the field of artificial intelligence and demonstrates a higher efficiency achievement of conventional clustering methods in combination with unconventional methods. It concerns a new hybrid approach based on the SOM (Self-Organizing Maps) method. We focused on the possibility of combining SOM with other clustering methods—CLARA, CURE a K-means. Method SOM is primarily useful in the first phases of the process, where knowledge of the data is too vague. It is thus followed by the use of a selected clustering algorithm. It then works with preprocessed data. Its performance, compared with its outputs on unprocessed data, is more efficient, which is proved by the performed experimental study on the benchmark data set Fundamental Clustering Problems Suite (FCPS). Part of the experimental verification was also a comparison of the achieved outputs with other approaches using this dataset based on a standard metrics—Rand index.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Abbreviations
- αs:
-
Shrinking factor (CURE), learning parameter (SOM)
- ε :
-
Radius
- μ :
-
Learning parameter
- ρ :
-
Surroundings of the winning neuron (SOM)
- D ( j ) :
-
Euclidean distance
- BIRCH :
-
Balanced Iterative Reducing and Clustering using Hierarchies
- CURE :
-
Clustering Using REpresentatives
- CLARA :
-
Clustering LARge Applications
- CLARANS :
-
Clustering Large Applications based on RANdomized Search
- CLIQUE :
-
CLustering In QUEst
- DBSCAN :
-
Density-Based Spatial Clustering of Applications with Noise
- DENCLUE :
-
DENsity-based CLUstEring
- EFCM :
-
Extended Fuzzy C-Means
- FCPS :
-
Fundamental Clustering Problems Suite
- MinPts :
-
Minimum number of other objects
- MLP :
-
Multilayer perceptron
- OPTICS :
-
Ordering points to identify the clustering structure
- PAM :
-
Partitioning Around Medoids
- SEEFC :
-
Self-organizing-map based extended fuzzy c-means
- SOM :
-
Self-Organizing Maps
- STING :
-
STatistical INformation Grid
References
Aghajari E, Chandrashekhar GD (2017) Self-organizing map based extended fuzzy C-means (SEEFC) algorithm for image segmentation. Appl Soft Comput 54:347–363
Ahmad T, Desai N, Wilson F, Schulte P, Dunning A, Jacoby D, O’Connor C (2016) Clinical implications of cluster analysis-based classification of acute decompensated heart failure and correlation with bedside hemodynamic profiles. PloS one 11(2):0145881
Allab K, Benabdeslem K (2011) Constraint selection for semi-supervised topological clustering. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 28–43). Springer, Berlin, Heidelberg.
Belkin M, Niyogi P (2003) Using manifold structure for partially labeled classification. In Advances in neural information processing systems (pp. 953–960).
Boric N, Estevez PA (2007) Genetic programming-based clustering using an information theoretic fitness measure. In 2007 IEEE Congress on Evolutionary Computation (pp. 31–38). IEEE.
Chen Q, Yuen KKF, Guan C (2017) Towards a hybrid approach of self-organizing map and density-based spatial clustering of applications with noise for image segmentation. In 2017 10th International Conference on Developments in eSystems Engineering (DeSE) (pp. 238–241). IEEE.
Cheng Y, Church GM (2000) Biclustering of expression data. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (Vol. 8, No. 2000, pp. 93–103).
Clifford H, Wessely F, Pendurthi S, Emes RD (2011) Comparison of clustering methods for investigation of genome-wide methylation array data. Front Genet 2:88. https://doi.org/10.3389/fgene.2011.00088
Dogan Y, Birant D, Kut A (2013) SOM++: integration of self-organizing map and k-means++ algorithms. In International Workshop on Machine Learning and Data Mining in Pattern Recognition (pp. 246–259). Springer, Berlin, Heidelberg.
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Knowledge Discov Data Min 96(34):226–231
Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis. Wiley
Firnhaber C, Pühler A, Küster H (2005) EST sequencing and time course microarray hybridizations identify more than 700 Medicago truncatula genes with developmental expression regulation in flowers and pods. Planta 222(2):269–283
Hamid JS, Meaney C, Crowcroft NS, Granerod J, Beyene J (2010) Cluster analysis for identifying sub-groups and selecting potential discriminatory variables in human encephalitis. BMC Infect Dis 10(1):364
Hennig C, Meila M, Murtagh F, Rocci R (Eds.) (2015) Handbook of cluster analysis. CRC Press.
Herrmann L, Ultsch A (2007) Label propagation for semi-supervised learning in self-organizing maps. In International Workshop on Self-Organizing Maps: Proceedings (2007).
Huai-bin W, Hong-liang Y, Zhi-Jian XU, Zheng Y (2010) A clustering algorithm use SOM and k-means in intrusion detection. In 2010 International Conference on E-Business and E-Government (pp. 1281–1284). IEEE.
Kaufman L, Rousseeuw PJ (1987) Clustering by means of Medoids. In: Dodge Y (ed) Statistical data analysis based on the L1 norm and related methods. North-Holland, Amsterdam, pp 405–416
Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69
Kotyrba M, Volná E, Komínková Oplatková Z (2014) Comparison of modern clustering algorithms for twodimensional data. In Proceedings-28th European Conference on Modelling and Simulation, ECMS 2014. European Council for Modelling and Simulation.
Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In Advances in neural information processing systems (pp. 556–562).
Lopez C, Tucker S, Salameh T, Tucker C (2018) An unsupervised machine learning method for discovering patient clusters based on genetic signatures. J Biomed Inform 85:30–39
MacQueen J (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, No. 14, pp. 281–297).
Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850
Rastin P, Cabanes G, Verde R, Bennani Y, Couronne T (2019) Generative histogram-based model using unsupervised learning. In International Conference on Neural Information Processing (pp. 634–646). Springer, Cham.
Shen R, Olshen AB, Ladanyi M (2009) Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25(22):2906–2912
Sakellariou A, Sanoudou D, Spyrou G (2012) Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data. BMC Bioinform 13(1):270
Shukla N, Hagenbuchner M, Win KT, Yang J (2018) Breast cancer data analysis for survivability studies and prediction. Comput Methods Program Biomed 155:199–208
Šefar S (2017) Comparative study of clustering methods (in Czech). Diploma Thesis. University of Ostrava.
Ultsch A (2005) Clustering with SOM: U*C. In Proceedings of the Workshop on Self-Organizing Maps (WSOM '05), Paris, France, (pp. 75–82).
Ultsch A, Loetsch J (2017) Machine-learned cluster identification in high-dimensional data. J Biomed Inform 66:95–104
Van Laerhoven K (2001) Combining the self-organizing map and k-means clustering for on-line classification of sensor data. In International Conference on Artificial Neural Networks (pp. 464–469). Springer, Berlin, Heidelberg.
Wu J, Xia J, Chen J, Cui Z (2011) Moving object classification method based on SOM and K-means. JCP 6(8):1654–1661
Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2(2):165–193
Yorek N, Ugulu I, Aydin H (2016) Using self-organizing neural network map combined with ward’s clustering algorithm for visualization of students’ cognitive structural models about aliveness concept. Comput Intell Neurosci, 2016.
Funding
This works was supported by TACR, project no. TL02000313 and also by University of Ostrava grant SGS05/PrF/2020.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
There have been no involvements that might raise the question of bias in the work reported or in the Conclusions, implications, or opinions stated. The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kotyrba, M., Volna, E., Jarusek, R. et al. The use of conventional clustering methods combined with SOM to increase the efficiency. Neural Comput & Applic 33, 16519–16531 (2021). https://doi.org/10.1007/s00521-021-06251-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06251-9