Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

The use of conventional clustering methods combined with SOM to increase the efficiency

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

This article reflects research in the field of artificial intelligence and demonstrates a higher efficiency achievement of conventional clustering methods in combination with unconventional methods. It concerns a new hybrid approach based on the SOM (Self-Organizing Maps) method. We focused on the possibility of combining SOM with other clustering methods—CLARA, CURE a K-means. Method SOM is primarily useful in the first phases of the process, where knowledge of the data is too vague. It is thus followed by the use of a selected clustering algorithm. It then works with preprocessed data. Its performance, compared with its outputs on unprocessed data, is more efficient, which is proved by the performed experimental study on the benchmark data set Fundamental Clustering Problems Suite (FCPS). Part of the experimental verification was also a comparison of the achieved outputs with other approaches using this dataset based on a standard metrics—Rand index.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Abbreviations

αs:

Shrinking factor (CURE), learning parameter (SOM)

ε :

Radius

μ :

Learning parameter

ρ :

Surroundings of the winning neuron (SOM)

D ( j ) :

Euclidean distance

BIRCH :

Balanced Iterative Reducing and Clustering using Hierarchies

CURE :

Clustering Using REpresentatives

CLARA :

Clustering LARge Applications

CLARANS :

Clustering Large Applications based on RANdomized Search

CLIQUE :

CLustering In QUEst

DBSCAN :

Density-Based Spatial Clustering of Applications with Noise

DENCLUE :

DENsity-based CLUstEring

EFCM :

Extended Fuzzy C-Means

FCPS :

Fundamental Clustering Problems Suite

MinPts :

Minimum number of other objects

MLP :

Multilayer perceptron

OPTICS :

Ordering points to identify the clustering structure

PAM :

Partitioning Around Medoids

SEEFC :

Self-organizing-map based extended fuzzy c-means

SOM :

Self-Organizing Maps

STING :

STatistical INformation Grid

References

  1. Aghajari E, Chandrashekhar GD (2017) Self-organizing map based extended fuzzy C-means (SEEFC) algorithm for image segmentation. Appl Soft Comput 54:347–363

    Article  Google Scholar 

  2. Ahmad T, Desai N, Wilson F, Schulte P, Dunning A, Jacoby D, O’Connor C (2016) Clinical implications of cluster analysis-based classification of acute decompensated heart failure and correlation with bedside hemodynamic profiles. PloS one 11(2):0145881

    Article  Google Scholar 

  3. Allab K, Benabdeslem K (2011) Constraint selection for semi-supervised topological clustering. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 28–43). Springer, Berlin, Heidelberg.

  4. Belkin M, Niyogi P (2003) Using manifold structure for partially labeled classification. In Advances in neural information processing systems (pp. 953–960).

  5. Boric N, Estevez PA (2007) Genetic programming-based clustering using an information theoretic fitness measure. In 2007 IEEE Congress on Evolutionary Computation (pp. 31–38). IEEE.

  6. Chen Q, Yuen KKF, Guan C (2017) Towards a hybrid approach of self-organizing map and density-based spatial clustering of applications with noise for image segmentation. In 2017 10th International Conference on Developments in eSystems Engineering (DeSE) (pp. 238–241). IEEE.

  7. Cheng Y, Church GM (2000) Biclustering of expression data. In Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (Vol. 8, No. 2000, pp. 93–103).

  8. Clifford H, Wessely F, Pendurthi S, Emes RD (2011) Comparison of clustering methods for investigation of genome-wide methylation array data. Front Genet 2:88. https://doi.org/10.3389/fgene.2011.00088

    Article  Google Scholar 

  9. Dogan Y, Birant D, Kut A (2013) SOM++: integration of self-organizing map and k-means++ algorithms. In International Workshop on Machine Learning and Data Mining in Pattern Recognition (pp. 246–259). Springer, Berlin, Heidelberg.

  10. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. Knowledge Discov Data Min 96(34):226–231

    Google Scholar 

  11. Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis. Wiley

    Book  Google Scholar 

  12. Firnhaber C, Pühler A, Küster H (2005) EST sequencing and time course microarray hybridizations identify more than 700 Medicago truncatula genes with developmental expression regulation in flowers and pods. Planta 222(2):269–283

    Article  Google Scholar 

  13. Hamid JS, Meaney C, Crowcroft NS, Granerod J, Beyene J (2010) Cluster analysis for identifying sub-groups and selecting potential discriminatory variables in human encephalitis. BMC Infect Dis 10(1):364

    Article  Google Scholar 

  14. Hennig C, Meila M, Murtagh F, Rocci R (Eds.) (2015) Handbook of cluster analysis. CRC Press.

  15. Herrmann L, Ultsch A (2007) Label propagation for semi-supervised learning in self-organizing maps. In International Workshop on Self-Organizing Maps: Proceedings (2007).

  16. Huai-bin W, Hong-liang Y, Zhi-Jian XU, Zheng Y (2010) A clustering algorithm use SOM and k-means in intrusion detection. In 2010 International Conference on E-Business and E-Government (pp. 1281–1284). IEEE.

  17. Kaufman L, Rousseeuw PJ (1987) Clustering by means of Medoids. In: Dodge Y (ed) Statistical data analysis based on the L1 norm and related methods. North-Holland, Amsterdam, pp 405–416

    Google Scholar 

  18. Kohonen T (1982) Self-organized formation of topologically correct feature maps. Biol Cybern 43(1):59–69

    Article  MathSciNet  Google Scholar 

  19. Kotyrba M, Volná E, Komínková Oplatková Z (2014) Comparison of modern clustering algorithms for twodimensional data. In Proceedings-28th European Conference on Modelling and Simulation, ECMS 2014. European Council for Modelling and Simulation.

  20. Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. In Advances in neural information processing systems (pp. 556–562).

  21. Lopez C, Tucker S, Salameh T, Tucker C (2018) An unsupervised machine learning method for discovering patient clusters based on genetic signatures. J Biomed Inform 85:30–39

    Article  Google Scholar 

  22. MacQueen J (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (Vol. 1, No. 14, pp. 281–297).

  23. Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336):846–850

    Article  Google Scholar 

  24. Rastin P, Cabanes G, Verde R, Bennani Y, Couronne T (2019) Generative histogram-based model using unsupervised learning. In International Conference on Neural Information Processing (pp. 634–646). Springer, Cham.

  25. Shen R, Olshen AB, Ladanyi M (2009) Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25(22):2906–2912

    Article  Google Scholar 

  26. Sakellariou A, Sanoudou D, Spyrou G (2012) Combining multiple hypothesis testing and affinity propagation clustering leads to accurate, robust and sample size independent classification on gene expression data. BMC Bioinform 13(1):270

    Article  Google Scholar 

  27. Shukla N, Hagenbuchner M, Win KT, Yang J (2018) Breast cancer data analysis for survivability studies and prediction. Comput Methods Program Biomed 155:199–208

    Article  Google Scholar 

  28. Šefar S (2017) Comparative study of clustering methods (in Czech). Diploma Thesis. University of Ostrava.

  29. Ultsch A (2005) Clustering with SOM: U*C. In Proceedings of the Workshop on Self-Organizing Maps (WSOM '05), Paris, France, (pp. 75–82).

  30. Ultsch A, Loetsch J (2017) Machine-learned cluster identification in high-dimensional data. J Biomed Inform 66:95–104

    Article  Google Scholar 

  31. Van Laerhoven K (2001) Combining the self-organizing map and k-means clustering for on-line classification of sensor data. In International Conference on Artificial Neural Networks (pp. 464–469). Springer, Berlin, Heidelberg.

  32. Wu J, Xia J, Chen J, Cui Z (2011) Moving object classification method based on SOM and K-means. JCP 6(8):1654–1661

    Google Scholar 

  33. Xu D, Tian Y (2015) A comprehensive survey of clustering algorithms. Ann Data Sci 2(2):165–193

    Article  MathSciNet  Google Scholar 

  34. Yorek N, Ugulu I, Aydin H (2016) Using self-organizing neural network map combined with ward’s clustering algorithm for visualization of students’ cognitive structural models about aliveness concept. Comput Intell Neurosci, 2016.

Download references

Funding

This works was supported by TACR, project no. TL02000313 and also by University of Ostrava grant SGS05/PrF/2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eva Volna.

Ethics declarations

Conflict of interest

There have been no involvements that might raise the question of bias in the work reported or in the Conclusions, implications, or opinions stated. The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kotyrba, M., Volna, E., Jarusek, R. et al. The use of conventional clustering methods combined with SOM to increase the efficiency. Neural Comput & Applic 33, 16519–16531 (2021). https://doi.org/10.1007/s00521-021-06251-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06251-9

Keywords

Navigation