Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3503047.3503104acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaissConference Proceedingsconference-collections
research-article

Application of the automatic selection and configuration of clustering algorithms method for the Apache Spark framework

Published: 19 January 2022 Publication History

Abstract

This article proposes the MASSCAH method realization for Apache Spark clustering algorithms selection and configuration. Optimization of one of the clustering quality measures is used to configure the algorithm. In the course of this study, additional clustering quality measures were implemented that are not included in the Apache Spark framework, since at the moment only the silhouette criterion is available in the framework.

References

[1]
Teddy J Akiki and Chadi G Abdallah. 2019. Determining the hierarchical architecture of the human brain using subject-level clustering of functional networks. Scientific reports 9, 1 (2019), 1–15.
[2]
Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2 (2002), 235–256.
[3]
Arjun Bhasin. 2018-03-02. Credit Card Dataset for Clustering. https://www.kaggle.com/arjunbhasin2013/ccdata.
[4]
Gilles Celeux and Gérard Govaert. 1992. A classification EM algorithm for clustering and two stochastic versions. Computational statistics & Data analysis 14, 3 (1992), 315–332.
[5]
Apache Software Foundation. 2018. Apache Spark. Lightning-fast unified analytics engine.https://spark.apache.org/.
[6]
J.C. Gittins and D.M. Jones. 1979. A dynamic allocation index for the discounted multiarmed bandit problem. Biometrika 66, 3 (1979), 561–565.
[7]
Yu-Lin He, Xiao-Liang Zhang, Wei Ao, and Joshua Zhexue Huang. 2018. Determining the optimal temperature parameter for Softmax function in reinforcement learning. Applied Soft Computing 70 (2018), 80–85.
[8]
Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. 2011. Sequential model-based optimization for general algorithm configuration. In International conference on learning and intelligent optimization. Springer, 507–523.
[9]
Donald R Jones, Matthias Schonlau, and William J Welch. 1998. Efficient global optimization of expensive black-box functions. Journal of Global optimization 13, 4 (1998), 455–492.
[10]
L. Kazakovtsev, G. Shkaberina, I. Rozhnov, R. Li, and V. Kazakovtsev. 2020b. Genetic Algorithms with the Crossover-Like Mutation Operator for the k-Means Problem. In Proceedings of Communications in Computer and Information Science, (CCIS). 350–362.
[11]
L. Kazakovtsev, D. Stashkov, M. Gudyma, and V. Kazakovtsev. 2019. Algorithms with greedy heuristic procedures for mixture probability distribution separation. Yugoslav Journal of Operations Research 29, 1 (2019), 51–67.
[12]
Vladimir Kazakovtsev, Svyatoslav Oreshin, Alexey Serdyukov, Egor Krasheninnikov, Sergey Muravyov, Albert Bezvinnyi, Alexander Panfilov, Igor Glukhov, Yulia Kaliberda, Daniil Masalskiy, 2020a. Recommender system for an academic supervisor with a matrix normalization approach. In 2020 International Conference on Control, Robotics and Intelligent System. 84–87.
[13]
Frank Lin and William W Cohen. 2010. Power iteration clustering. In ICML.
[14]
Dirk Merkel. 2014. Docker: lightweight linux containers for consistent development and deployment. Linux journal 2014, 239 (2014), 2.
[15]
S.B. Muravyov. 2019. System for automatic selection and evaluation of clustering algorithms and their parameters. Ph. D. Dissertation. ITMO University, Saint Petersburg, Russia.
[16]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
[17]
Butch Quinto. 2020. Introduction to Spark and Spark MLlib. In Next-Generation Machine Learning with Spark. Springer, 29–96.
[18]
E. Rendón, I. Abundez, A. Arizmendi, and E. Quiroz. 2011. Internal versus external cluster validation indexes. International Journal of computers and communications 5, 1(2011), 27–34.
[19]
Viacheslav Shalamov, Valeria Efimova, Sergey Muravyov, and Andrey Filchenkov. 2018. Reinforcement-based method for simultaneous clustering algorithm selection and its hyperparameters optimization. Procedia Computer Science 136 (2018), 144–153.
[20]
Kevin Sheridan, Tejas G Puranik, Eugene Mangortey, Olivia J Pinon-Fischer, Michelle Kirby, and Dimitri N Mavris. 2020. An application of dbscan clustering for flight anomaly detection during the approach phase. In AIAA Scitech 2020 Forum. 1851.
[21]
Beata Strack, Jonathan P DeShazo, Chris Gennings, Juan L Olmo, Sebastian Ventura, Krzysztof J Cios, and John N Clore. 2014. Impact of HbA1c measurement on hospital readmission rates: analysis of 70,000 clinical database patient records. BioMed research international 2014 (2014).
[22]
Orlov V.I. and Fedosov V.V.2016. ERC clustering dataset. http://levk.info/data1526.zip.
[23]
Rui Xu and Donald Wunsch. 2005. Survey of clustering algorithms. IEEE Transactions on neural networks 16, 3 (2005), 645–678.
[24]
Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, Ion Stoica, 2010. Spark: Cluster computing with working sets.HotCloud 10, 10-10 (2010), 95.

Cited By

View all
  • (2024)Features of the mutation operator for the problems of automatic grouping of objectsPROCEEDINGS OF THE IV INTERNATIONAL CONFERENCE ON MODERNIZATION, INNOVATIONS, PROGRESS: Advanced Technologies in Material Science, Mechanical and Automation Engineering: MIP: Engineering-IV-202210.1063/5.0192962(060033)Online publication date: 2024

Index Terms

  1. Application of the automatic selection and configuration of clustering algorithms method for the Apache Spark framework
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    AISS '21: Proceedings of the 3rd International Conference on Advanced Information Science and System
    November 2021
    526 pages
    ISBN:9781450385862
    DOI:10.1145/3503047
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 January 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. algorithm selection
    2. apache spark
    3. clustering
    4. hyperparameter optimization
    5. machine learning
    6. multi-armed bandit
    7. reinforcement learning

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    AISS 2021

    Acceptance Rates

    Overall Acceptance Rate 41 of 95 submissions, 43%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Features of the mutation operator for the problems of automatic grouping of objectsPROCEEDINGS OF THE IV INTERNATIONAL CONFERENCE ON MODERNIZATION, INNOVATIONS, PROGRESS: Advanced Technologies in Material Science, Mechanical and Automation Engineering: MIP: Engineering-IV-202210.1063/5.0192962(060033)Online publication date: 2024

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media