research-article

AutoML classifier clustering procedure

Authors:

Carina A. Hallin,

Amir A. IssaAuthors Info & Claims

International Journal of Intelligent Systems, Volume 37, Issue 7

Pages 4214 - 4232

https://doi.org/10.1002/int.22718

Published: 26 May 2022 Publication History

Abstract

Recommendation systems are one of the main applications of machine learning (ML) used across different industries. This paper presents a new automated machine learning (AutoML) method of providing recommendations by processing data sets using ML algorithms, targeting, and offering cluster recommendations for new observations and as a new decision support method. The AutoML conducts a complete procedure and includes analysis and division of data into an efficient number of clusters. We apply the k‐means, using the elbow method to calculate costs per cluster, followed by analyzing the allocation of the data into the clusters, thus providing a method for prediction and for allocating new observations to the relevant clusters (knn). This study includes two experiments using the complete AutoML procedure conducted on a data set, with more than two million records and dozens of attributes. This was done to demonstrate how the AutoML method can be implemented and successfully run with a high‐capacity analysis procedure. The motivation was to analyze, examine, assign, and integrate new observations into existing clusters that have been defined. The results showed that the AutoML method provided efficient recommendations for new observations with an accuracy rate of 99.99%. Hence, the AutoML procedure can offer a full system for any organization to efficiently split existing data into clusters, assign to clusters, and predict the cluster allocation of new observations. The significant contribution of this study is a simple method that can achieve fast and high accuracy clustering for ongoing (new) classified data acquired by an organization.

References

[1]

Jordan MI, Mitchell TM. Machine learning: trends, perspectives, and prospects. Science. 2015;349(6245):255‐260.

[2]

Oussous A, Benjelloun F‐Z, Lahcen AA, Belfkih S. Big data technologies: a survey. J King Saud Univ, Comput and Inf Sci. 2018;30(4):431‐448.

[3]

Wan L, Hong Y, Huang Z, Peng X, Li R. A hybrid ensemble learning method for tourist route recommendations based on geo‐tagged social networks. Int J Geogr Inf Sci. 2018;32(11):2225‐2246.

[4]

Molina LE. Recommendation System for Netflix. Vrije Universiteit; 2018.

[5]

Zheng B, Yoon SW, Lam SS. Breast cancer diagnosis based on feature extraction using a hybrid of K‐means and support vector machine algorithms. Expert Syst Appl. 2014;41(4):1476‐1482.

Digital Library

[6]

Logesh R, Subramaniyaswamy V, Vijayakumar V, Gao X‐Z, Indragandhi V. A hybrid quantum‐induced swarm intelligence clustering for the urban trip recommendation in smart city. Future Gener Comput Syst. 2018;83:653‐673.

[7]

Yigit H. ABC‐based distance‐weighted kNN algorithm. J Exp Theor Artif Intell. 2015;27(2):189‐198.

[8]

Hinz O, Eckert J, Skiera B. Drivers of the long tail phenomenon: an empirical analysis. Manag Inf Syst. 2011;27(4):43‐70.

Digital Library

[9]

Thusoo A, Sarma JS, Jain N, et al. Hive‐a petabyte scale data warehouse using hadoop. In: Proceedings of 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010) Conference; 2010:996‐1005.

[10]

Yang CC, Tang X, Dai Q, Yang H, Jiang L. Identifying implicit and explicit relationships through user activities in social media. Int J Electron Commer. 2013;18(2):73‐96.

[11]

Li C, Li H. Learning random model trees for regression. In J Comput Appl. 2011;33(3):258‐265.

[12]

Ning X, Desrosiers C, Karypis G. A comprehensive survey of neighborhood‐based recommendation methods. In: Ricci F, Rokach L, Shapira B, eds. Recommender Systems Handbook. Springer; 2015:37‐76.

[13]

Wang Y, Deng J, Gao J, Zhang P. A hybrid user similarity model for collaborative filtering. Inf Sci. 2017;418:102‐118.

Digital Library

[14]

Adeniyi DA, Wei Z, Yang Y. Personalised news filtering and recommendation system using Chi‐square statistics‐based K‐nearest neighbour (χ² SB‐KNN) model. Enterp Inf Syst. 2017;11(9):1283‐1316.

Digital Library

[15]

Li D, Li Z, Li R. Automate the identification of technical patterns: a K‐nearest‐neighbour model approach. Appl Econ. 2018;50(17):1978‐1991.

[16]

Zhang H, Ge D, Zhang S. Hybrid recommendation system based on semantic interest community and trusted neighbors. Multimed Tools Appl. 2018;77(4):4187‐4202.

[17]

Ricci F, Rokach L, Shapira B. Introduction to recommender systems handbook. Recommender Systems Handbook. Springer; 2011:1‐35.

[18]

Lops P, De Gemmis M, Semeraro G. Content‐based recommender systems: state of the art and trends. In: Ricci F, Rokach L, Shapira B, Kantor P, eds. Recommender Systems Handbook. Springer; 2011:73‐105.

[19]

Pazzani MJ, Billsus D. Content‐based recommendation systems. In: Brusilovsky P, Kobsa A, Nejdl W, eds. The Adaptive Web. Springer; 2007:325‐341.

[20]

Koren Y, Bell R. Advances in collaborative filtering. In: Ricci F, Rokach L, Shapira B, eds. Recommender Systems Handbook. Springer; 2015:77‐118.

[21]

Elton DC, Boukouvalas Z, Butrico MS, Fuge MD, Chung PW. Applying machine learning techniques to predict the properties of energetic materials. Sci Rep. 2018;8(1):1‐12.

[22]

Premchaiswadi W, Poompuang P. Hybrid profiling for hybrid multicriteria recommendation based on implicit multicriteria information. ApAI. 2013;27(3):213‐234.

[23]

Yoder J, Priebe CE. Semi‐supervised k‐means++. JSCS. 2017;87(13):2597‐2608.

[24]

Gao M, Wu Z, Jiang F. Userrank for item‐based collaborative filtering recommendation. Inf Process Lett. 2011;111(9):440‐446.

Digital Library

[25]

Adomavicius G, Tuzhilin A. Toward the next generation of recommender systems: a survey of the state‐of‐the‐art and possible extensions. IEEE Trans Knowl Data Eng. 2005;17(6):734‐749.

Digital Library

[26]

Portugal I, Alencar P, Cowan D. The use of machine learning algorithms in recommender systems: a systematic review. Expert Syst Appl. 2018;97:205‐227.

[27]

Abd SN, Ibraheem HR. Rao‐SVM machine learning algorithm for intrusion detection system. Iraqi J for Comput Sci Math. 2020;1(1):23‐27.

[28]

Nagarajah T, Poravi G. A review on automated machine learning (AutoML) systems. In: Proceedings of 5th International Conference for Convergence in Technology (I2CT) Conference; 2019:1‐6. IEEE.

[29]

Gijsbers P, LeDell E, Thomas J, Poirier S, Bischl B, Vanschoren J. An open source AutoML benchmark. In: 6th ICML Workshop on Automated Machine Learning, Long Beach, CA; 2019.

[30]

Guyon I, Sun‐Hosoya L, Boullé M, et al. Analysis of the AutoML challenge series. In: Hutter F, Kotthoff L, Vanschoren J, editors. Automated Machine Learning; 2019:177‐219. Springer.

[31]

He X, Zhao K, Chu X. AutoML: a survey of the state‐of‐the‐art. Knowl‐Based Syst. 2021;212:106622.

[32]

Yao Q, Wang M, Chen Y, Dai W, Li YF, Tu WW, … & Yu Y. Taking human out of learning applications: a survey on automated machine learning. 2018. arXiv preprint arXiv: 1810.13306.

[33]

Agrapetidou A, Charonyktakis P, Gogas P, Papadimitriou T, Tsamardinos I. An AutoML application to forecasting bank failures. Appl Econ Letters. 2021;28(1):5‐9.

[34]

Li Z, Guo H, Wang WM, et al. A blockchain and AutoML approach for open and automated customer service. IEEE Trans Industr Inform. 2019;15(6):3642‐3651.

[35]

Waring J, Lindvall C, Umeton R. Automated machine learning: review of the state‐of‐the‐art and opportunities for healthcare. Artif Intell Med. 2020;104:101822.

[36]

Shvachko K, Kuang H, Radia S, Chansler R. The hadoop distributed file system. In: Proceedings of 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) Conference. 2010:1‐10. IEEE.

[37]

Collingwood L, Wilkerson J. Tradeoffs in accuracy and efficiency in supervised learning methods. J Inf Technol Politics. 2012;9(3):298‐318.

[38]

Holtzman BK, Paté A, Paisley J, Waldhauser F, Repetto D. Machine learning reveals cyclic changes in seismic source spectra in Geysers geothermal field. Sci Adv. 2018;4(5):eaao2929.

[39]

Sardianos C, Varlamis I, Chronis C, et al. The emergence of explainability of intelligent systems: delivering explainable and personalized recommendations for energy efficiency. Int J Intell Syst. 2021;36(2):656‐680.

Digital Library

[40]

Amatriain X, Jaimes A, Oliver N, Pujol JM. Data mining methods for recommender systems. In: Ricci F, Rokach L, Shapira B, Kantor P, editors. Recommender Systems Handbook. Springer; 2011:39‐71.

[41]

Fürnkranz J, Gamberger D, Lavrač N. Foundations of Rule Learning. Springer Science & Business Media; 2012.

[42]

Jung YG, Kang MS, Heo J. Clustering performance comparison using K‐means and expectation maximization algorithms. Biotechnol Biotechnol Equip. 2014;28(suppl 1):S44‐S48.

[43]

Ali AH, Hussain ZF, Shamis NA. Big data classification efficiency based on linear discriminant analysis. Iraqi J for Comput Sci Math. 2020:7‐12.

[44]

Al Aghbari Z, Al‐Hamadi A. Efficient KNN search by linear projection of image clusters. Int J Intell Syst. 2011;26(9):844‐865.

Digital Library

[45]

Syakur M, Khotimah B, Rochman E, Satoto B. Integration k‐means clustering method and elbow method for identification of the best customer profile cluster. In: Proceedings of IOP Conference Series: Materials Science and Engineering Conference. Vol 336. IOP Publishing; 2018:012017.

[46]

Chang R‐I, Lin S‐Y, Ho J‐M, Fann C‐W, Wang Y‐C. A novel content based image retrieval system using k‐means/knn with feature extraction. Comput Sci Inf Syst. 2012;9(4):1645‐1661.

[47]

Fachrurrozi M, Fiqih A, Saputra BR, Algani R, Primanita A. Content based image retrieval for multi‐objects fruits recognition using k‐means and k‐nearest neighbor. In: Proceedings of 2017 International Conference on Data and Software Engineering (ICoDSE) Conference. IEEE; 2017:1‐6.

[48]

Manikandan P. Medical big data classification using a combination of random forest classifier and k‐means clustering. Int J Intell Syst Appl. 2018;10(11):11‐19.

[49]

Muda Z, Yassin W, Sulaiman M, Udzir N. Intrusion detection based on K‐Means clustering and Naïve Bayes classification. In: Proceedings of 2011 7th International Conference on Information Technology in Asia Conference. IEEE; 2011:1‐6.

[50]

Karegowda AG, Jayaram M, Manjunath A. Cascading k‐means clustering and k‐nearest neighbor classifier for categorization of diabetic patients. Int J Eng Adv Technol. 2012;1(3):147‐151.

[51]

NirmalaDevi M, alias Balamurugan SA, Swathi U. An amalgam KNN to predict diabetes mellitus. In: Proceedings of 2013 IEEE International Conference ON Emerging Trends in Computing, Communication and Nanotechnology (ICECCN) Conference. IEEE; 2013:691‐695.

[52]

Kodinariya TM, Makwana PR. Review on determining number of clusters in K‐means clustering. IJARCSMS. 2013;1(6):90‐95.

[53]

Bholowalia P, Kumar A. EBK‐means: a clustering technique based on elbow method and k‐means in WSN. Int J Comput Appl. 2014;105(9):17‐24.

[54]

Kim K‐j, Ahn H. Recommender systems using cluster‐indexing collaborative filtering and social data analytics. IJPR. 2017;55(17):5037‐5049.

[55]

Li J, Zhang K, Yang X, et al. Category preferred canopy–K‐means based collaborative filtering algorithm. Future Gener Comput Syst. 2019;93:1046‐1054.

Digital Library

[56]

Kant S, Mahara T. Nearest biclusters collaborative filtering framework with fusion. J Comput Sci. 2018;25:204‐212.

[57]

Tripathi AK, Sharma K, Bala M. A novel clustering method using enhanced grey wolf optimizer and mapreduce. Big Data Res. 2018;14:93‐100.

[58]

Koren O, Hallin CA, Perel N, Bendet D. Enhancement of the k‐means algorithm for mixed data in big data platforms. In: Proceedings of SAI Intelligent Systems Conference Conference: Springer; 2018:1025‐1040.

[59]

Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit‐learn: machine learning in Python. J Mach Learn Res. 2011;12:2825‐2830.

Digital Library

[60]

Dua D, Graff C. UCI Machine Learning Repository. University of California, School of Information and Computer Science; 2017. http://archive.ics.uci.edu/ml

[61]

Han F, Sun W, Ling Q‐H. A novel strategy for gene selection of microarray data based on gene‐to‐class sensitivity information. PLoS One. 2014;9(5):e97530.

[62]

Manjusha M, Harikumar R. Performance analysis of KNN classifier and K‐means clustering for robust classification of epilepsy from EEG signals. In: Proceedings of 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET) Conference. IEEE; 2016:2412‐2416.

[63]

Buana, PW, Jannet SDRM, Putra IKGD. Combination of k‐nearest neighbor and k‐means based on term re‐weighting for classifying Indonesian news. Int J Comput Apps. 2012;50(11):37‐42.

Cited By

Cheng HYing SDuan XYuan W(2024)DLLogInternational Journal of Intelligent Systems10.1155/2024/59619932024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/5961993
Pan BChen WDeng LXu CZhou X(2023)Classifier selection using geometry preserving featureNeural Computing and Applications10.1007/s00521-023-08828-y35:28(20955-20976)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1007/s00521-023-08828-y

Recommendations

Improving a Centroid-Based Clustering by Using Suitable Centroids from Another Clustering
Abstract
Fast centroid-based clustering algorithms such as k-means usually converge to a local optimum. In this work, we propose a method for constructing a better clustering from two such suboptimal clustering solutions based on the fact that each ...
Ant clustering algorithm with K-harmonic means clustering

Clustering is an unsupervised learning procedure and there is no a prior knowledge of data distribution. It organizes a set of objects/data into similar groups called clusters, and the objects within one cluster are highly similar and dissimilar with ...
ModEx and Seed-Detective

In this paper we present two clustering techniques called ModEx and Seed-Detective. ModEx is a modified version of an existing clustering technique called Ex-Detective. It addresses some limitations of Ex-Detective. Seed-Detective is a combination of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal of Intelligent Systems

International Journal of Intelligent Systems Volume 37, Issue 7

July 2022

585 pages

ISSN:0884-8173

DOI:10.1002/int.v37.7

Issue’s Table of Contents

© 2021 Wiley Periodicals LLC.

Publisher

John Wiley and Sons Ltd.

United Kingdom

Publication History

Published: 26 May 2022

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Cheng HYing SDuan XYuan W(2024)DLLogInternational Journal of Intelligent Systems10.1155/2024/59619932024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/5961993
Pan BChen WDeng LXu CZhou X(2023)Classifier selection using geometry preserving featureNeural Computing and Applications10.1007/s00521-023-08828-y35:28(20955-20976)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1007/s00521-023-08828-y

View Options

View options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents