Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

AutoML classifier clustering procedure

Published: 26 May 2022 Publication History

Abstract

Recommendation systems are one of the main applications of machine learning (ML) used across different industries. This paper presents a new automated machine learning (AutoML) method of providing recommendations by processing data sets using ML algorithms, targeting, and offering cluster recommendations for new observations and as a new decision support method. The AutoML conducts a complete procedure and includes analysis and division of data into an efficient number of clusters. We apply the k‐means, using the elbow method to calculate costs per cluster, followed by analyzing the allocation of the data into the clusters, thus providing a method for prediction and for allocating new observations to the relevant clusters (knn). This study includes two experiments using the complete AutoML procedure conducted on a data set, with more than two million records and dozens of attributes. This was done to demonstrate how the AutoML method can be implemented and successfully run with a high‐capacity analysis procedure. The motivation was to analyze, examine, assign, and integrate new observations into existing clusters that have been defined. The results showed that the AutoML method provided efficient recommendations for new observations with an accuracy rate of 99.99%. Hence, the AutoML procedure can offer a full system for any organization to efficiently split existing data into clusters, assign to clusters, and predict the cluster allocation of new observations. The significant contribution of this study is a simple method that can achieve fast and high accuracy clustering for ongoing (new) classified data acquired by an organization.

References

[1]
Jordan MI, Mitchell TM. Machine learning: trends, perspectives, and prospects. Science. 2015;349(6245):255‐260.
[2]
Oussous A, Benjelloun F‐Z, Lahcen AA, Belfkih S. Big data technologies: a survey. J King Saud Univ, Comput and Inf Sci. 2018;30(4):431‐448.
[3]
Wan L, Hong Y, Huang Z, Peng X, Li R. A hybrid ensemble learning method for tourist route recommendations based on geo‐tagged social networks. Int J Geogr Inf Sci. 2018;32(11):2225‐2246.
[4]
Molina LE. Recommendation System for Netflix. Vrije Universiteit; 2018.
[5]
Zheng B, Yoon SW, Lam SS. Breast cancer diagnosis based on feature extraction using a hybrid of K‐means and support vector machine algorithms. Expert Syst Appl. 2014;41(4):1476‐1482.
[6]
Logesh R, Subramaniyaswamy V, Vijayakumar V, Gao X‐Z, Indragandhi V. A hybrid quantum‐induced swarm intelligence clustering for the urban trip recommendation in smart city. Future Gener Comput Syst. 2018;83:653‐673.
[7]
Yigit H. ABC‐based distance‐weighted kNN algorithm. J Exp Theor Artif Intell. 2015;27(2):189‐198.
[8]
Hinz O, Eckert J, Skiera B. Drivers of the long tail phenomenon: an empirical analysis. Manag Inf Syst. 2011;27(4):43‐70.
[9]
Thusoo A, Sarma JS, Jain N, et al. Hive‐a petabyte scale data warehouse using hadoop. In: Proceedings of 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010) Conference; 2010:996‐1005.
[10]
Yang CC, Tang X, Dai Q, Yang H, Jiang L. Identifying implicit and explicit relationships through user activities in social media. Int J Electron Commer. 2013;18(2):73‐96.
[11]
Li C, Li H. Learning random model trees for regression. In J Comput Appl. 2011;33(3):258‐265.
[12]
Ning X, Desrosiers C, Karypis G. A comprehensive survey of neighborhood‐based recommendation methods. In: Ricci F, Rokach L, Shapira B, eds. Recommender Systems Handbook. Springer; 2015:37‐76.
[13]
Wang Y, Deng J, Gao J, Zhang P. A hybrid user similarity model for collaborative filtering. Inf Sci. 2017;418:102‐118.
[14]
Adeniyi DA, Wei Z, Yang Y. Personalised news filtering and recommendation system using Chi‐square statistics‐based K‐nearest neighbour (χ2 SB‐KNN) model. Enterp Inf Syst. 2017;11(9):1283‐1316.
[15]
Li D, Li Z, Li R. Automate the identification of technical patterns: a K‐nearest‐neighbour model approach. Appl Econ. 2018;50(17):1978‐1991.
[16]
Zhang H, Ge D, Zhang S. Hybrid recommendation system based on semantic interest community and trusted neighbors. Multimed Tools Appl. 2018;77(4):4187‐4202.
[17]
Ricci F, Rokach L, Shapira B. Introduction to recommender systems handbook. Recommender Systems Handbook. Springer; 2011:1‐35.
[18]
Lops P, De Gemmis M, Semeraro G. Content‐based recommender systems: state of the art and trends. In: Ricci F, Rokach L, Shapira B, Kantor P, eds. Recommender Systems Handbook. Springer; 2011:73‐105.
[19]
Pazzani MJ, Billsus D. Content‐based recommendation systems. In: Brusilovsky P, Kobsa A, Nejdl W, eds. The Adaptive Web. Springer; 2007:325‐341.
[20]
Koren Y, Bell R. Advances in collaborative filtering. In: Ricci F, Rokach L, Shapira B, eds. Recommender Systems Handbook. Springer; 2015:77‐118.
[21]
Elton DC, Boukouvalas Z, Butrico MS, Fuge MD, Chung PW. Applying machine learning techniques to predict the properties of energetic materials. Sci Rep. 2018;8(1):1‐12.
[22]
Premchaiswadi W, Poompuang P. Hybrid profiling for hybrid multicriteria recommendation based on implicit multicriteria information. ApAI. 2013;27(3):213‐234.
[23]
Yoder J, Priebe CE. Semi‐supervised k‐means++. JSCS. 2017;87(13):2597‐2608.
[24]
Gao M, Wu Z, Jiang F. Userrank for item‐based collaborative filtering recommendation. Inf Process Lett. 2011;111(9):440‐446.
[25]
Adomavicius G, Tuzhilin A. Toward the next generation of recommender systems: a survey of the state‐of‐the‐art and possible extensions. IEEE Trans Knowl Data Eng. 2005;17(6):734‐749.
[26]
Portugal I, Alencar P, Cowan D. The use of machine learning algorithms in recommender systems: a systematic review. Expert Syst Appl. 2018;97:205‐227.
[27]
Abd SN, Ibraheem HR. Rao‐SVM machine learning algorithm for intrusion detection system. Iraqi J for Comput Sci Math. 2020;1(1):23‐27.
[28]
Nagarajah T, Poravi G. A review on automated machine learning (AutoML) systems. In: Proceedings of 5th International Conference for Convergence in Technology (I2CT) Conference; 2019:1‐6. IEEE.
[29]
Gijsbers P, LeDell E, Thomas J, Poirier S, Bischl B, Vanschoren J. An open source AutoML benchmark. In: 6th ICML Workshop on Automated Machine Learning, Long Beach, CA; 2019.
[30]
Guyon I, Sun‐Hosoya L, Boullé M, et al. Analysis of the AutoML challenge series. In: Hutter F, Kotthoff L, Vanschoren J, editors. Automated Machine Learning; 2019:177‐219. Springer.
[31]
He X, Zhao K, Chu X. AutoML: a survey of the state‐of‐the‐art. Knowl‐Based Syst. 2021;212:106622.
[32]
Yao Q, Wang M, Chen Y, Dai W, Li YF, Tu WW, … & Yu Y. Taking human out of learning applications: a survey on automated machine learning. 2018. arXiv preprint arXiv: 1810.13306.
[33]
Agrapetidou A, Charonyktakis P, Gogas P, Papadimitriou T, Tsamardinos I. An AutoML application to forecasting bank failures. Appl Econ Letters. 2021;28(1):5‐9.
[34]
Li Z, Guo H, Wang WM, et al. A blockchain and AutoML approach for open and automated customer service. IEEE Trans Industr Inform. 2019;15(6):3642‐3651.
[35]
Waring J, Lindvall C, Umeton R. Automated machine learning: review of the state‐of‐the‐art and opportunities for healthcare. Artif Intell Med. 2020;104:101822.
[36]
Shvachko K, Kuang H, Radia S, Chansler R. The hadoop distributed file system. In: Proceedings of 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) Conference. 2010:1‐10. IEEE.
[37]
Collingwood L, Wilkerson J. Tradeoffs in accuracy and efficiency in supervised learning methods. J Inf Technol Politics. 2012;9(3):298‐318.
[38]
Holtzman BK, Paté A, Paisley J, Waldhauser F, Repetto D. Machine learning reveals cyclic changes in seismic source spectra in Geysers geothermal field. Sci Adv. 2018;4(5):eaao2929.
[39]
Sardianos C, Varlamis I, Chronis C, et al. The emergence of explainability of intelligent systems: delivering explainable and personalized recommendations for energy efficiency. Int J Intell Syst. 2021;36(2):656‐680.
[40]
Amatriain X, Jaimes A, Oliver N, Pujol JM. Data mining methods for recommender systems. In: Ricci F, Rokach L, Shapira B, Kantor P, editors. Recommender Systems Handbook. Springer; 2011:39‐71.
[41]
Fürnkranz J, Gamberger D, Lavrač N. Foundations of Rule Learning. Springer Science & Business Media; 2012.
[42]
Jung YG, Kang MS, Heo J. Clustering performance comparison using K‐means and expectation maximization algorithms. Biotechnol Biotechnol Equip. 2014;28(suppl 1):S44‐S48.
[43]
Ali AH, Hussain ZF, Shamis NA. Big data classification efficiency based on linear discriminant analysis. Iraqi J for Comput Sci Math. 2020:7‐12.
[44]
Al Aghbari Z, Al‐Hamadi A. Efficient KNN search by linear projection of image clusters. Int J Intell Syst. 2011;26(9):844‐865.
[45]
Syakur M, Khotimah B, Rochman E, Satoto B. Integration k‐means clustering method and elbow method for identification of the best customer profile cluster. In: Proceedings of IOP Conference Series: Materials Science and Engineering Conference. Vol 336. IOP Publishing; 2018:012017.
[46]
Chang R‐I, Lin S‐Y, Ho J‐M, Fann C‐W, Wang Y‐C. A novel content based image retrieval system using k‐means/knn with feature extraction. Comput Sci Inf Syst. 2012;9(4):1645‐1661.
[47]
Fachrurrozi M, Fiqih A, Saputra BR, Algani R, Primanita A. Content based image retrieval for multi‐objects fruits recognition using k‐means and k‐nearest neighbor. In: Proceedings of 2017 International Conference on Data and Software Engineering (ICoDSE) Conference. IEEE; 2017:1‐6.
[48]
Manikandan P. Medical big data classification using a combination of random forest classifier and k‐means clustering. Int J Intell Syst Appl. 2018;10(11):11‐19.
[49]
Muda Z, Yassin W, Sulaiman M, Udzir N. Intrusion detection based on K‐Means clustering and Naïve Bayes classification. In: Proceedings of 2011 7th International Conference on Information Technology in Asia Conference. IEEE; 2011:1‐6.
[50]
Karegowda AG, Jayaram M, Manjunath A. Cascading k‐means clustering and k‐nearest neighbor classifier for categorization of diabetic patients. Int J Eng Adv Technol. 2012;1(3):147‐151.
[51]
NirmalaDevi M, alias Balamurugan SA, Swathi U. An amalgam KNN to predict diabetes mellitus. In: Proceedings of 2013 IEEE International Conference ON Emerging Trends in Computing, Communication and Nanotechnology (ICECCN) Conference. IEEE; 2013:691‐695.
[52]
Kodinariya TM, Makwana PR. Review on determining number of clusters in K‐means clustering. IJARCSMS. 2013;1(6):90‐95.
[53]
Bholowalia P, Kumar A. EBK‐means: a clustering technique based on elbow method and k‐means in WSN. Int J Comput Appl. 2014;105(9):17‐24.
[54]
Kim K‐j, Ahn H. Recommender systems using cluster‐indexing collaborative filtering and social data analytics. IJPR. 2017;55(17):5037‐5049.
[55]
Li J, Zhang K, Yang X, et al. Category preferred canopy–K‐means based collaborative filtering algorithm. Future Gener Comput Syst. 2019;93:1046‐1054.
[56]
Kant S, Mahara T. Nearest biclusters collaborative filtering framework with fusion. J Comput Sci. 2018;25:204‐212.
[57]
Tripathi AK, Sharma K, Bala M. A novel clustering method using enhanced grey wolf optimizer and mapreduce. Big Data Res. 2018;14:93‐100.
[58]
Koren O, Hallin CA, Perel N, Bendet D. Enhancement of the k‐means algorithm for mixed data in big data platforms. In: Proceedings of SAI Intelligent Systems Conference Conference: Springer; 2018:1025‐1040.
[59]
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit‐learn: machine learning in Python. J Mach Learn Res. 2011;12:2825‐2830.
[60]
Dua D, Graff C. UCI Machine Learning Repository. University of California, School of Information and Computer Science; 2017. http://archive.ics.uci.edu/ml
[61]
Han F, Sun W, Ling Q‐H. A novel strategy for gene selection of microarray data based on gene‐to‐class sensitivity information. PLoS One. 2014;9(5):e97530.
[62]
Manjusha M, Harikumar R. Performance analysis of KNN classifier and K‐means clustering for robust classification of epilepsy from EEG signals. In: Proceedings of 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET) Conference. IEEE; 2016:2412‐2416.
[63]
Buana, PW, Jannet SDRM, Putra IKGD. Combination of k‐nearest neighbor and k‐means based on term re‐weighting for classifying Indonesian news. Int J Comput Apps. 2012;50(11):37‐42.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image International Journal of Intelligent Systems
International Journal of Intelligent Systems  Volume 37, Issue 7
July 2022
585 pages
ISSN:0884-8173
DOI:10.1002/int.v37.7
Issue’s Table of Contents

Publisher

John Wiley and Sons Ltd.

United Kingdom

Publication History

Published: 26 May 2022

Author Tags

  1. AutoML
  2. classification
  3. clustering
  4. k‐means
  5. targeting

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media