Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Literature review and analysis on big data stream classification techniques

Published: 01 January 2020 Publication History

Abstract

Rapid growth in technology and information lead the human to witness the improved growth in velocity, volume of data, and variety. The data in the business organizations demonstrate the development of big data applications. Because of the improving demand of applications, analysis of sophisticated streaming big data tends to become a significant area in data mining. One of the significant aspects of the research is employing deep learning approaches for effective extraction of complex data representations. Accordingly, this survey provides the detailed review of big data classification methodologies, like deep learning based techniques, Convolutional Neural Network (CNN) based techniques, K-Nearest Neighbor (KNN) based techniques, Neural Network (NN) based techniques, fuzzy based techniques, and Support vector based techniques, and so on. Moreover, a detailed study is made by concerning the parameters, like evaluation metrics, implementation tool, employed framework, datasets utilized, adopted classification methods, and accuracy range obtained by various techniques. Eventually, the research gaps and issues of various big data classification schemes are presented.

References

[1]
M. Mohseni, P. Rebentrost and S. Lloyd, Quantum support vector machine for big data classification, Physical Review Letters 113(13) (2014), 130503.
[2]
D. Niyato, H.P. Tan, M.A. Alsheikh, S. Lin and Z. Han, Mobile big data analytics using deep learning and apache spark, IEEE Network 30(3) (2016), 22–29.
[3]
F. Herrera, I. Triguero and J. Maillo, A map reduce-based k-nearest neighbour approach for big data classification, In Trustcom/BigDataSE/ISPA 2 (August 2015), 167–172.
[4]
D. Merino, F. Herrera, H. Bustince, I. Triguero, J. Maillo and M. Galar, Evolutionary under sampling for extremely imbalanced big data classification under apache spark, In Evolutionary Computation (CEC), July 2016, 640–647.
[5]
B. Krawczyk, F. Herrera, J.M. Benitez, M. Wozniak, S. Garcia and S. Ramírez-Gallego, Nearest neighbour classification for high-speed big data streams using spark, IEEE Transactions on Systems, Man, and Cybernetics: Systems 47(10) (2017), 2727–2739.
[6]
L.T. Yang, M.J. Deen, P. Li, Q. Zhang and Z. Chen, Deep convolutional computation model for feature learning on big data in internet of things, IEEE Transactions on Industrial Informatics 14(2) (2018), 790–798.
[7]
L.T. Yang, M.J. Deen, P. Li, Q. Zhang and Z. Chen, Privacy-preserving double-projection deep computation model with crowdsourcing on cloud for big data feature learning, IEEE Internet of Things Journal 5(4) (2018), 2896–2903.
[8]
A.K. Bishwas, A. Mani and V. Palade, An all-pair quantum SVM approach for big data multiclass classification, Quantum Information Processing 17(10) (2018), 282.
[9]
A. Hassanat, Norm-based binary search trees for speeding up KNN big data classification, Computers 7(4) (2018), 54.
[10]
B. Liu, D. Shen, E. Blasch, G. Chen and Y. Chen, Scalable sentiment classification for big data analysis using naive bayes classifier, in: Proceedings of International Conference on Big Data, IEEE, October. 2013, pp. 99–104.
[11]
R.C. Bhagat and S.S. Patil, Enhanced SMOTE algorithm for classification of imbalanced big-data using random forest, in: Proceedings of International Advance Computing Conference (IACC), IEEE, June 2015, pp. 403–408.
[12]
A.K. Bishwas, A. Mani and V. Palade, Big data classification with quantum multiclass SVM and quantum one-against-all approach, in: Proceedings of 2nd International Conference on Contemporary Computing and Informatics, December 2016, pp. 875–880.
[13]
F. Bao, Q. Dai, Y. Kong, Y. Deng and Z. Ren, A hierarchical fused fuzzy deep neural network for data classification, IEEE Transactions on Fuzzy Systems 25(4) (2017), 1006–1012.
[14]
B. Lo, C. Wong, D. Ravi and G.Z. Yang, A deep learning approach to on-node sensor data analytics for mobile or wearable devices, IEEE Journal of Biomedical and Health Informatics 21(1) (2017), 56–64.
[15]
M.L. Shyu, Q. Zhu, S.C. Chen and Y. Yan, A classifier ensemble framework for multimedia big data classification, in: Proceedings of 17th International Conference on Information Reuse and Integration (IRI), IEEE, July 2016, pp. 615–622.
[16]
H. Salehfar, P. Ranganathan and S.J. Plathottam, Convolutional Neural Networks (CNNs) for power system big data analysis, in: Power Symposium (NAPS), IEEE, September 2017, North American pp. 1–6.
[17]
A. Fernández, F. Herrera and S. del Río, A first approach in evolutionary fuzzy systems based on the lateral tuning of the linguistic labels for big data classification, in: Proceedings of International Conference on Fuzzy Systems, July 2016, pp. 1437–1444.
[18]
J. Wang, M.I. Jordan, M. Long and Y. Cao, Learning transferable features with deep adaptation networks, 2015.
[19]
A. Fernandez, L. Íñiguez and M. Galar, Improving Fuzzy Rule Based Classification Systems in Big Data via Support-based Filtering, in: Proceedings of International Conference on Fuzzy Systems, July 2018, pp. 1–8.
[20]
M. El Bakry, O. Hegazy and S. Safwat, Big data classification using fuzzy K-nearest neighbor, International Journal of Computers and Applications 132(10) (December 2015), 8–13.
[21]
S.P. Nie and W. Shan, Shuffled frog-leaping algorithm based neural network and its using in big data set, in: Proceedings of International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery, July 2017, pp. 707–711.
[22]
A. Akil Kumar, P. Mithunkumar, R. Anitha, R. Kiruthiga and V. Priya, Improved Fuzzy Rule Based Classification System Using Feature Selection and Bagging for Large Datasets, International Journal of Science and Research, 2015.
[23]
B.R. Jeetha and S. Meera, Acceleration artificial bee colony optimization-artificial neural network for optimal feature selection over big data, in: Proceedings of International Conference on Power, Control, Signals and Instrumentation Engineering, September 2017, pp. 1698–1706.
[24]
E.J.M. Carranza, R. Zuo and Y. Xiong, Mapping mineral prospectivity through big data analytics and a deep learning algorithm, Ore Geology Reviews 102 (2018), 811–817.
[25]
K. Nakae, M. Koyama, S. Ishii, S. Koyamada and Y. Shikauchi, Deep learning of MRI big data: a novel approach to subject-transfer decoding, 2015.
[26]
B. Xiao, E. Wang, X. Jin, Y. Li, Y. Zou and Z. Guo, Mariana: Tencent deep learning platform and its applications, In Proceedings of the VLDB Endowment 7(13) (2014), 1772–1777.
[27]
A. Bertaux, C. Cruz, N. Silva, R. Peixoto and T. Hassan, Semantic HMC: A predictive model using multi-label classification for big data, In Trustcom/BigDataSE/ISPA, IEEE 2 (August 2015), 173–179.
[28]
A. Segatori, F. Marcelloni and W. Pedrycz, On distributed fuzzy decision trees for big data, IEEE Transactions on Fuzzy Systems 26(1) (2018), 174–192.
[29]
J. Guo, P. Qin and W. Xu, An empirical convolutional neural network approach for semantic relation classification, Neurocomputing, 2016, 1–9.
[30]
D. Kuang and L. He, Classification on ADHD with deep learning, in: Proceedings of International Conference on Cloud Computing and Big Data (CCBD), November 2014, pp. 27–32.
[31]
S. Sharma and V. Mangat, Relevance vector machine classification for big data on Ebola outbreak, in: Proceedings of 1st International Conference on Next Generation Computing Technologies (NGCT), September. 2015, pp. 639–643.
[32]
W. Chen, Y. Chen and Z. Cui, Multi-scale convolutional neural networks for time series classification, 2016.
[33]
A. Vasilakos, R. Wong and S. Fong, Accelerated PSO swarm search feature selection for data stream mining big data, IEEE Transactions on Services Computing 1 (2016), 1–1.
[34]
D. Peralta, F. Herrera, I. Triguero, J. Bacardit and S. García, MRPR: A MapReduce solution for prototype reduction in big data classification, Neurocomputing, 2015, 331–345.
[35]
G.Q. Wu, W. Ding, X. Wu and X. Zhu, Data mining with big data, IEEE Transactions on Knowledge and Data Engineering 26(1) (2014), 97–107.
[36]
A.E. Hassanien and A.T. Azar, Dimensionality reduction of medical big data using neural-fuzzy classifier, Soft Computing 19(4) (2015), 1115–1127.
[37]
A. Ouyang, C. Chen, K. Li and Z. Tang, GPU-accelerated parallel hierarchical extreme learning machine on flink for big data, IEEE Transactions on Systems, Man, and Cybernetics: Systems 47(10) (2017), 2740–2753.
[38]
J. Zeng, L. Cao, Q. Zou and R. Ji, A novel features ranking metric with application to scalable visual and bioinformatics data classification, Neurocomputing 173 (2016), 346–354.
[39]
A. Bifet, B. Pfathringer, G. de Francisci Morales, G. Holmes and J. Read, Efficient online evaluation of big data stream classifiers, in: Proceedings of21th International Conference on Knowledge Discovery and Data Mining, August 2015, pp. 59–68.
[40]
L. Rutkowski, L. Pietruczuk, M. Jaworski and P. Duda, A new method for data stream mining based on the misclassification error, IEEE Transactions on Neural Networks and Learning Systems 26(5) (2015), 1048–1059.
[41]
D. Puthal, J. Chen, R. Ranjan and S. Nepal, DLSeF: A dynamic key-length-based efficient real-time security verification model for big data stream, ACM Transactions on Embedded Computing Systems (TECS) 16(2) (2017), 51.
[42]
C.Z. Gao, J. Li, K. Chen, P. Li, Z. Huang and W.B. Chen, Privacy-preserving outsourced classification in cloud computing, Cluster Computing, 2017, 1-10.
[43]
D. Lobell, M. Burke, M. Xie, N. Jean and S. Ermon, Transfer learning from deep features for remote sensing and poverty mapping, in: Proceedings of13th AAAI Conference on Artificial Intelligence, March 2016.
[44]
G. De Francisci Morales, SAMOA: A platform for mining big data stream, in: Proceedings of 22nd International Conference on World Wide Web, May 2013, pp. 777–778.
[45]
A. Bifet, A.T. Vu, G.D.F. Morales and J. Gama, Distributed adaptive model rules for mining big data streams, in: Proceedings of International Conference on Big Data, October 2014, pp. 345–353.
[46]
J. Yang, L. Huang, L. Qian, Q. Yu, X. Dong and Y. Guan, A multiclass classification method based on deep learning for named entity recognition in electronic medical records, in: Scientific Data Summit (NYSDS), 2016 August, pp. 1–10.
[47]
M.R. Mundada and S. Hegde, A Hybrid Approach of Deep Learning with Cognitive Particle Swarm Optimization for the Big Data Analytics, in: Proceedings of 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT), July 2018, pp. 1–5.
[48]
B. Twardowski and D. Ryzko, Multi-agent architecture for real-time big data processing, in: Proceedings of International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies, Vol. 3, August 2014, pp. 333–337.
[49]
E. Torunski and M.O. Shafiq, Towards Map Reduce based Bayesian deep learning network for monitoring big data applications, in: Proceedings of International Conference on Big Data (Big Data), December 2017, pp. 2112–2121.
[50]
C. Chen, K. Li and T. Dai, A parallel randomized neural network on in-memory cluster computing for big data, in: Proceedings of 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), July 2017, pp. 1769–1778.
[51]
J. Zhai, M. Zhang, S. Zhang and X. Liu, Fuzzy integral-based ELM ensemble for imbalanced big data classification, Soft Computing 22(11) (2018), 3519–3531.
[52]
K. Vijayakumar, M.A. Praveena and R.J. Manoj, An ACO-ANN based feature selection algorithm for big data, Cluster Computing, March 2018, 1–8.
[53]
A. Bifet and G.D.F. Morales, Bigdata stream learning with Samoa, in: Proceedings of International Conference on Data Mining Workshop, December 2014, pp. 1199–1202.
[54]
J.M. Keller and M. Popescu, Random projections fuzzy k-nearest neighbour (RPFKNN) for big data classification, in: Proceedings of International Conference on Fuzzy Systems, July 2016, pp. 1813–1817.
[55]
J.C. Hung, K.C. Lin, K.Y. Zhang, N. Yen and Y.H. Huang, Feature selection based on an improved cat swarm optimization algorithm for big data classification, The Journal of Supercomputing 72(8) (2016), 3210–3221.
[56]
S. Suthaharan, Big data classification: Problems and challenges in network intrusion prediction with machine learning, ACM Sigmetrics Performance Evaluation Review 41(4) (2014), 70–73.
[57]
Chris Eaton, Dirk deRoos, Thomas Deutsch, George Lapis, Paul C. Zikopoulos, Understanding Big Data, October 2011.
[58]
B.S. Gandhi and L.A. Deshpande, The survey on approaches to efficient clustering and classification analysis of big data, in: Proceedings of International Conference on Computing Communication Control and Automation (ICCUBEA), August 2016, pp. 1–4.

Index Terms

  1. Literature review and analysis on big data stream classification techniques
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image International Journal of Knowledge-based and Intelligent Engineering Systems
          International Journal of Knowledge-based and Intelligent Engineering Systems  Volume 24, Issue 3
          2020
          92 pages

          Publisher

          IOS Press

          Netherlands

          Publication History

          Published: 01 January 2020

          Author Tags

          1. Big data streaming
          2. classification
          3. CNN
          4. accuracy
          5. Map Reduce framework

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 19 Nov 2024

          Other Metrics

          Citations

          View Options

          View options

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media