Classification by Clustering (CbC): An Approach of Classifying Big Data Based on Similarities

Sakib Shahriar Khan⁶,
Shakim Ahamed⁶,
Miftahul Jannat⁶,
Swakkhar Shatabda⁶ &
…
Dewan Md. Farid⁶

Part of the book series: Algorithms for Intelligent Systems ((AIS))

1343 Accesses
7 Citations

Abstract

Data classification in supervised learning is the process of classifying data for data mining task that helps to analyse data for decision-making. The objective of a classification model is to correctly predict the categorical class labels of known/unknown instances. In machine learning for data mining applications, the classification models are trained based on labelled training datasets. In this paper, we have investigated if we can build a classification model based on the similarities of the instances instead of class labels of instances. Data labelling is always very costly and time-consuming process, and it becomes a very difficult task if the data is big data. The proposed approach clusters the big data and builds the classifier based on the clusters without considering the class labels, which basically improve the performance of the classifier. However, we can relate to the clusters with class labels. We have collected 10 big data from the UC Irvine machine learning repository for experimental analysis and applied three popular decision tree induction algorithms: ID3 (Iterative Dichotomiser 3), C4.5 (extension of ID3 algorithm), and CART (Classification and Regression Tree) for classifier construction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Comparative Study in Data Mining: Clustering and Classification Capabilities

Data Cleansing Using Clustering

Classification Through Data Mining Algorithm

References

Aggarwal CC, Reddy CK (eds) (2013) Data clustering: algorithms and applications. Chapman and Hall/CRC data mining and knowledge discovery series. Chapman and Hall/CRC, Boca Raton
Google Scholar
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. Chapman and Hall/CRC, Boca Raton
Google Scholar
Chen X, Ye Y, Xu X, Huang JZ (2012) A feature group weighting method for subspace clustering of high-dimensional data. Pattern Recognit 45(1):434–446
Google Scholar
Dheeru D, Taniskidou EK (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Fan W, Bifet A (2013) Mining big data: current status, and forecast to the future. ACM SIGKDD Explor Newsl 14(2):1–5
Article Google Scholar
Farid DM, Al-Mamun MA, Manderick B, Nowe A (2016) An adaptive rule-based classifier for mining big biological data. Exp Syst Appl 64:305–316
Article Google Scholar
Farid DM, Nowé A, Manderick B (2016) A feature grouping method for ensemble clustering of high-dimensional genomic big data. In: Future technologies conference, San Francisco, United States, pp 260–268
Google Scholar
Farid DM, Rahman CM (2013) Assigning weights to training instances increases classification accuracy. Int J Data Min Knowl Manag Process 3(1):129–135
Google Scholar
Farid DM, Rahman CM (2013) Mining complex data streams: discretization, attribute selection and classification. J Adv Inf Technol 4(3):129–135
Google Scholar
Farid DM, Zhang L, Hossain A, Rahman CM, Strachan R, Sexton G, Dahal K (2013) An adaptive ensemble classifier for mining concept drifting data streams. Exp Syst Appl 40(15):5895–5906
Article Google Scholar
Farid DM, Zhang L, Rahman CM, Hossain M, Strachan R (2014) Hybrid decision tree and naïve bayes classifiers for multi-class classification tasks. Exp Syst Appl 41(4):1937–1946
Article Google Scholar
Han J, Kamber M, Pei J (2011) Data mining concepts and techniques, 3rd edn. Morgan Kaufmann, Waltham
Google Scholar
Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666
Article Google Scholar
L’heureux A, Grolinger K, Elyamany HF, Capretz MAM (2017) Machine learning with big data: challenges and approaches. IEEE Access 5:7776–7797
Article Google Scholar
Özköse H, Arı ES, Gencer C (2015) Yesterday, today and tomorrow of big data. Procedia-Soc Behav Sci 195:1042–1050
Article Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M (2011) Édouard Duchesnay: Scikit-learn: Machine learning in python. J Mach Learn Res 12:2825–2830. http://dl.acm.org/citation.cfm?id=1953048.2078195
Quinlan J (1993) C4.5: programs for machine learning. Morgan Kaufmann, San Mateo
Google Scholar
Quinlan JR (1986) Induction of decision tree. Mach Learn 1(1):81–106
Article Google Scholar
Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann, Amsterdam
Google Scholar
Wu X, Zhu X, Wu GQ, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science & Engineering, United International University, United City, Madani Avenue, Badda, Dhaka, 1212, Bangladesh
Sakib Shahriar Khan, Shakim Ahamed, Miftahul Jannat, Swakkhar Shatabda & Dewan Md. Farid

Authors

Sakib Shahriar Khan
View author publications
You can also search for this author in PubMed Google Scholar
Shakim Ahamed
View author publications
You can also search for this author in PubMed Google Scholar
Miftahul Jannat
View author publications
You can also search for this author in PubMed Google Scholar
Swakkhar Shatabda
View author publications
You can also search for this author in PubMed Google Scholar
Dewan Md. Farid
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dewan Md. Farid .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Jahangirnagar University, Dhaka, Bangladesh
Mohammad Shorif Uddin
Department of Mathematics, South Asian University, New Delhi, Delhi, India
Jagdish Chand Bansal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khan, S.S., Ahamed, S., Jannat, M., Shatabda, S., Farid, D.M. (2020). Classification by Clustering (CbC): An Approach of Classifying Big Data Based on Similarities. In: Uddin, M.S., Bansal, J.C. (eds) Proceedings of International Joint Conference on Computational Intelligence. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-13-7564-4_50

Download citation

DOI: https://doi.org/10.1007/978-981-13-7564-4_50
Published: 04 July 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-7563-7
Online ISBN: 978-981-13-7564-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Classification by Clustering (CbC): An Approach of Classifying Big Data Based on Similarities

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Comparative Study in Data Mining: Clustering and Classification Capabilities

Data Cleansing Using Clustering

Classification Through Data Mining Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Classification by Clustering (CbC): An Approach of Classifying Big Data Based on Similarities

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Comparative Study in Data Mining: Clustering and Classification Capabilities

Data Cleansing Using Clustering

Classification Through Data Mining Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation