Abstract
The classifications of uncertain data turned into one of the dreary procedures in the data mining domain. The uncertain data have tuples with distinctive probability distribution, which helps to find similar class of tuples. When we consider an uncertain data, the feature vector will not be a single valued but a function. In this paper, we proposed fuzzy entropy and similarity measure to characterize the uncertain data through binary decision tree algorithm. Fuzzy entropy is used to find the best split point for the decision tree to handle the uncertain data. Similarity measure is used to make the better decision for the uncertain data with high accuracy. Initially, fuzzy entropy for each feature vector is calculated to select the best feature vector. Then, best split is selected from the selected feature vector. With the help of trained uncertain data, the binary tree starts to grow. Once the split point is selected, then the constructed decision tree is evaluated by the testing phase of uncertain data. The testing data are subjected to the trained decision tree to obtain the classified data. The experimental analyses are made to evaluate the performance of the proposed FUDT approach. Proposed FUDT algorithm is compared with the existing classification algorithm UDT in terms of accuracy and running time. The experimental analysis finalizes that our FUDT algorithm outperforms the existing UDT algorithm.
Similar content being viewed by others
References
Denoeux T.: Maximum likelihood estimation from uncertain data in the belief function framework. IEEE Trans. Knowl. Data Eng. 649, 119–130 (2013)
Charu, C.; Aggarwal.; Yu, P.S.: A survey of uncertain data algorithms and applications. IEEE Trans. Knowl. Data Eng. 21(5), 609–623 (2009)
Barbara, D.; Garcia-Molina, H.; Porter, D.: The management of probabilistic data. IEEE Trans. Knowl. Data Eng. 4(5), 487–502 (1992)
Puente, J.; Fuente, D.; Priore P.; Pino, R.: Abc classification with uncertain data. A fuzzy model vs. a probabilistic model. Appl. Artif. Intell. 16(6), 443–456 (2002)
Cheng, R.; Kalashnikov, D.; Prabhakar, S.: Evaluating probabilistic queries over imprecise data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2003)
Chau, M.; Cheng, R.; Kao, B.: Uncertain at a mining: a new research direction. In: Proceedings of the Workshop on the Sciences of the Artificial, Hualien, pp. 7–8 (2005)
Qin.; Biao.; Xia, Y.; Li, F.: DTU: a decision tree for uncertain data. Adv. Knowl. Discov. Data Mining 4–15 (2009)
Choudhary V., Jain P.: Classification: a decision tree for uncertain data using CDF. Int. J. Eng. Res. Appl. 3(1), 1501–1506 (2013)
Appriou A.: Uncertain data aggregation in classification and tracking processes. Aggreg. Fusion Imperf. Inf. Stud. Fuzz. Soft Comput. 12, 231–260 (1998)
Sun, Y.; Yuan, Y.; Wang, G.: Extreme learning machine for classification over uncertain data. Neurocomputing 128, 500–506 (2013)
Quinlan, J.R.: Probabilistic decision trees. Mach. Learn. 1, 81–106 (1990)
Lobo, O.O.; Numao, M.: Ordered estimation of missing values. PAKDD 239, 499–503, (1999)
Hawarah, L.; Simonet, A.; Simonet, M.: Dealing with missing values in a probabilistic decision tree during classification. In: The Second International Workshop on Mining Complex Data, pp. 325–329 (2006)
Bounhas, M.; et al.: Naive possibilistic classifiers for imprecise or uncertain numerical data. Fuzzy Sets Syst. 239, 137–156 (2013)
Angryk, R.A.: Similarity-driven defuzzification of fuzzy tuples for entropy-based data classification purposes. IEEE Int. Conf. Fuzzy Syst. 99, 414–422 (2006)
Kumar A., Dadhwal V.K.: Entropy-based fuzzy classification parameter optimization using uncertainty variation across spatial resolution. J. Ind. Soc. Remote Sens. 38(2), 179–192 (2010)
Qin, B.; et al.: A novel Bayesian classification for uncertain data. Knowl. Based Syst. 24, 1151–1158 (2011)
Quinlan J.R.: C4.5: Programs for Machine Learning. Morgan Kaufman, Burlington (1993)
Cohen, W.W.: Fast effective rule induction. In: Proceeding of the 12th International Conferrence on Machine Learning, pp. 115–123 (1995)
Langley, P.; Iba, W.; Thompson, K.: An analysis of Bayesian classifiers. In: National Conference on Artificial Intelligence, pp. 223–228 (1992)
Vapnik V.: The Nature of Statistical Learning Theory. Springer, Berlin (1995)
Andrews R., Diederich J., Tickle A.: A survey and critique of techniques for extracting rules from trained artificial neural networks. Knowl. Based Syst. 8(6), 373–389 (1995)
Dietterich, T.G.: Ensemble methods in machine learning. Lect. Notes Comput. Sci. 1857, 1–15 (2000)
Farid, D.M.; et al.: Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks. Expert Syst. Appl. 41, 1937–1946 (2014)
Mantas, C.J.; Abellán, J.: Analysis and extension of decision trees based on imprecise probabilities: application on noisy data. Expert Syst. Appl. 41, 2514–2525 (2014)
Khushaba, R.N.; Al-Jumaily, A.; Al-Ani, A.: Novel feature extraction method based on fuzzy entropy and wavelet packet transform for myoelectric control. International Symposium on Communications and Information Technologies (2007)
Tsang, S.; et al.: Decision trees for uncertain data. IEEE Trans. Knowl. Data Eng. 23, 64–78 (2011)
Tsang, S.; Kao, B.; Yip, K.Y.; Ho, W.S.; Lee, S.D.: Decision trees for uncertain data. In: Proceeding on International Conference Data Engineering, pp. 441–444, Mar/Apr (2009)
Iris dataset https://archive.ics.uci.edu/ml/datasets/Iris
Liver disorder dataset https://archive.ics.uci.edu/ml/machine-learning-databases/liver-disorders/bupa.data
Breast cancer dataset http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.names
Echocardiogram dataset from https://archive.ics.uci.edu/ml/machine-learning-databases/echocardiogram/echocardiogram.data
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Meenakshi, S., Venkatachalam, V. FUDT: A Fuzzy Uncertain Decision Tree Algorithm for Classification of Uncertain Data. Arab J Sci Eng 40, 3187–3196 (2015). https://doi.org/10.1007/s13369-015-1800-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-015-1800-0