New multivariate kernel density estimator for uncertain data classification

512 Accesses
Explore all metrics

Abstract

Uncertainty in data occurs in diverse applications due to measurement errors, data incompleteness, and multiple repeated measurements. Several classifiers for uncertain data have been developed to tackle this uncertainty. However, the existing classifiers do not consider the dependencies among uncertain features, even though this dependency has a critical effect on classification accuracy. Therefore, we propose a new Bayesian classification model that considers the correlation among uncertain features. To handle the uncertainty of data, new multivariate kernel density estimators are developed to estimate the class conditional probability density function of categorical, continuous, and mixed uncertain data. Experimental results with simulated data and real-life data sets show that the proposed approach is better than the existing approaches for classification of uncertain data in terms of classification accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fast feature selection for interval-valued data through kernel density estimation entropy

Article 07 May 2020

A two-stage optimized robust kernel density estimation for Bayesian classification with outliers

Article 07 January 2025

Optimizing Decision Tree Classification Algorithm with Kernel Density Estimation

References

Aggarwal, C. C. (2007). On density-based transforms for uncertain data mining. In 2007 IEEE 23rd international conference on data engineering (pp. 866–875). IEEE.
Aggarwal, C. C., & Philip, S. Y. (2008). A survey of uncertain data algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 21(5), 609–623.
Article Google Scholar
Angiulli, F., & Fassetti, F. (2013). Nearest neighbor-based classification of uncertain data. ACM Transactions on Knowledge Discovery from Data (TKDD), 7(1), 1–35.
Article Google Scholar
Bi, J., & Zhang, T. (2004). Support vector classification with input data uncertainty. Advances in neural information processing systems (pp. 161–168). Vancouver: British Columbia.
Google Scholar
Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases. Oakland: University of California.
Google Scholar
Casella, G., & Berger, R. L. (2002). Statistical inference (Vol. 2, pp. 337–472). Pacific Grove: Duxbury.
Google Scholar
Chaovalitwongse, W., Jeong, Y., Jeong, M. K., Danish, S., & Wong, S. (2011). Pattern recognition approaches for identifying subcortical targets during deep brain stimulation surgery. IEEE Intelligent Systems, 26(5), 54–63.
Article Google Scholar
Elgammal, A., Duraiswami, R., & Davis, L. S. (2003). Efficient kernel density estimation using the fast gauss transform with applications to color modeling and tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(11), 1499–1504.
Article Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Berlin: Springer.
Book Google Scholar
Hülsmann, J., & Brockmann, W. (2012). Classification of uncertain data: An application in nondestructive testing. In International conference on information processing and management of uncertainty in knowledge-based systems (pp. 231–240). Springer, Berlin, Heidelberg.
Jeong, Y. S., Kim, S. J., & Jeong, M. K. (2008). Automatic identification of defect patterns in semiconductor wafer maps using spatial correlogram and dynamic time warping. IEEE Transactions on Semiconductor Manufacturing, 21(4), 625–637.
Article Google Scholar
Kim, B. (2015). Advanced spatial data mining methodology and its applications to semiconductor manufacturing processes (Doctoral dissertation, Rutgers University-Graduate School-New Brunswick).
Lee, J., & Jun, C. H. (2015). Classification of high dimensionality data through feature selection using Markov blanket. Industrial Engineering & Management Systems, 14(2), 210–219.
Article Google Scholar
Li, Q., & Racine, J. (2003). Nonparametric estimation of distributions with categorical and continuous data. Journal of Multivariate Analysis, 86(2), 266–292.
Article Google Scholar
Li, M., & Sethi, I. K. (2006). Confidence-based active learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(8), 1251–1261.
Article Google Scholar
Nydick, S. W. (2012). The wishart and inverse wishart distributions. Electronic Journal of Statistics, 6, 1–19.
Google Scholar
Pei, J., Jiang, B., Lin, X., & Yuan, Y. (2007). Probabilistic skylines on uncertain data. In Proceedings of the 33rd international conference on Very large data bases (pp. 15–26).
Qin, B., Xia, Y., & Li, F. (2010). A Bayesian classifier for uncertain data. In Proceedings of the 2010 ACM symposium on applied computing (pp. 1010–1014).
Qin, B., Xia, Y., Prabhakar, S., & Tu, Y. (2009). A rule-based classification algorithm for uncertain data. In 2009 IEEE 25th international conference on data engineering (pp. 1633–1640). IEEE.
Ren, J., Lee, S. D., Chen, X., Kao, B., Cheng, R., & Cheung, D. (2009). Naive bayes classification of uncertain data. In 2009 Ninth IEEE international conference on data mining (pp. 944–949). IEEE.
Sariannidis, N., Papadakis, S., Garefalakis, A., Lemonakis, C., & Kyriaki-Argyro, T. (2019). Default avoidance on credit card portfolios using accounting, demographical and exploratory factors: Decision making based on machine learning (ML) techniques. Annals of Operations Research. https://doi.org/10.1007/s10479-019-03188-0.
Article Google Scholar
Scott, D. W. (2015). Multivariate density estimation: Theory, practice, and visualization. New York: Wiley.
Book Google Scholar
Street, W. N., Wolberg, W. H., & Mangasarian, O. L. (1993). Nuclear feature extraction for breast tumor diagnosis. In Biomedical image processing and biomedical visualization (Vol. 1905, pp. 861–870). San Jose, CA, United States: IS&T/SPIE’s Symposium on Electronic Imaging: Science and Technology.
Chapter Google Scholar
Sun, L., Cheng, R., Cheung, D. W., & Cheng, J. (2010). Mining uncertain data with probabilistic guarantees. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 273–282).
Tavakkol, B., Jeong, M. K., & Albin, S. L. (2017). Object-to-group probabilistic distance measure for uncertain data classification. Neurocomputing, 230, 143–151.
Article Google Scholar
Tsang, S., Kao, B., Yip, K. Y., Ho, W. S., & Lee, S. D. (2009). Decision trees for uncertain data. IEEE Transactions on Knowledge and Data Engineering, 23(1), 64–78.
Article Google Scholar
Wang, X., Fan, N., & Pardalos, P. M. (2018). Robust chance-constrained support vector machines with second-order moment information. Annals of Operations Research, 263(1–2), 45–68.
Article Google Scholar

Download references

Acknowledgements

Part of this work was supported by the Korea Institute for Advancement of Technology grant funded by the Korea Government (Grant No.: P0008691, HRD Program for Industrial Innovation) and by the research fund of the National Research Foundation of Korea (Grant No.: NRF-2019R1F1A1042307). We thank the anonymous reviewers whose comments and suggestions helped improve and clarify this manuscript.

Author information

Authors and Affiliations

Department of Industrial and Management Engineering, Hanyang University, Ansan, Korea
Byunghoon Kim
Department of Industrial Engineering, Chonnam National University, Gwangju, Korea
Young-Seon Jeong
Department of Industrial and Systems Engineering, Rutgers University, New Brunswick, NJ, USA
Byunghoon Kim, Young-Seon Jeong & Myong K. Jeong
Rutgers Center for Operations Research, Rutgers University, New Brunswick, NJ, USA
Myong K. Jeong

Authors

Byunghoon Kim
View author publications
You can also search for this author in PubMed Google Scholar
Young-Seon Jeong
View author publications
You can also search for this author in PubMed Google Scholar
Myong K. Jeong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Young-Seon Jeong.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Derivation of $ E\left[ {K_{\varvec{H}} \left( {\varvec{x} - \varvec{U}_{\varvec{i}} } \right)} \right] $ in Eq. (5)

$ E\left[ {K_{\varvec{H}} \left( {\varvec{x} - \varvec{U}_{\varvec{i}} } \right)} \right] $ can be obtained by the following convolution integral:

$$ \begin{aligned} & \mathop \int \limits_{\varvec{u}} \left( {2\pi } \right)^{{ - \frac{s}{2}}} \left| \varvec{H} \right|^{{ - \frac{1}{2}}} e^{{ - \frac{{\left( {\varvec{x} - \varvec{u}} \right)^{T} \varvec{H}^{ - 1} \left( {\varvec{x} - \varvec{u}} \right)}}{2}}} \left( {2\pi } \right)^{{ - \frac{s}{2}}} \left| {\varSigma_{i} } \right|^{{ - \frac{1}{2}}} e^{{ - \frac{{\left( {\varvec{u} -\varvec{\mu}_{\varvec{i}} } \right)^{T}\varvec{\varSigma}_{i}^{ - 1} \left( {\varvec{u} -\varvec{\mu}_{\varvec{i}} } \right)}}{2}}} d\varvec{u} \\ & \quad \varvec{ = }\mathop \int \limits_{\varvec{u}} \frac{1}{{\left( {2\pi } \right)^{{\frac{s}{2}}} \left| \varvec{H} \right|^{{\frac{1}{2}}} }}\frac{1}{{\left( {2\pi } \right)^{{\frac{s}{2}}} \left| {\varSigma_{i} } \right|^{{\frac{1}{2}}} }}e^{{ - \frac{1}{2} \left( {\left( {\varvec{x} - \varvec{u}} \right)^{T} \varvec{H}^{ - 1} \left( {\varvec{x} - \varvec{u}} \right) + \left( {\varvec{u} -\varvec{\mu}_{\varvec{i}} } \right)^{T}\varvec{\varSigma}_{i}^{ - 1} \left( {\varvec{u} -\varvec{\mu}_{\varvec{i}} } \right)} \right) }} d\varvec{u} \\ \end{aligned} $$

(A.1)

Using the factorization of quadratic forms, Equation (A.1) can be represented as follows:

$$ \begin{aligned} & \mathop \int \limits_{\varvec{u}} \frac{1}{{\left( {2\pi } \right)^{{\frac{s}{2}}} \left| \varvec{H} \right|^{{\frac{1}{2}}} }}\frac{1}{{\left( {2\pi } \right)^{{\frac{s}{2}}} \left| {\varSigma_{i} } \right|^{{\frac{1}{2}}} }}e^{{ - \frac{1}{2} \left( {\left( {\varvec{u} - \varvec{x}} \right)^{T} \varvec{H}^{ - 1} \left( {\varvec{u} - \varvec{x}} \right) + \left( {\varvec{u} -\varvec{\mu}_{\varvec{i}} } \right)^{T}\varvec{\varSigma}_{i}^{ - 1} \left( {\varvec{u} -\varvec{\mu}_{\varvec{i}} } \right)} \right) }} d\varvec{u} \\ & \quad = \mathop \int \limits_{\varvec{u}} \frac{1}{{\left( {2\pi } \right)^{{\frac{s}{2}}} \left| \varvec{H} \right|^{{\frac{1}{2}}} }}\frac{1}{{\left( {2\pi } \right)^{{\frac{s}{2}}} \left| {\varSigma_{i} } \right|^{{\frac{1}{2}}} }}e^{{ - \frac{1}{2} \left( {\left( {\varvec{u} - \varvec{c}} \right)^{T} \left( {\varvec{H}^{ - 1} +\varvec{\varSigma}_{i}^{ - 1} } \right)\left( {\varvec{u} - \varvec{c}} \right) + \left( {\varvec{x} -\varvec{\mu}_{\varvec{i}} } \right)^{T} \varvec{C}\left( {\varvec{x} -\varvec{\mu}_{\varvec{i}} } \right)} \right) }} d\varvec{u} \\ & \quad = \frac{{\left| {\left( {\varvec{H}^{ - 1} +\varvec{\varSigma}_{\varvec{i}}^{ - 1} } \right)^{ - 1} } \right|^{{\frac{1}{2}}} }}{{\left( {2\pi } \right)^{{\frac{s}{2}}} \left| \varvec{H} \right|^{{\frac{1}{2}}} \left| {\varSigma_{i} } \right|^{{\frac{1}{2}}} }}e^{{ - \frac{1}{2} \left( {\varvec{x} -\varvec{\mu}_{\varvec{i}} } \right)^{T} \varvec{C}\left( {\varvec{x} -\varvec{\mu}_{\varvec{i}} } \right) }} \\ & \quad \quad \times \mathop \int \limits_{\varvec{u}} \frac{1}{{\left( {2\pi } \right)^{{\frac{s}{2}}} \left| {\left( {\varvec{H}^{ - 1} +\varvec{\varSigma}_{\varvec{i}}^{ - 1} } \right)^{ - 1} } \right|^{{\frac{1}{2}}} }}e^{{ - \frac{1}{2} \left( {\varvec{u} - \varvec{c}} \right)^{T} \left( {\varvec{H}^{ - 1} +\varvec{\varSigma}_{i}^{ - 1} } \right)\left( {\varvec{u} - \varvec{c}} \right) }} d\varvec{u} \\ & \quad = \frac{{\left| {\left( {\varvec{H}^{ - 1} +\varvec{\varSigma}_{\varvec{i}}^{ - 1} } \right)^{ - 1} } \right|^{{\frac{1}{2}}} }}{{\left( {2\pi } \right)^{{\frac{s}{2}}} \left| \varvec{H} \right|^{{\frac{1}{2}}} \left| {\varSigma_{i} } \right|^{{\frac{1}{2}}} }}e^{{ - \frac{1}{2} \left( {\varvec{x} -\varvec{\mu}_{\varvec{i}} } \right)^{T} \varvec{C}\left( {\varvec{x} -\varvec{\mu}_{\varvec{i}} } \right) }} \\ \end{aligned} $$

(A.2)

where $ \varvec{c} = \left( {\varvec{H}^{ - 1} +\varvec{\varSigma}_{i}^{ - 1} } \right)\left( {\varvec{H}^{ - 1} \varvec{x} + {\varvec{\Sigma}}_{\varvec{i}}^{ - 1}\varvec{\mu}_{\varvec{i}} } \right) $ and $ \varvec{C} = \varvec{H}^{ - 1} \left( {\varvec{H}^{ - 1} +\varvec{\varSigma}_{i}^{ - 1} } \right){\varvec{\Sigma}}_{\varvec{i}}^{ - 1} = \left( {\varvec{H} +\varvec{\varSigma}_{i} } \right)^{ - 1} $.

Because $ \left| {\left( {\varvec{H}^{ - 1} +\varvec{\varSigma}_{\varvec{i}}^{ - 1} } \right)^{ - 1} } \right|^{{\frac{1}{2}}} = \frac{1}{{\left| {\varvec{H}^{ - 1} +\varvec{\varSigma}_{\varvec{i}}^{ - 1} } \right|^{{\frac{1}{2}}} }} $, Eq. (A.2) can be rewritten as

$$ \begin{aligned} & \frac{1}{{\left( {2\pi } \right)^{{\frac{s}{2}}} \left| \varvec{H} \right|^{{\frac{1}{2}}} \left| {\varvec{\varSigma}_{i} } \right|^{{\frac{1}{2}}} \left| {\varvec{H}^{ - 1} +\varvec{\varSigma}_{\varvec{i}}^{ - 1} } \right|^{{\frac{1}{2}}} }}e^{{ - \frac{1}{2} \left( {\varvec{x} -\varvec{\mu}_{\varvec{i}} } \right)^{T} \left( {\varvec{H} +\varvec{\varSigma}_{i} } \right)^{ - 1} \left( {\varvec{x} -\varvec{\mu}_{\varvec{i}} } \right) }} \\ & \quad = \frac{1}{{\left( {2\pi } \right)^{{\frac{s}{2}}} \left( {\left| \varvec{H} \right|\left| {\varvec{\varSigma}_{\varvec{i}} } \right|\left| {\varvec{H}^{ - 1} +\varvec{\varSigma}_{\varvec{i}}^{ - 1} } \right|} \right)^{{\frac{1}{2}}} }}e^{{ - \frac{1}{2} \left( {\varvec{x} -\varvec{\mu}_{\varvec{i}} } \right)^{T} \left( {\varvec{H} +\varvec{\varSigma}_{i} } \right)^{ - 1} \left( {\varvec{x} -\varvec{\mu}_{\varvec{i}} } \right) }} . \\ \end{aligned} $$

Finally, $ \left| \varvec{H} \right|\left| {\varvec{\varSigma}_{\varvec{i}} } \right|\left| {\varvec{H}^{ - 1} +\varvec{\varSigma}_{\varvec{i}}^{ - 1} } \right| $ can be simplified as $ \left| {\varvec{H} +\varvec{\varSigma}_{i} } \right| $ because

$$ \begin{aligned} \left| \varvec{H} \right|\left| {\varvec{\varSigma}_{\varvec{i}} } \right|\left| {\varvec{H}^{ - 1} +\varvec{\varSigma}_{\varvec{i}}^{ - 1} } \right| & = \left| {\varvec{H\varSigma }_{\varvec{i}} } \right|\left| {\varvec{H}^{ - 1} +\varvec{\varSigma}_{\varvec{i}}^{ - 1} } \right| \\ & = \left| {\varvec{H\varSigma }_{\varvec{i}} \varvec{H}^{ - 1} + \varvec{H}} \right| = \left| {\varvec{H\varSigma }_{\varvec{i}} \varvec{H}^{ - 1} + \varvec{HHH}^{ - 1} } \right| \\ & = \left| {\varvec{H}\left( {\varvec{\varSigma}_{\varvec{i}} + \varvec{H}} \right)\varvec{H}^{ - 1} } \right| = \left| \varvec{H} \right|\left| {\varvec{\varSigma}_{\varvec{i}} + \varvec{H}} \right|\left| {\varvec{H}^{ - 1} } \right| \\ & = \left| {{\varvec{\Sigma}}_{\varvec{i}} + \varvec{H}} \right|. \\ \end{aligned} $$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, B., Jeong, YS. & Jeong, M.K. New multivariate kernel density estimator for uncertain data classification. Ann Oper Res 303, 413–431 (2021). https://doi.org/10.1007/s10479-020-03715-4

Download citation

Published: 25 August 2020
Issue Date: August 2021
DOI: https://doi.org/10.1007/s10479-020-03715-4

New multivariate kernel density estimator for uncertain data classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fast feature selection for interval-valued data through kernel density estimation entropy

A two-stage optimized robust kernel density estimation for Bayesian classification with outliers

Optimizing Decision Tree Classification Algorithm with Kernel Density Estimation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Derivation of \( E\left[ {K_{\varvec{H}} \left( {\varvec{x} - \varvec{U}_{\varvec{i}} } \right)} \right] \) in Eq. (5)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

New multivariate kernel density estimator for uncertain data classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Fast feature selection for interval-valued data through kernel density estimation entropy

A two-stage optimized robust kernel density estimation for Bayesian classification with outliers

Optimizing Decision Tree Classification Algorithm with Kernel Density Estimation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Derivation of \( E\left[ {K_{\varvec{H}} \left( {\varvec{x} - \varvec{U}_{\varvec{i}} } \right)} \right] \) in Eq. (5)

Appendix: Derivation of \( E\left[ {K_{\varvec{H}} \left( {\varvec{x} - \varvec{U}_{\varvec{i}} } \right)} \right] \) in Eq. (5)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now