Measures of uncertainty for partially labeled categorical data based on an indiscernibility relation: an application in semi-supervised attribute reduction

Jiali He¹,
Gangqiang Zhang ORCID: orcid.org/0000-0001-5424-6389²,
Dan Huang³,
Pei Wang¹ &
…
Guangji Yu⁴

139 Accesses
Explore all metrics

Abstract

In many practical applications of machine learning, only part of data is labeled because the cost of assessing class label is relatively high. This paper concentrates on measures of uncertainty for a partial label categorical decision information system (p-CDIS), and considers an application to semi-supervised attribute reduction. Firstly, two decision information systems (DISs) can be induced by a p-CDIS (U, C, d): one is for a decision information system for labeled categorical data $(U^l,C,d)$ and the other one is a decision information system for unlabeled categorical data $(U^u,C,d)$, and the missing rate of labels in (U, C, d) is introduced. In view of partial label data, the existential research did not take into account the missing rate of labels and only considered one importance of each attribute subset. Then, four importance of an attribute subset $P\subseteq C$ in (U, C, d) are defined based on an indiscernibility relation. They are the weighted sum of the importance of P in $(U^l,C,d)$ and $(U^u,C,d)$ determined by the missing rate of labels. These four importance can be regarded as four uncertainty measurements (UMs) for (U, P, d). Next, numerical experiments and statistical tests are carried out on 15 datasets of UCI to demonstrate four UMs’ advantages and disadvantages. Finally, as an application for UM in p-CDIS, two better UMs are used as semi-supervised attribute reduction and two corresponding algorithms are designed that can automatically adapt to different missing rates of labels. The experimental results show the feasibility and superiority of the designed algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Uncertainty measurement of partially labeled categorical data with application to semi-supervised attribute reduction

Article 06 June 2023

Semi-supervised attribute reduction for interval data based on misclassification cost

Article 03 December 2021

Semi-supervised attribute reduction for hybrid data

Article Open access 12 February 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability and access

The data used or analyzed during the current study are available from the corresponding author after the paper is accepted for publication.

References

Andrzej C (2003) Automatic identication of sound source position employing neural networks and rough sets. Pattern Recognit Lett 24(6):921–933
Google Scholar
Benabdeslem K, Hindawi M (2014) Efficient semi-supervised feature selection: constraint, relevance and redundancy. IEEE Trans Knowl Data Eng 26:1131–1143
Google Scholar
Bao WX, Hang JY, Zhang ML (2021) Partial label dimensionality reduction via confidence-based dependence maximization. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 46–54
Bao WX, Hang JY, Zhang ML (2022) Submodular feature selection for partial label learning. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 26–34
Campagner A, Ciucci D (2022) Rough-set based genetic algorithms for weakly supervised feature selection. International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems. Springer, Cham, pp 761–773
Google Scholar
Campagner A, Ciucci D, Huellermeier E (2021) Rough set-based feature selection for weakly labeled data. Int J Approx Reason 136:150–167
MathSciNet MATH Google Scholar
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with cortraining//Proceedings of the 11th annual Conference on Computational Learning Theory. New York: ACM, 92–100
Beaubouef T, Petry FE, Arora G (1998) Information-theoretic measures of uncertainty for rough sets and rough relational databases. Inf Sci 109:185–195
Google Scholar
Chen YQ, Gao W, Zhu TS (2002) Learning prosodic patterns for mandarin speech synthesis. J Intell Inf Syst 19(1):95–109
Google Scholar
Chen Z, Liu KY, Yang XB, Fujita H (2022) Random sampling accelerator for attribute reduction. Int J Approx Reason 140:75–91
MathSciNet MATH Google Scholar
Chen Y, Liu KY, Song JJ, Fujita H, Yang XB, Qian YH (2020) Attribute group for attribute reduction. Inf Sci 535:64–80
MATH Google Scholar
Girish C, Ferat S (2014) A survey on feature selection methods. Comput Electr Eng 40:16–28
Google Scholar
Dai JH, Hu QH, Zhang JH, Hu H, Zheng NG (2017) Attribute selection for partially labeled categorical data by rough set approach. IEEE Trans Cybernet 47:2460–2471
Google Scholar
Dai JH, Wang WS, Zhang CC, Qu SJ (2023) Semi-supervised attribute reduction via attribute indiscernibility. Int J Mach Learn Cybernet 14(4):1445–1464
Google Scholar
Feofanov V, Devijver E, Amini MR (2022) Wrapper feature selection with partially labeled data. Appl Intell 52(11):12316–12329
Google Scholar
Fan TF, Liu DR, Tzeng GH (2007) Rough set-based logics for multicriteria decision analysis. European J Operat Res 182:340–355
MathSciNet MATH Google Scholar
Forestier G, Wemmert C (2016) Semi-supervised learning using multiple clustering with limited labeled data. Inf Sci 361–362:48–65
Google Scholar
He XF, Deng C, Partha N (2005) Laplacian score for feature selection//Proceedings of the 18th International Conference on Neural Information Processing Systems (NIPS’05). Cambridge, USA: MIT Press, 507–514
Handl J, Knowles J (2006) Semi-supervised feature selection via multi-objective optimization//The 2006 International Joint Conference on Neural Networks
Hirano S, Tsumoto S (2005) Rough representation of a region of interest in medical images. Int J Approx Reason 40(1–2):23–34
Google Scholar
Han YH, Yang Y, Yan Y, Ma ZG, Zhou XF (2015) Semisupervised feature selection via spline regression for video semantic recognition. IEEE Trans Neural Netw Learn Syst 26:252–264
MathSciNet Google Scholar
Weston J, Andre E, Bernhard S (2003) Use of the zero-norm with linear models and kernel methods. J Mach Learn Res 3:1439–1461
MathSciNet MATH Google Scholar
Jiang ZH, Liu KY, Yang XB, Yu HL, Fujita H, Qian YH (2020) Accelerator for supervised neighborhood based attribute reduction. Int J Approx Reason 119:122–150
MathSciNet MATH Google Scholar
Jiang F, Sui YF, Cao CG (2010) An information entropy-based approach to outlier detection in rough sets. Exp Syst Appl 37:6338–6344
Google Scholar
Jain P, Tiwari AK, Som T (2020) A fitting model based intuitionistic fuzzy rough feature selection. Eng Appl Art Intell 89:103421
Google Scholar
Kohavi R, John GH (1997) Wrappers for feature subset selection. Art Intell 97(1–2):273–324
MATH Google Scholar
Liu GL (2022) Attribute reduction algorithms determined by invariants for decision tables. Cognit Comput 14:1818–1825
Google Scholar
Liang JY, Shi ZZ, Li DY (2006) Information entropy, rough entropy and knowledge granulation in incomplete information systems. Int J Gen Syst 35:641–654
MathSciNet MATH Google Scholar
Liu KY, Yang XB, Yu HL, Mi JS (2019) Rough set based semi-supervised feature selection via ensemble selector. Knowl-Based Syst 165:282–296
Google Scholar
Kryszkiewicz M (1999) Rules in incomplete information systems. Inf Sci 113:271–292
MathSciNet MATH Google Scholar
Miao D, Gao C, Zhang N (2011) Diverse reduct subspaces based co-training for partially labeled data. Int J Approx Reason 52:1103–1117
MathSciNet Google Scholar
Nakatani Y, Zhu K, Uehara K (2007) Semi-supervised learning using feature selection based on maximum density subgraphs. Syst Comput Japan 38:32–43
Google Scholar
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11:341–356
MATH Google Scholar
Qian YH, Liang JY (2008) Combination entropy and combination granulation in rough set theory. Int J Uncert Fuzz Knowl-Based Syst 16:179–193
MathSciNet MATH Google Scholar
Ren JY, Qiu ZY, Fan W (2008) Forward semi-supervised feature selectio. in: Proceedings of the 12th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining (PAKDD’08). Springer-Verlag, Berlin, pp. 970–976
Shannon C (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423
MathSciNet MATH Google Scholar
Sang BB, Chen HM, Yang L, Li TR, Xu WH (2021) Incremental feature selection using a conditional entropy based on fuzzy dominance neighborhood rough sets. IEEE Trans Fuzz Syst 30(6):1683–1697
Google Scholar
UCI Machine Learning Repository. http://archive.ics.uci.edu/ml/datasets.html
Wierman MJ (1999) Measuring uncertainty in rough set theory. Int J Gen Syst 28:283–297
MathSciNet MATH Google Scholar
Wang XZ, Tsang ECC, Zhao SY, Chen DG, Yeung DS (2007) Learning fuzzy rules from fuzzy samples based on rough set technique. Inf Sci 177:4493–4514
MathSciNet MATH Google Scholar
Wang CZ, Wang Y, Shao MW, Qian YH, Chen DG (2020) Fuzzy rough attribute reduction for categorical data. IEEE Trans Fuzz Syst 28:818–830
Google Scholar
Wan L, Xia SJ, Zhu Y, Lyu ZH (2021) An improved semi-supervised feature selection algorithm based on information entropy. Stat Decis Making 17:66–70
Google Scholar
Wang YB, Chen XJ, Dong K (2019) Attribute reduction via local conditional entropy. Int J Mach Learn Cybernet 10:3619–3634
Google Scholar
Yuan Z, Chen HM, Zhang PF, Wan JH, Li TR (2021) A novel unsupervised approach to heterogeneous feature selection based on fuzzy mutual information. IEEE Trans Fuzz Syst 30(9):3395–3409
Google Scholar
Yang X, Chen Y, Fujita H, Liu D, Li TR (2022) Mixed data-driven sequential three-way decision via subjective-objective dynamic fusion. Knowl-Based Syst 237:107728
Google Scholar
Zhao MY, Jiao LC, Ma WP (2016) Classification and saliency detection by semi-supervised low-rank representation. Pattern Recognit 51:281–294
Google Scholar
Zhang W, Miao DQ, Gao C, Li F (2016) Semi-supervised attribute reduction based on rough-subspace ensemble learning. J Chinese Comput Syst 37:2727–2732
Google Scholar
Zhang R, Nie F, Li X (2019) Feature selection with multi-view data: A survey. Inf Fusion 50:158–167
Google Scholar
Hu X, Zhou P, Li P, Wang J, Wu X (2018) A survey on online feature selection with streaming features. Front Comput Sci 12:479–493
Google Scholar

Download references

Acknowledgements

The authors would like to thank the editors and the anonymous reviewers for their valuable comments and suggestions, which have helped immensely in improving the quality of the paper. This work is supported by Natural Science Foundation of Guangxi Province (2021GXNSFAA220114, 2020GXNSFAA159155) and Guangxi First-class Discipline Statistics Construction Project Fund.

Author information

Authors and Affiliations

Center for Applied Mathematics of Guangxi, Yulin Normal University, Yulin, Guangxi, 537000, P.R. China
Jiali He & Pei Wang
School of Artificial Intelligence, Guangxi Minzu University, Nanning, Guangxi, 530006, P.R. China
Gangqiang Zhang
School of Computer Science and Engineering, Yulin Normal University, Yulin, Guangxi, 537000, P.R. China
Dan Huang
School of Big Data and Artificial Intelligence, Guangxi University of Finance and Economics, Nanning, Guangxi, 530003, P.R. China
Guangji Yu

Authors

Jiali He
View author publications
You can also search for this author in PubMed Google Scholar
Gangqiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dan Huang
View author publications
You can also search for this author in PubMed Google Scholar
Pei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guangji Yu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Jiali He: Methodology, Writing-Original draft; Gangqiang Zhang: Software, Editing, Investigation; Dan Huang: Data curation; Pei Wang: Validation; Guangji Yu: Software, Investigation.

Corresponding authors

Correspondence to Gangqiang Zhang or Guangji Yu.

Ethics declarations

Competing Interests

The authors declare that they have no conflict of interest.

Ethical and informed consent for data used

The data used or analyzed during the current study are available from the corresponding author after the paper is accepted for publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

He, J., Zhang, G., Huang, D. et al. Measures of uncertainty for partially labeled categorical data based on an indiscernibility relation: an application in semi-supervised attribute reduction. Appl Intell 53, 29486–29513 (2023). https://doi.org/10.1007/s10489-023-05078-2

Download citation

Accepted: 30 September 2023
Published: 31 October 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10489-023-05078-2

Measures of uncertainty for partially labeled categorical data based on an indiscernibility relation: an application in semi-supervised attribute reduction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Uncertainty measurement of partially labeled categorical data with application to semi-supervised attribute reduction

Semi-supervised attribute reduction for interval data based on misclassification cost

Semi-supervised attribute reduction for hybrid data

Data availability and access

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing Interests

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Measures of uncertainty for partially labeled categorical data based on an indiscernibility relation: an application in semi-supervised attribute reduction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Uncertainty measurement of partially labeled categorical data with application to semi-supervised attribute reduction

Semi-supervised attribute reduction for interval data based on misclassification cost

Semi-supervised attribute reduction for hybrid data

Explore related subjects

Data availability and access

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing Interests

Ethical and informed consent for data used

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation