Abstract
Due to the growing number of high-dimensional multi-label data which emerge in modern applications, multi-label feature selection becomes an important issue. Traditional multi-label feature selection algorithms focus on evaluating the relevance of the candidate and label set, which neglects the impact of selected feature set. There are few studies on the redundancy analysis of the selected features, resulting in the most discriminative features being ignored. To solve this problem, we propose a novel multi-label feature selection algorithm based on the fuzzy rough set. First, we propose the definition of redundancy weight for each selected feature via fuzzy interaction information to evaluate the correlation between features in selected feature set, and design the instance equivalence matrix based on the redundancy weight. Second, the fuzzy conditional mutual information is defined to evaluate the relevance between candidate features and label set given selected feature set. Finally, we combine the redundancy analysis with the feature relevance for designing the multi-label feature selection algorithm. To verify the performance of the proposed algorithm, the proposed algorithm is compared to nine representative feature selection algorithms on synthetic and real-world datasets. The experimental test and statistical test show that our proposed algorithm outperforms the other compared algorithms.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ah A, Mbd A, Np B (2020) Mfs-mcdm: Multi-label feature selection using multi-criteria decision making - sciencedirect. Knowl-Based Syst 206
Ata B, Jl A, Wzwb C, Jia ZD, Lin SE, Chao CF (2021) Fuzzy rough discrimination and label weighting for multi-label feature selection. Neurocomputing
Chen D, Yang Y (2013) Attribute reduction for heterogeneous data based on the combination of classical and fuzzy rough set models. IEEE Trans Fuzzy Syst 22(5):1325–1334
Chen W, Yan J, Zhang B, Chen Z, Yang Q (2007) Document transformation for multi-label feature selection in text categorization. In: Seventh IEEE international conference on data mining (ICDM 2007), IEEE, pp 451–456
Dai J, Chen J, Liu Y, Hu H (2020) Novel multi-label feature selection via label symmetric uncertainty correlation learning and feature redundancy evaluation. Knowl-Based Syst 207:106342
Dai J, Han H, Hu Q, Liu M (2016) Discrete particle swarm optimization approach for cost sensitive attribute reduction. Knowl-Based Syst 102:116–126
Dai J, Hu Q, Zhang J, Hu H, Zheng N (2016) Attribute selection for partially labeled categorical data by rough set approach. IEEE Trans Cybern 47(9):2460–2471
Dai J, Xu Q (2013) Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification. Appl Soft Comput 13(1):211–221
Demiar J, Schuurmans D (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
Dubois D, Prade H (1990) Rough fuzzy sets and fuzzy rough sets. International Journal of General System 17(2-3):191– 209
Fürnkranz J, Hüllermeier E, Mencía EL, Brinker K (2008) Multilabel classification via calibrated label ranking. Mach Learn 73(2):133–153
Gl A, Sv B, Ac A (2019) Distributed multi-label feature selection using individual mutual information measures. Knowl-Based Syst 188
Hashemi A, Dowlatshahi MB, Nezamabadi-Pour H (2020) A bipartite matching-based feature selection for multi-label learning. International journal of machine learning and cybernetics, pp 1–17
Hu Q, Yu D, Xie Z, Liu J (2006) Fuzzy probabilistic approximation spaces and their information measures. IEEE Trans Fuzzy Syst 14(2):191–201
Jian L, Li J, Shu K, Liu H (2016) Multi-label informed feature selection. IJCAI 16:1627–33
Lee J, Kim DW (2013) Feature selection for multi-label classification using multivariate mutual information. Pattern Recogn Lett 34(3):349–357
Lee J, Kim DW (2017) Scls: Multi-label feature selection based on scalable criterion for large label set. Pattern Recogn 66
Let X (2005) Pattern classification
Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: A data perspective. ACM Computing Surveys (CSUR) 50(6):1–45
Lin Y, Hu Q, Liu J, Li J, Wu X (2017) Streaming feature selection for multilabel learning based on fuzzy mutual information. IEEE Trans Fuzzy Syst 25(6):1491–1507
Liu J, Li Y, Weng W, Zhang J, Chen B, Wu S (2020) Feature selection for multi-label learning with streaming label. Neurocomputing 387:268–278
Liu K, Yang X, Yu H, Mi J, Wang P, Chen X (2019) Rough set based semi-supervised feature selection via ensemble selector. Knowledge-based Systems 165:282–296
Lou Q, Deng Z, Choi KS, Shen H, Wang S (2021) Robust multi-label relief feature selection based on fuzzy margin co-optimization. IEEE Transactions on Emerging Topics in Computational Intelligence PP(99):1–12
Luaces O, Díez J, Barranquero J, del Coz JJ, Bahamonde A (2012) Binary relevance efficacy for multilabel classification. Progress in Artificial Intelligence 1(4):303–313
Pawlak Z (1982) Rough sets. Int J Comput Inform Sci 11(5):341–356
Qian W, Xiong C, Wang Y (2020) A ranking-based feature selection for multi-label classification with fuzzy relative discernibility. Appl Soft Comput 102(10):106995
Qian Y, Wang Q, Cheng H, Liang J, Dang C (2015) Fuzzy-rough feature selection accelerator. Fuzzy Sets Syst 258:61–78
Shannon CE (1949) Communication theory of secrecy systems. Bell Syst Tech J 28(4):656–715
Tan A, Wu WZ, Qian Y, Liang J, Chen J, Li J (2018) Intuitionistic fuzzy rough set-based granular structures and attribute subset selection. IEEE Trans Fuzzy Syst 27(3):527–539
Tomás JT, Spolaôr N, Cherman EA, Monard MC (2014) A framework to generate synthetic multi-label datasets. Electronic Notes in Theoretical Computer Science 302:155–176
Tsoumakas G, Spyromitros-Xioufis E, Vilcek J, Vlahavas I (2011) Mulan: A java library for multi-label learning. J Mach Learn Res 12(7):2411–2414
Wang C, Huang Y, Shao M, Fan X (2019) Fuzzy rough set-based attribute reduction using distance measures. Knowl-Based Syst 164:205–212
Wang C, Shao M, He Q, Qian Y, Qi Y (2016) Feature subset selection based on fuzzy neighborhood rough sets. Knowl-Based Syst 111:173–179
Wang J, Wei JM, Yang Z, Wang SQ (2017) Feature selection by maximizing independent classification information. IEEE Trans Knowl Data Eng 29(4):828–841
Wei G, Zhao J, Feng Y, He A, Yu J (2020) A novel hybrid feature selection method based on dynamic feature importance. Appl Soft Comput 93(6):106337
Wei-hua X, Xiao-yan Z, Wen-xiu Z (2009) Knowledge granulation, knowledge entropy and knowledge uncertainty measure in ordered information systems. Appl Soft Comput 9(4):1244–1251
Xiong C, Qian W, Wang Y, Huang J (2021) Feature selection based on label distribution and fuzzy mutual information. Information Sciences 574(6)
Yang Y, Chen D, Wang H, Wang X (2017) Incremental perspective for feature selection based on fuzzy rough sets. IEEE Trans Fuzzy Syst 26(3):1257–1273
Yeung DS, Chen D, Tsang EC, Lee JW, Xizhao W (2005) On the generalization of fuzzy rough sets. IEEE Trans Fuzzy Syst 13(3):343–361
Yu L, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5:1205–1224
Zadeh LA (1996) Fuzzy sets. In: Fuzzy sets, fuzzy logic, and fuzzy systems: selected papers by lotfi a zadeh, World Scientific, pp 394–432
Zhang ML, Zhou ZH (2007) Ml-knn: A lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048
Zhang Y, Zhou ZH (2010) Multilabel dimensionality reduction via dependence maximization. ACM Transactions on Knowledge Discovery from Data (TKDD) 4(3):1–21
Acknowledgements
This work was supported by The National Nature Science Foundation of China (Grant Nos. , 61772226 and 61862056), The Natural Science Foundation of Jilin Province (Grant number No. 20200201159JC), Key Laboratory for Symbol Computation and Knowledge Engineering of the National Education Ministry of China, Jilin University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhong, H., Zhang, P. & Liu, G. Multi-label feature selection via redundancy of the selected feature set. Appl Intell 53, 11073–11091 (2023). https://doi.org/10.1007/s10489-022-03365-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03365-y