Abstract
Recent technological advances led to accumulation of large volumes of data in digital repositories. Mining data for information retrieval from such repositories faces a big challenge both in perspective of dimensionality and the sample size. Mining tasks such as text mining have been confronted with the problem of high dimensionality of the data. Thus, it becomes necessary to minimize the high dimensionality of the data. Fuzzy rough set feature selection techniques have proved highly efficient in dimension reduction. It can successfully handle the data dependencies and reduce data dimensionality without compromising the performance of classification and clustering. In this paper, an attempt has been made to review major developments in fuzzy rough set-based feature selection domain over a period of 20 years. Further, the paper discusses the potential of fuzzy rough set-based feature selection in the domain of text categorization. A hybrid feature selection technique is proposed based on large-scale spectral clustering with landmark-based representation and fuzzy rough feature selection and it is found to work efficiently in memory short environments. Moreover, the proposed technique reduces the data dimensionality immensely on the considered datasets with acceptable degree of clustering accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Radzikowska, A.M., Kerre, E.E.: A comparative study of fuzzy rough sets. Fuzzy Sets Syst. 126(2), 137–155 (2002)
Albrecht, A.A.: Stochastic local search for the feature set problem, with applications to microarray data. Appl. Math. Comput. 183(2), 1148–1164 (2006)
Almuallim, H., Dietterich, T.G.: Learning with many irrelevant features. AAAI 91, 547–552 (1991)
Anaraki, J.R., Eftekhari, M.: Improving fuzzy-rough quick reduct for feature selection. In: 2011 19th Iranian Conference on Electrical Engineering, pp. 1–6. IEEE (2011)
Belkin, M., Niyogi, P.: Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in Neural Information Processing Systems, pp. 585–591 (2002)
Bhatt, R.B., Gopal, M.: On fuzzy-rough sets approach to feature selection. Pattern Recogn. Lett. 26(7), 965–975 (2005)
Bhatt, R.B., Gopal, M.: On the compact computational domain of fuzzy-rough sets. Pattern Recogn. Lett. 26(11), 1632–1640 (2005)
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press (1995)
Chen, D., Hu, Q., Yang, Y.: Parameterized attribute reduction with Gaussian kernel based fuzzy rough sets. Inf. Sci. 181(23), 5169–5179 (2011)
Chen, X., Cai, D.: Large scale spectral clustering with landmark-based representation. In: Twenty-Fifth AAAI Conference on Artificial Intelligence (2011)
Chen, J., Mi, J., Lin, Y.: A graph approach for fuzzy-rough feature selection. Fuzzy Sets Syst. 391, 96–116 (2020)
Dash, M., Liu, H.: Feature selection for classification. Intell. Data Anal. 1(3), 131–156 (1997)
Dash, M., Liu, H.: Consistency-based search in feature selection. Artif. Intell. 151(1–2), 155–176 (2003)
Devijver, P.A., Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice Hall (1982)
Diao, R., Mac Parthaláin, N., Shen, Q.: Dynamic feature selection with fuzzy-rough sets. In: 2013 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 1–7. IEEE (2013)
Du¨ntsch, I., Gediga, G.: Rough Set Data Analysis: A Road to Non-invasive Knowledge Discovery. Methodos, Bangor (2000)
Fukunaga, K.: Introduction to Statistical Pattern Recognition, ser. Computer Science and Scientific Computing. Academic, Boston (1990)
Gennari, J.H., Langley, P., Fisher, D.: Models of incremental concept formation. Artif. Intell. 40(1–3), 11–61 (1989)
Hu, Q., Xie, Z., Yu, D.: Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation. Pattern Recogn. 40(12), 3509–4352 (2007)
Jensen, R.: Combining Rough and Fuzzy Sets for Feature Selection. Doctoral dissertation, University of Edinburgh (2005)
Jensen, R., Shen, Q.: Semantics-preserving dimensionality reduction: rough and fuzzy-rough-based approaches. IEEE Trans. Knowl. Data Eng. 16(12), 1457–1471 (2004)
Jensen, R., Shen, Q.: Fuzzy–rough attribute reduction with application to web categorization. Fuzzy Sets Syst. 141(3), 469–485 (2004)
Jensen, R., Shen, Q.: Fuzzy-rough sets assisted attribute selection. IEEE Trans. Fuzzy Syst. 15(1), 73–89 (2007)
Jensen, R., Shen, Q.: New approaches to fuzzy-rough feature selection. IEEE Trans. Fuzzy Syst. 17(4), 824–883 (2008)
John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Machine Learning Proceedings 1994, pp. 121–129. Morgan Kaufmann
Kuncheva, L.I.: Fuzzy rough sets: application to feature selection. Fuzzy Sets Syst. 51(2), 147–153 (1992)
Zadeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965)
Leopold, E., Kindermann, J.: Text categorization with support vector machines. How to represent texts in input space? Mach. Learn. 46(1–3), 423–444 (2002)
Beynon, M.J.: Stability of continuous value discretisation: an application within rough set theory. Int. J. Approx. Reas. 35, 29–53 (2004)
Ni, P., Zhao, S., Wang, X., Chen, H., Li, C., Tsang, E.C.: Incremental feature selection based on fuzzy rough sets. Inf. Sci. 536, 185–204 (2020)
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000)
Parthaláin, N.M., Jensen, R.: Measures for unsupervised fuzzy-rough feature selection. Int. J. Hybrid Intell. Syst. 7(4), 249–259 (2010)
Pawlak, Z.: Rough sets. Int. J. Comput. Inform. Sci. 11(5), 341–356 (1982)
Qian, Y., Wang, Q., Cheng, H., Liang, J., Dang, C.: Fuzzy-rough feature selection accelerator. Fuzzy Sets Syst. 258, 61–78 (2015)
Qu, Y., Rong, Y., Deng, A., Yang, L.: Associated multi-label fuzzy-rough feature selection. In: 2017 Joint 17th World Congress of International Fuzzy Systems Association and 9th International Conference on Soft Computing and Intelligent Systems, pp. 1–6. IEEE, Otsu, Japan
Shen, Q., Jensen, R.: Selecting informative features with fuzzy-rough sets and its application for complex systems monitoring. Pattern Recogn. 3 (2004)
Tenenbaum, J.B., De Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. Artif. Intell. Mag. 17(3), 37–54 (1996)
Wang, C., Qi, Y., Shao, M., Hu, Q., Chen, D., Qian, Y., Lin, Y.: A fitting model for feature selection with fuzzy rough sets. IEEE Trans. Fuzzy Syst. 25(4), 741–753 (2016)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML, vol. 97, No. 412–420, p. 35 (1997)
Yang, Y., Chen, D., Wang, H., Tsang, E.C., Zhang, D.: Fuzzy rough set based incremental attribute reduction from dynamic data with sample arriving. Fuzzy Sets Syst. 312, 66–86 (2017)
Yu, L., Liu, H.: Redundancy based feature selection for microarray data. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 737–742 (2004)
Zhao, S., Tsang, E.C., Chen, D.: The model of fuzzy variable precision rough sets. IEEE Trans. Fuzzy Syst. 17(2), 451–467 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Gupta, A., Begum, S.A. (2023). Fuzzy Rough Set-Based Feature Selection for Text Categorization. In: Som, T., Castillo, O., Tiwari, A.K., Shreevastava, S. (eds) Fuzzy, Rough and Intuitionistic Fuzzy Set Approaches for Data Handling. Forum for Interdisciplinary Mathematics. Springer, Singapore. https://doi.org/10.1007/978-981-19-8566-9_4
Download citation
DOI: https://doi.org/10.1007/978-981-19-8566-9_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8565-2
Online ISBN: 978-981-19-8566-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)