Topic-Specific Text Filtering Based on Multiple Reducts

Qiang Li⁴ &
Jianhua Li⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3505))

Included in the following conference series:

International Workshop on Autonomous Intelligent Systems: Agents and Data Mining

450 Accesses

Abstract

Feature selection is a very important step in text preprocessing, a good selected feature subset can get the same performance than using full features, at the same time, it reduced the learning time. To make our system fit for the application and to embed this model gateway for real-time text filtering, we need to further select more accurate features. In this paper, we proposed a new feature selection method based on Rough set theory. It generate several reducts, but the special point is that between these reducts there are no common attributes, so these attributes have more powerfully capability to classify new objects, especially for real data set in application. We choose two data sets to evaluate our feature selection method, one is a benchmark data set from UCI machine learning archive, and another is captured from Web. We use statistical classification methods to classify these objects, in the benchmark testing set, we get good precision with a single reduct, but in real date set, we get good precision with several reducts, and the data set is used in our system for topic-specific text filtering. Thus we conclude our method is very effective in application. In addition, we also conclude that SVM and VSM methods get better performance, while Naïve Bayes method get poor performance with the same selected features on non-balance data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A*-Reduct: A Heuristic Rough Set Based Feature Selection Algorithm and Its Application to Text Summarization

Fuzzy Rough Set-Based Feature Selection for Text Categorization

A Survey on Filter Techniques for Feature Selection in Text Mining

References

Salton, G., Lesk, M.E.: Computer evaluation of indexing and text processing. Journal of the ACM 15(1), 8–36 (1968)
Article Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Comm. ACM 18(11), 613–620 (1975)
Article Google Scholar
Lewis, D.D.: Naïve Bayes at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)
Chapter Google Scholar
Žižka, J., Bourek, A., Frey, L.: TEA: A text analysis tool for the intelligent text document filtering. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2000. LNCS (LNAI), vol. 1902, pp. 151–156. Springer, Heidelberg (2000)
Chapter Google Scholar
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Burges, C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)
Article Google Scholar
Joachims, T.: Text categorization with support vector machines. In: Proceedings of the European Conference on Machine Learning. Springer, Heidelberg (1998)
Google Scholar
Lee, P.Y., Hui, S.C., Fong, A.C.M.: Neural Networks for Web Content Filtering. IEEE Intelligent Systems 17, 48–57 (2002)
Article Google Scholar
Zhou, Z.-H., Jiang, Y.: Medical diagnosis with C4. 5 rule preceded by artificial neural network ensemble. IEEE Transactions on Information Technology in Biomedicine 7(1), 37–42 (2003)
Article Google Scholar
John, G., Kohavi, R., Pfleger, K.: Irrelevant Features and the Subset Selection Problem. In: Proc. ICML, pp. 121–129 (1994)
Google Scholar
Yang, Y.: An evaluation of statistical approaches to text categorization. Journal of Information Retrieval 1(1), 67–88 (1999)
Google Scholar
Pawlak, Z.: Rough sets. International Journal of Information and computer Science 11(5), 341–356 (1982)
Article Google Scholar
Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishing, Dordrecht (1991)
Book Google Scholar
Cercone, N., Ziarko, W., Hu, X.: Rule Discovery from Databases: A Decision Matrix Approach. In: Proc. of ISMIS, Zakopane, Poland, pp. 653–662 (1996)
Google Scholar
Skowron, A., Rauszer, C.: The Discernibility Matrices and Functions in Information Systems. In: Slowinski, K. (ed.) Intelligent Decision Support - Handbook of Applications and Advances of the Rough Sets Theory, pp. 331–362. Kluwer, Dordrecht (1992)
Chapter Google Scholar
Bao, Y., Asai, D., Du, X., Yamada, K., Ishii, N.: An Effective Rough Set-Based Method for Text Classification. In: Liu, J., Cheung, Y.-m., Yin, H. (eds.) IDEAL 2003. LNCS, vol. 2690, pp. 545–552. Springer, Heidelberg (2003)
Google Scholar
Chouchoulas, A., Shen, Q.: Rough Set-Aided Keyword Reduction for Text Categorisation. Journal of Applied Artificial Intelligence 15(9), 843–873 (2001)
Article Google Scholar
Chouchoulas, A., Shen, Q.: A Rough Set-Based Approach to Text Classification. In: Zhong, N., Skowron, A., Ohsuga, S. (eds.) RSFDGrC 1999. LNCS (LNAI), vol. 1711, pp. 118–127. Springer, Heidelberg (1999)
Chapter Google Scholar
Chouchoulas, A., Halliwell, J., Shen, Q.: On the implementation of rough set attribute reduction. In: Proc. 2002 UK Workshop on Computational Intelligence, pp. 18–23 (2002)
Google Scholar
Han, J., Hu, X., Lin, T.Y.: Feature Subset Selection Based on Relative Dependency between Attributes. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 176–185. Springer, Heidelberg (2004)
Chapter Google Scholar
http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html
http://kdd.ics.uci.edu/summary.data.application.html

Download references

Author information

Authors and Affiliations

Modern Communication Institute, Shanghai Jiaotong univ., Shanghai, 200030, China
Qiang Li & Jianhua Li

Authors

Qiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

St. Petersburg Intitute for Informaticsand Automation, 39, 14-th Liniya, 199178, St. Petersburg, Russia
Vladimir Gorodetsky
Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
Jiming Liu
US Air Force, Binghamton University (SUNYI), Binghamton, 13902, NY, USA
Victor A. Skormin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Q., Li, J. (2005). Topic-Specific Text Filtering Based on Multiple Reducts. In: Gorodetsky, V., Liu, J., Skormin, V.A. (eds) Autonomous Intelligent Systems: Agents and Data Mining. AIS-ADM 2005. Lecture Notes in Computer Science(), vol 3505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11492870_14

Download citation

DOI: https://doi.org/10.1007/11492870_14
Published: 20 May 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26164-3
Online ISBN: 978-3-540-31932-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Topic-Specific Text Filtering Based on Multiple Reducts

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A*-Reduct: A Heuristic Rough Set Based Feature Selection Algorithm and Its Application to Text Summarization

Fuzzy Rough Set-Based Feature Selection for Text Categorization

A Survey on Filter Techniques for Feature Selection in Text Mining

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Topic-Specific Text Filtering Based on Multiple Reducts

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A*-Reduct: A Heuristic Rough Set Based Feature Selection Algorithm and Its Application to Text Summarization

Fuzzy Rough Set-Based Feature Selection for Text Categorization

A Survey on Filter Techniques for Feature Selection in Text Mining

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation