Nothing Special   »   [go: up one dir, main page]

Skip to main content

Topic-Specific Text Filtering Based on Multiple Reducts

  • Conference paper
  • First Online:
Autonomous Intelligent Systems: Agents and Data Mining (AIS-ADM 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3505))

  • 450 Accesses

Abstract

Feature selection is a very important step in text preprocessing, a good selected feature subset can get the same performance than using full features, at the same time, it reduced the learning time. To make our system fit for the application and to embed this model gateway for real-time text filtering, we need to further select more accurate features. In this paper, we proposed a new feature selection method based on Rough set theory. It generate several reducts, but the special point is that between these reducts there are no common attributes, so these attributes have more powerfully capability to classify new objects, especially for real data set in application. We choose two data sets to evaluate our feature selection method, one is a benchmark data set from UCI machine learning archive, and another is captured from Web. We use statistical classification methods to classify these objects, in the benchmark testing set, we get good precision with a single reduct, but in real date set, we get good precision with several reducts, and the data set is used in our system for topic-specific text filtering. Thus we conclude our method is very effective in application. In addition, we also conclude that SVM and VSM methods get better performance, while Naïve Bayes method get poor performance with the same selected features on non-balance data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Salton, G., Lesk, M.E.: Computer evaluation of indexing and text processing. Journal of the ACM 15(1), 8–36 (1968)

    Article  Google Scholar 

  2. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Comm. ACM 18(11), 613–620 (1975)

    Article  Google Scholar 

  3. Lewis, D.D.: Naïve Bayes at forty: The independence assumption in information retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  4. Žižka, J., Bourek, A., Frey, L.: TEA: A text analysis tool for the intelligent text document filtering. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2000. LNCS (LNAI), vol. 1902, pp. 151–156. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  5. Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  6. Burges, C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)

    Article  Google Scholar 

  7. Joachims, T.: Text categorization with support vector machines. In: Proceedings of the European Conference on Machine Learning. Springer, Heidelberg (1998)

    Google Scholar 

  8. Lee, P.Y., Hui, S.C., Fong, A.C.M.: Neural Networks for Web Content Filtering. IEEE Intelligent Systems 17, 48–57 (2002)

    Article  Google Scholar 

  9. Zhou, Z.-H., Jiang, Y.: Medical diagnosis with C4. 5 rule preceded by artificial neural network ensemble. IEEE Transactions on Information Technology in Biomedicine 7(1), 37–42 (2003)

    Article  Google Scholar 

  10. John, G., Kohavi, R., Pfleger, K.: Irrelevant Features and the Subset Selection Problem. In: Proc. ICML, pp. 121–129 (1994)

    Google Scholar 

  11. Yang, Y.: An evaluation of statistical approaches to text categorization. Journal of Information Retrieval 1(1), 67–88 (1999)

    Google Scholar 

  12. Pawlak, Z.: Rough sets. International Journal of Information and computer Science 11(5), 341–356 (1982)

    Article  Google Scholar 

  13. Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishing, Dordrecht (1991)

    Book  Google Scholar 

  14. Cercone, N., Ziarko, W., Hu, X.: Rule Discovery from Databases: A Decision Matrix Approach. In: Proc. of ISMIS, Zakopane, Poland, pp. 653–662 (1996)

    Google Scholar 

  15. Skowron, A., Rauszer, C.: The Discernibility Matrices and Functions in Information Systems. In: Slowinski, K. (ed.) Intelligent Decision Support - Handbook of Applications and Advances of the Rough Sets Theory, pp. 331–362. Kluwer, Dordrecht (1992)

    Chapter  Google Scholar 

  16. Bao, Y., Asai, D., Du, X., Yamada, K., Ishii, N.: An Effective Rough Set-Based Method for Text Classification. In: Liu, J., Cheung, Y.-m., Yin, H. (eds.) IDEAL 2003. LNCS, vol. 2690, pp. 545–552. Springer, Heidelberg (2003)

    Google Scholar 

  17. Chouchoulas, A., Shen, Q.: Rough Set-Aided Keyword Reduction for Text Categorisation. Journal of Applied Artificial Intelligence 15(9), 843–873 (2001)

    Article  Google Scholar 

  18. Chouchoulas, A., Shen, Q.: A Rough Set-Based Approach to Text Classification. In: Zhong, N., Skowron, A., Ohsuga, S. (eds.) RSFDGrC 1999. LNCS (LNAI), vol. 1711, pp. 118–127. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  19. Chouchoulas, A., Halliwell, J., Shen, Q.: On the implementation of rough set attribute reduction. In: Proc. 2002 UK Workshop on Computational Intelligence, pp. 18–23 (2002)

    Google Scholar 

  20. Han, J., Hu, X., Lin, T.Y.: Feature Subset Selection Based on Relative Dependency between Attributes. In: Tsumoto, S., Słowiński, R., Komorowski, J., Grzymała-Busse, J.W. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 176–185. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  21. http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html

  22. http://kdd.ics.uci.edu/summary.data.application.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, Q., Li, J. (2005). Topic-Specific Text Filtering Based on Multiple Reducts. In: Gorodetsky, V., Liu, J., Skormin, V.A. (eds) Autonomous Intelligent Systems: Agents and Data Mining. AIS-ADM 2005. Lecture Notes in Computer Science(), vol 3505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11492870_14

Download citation

  • DOI: https://doi.org/10.1007/11492870_14

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26164-3

  • Online ISBN: 978-3-540-31932-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics