Abstract
A non-trivial obstacle in good text classification for information filtering and retrieval (IF/IR) is the dimensionality of the data. This paper proposes a technique using Rough Set Theory to alleviate this situation. Given corpora of documents and a training set of examples of classified documents, the technique locates a minimal set of co-ordinate keywords to distinguish between classes of documents, reducing the dimensionality of the keyword vectors. This simplifies the creation of knowledge-based IF/IR systems, speeds up their operation, and allows easy editing of the rule bases employed. The paper describes the proposed technique, discusses the integration of a keyword acquisition algorithm with a rough set-based dimensionality reduction algorithm, and provides experimental results of a proof-of-concept implementation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
van Rijsbergen, C.J.: Information Retrieval. Butterworths, United Kingdom (1990)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398. Springer, Heidelberg (1998)
Moukas, A., Maes, P.: Amalthaea: An Evolving Multi-Agent Information Filtering and Discovery System for the WWW. In: Journal of Autonomous Agents and Multi-Agent Systems, vol. 1, pp. 59–88 (1998)
Pawlak, Z.: Rough Sets: Theoretical Aspects of Reasoning About Data. Kluwer Academic Publishers, Dordrecht (1991)
Shen, Q., Chouchoulas, A.: Combining Rough Sets and Data-Driven Fuzzy Learning (accepted for publication in Pattern Recognition)
Chouchoulas, A., Shen, Q.: Rough Set-Aided Rule Induction for Plant Monitoring. In: Proceedings of the 1998 International Joint Conference on Information Science (JCISm 1998), vol. 2, pp. 316–319 (1998)
Crocker, D.H.: RFC 822, Standard for the Format of ARPA Internet Text Messages. Dept. of Electrical Engineering, Univ. of Delaware (1982)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chouchoulas, A., Shen, Q. (1999). A Rough Set-Based Approach to Text Classification. In: Zhong, N., Skowron, A., Ohsuga, S. (eds) New Directions in Rough Sets, Data Mining, and Granular-Soft Computing. RSFDGrC 1999. Lecture Notes in Computer Science(), vol 1711. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-48061-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-48061-7_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66645-5
Online ISBN: 978-3-540-48061-7
eBook Packages: Springer Book Archive