Abstract
We have developed a research support system, called Papits, that shares research information, such as PDF files of research papers, in computers on the network and classifies the information into types of research fields. Users of Papits can share various research information and survey the corpora of their particular fields of research. In order to realize Papits, we need to design a mechanism for identifying what words are best suited to classify documents in predefined classes. Further we have to consider classification in cases where we must classify documents into multivalued fields and where there is insufficient data for classification. In this paper, we present an implementation method of automatic classification based on a text classification technique for Papits. We also propose a new method for using feature selection to classify documents that are represented by a bag-of-words into a multivalued category. Our method transforms the multivalued category into a binary category to easily identify the characteristic words to classify category in a few training data. Our experimental result indicates that our method can effectively classify documents in Papits.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)
Fujimaki, N., Ozono, T., Shintani, T.: Flexible Query Modifier for Research Support System Papits. In: Proceedings of the IASTED International Conference on Artificial and Computational Intelligence(ACI 2002), pp. 142–147 (2002)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Proceedings of the European Conference on Machine Learning (1998)
John, G.H., Kohavi, R., Pfleger, K.: Irrelevant Features and the Subset Selection Problem. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 121–129 (1994)
Kudo, T.: TinySVM: Support Vector Machines (2001), http://cl-aistnara.ac.jp/taku-ku/software/TinySVM
Lewis, D., Ringuette, M.: A comparison of two learning algorithms for text categorization. In: Third Annual Symposium on Document Analysis and Information Retrieval, pp. 81–93 (1994)
Nigam, K., Lafferty, J., McCallum, A.: Using Maximum Entropy for Text Classification. In: IJCAI 1999 Workshop on Machine Learning for Information Filtering (1999)
Ozono, T., Goto, S., Fujimaki, N., Shintani, T.: P2P based Knowledge Source Discovery on Research Support System Papits. In: The First International Joint Conference on Autonomous Agents & Multiagent Systems(AAMAS 2002) (2002)
Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian approach to filtering junk e-mail. In: AAAI/ICML Workshop on Learning for Text Categorization (1998)
Soucy, P., Mineau, G.W.: A Simple Feature Selection Method for Text Classification. In: Proceedings of International joint Conference on Artificial Intelligence( IJCAI 2001), pp. 897–902 (2001)
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: 22nd Annual International SIGIR, pp. 42–49 (1999)
Yang, Y., Perdersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML 1997 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ozono, T., Shintani, T., Ito, T., Hasegawa, T. (2004). A Feature Selection for Text Categorization on Research Support System Papits. In: Zhang, C., W. Guesgen, H., Yeap, WK. (eds) PRICAI 2004: Trends in Artificial Intelligence. PRICAI 2004. Lecture Notes in Computer Science(), vol 3157. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28633-2_56
Download citation
DOI: https://doi.org/10.1007/978-3-540-28633-2_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22817-2
Online ISBN: 978-3-540-28633-2
eBook Packages: Springer Book Archive