Nothing Special   »   [go: up one dir, main page]

Skip to main content

A Feature Selection for Text Categorization on Research Support System Papits

  • Conference paper
PRICAI 2004: Trends in Artificial Intelligence (PRICAI 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3157))

Included in the following conference series:

Abstract

We have developed a research support system, called Papits, that shares research information, such as PDF files of research papers, in computers on the network and classifies the information into types of research fields. Users of Papits can share various research information and survey the corpora of their particular fields of research. In order to realize Papits, we need to design a mechanism for identifying what words are best suited to classify documents in predefined classes. Further we have to consider classification in cases where we must classify documents into multivalued fields and where there is insufficient data for classification. In this paper, we present an implementation method of automatic classification based on a text classification technique for Papits. We also propose a new method for using feature selection to classify documents that are represented by a bag-of-words into a multivalued category. Our method transforms the multivalued category into a binary category to easily identify the characteristic words to classify category in a few training data. Our experimental result indicates that our method can effectively classify documents in Papits.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)

    Article  Google Scholar 

  2. Fujimaki, N., Ozono, T., Shintani, T.: Flexible Query Modifier for Research Support System Papits. In: Proceedings of the IASTED International Conference on Artificial and Computational Intelligence(ACI 2002), pp. 142–147 (2002)

    Google Scholar 

  3. Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Proceedings of the European Conference on Machine Learning (1998)

    Google Scholar 

  4. John, G.H., Kohavi, R., Pfleger, K.: Irrelevant Features and the Subset Selection Problem. In: Proceedings of the Eleventh International Conference on Machine Learning, pp. 121–129 (1994)

    Google Scholar 

  5. Kudo, T.: TinySVM: Support Vector Machines (2001), http://cl-aistnara.ac.jp/taku-ku/software/TinySVM

  6. Lewis, D., Ringuette, M.: A comparison of two learning algorithms for text categorization. In: Third Annual Symposium on Document Analysis and Information Retrieval, pp. 81–93 (1994)

    Google Scholar 

  7. Nigam, K., Lafferty, J., McCallum, A.: Using Maximum Entropy for Text Classification. In: IJCAI 1999 Workshop on Machine Learning for Information Filtering (1999)

    Google Scholar 

  8. Ozono, T., Goto, S., Fujimaki, N., Shintani, T.: P2P based Knowledge Source Discovery on Research Support System Papits. In: The First International Joint Conference on Autonomous Agents & Multiagent Systems(AAMAS 2002) (2002)

    Google Scholar 

  9. Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)

    Google Scholar 

  10. Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian approach to filtering junk e-mail. In: AAAI/ICML Workshop on Learning for Text Categorization (1998)

    Google Scholar 

  11. Soucy, P., Mineau, G.W.: A Simple Feature Selection Method for Text Classification. In: Proceedings of International joint Conference on Artificial Intelligence( IJCAI 2001), pp. 897–902 (2001)

    Google Scholar 

  12. Yang, Y., Liu, X.: A re-examination of text categorization methods. In: 22nd Annual International SIGIR, pp. 42–49 (1999)

    Google Scholar 

  13. Yang, Y., Perdersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, ICML 1997 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ozono, T., Shintani, T., Ito, T., Hasegawa, T. (2004). A Feature Selection for Text Categorization on Research Support System Papits. In: Zhang, C., W. Guesgen, H., Yeap, WK. (eds) PRICAI 2004: Trends in Artificial Intelligence. PRICAI 2004. Lecture Notes in Computer Science(), vol 3157. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28633-2_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-28633-2_56

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22817-2

  • Online ISBN: 978-3-540-28633-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics