Nothing Special   »   [go: up one dir, main page]

Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 236))

Abstract

Text classification is used to organize documents in a predefined set of classes. It is very useful in Web content management, search engines; email filtering, etc. Text classification is a difficult task due to high- dimensional feature vector comprising noisy and irrelevant features. Various feature reduction methods have been proposed for eliminating irrelevant features as well as for reducing the dimension of feature vector. Relevant and reduced feature vector is used by machine learning model for better classification results. This paper presents various text classification approaches using machine learning techniques, and feature selection techniques for reducing the high-dimensional feature vector.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)

    Article  Google Scholar 

  2. Al-Harbi, S., Almuhareb, A., Al-Thubaity, A., Khorsheed, M., Al-Rajeh, A.: Automatic Arabic text classification. In: JADT’08, France, pp. 77–83 (2008)

    Google Scholar 

  3. Forman, George: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)

    MATH  Google Scholar 

  4. Yang, Y., Pedersen, J.O.: A Comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420, 08–12 July 1997

    Google Scholar 

  5. Isa, D., Lee, L.H., Kallimani, V.P., RajKumar, R.: Text document pre-processing with the Bayes formula for classification using the support vector machine. IEEE Trans. Knowl. Data Eng. 20(9), 1264–1272 (2008)

    Google Scholar 

  6. Yan, X., Gareth J., Li J.T., Wang, B., Sun, C.M.: A study on mutual information-based feature selection for text categorization’. J. Comput. Inf. Syst. 3(3), 1007–1012 (2007)

    Google Scholar 

  7. Porter, M.F.: An algorithm for suffix stripping. Program 14(3). 130–137 (1980)

    Google Scholar 

  8. Nigam, K., Mccallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39, 103–134 (2000)

    Google Scholar 

  9. Joachims, T.: A statistical learning model for text classification for support vector machines. In: 24th ACM International Conference on Research and Development in Information Retrieval (SIGIR) (2001)

    Google Scholar 

  10. Dong, Tao, Shang, Wenqian, Zhu, Haibin: An improved algorithm of Bayesian text categorization. J. Softw. 6(9), 1837–1843 (September 2011)

    Google Scholar 

  11. Kumar, C.A.: Analysis of unsupervised dimensionality reduction techniques. Comput. Sci. Inf. Syst. 6(2), 217–227 (Dec. 2009)

    Google Scholar 

  12. Soon, C.P.: Neural network for text classification based on singular value decomposition. In: 7\(^{th}\) International conference on Computer and Information Technology, pp. 47–52 (2007)

    Google Scholar 

  13. Muhammed, M.: Improved k-NN algorithm for text classification. Department of Computer Science and Engineering University of Texas at Arlington, TX, USA

    Google Scholar 

  14. Ikonomakis, M., Kotsiantis, S., Tampakas, V.: Text classification using machine learning techniques. IEEE Trans. Comput. 4(8) 966–974 (2005)

    Google Scholar 

  15. Wang, Z, Qian, X.: Text categorization based on LDA and SVM. In: Computer Science and Software Engineering, 2008 International Conference, vol. 1, pp. 674–677 (2008)

    Google Scholar 

  16. Kolenda, T., Hansen, L.K., Sigurdsson, S.: Independent components in text. In: Girolami, M. (ed.) Advances in Independent Component Analysis, Springer-Verlag, New York (2000)

    Google Scholar 

  17. Jia-ni, H.U., Wei-Ran, X.U. Jun, G., Wei-Hong, D.: Study on feature methods in chinese text categorization. Study Opt. Commun. 3, 44–46 (2005)

    Google Scholar 

  18. Aggarwal, C.C., Zhai, C-X.: A survey of text classification algorithms. Mining Text Data. pp. 163–222, Springer (2012)

    Google Scholar 

  19. Aas, K., Eikvil, L.: Text categorisation: A survey”m Tech. rep. 941. Norwegian Computing Center, Oslo, Norway (1999)

    Google Scholar 

  20. Schapire, R.E., Singer, Y., Singhal, A.: Boosting and Rocchio applied to text filtering. In: Proceedings of SIGIR-98 21st ACM International Conference on Research and Development in Information Retrieval, pp. 215–223, ACM Press, New York US (1998)

    Google Scholar 

  21. Kim, S.B., Rim, H.C., Yook, D.S., Lim, H.S.: Effective Methods for Improving Naive Bayes Text Classifiers. LNAI 2417, 414–423 (2002)

    Google Scholar 

  22. Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)

    Google Scholar 

  23. Zhang, B., Su, J., Xu, X.: A class-incremental learning method for multi-class support vector machines in text classification. In: Proceedings of the 5th IEEE international conference on Machine Learning and, Cybernetics, pp. 2581–2585 (2006)

    Google Scholar 

  24. Goyal, R.D.: Knowledge based neural network for text classification. In: Proceedings of the IEEE international conference on Granular, Computing, pp. 542–547 (2007)

    Google Scholar 

  25. Meena, M.J., Chandran, K.R.: Naïve bayes text classification with positive features selected by statistical method. In: Proceedings of the IEEE international conference on Advanced, Computing, pp. 28–33 (2009)

    Google Scholar 

  26. Li, C.H, Park, S.C.: An efficient document classification model using an improved back propagation neural network and singular value decomposition. J. Expert Syst. Appl. 36(2), pp. 3208–3215 (2009)

    Google Scholar 

  27. Wang, Z., He, Y., Jiang, M.: A comparison among three neural networks for text classification. In: 8th IEEE International Conference on, Signal Processing (2006)

    Google Scholar 

  28. Zhijie, L., Lv, X., Liu, K., Shi, S.: Study on SVM compared with other text classification methods. In: 2\(^{nd}\) International workshop on education technology and computer, science (2010)

    Google Scholar 

  29. Freund, Y., Shapire, R.R.: Experiments with a new boosting algorithm. In: Proceedings of 13th International Conference on, Machine learning, pp. 148–156 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Basant Agarwal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer India

About this paper

Cite this paper

Agarwal, B., Mittal, N. (2014). Text Classification Using Machine Learning Methods-A Survey. In: Babu, B., et al. Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), December 28-30, 2012. Advances in Intelligent Systems and Computing, vol 236. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1602-5_75

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-1602-5_75

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-1601-8

  • Online ISBN: 978-81-322-1602-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics