Abstract
Text classification is used to organize documents in a predefined set of classes. It is very useful in Web content management, search engines; email filtering, etc. Text classification is a difficult task due to high- dimensional feature vector comprising noisy and irrelevant features. Various feature reduction methods have been proposed for eliminating irrelevant features as well as for reducing the dimension of feature vector. Relevant and reduced feature vector is used by machine learning model for better classification results. This paper presents various text classification approaches using machine learning techniques, and feature selection techniques for reducing the high-dimensional feature vector.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Al-Harbi, S., Almuhareb, A., Al-Thubaity, A., Khorsheed, M., Al-Rajeh, A.: Automatic Arabic text classification. In: JADT’08, France, pp. 77–83 (2008)
Forman, George: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)
Yang, Y., Pedersen, J.O.: A Comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420, 08–12 July 1997
Isa, D., Lee, L.H., Kallimani, V.P., RajKumar, R.: Text document pre-processing with the Bayes formula for classification using the support vector machine. IEEE Trans. Knowl. Data Eng. 20(9), 1264–1272 (2008)
Yan, X., Gareth J., Li J.T., Wang, B., Sun, C.M.: A study on mutual information-based feature selection for text categorization’. J. Comput. Inf. Syst. 3(3), 1007–1012 (2007)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3). 130–137 (1980)
Nigam, K., Mccallum, A.K., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39, 103–134 (2000)
Joachims, T.: A statistical learning model for text classification for support vector machines. In: 24th ACM International Conference on Research and Development in Information Retrieval (SIGIR) (2001)
Dong, Tao, Shang, Wenqian, Zhu, Haibin: An improved algorithm of Bayesian text categorization. J. Softw. 6(9), 1837–1843 (September 2011)
Kumar, C.A.: Analysis of unsupervised dimensionality reduction techniques. Comput. Sci. Inf. Syst. 6(2), 217–227 (Dec. 2009)
Soon, C.P.: Neural network for text classification based on singular value decomposition. In: 7\(^{th}\) International conference on Computer and Information Technology, pp. 47–52 (2007)
Muhammed, M.: Improved k-NN algorithm for text classification. Department of Computer Science and Engineering University of Texas at Arlington, TX, USA
Ikonomakis, M., Kotsiantis, S., Tampakas, V.: Text classification using machine learning techniques. IEEE Trans. Comput. 4(8) 966–974 (2005)
Wang, Z, Qian, X.: Text categorization based on LDA and SVM. In: Computer Science and Software Engineering, 2008 International Conference, vol. 1, pp. 674–677 (2008)
Kolenda, T., Hansen, L.K., Sigurdsson, S.: Independent components in text. In: Girolami, M. (ed.) Advances in Independent Component Analysis, Springer-Verlag, New York (2000)
Jia-ni, H.U., Wei-Ran, X.U. Jun, G., Wei-Hong, D.: Study on feature methods in chinese text categorization. Study Opt. Commun. 3, 44–46 (2005)
Aggarwal, C.C., Zhai, C-X.: A survey of text classification algorithms. Mining Text Data. pp. 163–222, Springer (2012)
Aas, K., Eikvil, L.: Text categorisation: A survey”m Tech. rep. 941. Norwegian Computing Center, Oslo, Norway (1999)
Schapire, R.E., Singer, Y., Singhal, A.: Boosting and Rocchio applied to text filtering. In: Proceedings of SIGIR-98 21st ACM International Conference on Research and Development in Information Retrieval, pp. 215–223, ACM Press, New York US (1998)
Kim, S.B., Rim, H.C., Yook, D.S., Lim, H.S.: Effective Methods for Improving Naive Bayes Text Classifiers. LNAI 2417, 414–423 (2002)
Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Zhang, B., Su, J., Xu, X.: A class-incremental learning method for multi-class support vector machines in text classification. In: Proceedings of the 5th IEEE international conference on Machine Learning and, Cybernetics, pp. 2581–2585 (2006)
Goyal, R.D.: Knowledge based neural network for text classification. In: Proceedings of the IEEE international conference on Granular, Computing, pp. 542–547 (2007)
Meena, M.J., Chandran, K.R.: Naïve bayes text classification with positive features selected by statistical method. In: Proceedings of the IEEE international conference on Advanced, Computing, pp. 28–33 (2009)
Li, C.H, Park, S.C.: An efficient document classification model using an improved back propagation neural network and singular value decomposition. J. Expert Syst. Appl. 36(2), pp. 3208–3215 (2009)
Wang, Z., He, Y., Jiang, M.: A comparison among three neural networks for text classification. In: 8th IEEE International Conference on, Signal Processing (2006)
Zhijie, L., Lv, X., Liu, K., Shi, S.: Study on SVM compared with other text classification methods. In: 2\(^{nd}\) International workshop on education technology and computer, science (2010)
Freund, Y., Shapire, R.R.: Experiments with a new boosting algorithm. In: Proceedings of 13th International Conference on, Machine learning, pp. 148–156 (1996)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer India
About this paper
Cite this paper
Agarwal, B., Mittal, N. (2014). Text Classification Using Machine Learning Methods-A Survey. In: Babu, B., et al. Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), December 28-30, 2012. Advances in Intelligent Systems and Computing, vol 236. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1602-5_75
Download citation
DOI: https://doi.org/10.1007/978-81-322-1602-5_75
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-1601-8
Online ISBN: 978-81-322-1602-5
eBook Packages: EngineeringEngineering (R0)