Abstract
The incredible increase of online documents in digital form on the Web, has renewed the interest in text classification. The aim of text classification is to classify text documents into a set of pre-defined categories. But the poor quality of features selection, extremely high dimensional feature space and complexity of natural languages become the roadblock for this classification process. To address these issues, here we propose a k-means clustering based feature selection for text classification. Bi-Normal Separation (BNS) combine with Wordnet and cosine-similarity helps to form a quality and reduce feature vector to train the Extreme Learning Machine (ELM) and Multi-layer Extreme Learning Machine (ML-ELM) classifiers. For experimental purpose, 20-Newsgroups and DMOZ datasets have been used. The empirical results on these two benchmark datasets demonstrate the applicability, efficiency and effectiveness of our approach using ELM and ML-ELM as the classifiers over state-of-the-art classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
Iteratively running the script over a range of values of m and finally select that value of m for which the result is best.
- 4.
- 5.
References
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: ICML, vol. 97, pp. 412–420 (1997)
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Aggarwal, C.C., Zhai, C.: A survey of text classification algorithms. Mining Text Data, pp. 163–222. Springer, New York (2012)
Qiu, X., Huang, X., Liu, Z., Zhou, J.: Hierarchical text classification with latent concepts. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers, vol. 2. Association for Computational Linguistics, pp. 598–602 (2011)
Qiu, X., Zhou, J., Huang, X.: An effective feature selection method for text categorization. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part I. LNCS, vol. 6634, pp. 50–61. Springer, Heidelberg (2011)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)
Eyheramendy, S., Madigan, D.: A novel feature selection score for text categorization. In: Proceedings of the Workshop on Feature Selection for Data Mining, in conjunction with the 2005 SIAM International Conference on Data Mining, pp. 1–8 (2005)
Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70(1), 489–501 (2006)
Vapnik, V., Golowich, S.E., Smola, A.: Support vector method for function approximation, regression estimation, and signal processing. In: Advances in Neural Information Processing Systems, vol. 9. Citeseer (1996)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Kasun, H.G.V., Zhou, H.: Representational learning with elms for big data scholarly article. IEEE Intell. Syst. 28(6), 31–34 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Roul, R.K., Sahay, S.K. (2016). K-means and Wordnet Based Feature Selection Combined with Extreme Learning Machines for Text Classification. In: Bjørner, N., Prasad, S., Parida, L. (eds) Distributed Computing and Internet Technology. ICDCIT 2016. Lecture Notes in Computer Science(), vol 9581. Springer, Cham. https://doi.org/10.1007/978-3-319-28034-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-28034-9_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28033-2
Online ISBN: 978-3-319-28034-9
eBook Packages: Computer ScienceComputer Science (R0)