Abstract
Classification of text data one of the well known, interesting research topic in computer science and knowledge engineering. This research article, address the classification of text files issue using lzw text compression algorithms. LZW is a lossless compression technique which requires two pass on the input data. These two passes are treated separately as training stage and text stage for classification of text data. The proposed compression based classification technique is tested on publically available datasets. Results of the experiments shows the effectiveness of the proposed algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34, 1–47 (2002)
Bhushan Bharath, S.N., Ajit, D.: Classification of text documents based on score level fusion approach. Pattern Recogn. Lett. 94, 118–126 (2017)
Schoenharl, T.W., Madey, G.: Evaluation of measurement techniques for the validation of agent-based simulations against streaming data. In: Proceedings of ICCS, Kraków, Poland (2008)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco, Elsevier, Boston (2006)
Ajit, D., Bhushan Bharath, S.N.: Document vector space representation model for automatic text classification. In: Proceedings of International Conference on Multimedia Processing, Communication and Information Technology, Shimoga, pp. 338–344 (2013)
Du, Y., LiuW, L.X., Peng, G.: An improved focused crawler based on semantic similarity vector space model. Appl. Soft Comput. 36, 392–407 (2015). https://doi.org/10.1016/j.asoc.2015.07.026
Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). http://dl.acm.org/citation.cfm?id=944919.944937
Hotho, A., Staab, S., Stumme, G.: Ontologies improve text document clustering. In: Proceedings of the 3rd IEEE International Conference on Data Mining, pp. 541–544 (2003)
Lewis, D.D., Ringuette, M.: A comparison of two learning algorithms for text classification. In: Proceedings of the 3rd Annual Symposium on Document Analysis and Information Retrieval, pp. 81–93 (1998)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026683
Donald, S.: Probabilistic neural networks. J. Neural Networks 3(1), 109–118 (1990)
Patra, A., Singh, D.: Neural network approach for text classification using relevance factor as term weighing method. Int. J. Comput. Appl. 68(17), 37–41 (2013)
Ajit, D., Bharath, B.: Classification of text documents using integer representation and regression: an integrated approach. Spec. Issue IIOAB Scopus Indexed J. 7(2), 45–50 (2016)
Bharath Bhushan, S.N., Danti, A.: Classification of compressed and uncompressed text documents. Future Gener. Comput. Syst. 88, 614–623 (2018)
Bharath Bhushan, S.N., Danti, A.: Comparative study of clustering algorithms on compressed text data. Int. J. Comput. Eng. Appl. XII(I), 182–190 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Bhushan, S.N.B., Danti, A. (2019). Compression Based Modeling for Classification of Text Documents. In: Santosh, K., Hegadi, R. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R 2018. Communications in Computer and Information Science, vol 1037. Springer, Singapore. https://doi.org/10.1007/978-981-13-9187-3_63
Download citation
DOI: https://doi.org/10.1007/978-981-13-9187-3_63
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9186-6
Online ISBN: 978-981-13-9187-3
eBook Packages: Computer ScienceComputer Science (R0)