Compression Based Modeling for Classification of Text Documents

S. N. Bharath Bhushan⁹ &
Ajit Danti¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1037))

Included in the following conference series:

International Conference on Recent Trends in Image Processing and Pattern Recognition

Abstract

Classification of text data one of the well known, interesting research topic in computer science and knowledge engineering. This research article, address the classification of text files issue using lzw text compression algorithms. LZW is a lossless compression technique which requires two pass on the input data. These two passes are treated separately as training stage and text stage for classification of text data. The proposed compression based classification technique is tested on publically available datasets. Results of the experiments shows the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Classification of Text Documents

Accurate Text Classification via Maximum Entropy Model

Text-Transformed Image Classification Based on Data Compression

References

Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34, 1–47 (2002)
Article Google Scholar
Bhushan Bharath, S.N., Ajit, D.: Classification of text documents based on score level fusion approach. Pattern Recogn. Lett. 94, 118–126 (2017)
Article Google Scholar
Schoenharl, T.W., Madey, G.: Evaluation of measurement techniques for the validation of agent-based simulations against streaming data. In: Proceedings of ICCS, Kraków, Poland (2008)
Chapter Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco, Elsevier, Boston (2006)
Google Scholar
Ajit, D., Bhushan Bharath, S.N.: Document vector space representation model for automatic text classification. In: Proceedings of International Conference on Multimedia Processing, Communication and Information Technology, Shimoga, pp. 338–344 (2013)
Google Scholar
Du, Y., LiuW, L.X., Peng, G.: An improved focused crawler based on semantic similarity vector space model. Appl. Soft Comput. 36, 392–407 (2015). https://doi.org/10.1016/j.asoc.2015.07.026
Article Google Scholar
Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). http://dl.acm.org/citation.cfm?id=944919.944937
MATH Google Scholar
Hotho, A., Staab, S., Stumme, G.: Ontologies improve text document clustering. In: Proceedings of the 3rd IEEE International Conference on Data Mining, pp. 541–544 (2003)
Google Scholar
Lewis, D.D., Ringuette, M.: A comparison of two learning algorithms for text classification. In: Proceedings of the 3rd Annual Symposium on Document Analysis and Information Retrieval, pp. 81–93 (1998)
Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0026683
Chapter Google Scholar
Donald, S.: Probabilistic neural networks. J. Neural Networks 3(1), 109–118 (1990)
Article Google Scholar
Patra, A., Singh, D.: Neural network approach for text classification using relevance factor as term weighing method. Int. J. Comput. Appl. 68(17), 37–41 (2013)
Google Scholar
Ajit, D., Bharath, B.: Classification of text documents using integer representation and regression: an integrated approach. Spec. Issue IIOAB Scopus Indexed J. 7(2), 45–50 (2016)
Google Scholar
Bharath Bhushan, S.N., Danti, A.: Classification of compressed and uncompressed text documents. Future Gener. Comput. Syst. 88, 614–623 (2018)
Article Google Scholar
Bharath Bhushan, S.N., Danti, A.: Comparative study of clustering algorithms on compressed text data. Int. J. Comput. Eng. Appl. XII(I), 182–190 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Sahyadri College of Engineering & Management, Mangaluru, India
S. N. Bharath Bhushan
Faculty of Engineering-CSE, Christ (Deemed to be University), Kengeri Campus, Bangalore, 560074, India
Ajit Danti

Authors

S. N. Bharath Bhushan
View author publications
You can also search for this author in PubMed Google Scholar
Ajit Danti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. N. Bharath Bhushan .

Editor information

Editors and Affiliations

Department of Computer Science, University of South Dakota, Vermillion, SD, USA
K. C. Santosh
Solapur University, Solapur, India
Ravindra S. Hegadi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhushan, S.N.B., Danti, A. (2019). Compression Based Modeling for Classification of Text Documents. In: Santosh, K., Hegadi, R. (eds) Recent Trends in Image Processing and Pattern Recognition. RTIP2R 2018. Communications in Computer and Information Science, vol 1037. Springer, Singapore. https://doi.org/10.1007/978-981-13-9187-3_63

Download citation

DOI: https://doi.org/10.1007/978-981-13-9187-3_63
Published: 17 July 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9186-6
Online ISBN: 978-981-13-9187-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Compression Based Modeling for Classification of Text Documents

Abstract

Access this chapter

Subscribe and save

Buy Now