research-article

An empirical study on various text classifiers

Authors:

Ramya M. Hegde,

M. MeghanaAuthors Info & Claims

CCSEIT '12: Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology

Pages 587 - 593

https://doi.org/10.1145/2393216.2393314

Published: 26 October 2012 Publication History

Abstract

Text classification has gained importance more than ever in the present day owing to the huge amount of data generated with the advent of technology. There are a numerous well established techniques available to achieve classification. It is difficult to declare an algorithm to be universally efficient over the huge variety of datasets created in real time. In this paper, the existing methods are compared and contrasted based on experimental results. The experiment involves testing a document against the training set created previously. The results show quantitative values of the comparable parameters and hence helpful in the choice of a classification algorithm.

References

[1]

Song, F., Liu, S., and Yang, J. 2005. A comparative study on text representation schemes in text categorization, Journal of Pattern Analysis Application, Vol 8, 2005, pp199--209.

Digital Library

[2]

Porter, M. F. 1980. An algorithm for suffix stripping. Program, Vol. 14 (3), pp. 130--137.

Digital Library

[3]

Hotho, A., Nürnberger, A., and Paaß, G. 2005. A Brief Survey of Text Mining. Journal for Computational Linguistics and Language Technology. Vol. 20, pp. 19--62.

[4]

Salton, G., Wang, A., and Yang, C. S.1975. A Vector Space Model for Automatic Indexing. Communications of the ACM, Vol. 18, pp. 613--620.

Digital Library

[5]

Bernotas, M., Karklius, K., Laurutis, R., and Slotkiene, A. 2007. The peculiarities of the text document representation, using ontology and tagging-based clustering technique. Journal of Information Technology and Control. Vol. 36, pp. 217--220.

[6]

Lan, M., Tan, C. L., Su. J., and Lu, Y.2009. Supervised and Traditional Term Weighting Methods for Automatic Text Categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume: 31 (4), pp. 721--735.

Digital Library

[7]

Altinçay, H., and Erenel, Z. 2010. Analytical evaluation of term weighting schemes for text categorization. In Journal of Pattern Recognition Letters, vol. 31 (11), pp. 1310--1323.

Digital Library

[8]

Li, and Jain, A. K., Y. H. 1998. Classification of Text Documents. The Computer Journal, Vol 41, pp. 537--546.

[9]

Hotho, A., Maedche, A., and Staab, S. 2001. Ontology based text clustering. In Proceedings of International Joint Conference on Artificial Intelligence, pp. 30--37.

[10]

Cavnar, W. B. 1994. Using an N-Gram based document representation with a vector processing retrieval model. In Proceedings of The Third Text Retrieval Conference (TREC-3), pp. 269--278.

[11]

Milios, E., Zhang, Y., He, B., and Dong, L. 2003. Automatic term extraction and document similarity in special text corpora. In Proceedings of Sixth Conference of the Pacific Association for Computational Linguistics (PACLing'03), pp. 275--284.

[12]

Wei, C. P., Yang, C. C., and Lin, C. M. 2008. A Latent Semantic Indexing-based approach to multilingual document clustering. Journal of Decision Support System. Vol. 45, pp. 606--620.

Digital Library

[13]

He, X., Cai, D., Liu, H., and Ma, W. Y. 2004. Locality Preserving Indexing for document representation. In SIGIR, pp. 96--103.

Digital Library

[14]

Cai, D., He, X., Zhang, W. V., and Han J. 2007. Regularized Locality Preserving Indexing via Spectral Regression. In ACM International Conference on Information and Knowledge Management (CIKM'07), pp. 741--750.

Digital Library

[15]

Choudhary, B., and Bhattacharyya, P. 2003. Text clustering using Universal Networking Language representation. In Eleventh International World Wide Web Conference.

[16]

Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T. M., Nigam, K., and Slattery, S. 1998. Learning to Extract Symbolic Knowledge from the World Wide Web. In Proceedings of AAAI/IAAI', pp. 509--516.

Digital Library

[17]

Esteban, M., and Rodriguez, O. R. 2006. A Symbolic Representation for Distributed Web Document Clustering. In the Proceedings of Fourth Latin American Web Congress, Cholula, Mexico.

[18]

Isa, D., Lee, L. H., Kallimani, V. P., and Rajkumar, R. 2008. Text document preprocessing with the Bayes formula for classification using the support vector machine. IEEE Transactions on Knowledge and Data Engineering. Vol. 20, pp. 23--31.

Digital Library

[19]

Dinesh, R., Harish, B. S., Guru, D. S., and Manjunath, S.2009. Concept of Status Matrix in Text Classification. In the Proceedings of Indian International Conference on Artificial Intelligence, Tumkur, India, pp. 2071--2079.

[20]

Imola K. Fodor, A Survey of Dimension Reduction Techniques, June 2002.

[21]

K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate Analysis. Probability and Mathematical Statistics. Academic Press, 1995.

[22]

Kirk Baker. Singular Value Decomposition Tutorial, March 2005.

[23]

Xiaofei He, Deng Cai, Haifeng Liu, Wei-Ying Ma. Locality Preserving Indexing for Document Representation.

[24]

Rocchio. Relevance Feedback in Information Retrieval. Prentice-Hall Inc., 1971.

[25]

B S Harish, D S Guru, S Manjunath. Representation and Classification of Text Documents: Abrief Review. IJCA Special Issue on "Recent Trends in Image Processing and Pattern Recognition" RTIPPR, 2010.

[26]

Sebastiani, F. 2002. Machine learning in automated text categorization.ACM Computing Surveys. Vol 34, pp. 1--47.

Digital Library

[27]

Lewis, D. D., Schapire, R. E., Callan, J. P., and Papka, R.1996. Training algorithms for linear text classifiers. In the Proceedings of the Nineteenth International Conference on Research and Development in Information Retrieval (SIGIR'96), pp. 289--297.

Digital Library

[28]

Joachims, Y. 1997. A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In the Proceedings of the Fourteenth International Conference on Machine Learning, pp. 143--151.

Digital Library

Index Terms

An empirical study on various text classifiers
1. Computing methodologies
  1. Machine learning

Recommendations

Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
Bayesian Naïve Bayes classifiers to text classification

Text classification is the task of assigning predefined categories to natural language documents, and it can provide conceptual views of document collections. The Na ve Bayes NB classifier is a family of simple probabilistic classifiers based on a ...
Ensemble of keyword extraction methods and classifiers in text classification

Text classification is a domain with high dimensional feature space.Extracting the keywords as the features can be extremely useful in text classification.An empirical analysis of five statistical keyword extraction methods.A comprehensive analysis of ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

CCSEIT '12: Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology

October 2012

800 pages

ISBN:9781450313100

DOI:10.1145/2393216

General Chairs:
Natarajan Meghanathan
Jackson State University
,
Michal Wozniak
Wroclaw University of Technology, Poland

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Avinashilingam University: Avinashilingam University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CCSEIT '12

Sponsor:

Avinashilingam University

CCSEIT '12: The Second International Conference on Computational Science, Engineering

October 26 - 28, 2012

Coimbatore UNK, India

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
109
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents