Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2393216.2393314acmotherconferencesArticle/Chapter ViewAbstractPublication PagesccseitConference Proceedingsconference-collections
research-article

An empirical study on various text classifiers

Published: 26 October 2012 Publication History

Abstract

Text classification has gained importance more than ever in the present day owing to the huge amount of data generated with the advent of technology. There are a numerous well established techniques available to achieve classification. It is difficult to declare an algorithm to be universally efficient over the huge variety of datasets created in real time. In this paper, the existing methods are compared and contrasted based on experimental results. The experiment involves testing a document against the training set created previously. The results show quantitative values of the comparable parameters and hence helpful in the choice of a classification algorithm.

References

[1]
Song, F., Liu, S., and Yang, J. 2005. A comparative study on text representation schemes in text categorization, Journal of Pattern Analysis Application, Vol 8, 2005, pp199--209.
[2]
Porter, M. F. 1980. An algorithm for suffix stripping. Program, Vol. 14 (3), pp. 130--137.
[3]
Hotho, A., Nürnberger, A., and Paaß, G. 2005. A Brief Survey of Text Mining. Journal for Computational Linguistics and Language Technology. Vol. 20, pp. 19--62.
[4]
Salton, G., Wang, A., and Yang, C. S.1975. A Vector Space Model for Automatic Indexing. Communications of the ACM, Vol. 18, pp. 613--620.
[5]
Bernotas, M., Karklius, K., Laurutis, R., and Slotkiene, A. 2007. The peculiarities of the text document representation, using ontology and tagging-based clustering technique. Journal of Information Technology and Control. Vol. 36, pp. 217--220.
[6]
Lan, M., Tan, C. L., Su. J., and Lu, Y.2009. Supervised and Traditional Term Weighting Methods for Automatic Text Categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume: 31 (4), pp. 721--735.
[7]
Altinçay, H., and Erenel, Z. 2010. Analytical evaluation of term weighting schemes for text categorization. In Journal of Pattern Recognition Letters, vol. 31 (11), pp. 1310--1323.
[8]
Li, and Jain, A. K., Y. H. 1998. Classification of Text Documents. The Computer Journal, Vol 41, pp. 537--546.
[9]
Hotho, A., Maedche, A., and Staab, S. 2001. Ontology based text clustering. In Proceedings of International Joint Conference on Artificial Intelligence, pp. 30--37.
[10]
Cavnar, W. B. 1994. Using an N-Gram based document representation with a vector processing retrieval model. In Proceedings of The Third Text Retrieval Conference (TREC-3), pp. 269--278.
[11]
Milios, E., Zhang, Y., He, B., and Dong, L. 2003. Automatic term extraction and document similarity in special text corpora. In Proceedings of Sixth Conference of the Pacific Association for Computational Linguistics (PACLing'03), pp. 275--284.
[12]
Wei, C. P., Yang, C. C., and Lin, C. M. 2008. A Latent Semantic Indexing-based approach to multilingual document clustering. Journal of Decision Support System. Vol. 45, pp. 606--620.
[13]
He, X., Cai, D., Liu, H., and Ma, W. Y. 2004. Locality Preserving Indexing for document representation. In SIGIR, pp. 96--103.
[14]
Cai, D., He, X., Zhang, W. V., and Han J. 2007. Regularized Locality Preserving Indexing via Spectral Regression. In ACM International Conference on Information and Knowledge Management (CIKM'07), pp. 741--750.
[15]
Choudhary, B., and Bhattacharyya, P. 2003. Text clustering using Universal Networking Language representation. In Eleventh International World Wide Web Conference.
[16]
Craven, M., DiPasquo, D., Freitag, D., McCallum, A., Mitchell, T. M., Nigam, K., and Slattery, S. 1998. Learning to Extract Symbolic Knowledge from the World Wide Web. In Proceedings of AAAI/IAAI', pp. 509--516.
[17]
Esteban, M., and Rodriguez, O. R. 2006. A Symbolic Representation for Distributed Web Document Clustering. In the Proceedings of Fourth Latin American Web Congress, Cholula, Mexico.
[18]
Isa, D., Lee, L. H., Kallimani, V. P., and Rajkumar, R. 2008. Text document preprocessing with the Bayes formula for classification using the support vector machine. IEEE Transactions on Knowledge and Data Engineering. Vol. 20, pp. 23--31.
[19]
Dinesh, R., Harish, B. S., Guru, D. S., and Manjunath, S.2009. Concept of Status Matrix in Text Classification. In the Proceedings of Indian International Conference on Artificial Intelligence, Tumkur, India, pp. 2071--2079.
[20]
Imola K. Fodor, A Survey of Dimension Reduction Techniques, June 2002.
[21]
K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate Analysis. Probability and Mathematical Statistics. Academic Press, 1995.
[22]
Kirk Baker. Singular Value Decomposition Tutorial, March 2005.
[23]
Xiaofei He, Deng Cai, Haifeng Liu, Wei-Ying Ma. Locality Preserving Indexing for Document Representation.
[24]
Rocchio. Relevance Feedback in Information Retrieval. Prentice-Hall Inc., 1971.
[25]
B S Harish, D S Guru, S Manjunath. Representation and Classification of Text Documents: Abrief Review. IJCA Special Issue on "Recent Trends in Image Processing and Pattern Recognition" RTIPPR, 2010.
[26]
Sebastiani, F. 2002. Machine learning in automated text categorization.ACM Computing Surveys. Vol 34, pp. 1--47.
[27]
Lewis, D. D., Schapire, R. E., Callan, J. P., and Papka, R.1996. Training algorithms for linear text classifiers. In the Proceedings of the Nineteenth International Conference on Research and Development in Information Retrieval (SIGIR'96), pp. 289--297.
[28]
Joachims, Y. 1997. A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In the Proceedings of the Fourteenth International Conference on Machine Learning, pp. 143--151.

Index Terms

  1. An empirical study on various text classifiers

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    CCSEIT '12: Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology
    October 2012
    800 pages
    ISBN:9781450313100
    DOI:10.1145/2393216
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • Avinashilingam University: Avinashilingam University

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 October 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. classifiers
    2. dimensionality reduction
    3. documents
    4. text classification

    Qualifiers

    • Research-article

    Conference

    CCSEIT '12
    Sponsor:
    • Avinashilingam University

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 109
      Total Downloads
    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 21 Nov 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media