Article

Free access

A re-examination of text categorization methods

Authors:

SIGIR '99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval

Pages 42 - 49

https://doi.org/10.1145/312624.312647

Published: 01 August 1999 Publication History

PDF eReader

References

[1]

C. Apte, N. Damerau, and S. Weiss. Towards language independent automated learning of text categorization models. In Proceedings of the 17th Annual A CM/SIGIR conference, 1994.

Digital Library

Google Scholar

[2]

C. Apte, F. Damerau, and S. Weiss. Text mining with decision rules and decision trees. In Proceedings of the Conference on Automated Learning and Discorery, Workshop 6: Learning from Text and the Web, 1998.

Google Scholar

[3]

L. Douglas Baker and Andrew K. Mccallum. Distributional clustering of words for text categorization. In Proceedings of the 21th Ann Int A CM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98), pages 96-103, 1998.

Digital Library

Google Scholar

[4]

D. Berry and B.W. Lindgren. Statistics: Theory and Methods. Brooks/Cole, Pacific Grove, California, 1990.

Google Scholar

[5]

William W. Cohen. Text categorization and relational learning. In The Twelfth International Conference on Machine Learning (ICML'95). Morgan Kaufmann, 1995.

Digital Library

Google Scholar

[6]

William W. Cohen and Yoram Singer. Context-sensitive learning methods for text categorization. In SIGIR '96: Proceedings of the 19th Annual International A CM SIGIR Conference on Research and Development in Information Retrieval, 1996. 307-315.

Digital Library

Google Scholar

[7]

C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273-297, 1995.

Digital Library

Google Scholar

[8]

Belur V. Dasarathy. Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. McGraw-Hill Computer Science Series. IEEE Computer Society Press, Las Alamitos, California, 1991.

Google Scholar

[9]

N. Fuhr, S. Hartmanna, G. Lustig, M. Schwantner, and K. Tzeras. Air/x - a rule-based multistage indexing systems for large subject fields. In 606-623, editor, Proceedings of RIAO'91, 1991.

Google Scholar

[10]

P.J. Hayes and S. P. Weinstein. Construe/tis: a system for content-based indexing of a database of new stories. In Second Annual Conference on Innovative Applications of ArtificiaI Intelligence, 1990.

Digital Library

Google Scholar

[11]

Makato Iwayama and Takenobu Tokunaga. Cluster-based text categorization: a comparison of category search strategies. In Proceedings of the 18th Ann Int A CM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'95), pages 273-281, 1995.

Digital Library

Google Scholar

[12]

Thorsten Joachims. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In European Conference on Machine Learning (ECML), 1998.

Digital Library

Google Scholar

[13]

D. Koller and M. Sahami. Hierarchically classifying documents using very few words. In The Fourteenth International Conference on Machine Learning (ICML'97), pages 170-178, 1997.

Digital Library

Google Scholar

[14]

W. Lam and C.Y. Ho. Using a generalized instance set for automatic text categorization. In Proceedings of the 21th Ann Int A CM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'98), pages 81-89, 1998.

Digital Library

Google Scholar

[15]

David D. Lewis, Robert E. Schapire, James P. Callan, and Ron Papka. Training algorithms for linear text classifiers. In SIGIR '96: Proceedings of the 19th Annual International A CM SIGIR Conference on Research and Development in Information Retrieval, 1996. 298-306.

Digital Library

Google Scholar

[16]

D.D. Lewis and M. Ringuette. Comparison of two learning algorithms for text categorization. In Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR'94), 1994.

Google Scholar

[17]

B. Masand, G. Linoff, and D. Waltz. Classifying news stories using memory based reasoning. In 15th Ann Int A CM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'92), pages 59-64, 1992.

Digital Library

Google Scholar

[18]

A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization, 1998.

Google Scholar

[19]

Tom Mitchell. Machine Learning. McGraw Hill, 1996.

Digital Library

Google Scholar

[20]

I. Moulinier. Is learning bias an issue on the text categorization problem? In Technical report, LAFORIA-LIP6, Universite Paris VI, 1997.

Google Scholar

[21]

I. Moulinier, G. Raskinis, and J. Ganascia. Text categorization: a symbolic approach. In Proceedings of the Fifth Annual Symposium on Document Analysis and Information Retrieval, 1996.

Google Scholar

[22]

H.T. Ng, W.B. Goh, and K.L. Low. Feature selection, perceptron learning, and a usability case study for text categorization. In 20th Ann Int A CM SIGIR Conference on Research and Development in Information Retrieval (SI- GIR'97), pages 67-73, 1997.

Digital Library

Google Scholar

[23]

Osuna, R. Freund, and F. Girosi. Support vector machines: Training and applications. In A.L Memo. MIT A.I. Lab, 1996.

Digital Library

Google Scholar

[24]

J. Platt. Sequetial minimal optimization: A fast algorithm for training support vector machines. In Technical Report MST-TR-98-14. Microsoft Research, 1998.

Google Scholar

[25]

K. Tzeras and S. Hartman. Automatic indexing based on bayesian inference networks. In Proc 16th Ann Int A CM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'93), pages 22-34, 1993.

Digital Library

Google Scholar

[26]

C.J. van Rijsbergen. Information Retrieval. Butterworths, London, 1979.

Digital Library

Google Scholar

[27]

V. Vapnic. The Nature of Statistical Learning Theory. Springer, New York, 1995.

Digital Library

Google Scholar

[28]

E. Wiener, J.O. Pedersen, and A.S. Weigend. A neural network approach to topic spotting. In Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval (SDAIR'95), 1995.

Google Scholar

[29]

Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In 17th Ann Int A CM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94), pages 13- 22, 1994.

Digital Library

Google Scholar

[30]

Y. Yang. Sampling strategies and learning efficiency in text categorization. In AAAI Spring Symposium on Machine Learning in Information Access, pages 88-95, 1996.

Google Scholar

[31]

Y. Yang. An evaluation of statistical approaches to text categorization. Journal of Information Retrieval (to appear), 1999.

Digital Library

Google Scholar

[32]

Y. Yang and C.G. Chute. An example-based mapping method for text categorization and retrieval. A CM Transaction on Information Systems (TOIS), 12(3):252-277, 1994.

Digital Library

Google Scholar

[33]

Y. Yang and J.P. Pedersen. Feature selection in statistical learning of text categorization. In The Fourteenth International Conference on Machine Learning, pages 412-420, 1997.

Digital Library

Google Scholar

Cited By

View all

Yasuda YMiyazaki TGoto J(2024)Weighted Asymmetric Loss for Multi-Label Text Classification on Imbalanced DataJournal of Natural Language Processing10.5715/jnlp.31.116631:3(1166-1192)Online publication date: 2024
https://doi.org/10.5715/jnlp.31.1166
Demirel SBulur NÇakıcı Z(2024)Utilizing Artificial Intelligence for Text Classification in Communication SciencesDesign and Development of Emerging Chatbot Technology10.4018/979-8-3693-1830-0.ch013(218-235)Online publication date: 15-Mar-2024
https://doi.org/10.4018/979-8-3693-1830-0.ch013
Xu EZhu JZhang LWang YLin W(2024)Research on Aspect-Level Sentiment Analysis Based on Adversarial Training and Dependency ParsingElectronics10.3390/electronics1310199313:10(1993)Online publication date: 20-May-2024
https://doi.org/10.3390/electronics13101993
Show More Cited By

Index Terms

A re-examination of text categorization methods
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Classification and regression trees
2. Mathematics of computing
  1. Probability and statistics

Recommendations

Cross-lingual text categorization: Conquering language boundaries in globalized environments

Text categorization pertains to the automatic learning of a text categorization model from a training set of preclassified documents on the basis of their contents and the subsequent assignment of unclassified documents to appropriate categories. Most ...
Text categorization: past and present
Abstract
Automatic text categorization is the operation of sorting out the text documents into pre-defined text categories using some machine learning algorithms. Normally, it defines the most important approaches to organizing and making the use of a ...
Arabic Text Categorization Based on Arabic Wikipedia

This article describes an algorithm for categorizing Arabic text, relying on highly categorized corpus-based datasets obtained from the Arabic Wikipedia by using manual and automated processes to build and customize categories. The categorization ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

SIGIR '99: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval

August 1999

339 pages

ISBN:1581130961

DOI:10.1145/312624

Chairmen:
Fredric Gey
Univ. of California
,
Marti Hearst
Univ. of California, Berkeley
,
Richard Tong
Tarragon Consulting Corp.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 1999

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

SIGIR99

Sponsor:

SIGIR

SIGIR99: 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval

August 15 - 19, 1999

California, Berkeley, USA

Acceptance Rates

SIGIR '99 Paper Acceptance Rate 33 of 135 submissions, 24%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1,673
Total Citations
View Citations
9,468
Total Downloads

Downloads (Last 12 months)625
Downloads (Last 6 weeks)79

Reflects downloads up to 19 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Yasuda YMiyazaki TGoto J(2024)Weighted Asymmetric Loss for Multi-Label Text Classification on Imbalanced DataJournal of Natural Language Processing10.5715/jnlp.31.116631:3(1166-1192)Online publication date: 2024
https://doi.org/10.5715/jnlp.31.1166
Demirel SBulur NÇakıcı Z(2024)Utilizing Artificial Intelligence for Text Classification in Communication SciencesDesign and Development of Emerging Chatbot Technology10.4018/979-8-3693-1830-0.ch013(218-235)Online publication date: 15-Mar-2024
https://doi.org/10.4018/979-8-3693-1830-0.ch013
Xu EZhu JZhang LWang YLin W(2024)Research on Aspect-Level Sentiment Analysis Based on Adversarial Training and Dependency ParsingElectronics10.3390/electronics1310199313:10(1993)Online publication date: 20-May-2024
https://doi.org/10.3390/electronics13101993
Gharavi ELeRoy NZheng GZhang ABrown DSheffield N(2024)Joint Representation Learning for Retrieval and Annotation of Genomic Interval SetsBioengineering10.3390/bioengineering1103026311:3(263)Online publication date: 8-Mar-2024
https://doi.org/10.3390/bioengineering11030263
Wang ZZheng XZhang JZhang M(2024)Three-Branch BERT-Based Text Classification Network for Gastroscopy Diagnosis TextInternational Journal of Crowd Science10.26599/IJCS.2023.91000318:1(56-63)Online publication date: Feb-2024
https://doi.org/10.26599/IJCS.2023.9100031
Goel LGupta SGupta ANandal NRajan SGupta P(2024)A Comparative Analysis of Feature Selection Algorithms in Cross Domain Sentiment ClassificationRecent Advances in Computer Science and Communications10.2174/012666255827688924012506285717:3Online publication date: May-2024
https://doi.org/10.2174/0126662558276889240125062857
Scholz FKolb TNeidhardt JBoratto LGena CMarras MGermanakos PPopescus E(2024)Classifying User Roles in Online News Forums: A Model for User Interaction and Behavior AnalysisAdjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization10.1145/3631700.3665187(240-249)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3631700.3665187
Jeong SKo WMulyadi ASuk H(2024)Deep Efficient Continuous Manifold Learning for Time Series ModelingIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.332012546:1(171-184)Online publication date: Jan-2024
https://doi.org/10.1109/TPAMI.2023.3320125
V VMohania MGoyal V(2024)TagRec++: Hierarchical Label Aware Attention Network for Question CategorizationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.3354504(1-12)Online publication date: 2024
https://doi.org/10.1109/TKDE.2024.3354504
Zhao XAn YXu NGeng X(2024)Variational Continuous Label Distribution Learning for Multi-Label Text ClassificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3323401(1-15)Online publication date: 2024
https://doi.org/10.1109/TKDE.2023.3323401
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Cross-lingual text categorization: Conquering language boundaries in globalized environments

Text categorization: past and present

Arabic Text Categorization Based on Arabic Wikipedia