Nothing Special   »   [go: up one dir, main page]

skip to main content
10.3115/1118935.1118940dlproceedingsArticle/Chapter ViewAbstractPublication PagesiralConference Proceedingsconference-collections
Article
Free access

Poisson naive Bayes for text classification with feature weighting

Published: 07 July 2003 Publication History

Abstract

In this paper, we investigate the use of multivariate Poisson model and feature weighting to learn naive Bayes text classifier. Our new naive Bayes text classification model assumes that a document is generated by a multivariate Poisson model while the previous works consider a document as a vector of binary term features based on the presence or absence of each term. We also explore the use of feature weighting for the naive Bayes text classification rather than feature selection, which is a quite costly process when a small number of the new training documents are continuously provided.Experimental results on the two test collections indicate that our new model with the proposed parameter estimation and the feature weighting technique leads to substantial improvements compared to the unigram language model classifiers that are known to outperform the original pure naive Bayes text classifiers.

References

[1]
William S. Cooper, Fredric C. Gey, and Daniel P. Dabney. 1992. Probabilistic retrieval based on staged logsitic regression. Proceedings of SIGIR-92, 15th ACM International Conference on Research and Development in Information Retrieval, pages 198--210.
[2]
Pedro Domingos and Michael J. Pazzani. 1997. On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning, 29(2/3):103--130.
[3]
Susan Dumais, John Plat, David Heckerman, and Mehran Sahami. 1998. Inductive learning algorithms and representation for text categorization. Proceedings of CIKM-98, 7th ACM International Conference on Information and Knowledge Management, pages 148--155.
[4]
Thorsten Joachims. 1998. Text categorization with support vector machines: learning with many relevant features. Proceedings of ECML-98, 10th European Conference on Machine Learning, pages 137--142.
[5]
Karen Sparck Jones, Steve Walker, and Stephen E. Robertson. 2000. A probabilistic model of information retrieval: development and comparative experiments - part 1. Information Processing and Management, 36(6):779--808.
[6]
David D. Lewis. 1992. Representation and learning in information retrieval. Ph.D. thesis, Department of Computer Science, University of Massachusetts, Amherst, US.
[7]
David D. Lewis. 1998. Naive (Bayes) at forty: The independence assumption in information retrieval. Proceedings of ECML-98, 10th European Conference on Machine Learning, pages 4--15.
[8]
Andrew K. McCallum and Kamal Nigam. 1998. Employing EM in pool-based active learning for text classification. Proceedings of ICML-98, 15th International Conference on Machine Learning, pages 350--358.
[9]
Kamal Nigam, Andrew K. McCallum, Sebastian Thrun, and Tom M. Mitchell. 2000. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39(2/3):103--134.
[10]
Robert E. Schapire and Yoram Singer. 2000. BOOSTEXTER: a boosting-based system for text categorization. Machine Learning, 39(2/3):135--168.
[11]
Yiming Yang and Christopher G. Chute. 1994. An example-based mapping method for text categorization and retrieval. ACM Transactions on Information Systems, 12(3):252--277.
[12]
Yiming Yang and Xin Liu. 1999. A re-examination of text categorization methods. Proceedings of SIGIR-99, 22nd ACM International Conference on Research and Development in Information Retrieval, pages 42--49.
[13]
Yiming Yang and Jan O. Pedersen. 1997. A comparative study on feature selection in text categorization. Proceedings of ICML-97, 14th International Conference on Machine Learning, pages 412--420.

Cited By

View all
  • (2014)The Significance Of Low Frequent Terms in Text ClassificationInternational Journal of Intelligent Systems10.1002/int.2164329:5(389-406)Online publication date: 1-May-2014
  • (2013)What's the deal?Proceedings of the First Australasian Web Conference - Volume 14410.5555/2527208.2527217(69-73)Online publication date: 29-Jan-2013

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
AsianIR '03: Proceedings of the sixth international workshop on Information retrieval with Asian languages - Volume 11
July 2003
175 pages
  • Program Chair:
  • Jun Adachi

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 07 July 2003

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)63
  • Downloads (Last 6 weeks)6
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2014)The Significance Of Low Frequent Terms in Text ClassificationInternational Journal of Intelligent Systems10.1002/int.2164329:5(389-406)Online publication date: 1-May-2014
  • (2013)What's the deal?Proceedings of the First Australasian Web Conference - Volume 14410.5555/2527208.2527217(69-73)Online publication date: 29-Jan-2013

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media