Abstract
Naive Bayes classifier is widely used in text classification tasks, and it can perform surprisingly well, it is often regarded as a baseline. But previous researches show that the skewed distribution of training collection may cause poor results in text classification. This paper presents a new method to deal with this situation. We introduce a conditional probability which takes into account both the information of the whole corpus and each category. Our proposed method performs well in the standard benchmark collections, competing with the state-of-the-art text classifiers especially for the skewed data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Allison, B.: An improved hierarchical Bayesian model of language for document classification. In: Proceedings of the 22nd International Conference on Computational Linguistics-, vol. 1, pp. 25–32. Association for Computational Linguistics (2008)
Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29(2), 103–130 (1997)
Katz, S.: Distribution of content words and phrases in text and language modelling. Natural Language Engineering 2(1), 15–59 (1996)
Kim, S., Han, K., Rim, H., Myaeng, S.: Some effective techniques for naive bayes text classification. IEEE Transactions on Knowledge and Data Engineering, 1457–1466 (2006)
McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI-1998 Workshop on Learning for Text Categorization, Citeseer, vol. 752, pp. 41–48 (1998)
Meena, M., Chandran, K.: Naive Bayes text classification with positive features selected by statistical method. In: First International Conference on Advanced Computing, ICAC 2009, pp. 28–33. IEEE, Los Alamitos (2009)
Nigam, K., Lafferty, J., McCallum, A.: Using maximum entropy for text classification. In: IJCAI 1999 Workshop on Machine Learning for Information Filtering, Citeseer, vol. 1, pp. 61–67 (1999)
Qiang, G.: An Effective Algorithm for Improving the Performance of Naive Bayes for Text Classification. In: 2010 Second International Conference on Computer Research and Development, pp. 699–701. IEEE, Los Alamitos (2010)
Rennie, J., Shih, L., Teevan, J., Karger, D.: Tackling the poor assumptions of naive bayes text classifiers. In: Machine Learning-International Workshop Then Conference-, vol. 20, p. 616 (2003)
Rocha, L., Mourão, F., Pereira, A., Gonçalves, M., Meira Jr., W.: Exploiting temporal contexts in text classification. In: Proceeding of the 17th ACM Conference on Information and Knowledge Management, pp. 243–252. ACM, New York (2008)
Schneider, K.: Techniques for improving the performance of naive Bayes for text classification. In: Computational Linguistics and Intelligent Text Processing, pp. 682–693 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jiang, Y., Lin, H., Wang, X., Lu, D. (2011). A Technique for Improving the Performance of Naive Bayes Text Classification. In: Gong, Z., Luo, X., Chen, J., Lei, J., Wang, F.L. (eds) Web Information Systems and Mining. WISM 2011. Lecture Notes in Computer Science, vol 6988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23982-3_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-23982-3_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23981-6
Online ISBN: 978-3-642-23982-3
eBook Packages: Computer ScienceComputer Science (R0)