Nothing Special   »   [go: up one dir, main page]

Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4285))

Included in the following conference series:

  • 1070 Accesses

  • 6 Citations

Abstract

Email categorization becomes very popular today in personal information management. However, most n-way classification methods suffer from feature unevenness problem, namely, features learned from training samples distribute unevenly in various folders. We argue that the binarization approaches can handle this problem effectively. In this paper, three binarization techniques are implemented, i.e. one-against-rest, one-against-one and some-against-rest, using two assembling techniques, i.e. round robin and elimination. Experiments on email categorization prove that significant improvement has been achieved in these binarization approaches over an n-way baseline classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bekkerman, R., McCallum, A., Huang, G.: Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora. UMass CIIR Technical Report IR-418 (2004)

    Google Scholar 

  2. Berger, A.: Error-correcting output coding for text classification. In: IJCAI 1999 Workshop on machine learning for information filtering (1999)

    Google Scholar 

  3. Cohen, W.: Learning Rules that Classify E-Mail. In: Proc. AAAI Spring Symposium on Machine Learning in Information Access, Stanford, California (1996)

    Google Scholar 

  4. Fisher, D., Moody, P.: Studies of Automated Collection of Email Records. University of California, Irvine, Technical Report UCI-ISR-02-4 (2001)

    Google Scholar 

  5. Furnkranz, J.: Round robin classification. Journal of Machine Learning Research 2, 721–747 (2002)

    Article  MathSciNet  Google Scholar 

  6. Hastie, T., Tibshirani, R.: Classification by pairwise coupling. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Advances in Neural Information Processing Systems 10 (NIPS 1997), pp. 507–513. MIT Press, Cambridge (1998)

    Google Scholar 

  7. Joachims, T.: Learning to Classify Text Using Support Vector Machines, Methods, Theory, and Algorithms. Kluwer, Dordrecht (2002)

    Google Scholar 

  8. Yang, Y., Klimt, B.: The Enron Corpus: A New Dataset for Email Classification Research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 217–226. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  9. Manco, G., Masciari, E., Rurolo, M., Tagarelli, A.: Towards an adaptive mail classifier. In: Proc. AIIA 2002 (2002)

    Google Scholar 

  10. Schwenker, F.: Hierarchical support vector machines for multi-class pattern recognition. In: Proc. IEEE KES 2000, vol. 2, pp. 561–565 (2000)

    Google Scholar 

  11. Xia, Y., Dalli, A., Wilks, Y., Guthrie, L.: FASiL Adaptive Email Categorization System. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 723–734. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  12. Yang, Y.: An evaluation of statistical approaches to text categorization. Journal IR 1(1/2), 67–88 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Xia, Y., Wong, KF. (2006). Binarization Approaches to Email Categorization. In: Matsumoto, Y., Sproat, R.W., Wong, KF., Zhang, M. (eds) Computer Processing of Oriental Languages. Beyond the Orient: The Research Challenges Ahead. ICCPOL 2006. Lecture Notes in Computer Science(), vol 4285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11940098_50

Download citation

  • DOI: https://doi.org/10.1007/11940098_50

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49667-0

  • Online ISBN: 978-3-540-49668-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics