Nothing Special   »   [go: up one dir, main page]

Skip to main content

Kernel-Based Text Classification on Statistical Manifold

  • Conference paper
Advances in Neural Networks - ISNN 2008 (ISNN 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5263))

Included in the following conference series:


In the text literature, a variety of useful kernel methods have been developed by many researchers. However, embedding text data into Euclidean space is the key characteristic of common kernels-based text categorization. In this paper, we focus on representation text vectors as points on Riemann manifold and use kernels to integrate discriminative and generative model. And then, we present diffuse kernel based on Dirichlet Compound Multinomial manifold (DCM manifold) which is a space about Dirichlet Compound Multinomial model combining inverse document frequency and information gain. More specifically, as demonstrated by our experimental results on various real-world text datasets, we show that the kernel based on this DCM manifold is more desirable than Euclidean space for text categorization. And our kernel method provides much better computational accuracy than some current state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others


  1. Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  2. Jaakkola, T.S., Haussler, D.: Exploiting Generative Models in Discriminative Classifier. In: Proceedings of the 1998 Conference on Advances in Neural Information Processing Systems, Denver, Colorado, USA, vol. 11, pp. 487–493. MIT Press, Cambridge (1999)

    Google Scholar 

  3. Jebara, T., Kondor, R., Howard, A.: Probability Product Kernels. The Journal of Machine Learning Research 5, 819–844 (2004)

    MathSciNet  Google Scholar 

  4. Kondor, R., Lafferty, J.: Diffusion Kernels on Graphs and Other Discrete Input Spaces. In: Proceedings of the Nineteenth International Conference on Machine Learning, San Mateo, CA, USA, pp. 315–322. Morgan Kaufmann, San Francisco (2002)

    Google Scholar 

  5. Lafferty, J., Lebanon, G.: Diffusion Kernels on Statistical Manifolds. Journal of Machine Learning Research(JMLR) 6, 129–163 (2005)

    MathSciNet  Google Scholar 

  6. Zhang, D., Chen, X., Lee, W.S.: Text Classification with Kernels on the Multinomial Manifold. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005), Salvador, Brazil, pp. 266–273. ACM Press, New York (2005)

    Chapter  Google Scholar 

  7. Madsen, R.E., Kauchak, D., Elkan, C.: Modeling Word Burstiness Using the Dirichlet Distribution. In: Proceedings of the 22nd International Conference on Machine Learning, New York, NY, USA, pp. 545–552. Morgan Kaufmann, San Francisco (2005)

    Chapter  Google Scholar 

  8. Lebanon, G.: Metric Learning for Text Documents. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 497–508 (2006)

    Article  Google Scholar 

  9. Minka, T.: Estimating a Dirichlet Distribution (unpublished Paper, 2003)

  10. Chang, C.-C., Lin, C.-J.: LIBSVM: A Library for Support Vector Machines (2001),

Download references

Author information

Authors and Affiliations


Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhou, S., Feng, S., Liu, Y. (2008). Kernel-Based Text Classification on Statistical Manifold. In: Sun, F., Zhang, J., Tan, Y., Cao, J., Yu, W. (eds) Advances in Neural Networks - ISNN 2008. ISNN 2008. Lecture Notes in Computer Science, vol 5263. Springer, Berlin, Heidelberg.

Download citation

  • DOI:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87731-8

  • Online ISBN: 978-3-540-87732-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics