Robust Discriminant Analysis of Latent Semantic Feature for Text Categorization

Jiani Hu²³,
Weihong Deng²³ &
Jun Guo²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4223))

Included in the following conference series:

International Conference on Fuzzy Systems and Knowledge Discovery

1259 Accesses
3 Citations

Abstract

This paper proposes a Discriminative Semantic Feature (DSF) method for vector space model based text categorization. The DSF method, which involves two stages, first reduces the dimension of the document vector space by Latent Semantic Indexing (LSI), and then applies a Robust linear Discriminant analysis Model (RDM), which improves the classical LDA by a energy-adaptive regularization criteria, to extract the discriminative semantic feature with enhanced discrimination power. As a result, DSF method can not only uncover latent semantic structure but also capture the discriminative feature. Comparative experiments on various state-of-art dimension reduction schemes such as our DSF, LSI, orthogonal centroid, two-stage LSI+LDA, LDA/QR and LDA/GSVD, are also performed. Experiments using the Reuters-21578 text collection show the proposed method performs better than other algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A two-stage feature selection method for text categorization by using category correlation degree and latent semantic indexing

Article 29 January 2015

Comparison of Support Vector Machines With and Without Latent Semantic Analysis for Document Classification

A Similarity Function for Feature Pattern Clustering and High Dimensional Text Document Classification

Article 09 March 2019

References

Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)
Article Google Scholar
Duda, R.O., Hart, P.E., Stork, D.: Pattern Classification. Wiley, Chichester (2000)
Google Scholar
Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press, London (1990)
MATH Google Scholar
Howland, P., Park, H.: Generalizing Discriminant Analysis Using the Generalized Singular Value Decomposition. IEEE Trans. Pattern Anal. Machine Intell. 26, 995–1006 (2004)
Article Google Scholar
Joachims, T.: Learning to Classify Text Using Support Vector Machines. Kluwer, Dordrecht (2002)
Google Scholar
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)
Article Google Scholar
Lewis, D.D.: Reuters-21578 text categorization test collection http://www.daviddlewis.com/resources/testcollections/reuters21578/
Porter, M.F.: An Algorithm for Suffix Stripping. Program 14, 130–137 (1980)
Google Scholar
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18, 613–620 (1975)
Article MATH Google Scholar
Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management 24, 513–523 (1988)
Article Google Scholar
Torkkola, K.: Linear discriminant analysis in document classification. In: IEEE International Conference on Data Mining (ICDM) Workshop on Text Mining (2001)
Google Scholar
Thomaz, C.E., Gillies, D.F., Feitosa, R.Q.: A New Covariance Estimate for Bayesian Classifier in Biometric Recognition. IEEE Trans. CSVT 14, 214–223 (2004)
Google Scholar
Ye, J., Li, Q.: A Two-Stage Linear Discriminant Analysis via QR-Decomposition. IEEE Trans. Pattern Anal. Machine Intell. 27, 929–941 (2005)
Article Google Scholar
Zhao, Y., Karypis, G.: Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering. Machine Learning 55, 311–331 (2004)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Beijing University of Posts and Telecommunications, 100876, Beijing, China
Jiani Hu, Weihong Deng & Jun Guo

Authors

Jiani Hu
View author publications
You can also search for this author in PubMed Google Scholar
Weihong Deng
View author publications
You can also search for this author in PubMed Google Scholar
Jun Guo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical and Electronic Engineering, Nanyang Technological University,, Block S1, Nanyang Avenue, 639798, Singapore
Lipo Wang
Life Science Research Center, School of Electronic Engineering, Xidian University,, 710071, Xi’an, Shaanxi, China
Licheng Jiao
School of Electrical and Electronic Engineering, Xidian University, 710071, Xi’an, China
Guanming Shi
School of Information Technology and Electrical Engineering, The University of Queensland, 4072, Brisbane, Queensland, Australia
Xue Li
College of Mathematics and Information Science, Hebei Normal University, 050016, Shijiazhuang, Hebei, P.R. China
Jing Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, J., Deng, W., Guo, J. (2006). Robust Discriminant Analysis of Latent Semantic Feature for Text Categorization. In: Wang, L., Jiao, L., Shi, G., Li, X., Liu, J. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2006. Lecture Notes in Computer Science(), vol 4223. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11881599_46

Download citation

DOI: https://doi.org/10.1007/11881599_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45916-3
Online ISBN: 978-3-540-45917-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Robust Discriminant Analysis of Latent Semantic Feature for Text Categorization

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A two-stage feature selection method for text categorization by using category correlation degree and latent semantic indexing

Comparison of Support Vector Machines With and Without Latent Semantic Analysis for Document Classification

A Similarity Function for Feature Pattern Clustering and High Dimensional Text Document Classification

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Robust Discriminant Analysis of Latent Semantic Feature for Text Categorization

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A two-stage feature selection method for text categorization by using category correlation degree and latent semantic indexing

Comparison of Support Vector Machines With and Without Latent Semantic Analysis for Document Classification

A Similarity Function for Feature Pattern Clustering and High Dimensional Text Document Classification

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation