Abstract
This paper proposes a Discriminative Semantic Feature (DSF) method for vector space model based text categorization. The DSF method, which involves two stages, first reduces the dimension of the document vector space by Latent Semantic Indexing (LSI), and then applies a Robust linear Discriminant analysis Model (RDM), which improves the classical LDA by a energy-adaptive regularization criteria, to extract the discriminative semantic feature with enhanced discrimination power. As a result, DSF method can not only uncover latent semantic structure but also capture the discriminative feature. Comparative experiments on various state-of-art dimension reduction schemes such as our DSF, LSI, orthogonal centroid, two-stage LSI+LDA, LDA/QR and LDA/GSVD, are also performed. Experiments using the Reuters-21578 text collection show the proposed method performs better than other algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)
Duda, R.O., Hart, P.E., Stork, D.: Pattern Classification. Wiley, Chichester (2000)
Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Academic Press, London (1990)
Howland, P., Park, H.: Generalizing Discriminant Analysis Using the Generalized Singular Value Decomposition. IEEE Trans. Pattern Anal. Machine Intell. 26, 995–1006 (2004)
Joachims, T.: Learning to Classify Text Using Support Vector Machines. Kluwer, Dordrecht (2002)
Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)
Lewis, D.D.: Reuters-21578 text categorization test collection http://www.daviddlewis.com/resources/testcollections/reuters21578/
Porter, M.F.: An Algorithm for Suffix Stripping. Program 14, 130–137 (1980)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18, 613–620 (1975)
Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Text Retrieval. Information Processing and Management 24, 513–523 (1988)
Torkkola, K.: Linear discriminant analysis in document classification. In: IEEE International Conference on Data Mining (ICDM) Workshop on Text Mining (2001)
Thomaz, C.E., Gillies, D.F., Feitosa, R.Q.: A New Covariance Estimate for Bayesian Classifier in Biometric Recognition. IEEE Trans. CSVT 14, 214–223 (2004)
Ye, J., Li, Q.: A Two-Stage Linear Discriminant Analysis via QR-Decomposition. IEEE Trans. Pattern Anal. Machine Intell. 27, 929–941 (2005)
Zhao, Y., Karypis, G.: Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering. Machine Learning 55, 311–331 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hu, J., Deng, W., Guo, J. (2006). Robust Discriminant Analysis of Latent Semantic Feature for Text Categorization. In: Wang, L., Jiao, L., Shi, G., Li, X., Liu, J. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2006. Lecture Notes in Computer Science(), vol 4223. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11881599_46
Download citation
DOI: https://doi.org/10.1007/11881599_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45916-3
Online ISBN: 978-3-540-45917-0
eBook Packages: Computer ScienceComputer Science (R0)