Abstract
Many text categorization tasks involve imbalanced training examples. We tackle this problem by using improved local Latent Semantic Analysis. LSA has been shown to be extremely useful but it is not an optimal representation for text categorization because this unsupervised method ignores class discrimination while only concentrating on representation. Some local LSI methods have been proposed to improve the classification by utilizing class discrimination information. In this paper, we choose support vector machine (SVM) to generate imbalanced dataset as the local regions for local LSA. Experimental results show that our method is better than global LSA and traditional local LSA methods on classification within a much smaller LSA dimension.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Nickerson, A., Japkowicz, N., Milios: Using unsupervised learning to guide re-sampling in imbalanced data sets. In: Proceedings of the Eighth International Workshop on AI and Statistics, pp. 261–265
Liu, A.Y.C.: The effect of oversampling and undersampling onclassifying imbalanced text datasets. Masters thesis. University of Texas at Austin (2004)
Tan, s.: An effective refinement strategy for KNN text classifier. Expert Systems with Applications 30, 290–298 (2006)
Zheng, Z., Wu, X., Srihari, R.: Feature selection for text categorization on imbalanced data. ACM SIGKDD Explorations Newsletter 6(1), 80–89 (2004)
Sun, A., Lim, E.-P.: On strategies for imbalanced text classification using SVM: A comparative study. Decision Support Systems 48, 191–201 (2009)
Liu, Y., Loh, H.T.: Imbalanced text classification: A term weighting approach. Expert Systems with Applications 36, 690–701 (2009)
Yany, Y.: Noise reduction in a statistical approach to text categorization. In: Proc. of the 18th ACM International Conference on Rexorch ond Development in Information Retrieval, New York, pp. 256–263 (1995)
Liu, T., Chen, Z.: Improving Text Classification using Local Latent Semantic Indexing. In: Fourth IEEE International Conference on Data Mining, ICDM 2004, pp. 162–169 (2004)
Vapnik, V., Golowich, S., Smola, A.: Support vector method for function approximation, regression estimation, and signal Processing. In: Neural Information Processing Systems, pp. 281–287 (September 1997)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W.: Indexing by latent semantic analysis. Journal of the American Society of Information Science 41, 391–407 (1990)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wan, Y., Tong, H., Deng, Y. (2011). Local Latent Semantic Analysis Based on Support Vector Machine for Imbalanced Text Categorization. In: Zhang, J. (eds) Applied Informatics and Communication. ICAIC 2011. Communications in Computer and Information Science, vol 226. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23235-0_42
Download citation
DOI: https://doi.org/10.1007/978-3-642-23235-0_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23234-3
Online ISBN: 978-3-642-23235-0
eBook Packages: Computer ScienceComputer Science (R0)