Abstract
In this paper, the grouping method of the similar words, is proposed for the classification of documents. It is shown that the grouping of words has equivalent ability to the LSA in the classification accuracy. Further, a new combining method is proposed for the documents classification, which consists of Grouping, Latent Semantic Analysis(LSA) followed by the k-Nearest Neighbor classification ( k-NN ). The combining method proposed here, shows the higher accuracy in the classification than the conventional methods of the kNN, and the LSA followed by the kNN. Thus, the grouping method is effective as a preprocessing before the conventional method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Grossma, D.A., Frieder, O.: Information Retrieval - Algorithms and Heuristics-, p. 332. Springer, Heidelberg (2004)
Sebastiani, F.: A tutorial on automated text categorization. In: Proc. of ASAI-1999, 1st Argentinian Symposium on Artificial Intelligence, Buenos Aires, pp. 7–35 (1999)
Derrwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)
Landauer, P.W., Folz, T.K., Laham, D.: Introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Bao, Y., Ishii, N.: Combining multiple k-nearest neighbor classifiers for text classification by reducts. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS, vol. 2534, pp. 340–347. Springer, Heidelberg (2002)
Sirmakessis, S.: Text Mining and its Application, p. 204. Springer, Heidelberg (2003)
Baldi, P., Frasconi, P., Smyth, P.: Modeling the Internet and the Web, p. 285. Wiley, Chichester (2003)
http://kdd.ics.uci.edu//databases/reuters21578/reuters21578.html
Bao, Y., Tsuchiya, E., Ishii, N., Du, X.: Classification by Instance-Based Learning Algorithm. In: Gallagher, M., Hogan, J.P., Maire, F. (eds.) IDEAL 2005. LNCS, vol. 3578, pp. 133–140. Springer, Heidelberg (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ishii, N., Murai, T., Yamada, T., Bao, Y. (2006). Classification by Weighting, Similarity and kNN. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2006. IDEAL 2006. Lecture Notes in Computer Science, vol 4224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875581_7
Download citation
DOI: https://doi.org/10.1007/11875581_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45485-4
Online ISBN: 978-3-540-45487-8
eBook Packages: Computer ScienceComputer Science (R0)