Abstract
Based on the study of TF-IDF, information gain and information entropy, the paper proposes an improved method of weight calculation, which combines the TF-IDF Normalization with information gain, to extract key words. Moreover, to abstract indexing words with counting semantic similarity of the key words in order to finish a process of automatic indexing. Through the comparative experiment shows that the comprehensive assessment value of indexing words which are obtained by the modified method of weight calculation are higher than obtained by the traditional TF-IDF method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cleverton, C.: Optimizing Convenient Online Access to Bibliographic Database. Information Services and Use (1984)
Chengzhi, Z.: Review and Prospect of Automatic Indexing Research. J. New Technology of Library and Information Service. 11, 33–39 (2007) (in Chinese)
Yunzhi, Z.: Improvement of Automatic Indexing by Statistical Analysis. J. Journal of the China Society for Scientific and Technical Information 19, 333–337 (2000) (in Chinese)
Liu-ling, D., He-yan, H.: A Comparative Study on Feature Selection in Chinese Text Categorization. J. Journal of Chinese Information Processing 18, 26–32 (2004) (in Chinese)
Salton, G., Buckley, B.: Term-weighting Approaches in Automatic Text Retrieval. J. Information Processing and Management 24, 513–523 (1998)
Shiyi, S., Zhonghua, W.: Information Theory Fundamentals and Applications. Beijing Higher Education Press, Beijing (2004)
Harold, B.: Abstracting Concepts and Methods. Academic Press, New York (1975)
Qun, L., Sujian, L.: Word Similarity Computing Based on How-net. J. International Journal of Computational Linguistics & Chinese Language Processing 7, 59–76 (2002) (in Chinese)
Jichao, C.: Technology and Application of Support Vector Machine. J. Science & Technology Information 25, 490–491 (2007) (in Chinese)
Rui, Z.: Research and Implementation on Chinese document clustering Based on k-means. Northwest University, Xian (2009) (in Chinese)
Deyi, T.: Study for Categorization Based on Feature Weighting. Hefei University of Technology, Hefei (2007) (in Chinese)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, L., Shi, Sc., Lv, Xq., Li, Yq. (2010). Research and Application to Automatic Indexing. In: Zhang, L., Lu, BL., Kwok, J. (eds) Advances in Neural Networks - ISNN 2010. ISNN 2010. Lecture Notes in Computer Science, vol 6064. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13318-3_41
Download citation
DOI: https://doi.org/10.1007/978-3-642-13318-3_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13317-6
Online ISBN: 978-3-642-13318-3
eBook Packages: Computer ScienceComputer Science (R0)