Abstract
Chinese word similarity computing is a fundamental task for natural language processing. This paper presents a method to calculate the similarity between Chinese words based on combination strategy. We apply Baidubaike to train Word2Vector model, and then integrate different methods, semantic Dictionary-based method, Word2Vector-based method and Chinese FrameNet (CFN)-based method, to calculate the semantic similarity between Chinese words. The semantic Dictionary-based method includes dictionaries such as HowNet, DaCilin, Tongyici Cilin (Extended) and Antonym. The experiments are performed on 500 pairs of words and the Spearman correlation coefficient of test data is 0.524, which shows that the proposed method is feasible and effective.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Liu, K.: Research on Chinese FrameNet construction and application technologies. J. Chin. Inf. Process. 6(006), 47 (2011)
Dong, Z., Dong, Q.: Introduction to hownet. HowNet (2000). http://www.keenage.com
Dong, Z., Dong, Q., Hao, C.: Hownet and its computation of meaning. In: Proceedings of 23rd International Conference on Computational Linguistics: Demonstrations, pp. 53–56. Association for Computational Linguistics, August 2010
Liu, Q., Li, S.: Word similarity computing based on How-net. Comput. Linguist. Chin. Lang. Process. 7(2), 59–76 (2002)
Xue, B., Fu, C., Shaobin, Z.: A study on sentiment computing and classification of Sina Weibo with Word2vec. In: 2014 IEEE International Congress on Big Data, pp. 358–363. IEEE, June 2014
Fillmore, C.J.: Frame semantics and the nature of language. Ann. N.Y. Acad. Sci. 280(1), 20–32 (1976)
Fillmore, C.: Frame semantics. In: Linguistics in the Morning Calm, pp. 111–137 (1982)
Fillmore, C.J., Wooters, C., Baker, C.F.: Building a large lexical databank which provides deep semantics. publisher not identified (2001)
Hao, X., Wei, L., Ru, L., Kaiying, L.: Description systems of the Chinese FrameNet database and software tools. J. Chin. Inf. Process. 21(5), 96–100 (2007)
Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley FrameNet project. In: Proceedings of 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 1, pp. 86–90. Association for Computational Linguistics, August 1998
Wu, Y., Li, W.: NLPCC-ICCPOL 2016 Shared Task 3: Chinese word similarity measurement. In: Proceedings of NLPCC 2016 (2016)
Petruck, M.R.L.: Frame semantics. Handbook of Pragmatics, pp. 1–13 (1996)
Acknowledgements
This work is supported by the National 863 Project of China (2015AA015407), National Natural Science Foundation of China (61373082, 61673248, 61502287), Shanxi Platform Project (2014091004-0103), Scholarship Council (2013-015), Open Project Foundation of Information Security Evaluation Center of Civil Aviation, Civil Aviation University of China (CAAC-ISECCA-201402) and Shanxi Higher School Science and Technology Innovation Project (2015104, 201505).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Guo, S., Guan, Y., Li, R., Zhang, Q. (2016). Chinese Word Similarity Computing Based on Combination Strategy. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_67
Download citation
DOI: https://doi.org/10.1007/978-3-319-50496-4_67
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50495-7
Online ISBN: 978-3-319-50496-4
eBook Packages: Computer ScienceComputer Science (R0)