Overview of the NLPCC-ICCPOL 2016 Shared Task: Chinese Word Similarity Measurement

Yunfang Wu¹⁸ &
Wei Li¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10102))

Included in the following conference series:

4941 Accesses
11 Citations

Abstract

Word similarity computation is a fundamental task for natural language processing. We organize a semantic campaign of Chinese word similarity measurement at NLPCC-ICCPOL 2016. This task provides a benchmark dataset of Chinese word similarity (PKU-500 dataset), including 500 word pairs with their similarity scores. There are 21 teams submitting 24 systems in this campaign. In this paper, we describe clearly the data preparation and word similarity annotation, make an in-depth analysis on the evaluation results and give a brief introduction to participating systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Czech Dataset for Semantic Textual Similarity

Overview of the NLPCC 2017 Shared Task: Chinese Word Semantic Relation Classification

Word Similarity Computation with Extreme-Similar Method

References

Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Commun. ACM 8(10), 627–633 (1965)
Article Google Scholar
Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Lang. Cogn. Neurosci. 6(1), 1–28 (1991)
Google Scholar
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., et al.: Placing search in context: the concept revisited. TOIS 20, 116–131 (2002)
Article Google Scholar
Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of the Association for Computational Linguistics (2012)
Google Scholar
Liu, Q., Li, S.: Word similarity computing based on HowNet. Int. J. Comput. Linguist. Chin. Lang. Process. 7, 59–76 (2002)
Google Scholar
Jin, P., Wu, Y.: SemEval-2012 task 4: evaluating Chinese word similarity. In: First Joint Conference on Lexical and Computational Semantics (2012)
Google Scholar
Guo, J., Che, W., Wang, H., Liu, T.: Learning sense-specific word embeddings by exploiting bilingual resources. In: Proceedings of COLING 2014 (2014)
Google Scholar
Schnabel, T., Labutov, I., Mimno, D., Joachims, T.: Evaluation methods for unsupervised word embeddings. In: Proceedings of Empirical Methods in Natural Language Processing (2015)
Google Scholar
Trask, A., Michalak, P., Liu, J.: Sense2vec - a fast and accurate method for word sense disambiguation in neural word embeddings (2015). arXiv preprint: arXiv:1511.06388
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: International Joint Conference on Artificial Intelligence (1995)
Google Scholar
Meng, L., Huang, R., Gu, J.: A review of semantic similarity measures in WordNet. Int. J. Hybrid Inf. Technol. 6, 1–12 (2013)
Google Scholar
Tian, J.L., Zhao, W.: Words similarity algorithm based on Tongyici Cilin in semantic web adaptive learning system. J. Jilin Univ. 28(06), 602–608 (2010)
Google Scholar
Lin, D.: Automatic retrieval and clustering of similar words. In: Proceedings of Coling-ACL 2002 (2002)
Google Scholar
Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., Soroa, A.: A study on similarity and relatedness using distributional and WordNet-based approaches. In: Proceedings of Human Language Technology (2009)
Google Scholar
Shi, J., Wu, Y., Qiu, L., Lv, X.: Chinese lexical semantic similarity computing based on large-scale corpus. J. Chin. Inf. Process. 27(1), 1–6 (2013)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013)
Google Scholar
Guo, S., Guan, Y., Li, R., Zhang, Q.: Chinese word similarity computing based on combination strategy. In: Proceedings of NLPCC 2016 (2016)
Google Scholar
Liu, K.: Research on Chinese FrameNet construction and application technologies. J. Chin. Inf. Process. 6, 47 (2011)
Google Scholar
Pei, J., Zhang C., Huang, D., Ma, J.: Combining word embedding and semantic lexicon for Chinese word similarity computation. In: Proceedings of NLPCC 2016 (2016)
Google Scholar

Download references

Acknowledgement

This work is supported by National High Technology Research and Development Program of China (2015AA015403), National Natural Science Foundation of China (61371129, 61572245), Key Program of Social Science foundation of China (12&ZD227).

Author information

Authors and Affiliations

Key Laboratory of Computational Linguistics, Peking University, Beijing, 100871, China
Yunfang Wu & Wei Li

Authors

Yunfang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yunfang Wu .

Editor information

Editors and Affiliations

Microsoft Research Asia, Beijing, China
Chin-Yew Lin
Brandeis University, Waltham, Massachusetts, USA
Nianwen Xue
Peking University, Beijing, China
Dongyan Zhao
Fudan University, Shanghai, China
Xuanjing Huang
Peking University, Beijing, China
Yansong Feng

Appendix A: 91 Word Pairs with Standard Deviation Greater Than 2

[没戏没辙] [只管尽管] [GDP 生产力] [包袱段子] [日期时间] [由此通过]

[爱面子好高骛远] [一方面一边] [托福 GRE] [严厉严谨] [抄袭克隆]

[悲喜大悲大喜] [亏幸亏] [老气土气] [蹩脚差强人意] [容易顺利]

[狭隘狭窄] [害臊腼腆] [理解理会] [的哥司机] [娇艳幽美] [幻境红楼梦]

[自然环境] [权限权利] [几乎差点儿] [酣睡打鼾] [振兴建设] [节日假日]

[依稀清晰] [伟大壮烈] [典型代表] [出神发楞] [冷僻晦涩] [面首]

[发票账单] [物品物质] [回收站垃圾篓] [必须必需] [路子后门]

[牛脾气我行我素] [免费便宜] [江湖红尘] [塞车拥挤] [要面子虚荣心]

[琢磨镂刻] [大小多少] [候选人备胎] [旅客驴友] [多角度多元化]

[信物物件] [豆蔻年华黄金时代] [血液红细胞] [酷爽] [质量重量]

[牺牲粉身碎骨] [隆重重要] [天赋技能] [身姿身手] [事变后院起火]

[鸣谢酬答] [硅谷中关村] [平凡平庸] [了不得好] [许可证执照]

[线路行程] [与以及] [和谐平安] [怯懦胆小鬼] [是非方圆] [大高]

[手续过程] [高峰山巅] [崛起凸起] [辛勤夜以继日] [环境生态]

[渣废品] [杂事闲事] [商标符号] [右翼左派] [实践进行] [借口理由]

[收费缴纳] [享受大快朵颐] [吸引力地磁力] [工作日开放日]

[合理合理性] [违纪贪污] [言语语言] [买卖营销] [光盘硬盘]

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, Y., Li, W. (2016). Overview of the NLPCC-ICCPOL 2016 Shared Task: Chinese Word Similarity Measurement. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_75

Download citation

DOI: https://doi.org/10.1007/978-3-319-50496-4_75
Published: 02 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50495-7
Online ISBN: 978-3-319-50496-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Overview of the NLPCC-ICCPOL 2016 Shared Task: Chinese Word Similarity Measurement

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Czech Dataset for Semantic Textual Similarity

Overview of the NLPCC 2017 Shared Task: Chinese Word Semantic Relation Classification

Word Similarity Computation with Extreme-Similar Method

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix A: 91 Word Pairs with Standard Deviation Greater Than 2

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Overview of the NLPCC-ICCPOL 2016 Shared Task: Chinese Word Similarity Measurement

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Czech Dataset for Semantic Textual Similarity

Overview of the NLPCC 2017 Shared Task: Chinese Word Semantic Relation Classification

Word Similarity Computation with Extreme-Similar Method

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix A: 91 Word Pairs with Standard Deviation Greater Than 2

Appendix A: 91 Word Pairs with Standard Deviation Greater Than 2

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation