Abstract
This paper describes a study of unsupervised identification of Chinese VO idioms by examining the Verb-Object (VO) pairs derived from the dependency structure of sentences. We test several statistical measures, including Point-wise Mutual Information (PMI), P(o|v), P(v|o), Salience, and Selectional Association. The experiments show that PMI performs the best in automatically identifying real VO idioms, which is consistent with previous studies on other languages. On the other hand, PMI tends to rank low-frequency items (very often noise) high. It obtained a 36% F1 score in the successful identification of real VO idioms among the top 100 of the ranked VO pairs. We thus suggest that syntactic features are not enough to identify VO idioms in an unsupervised framework, and more sophisticated methods with consideration of more semantic information are required.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
This is an example of parsing error. The correct segmentation should be mofang zhu ‘mill owner’. The parser wrong recognizes this as a verb mo ‘grind’ plus an object fangzhu ‘mill owner’.
References
Constant, M., et al.: Multiword expression processing: a survey. Comput. Linguist. 43(4), 837–892 (2017)
Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45715-1_1
Savary, A., et al.: PARSEME-PARSing and multiword expressions within a European multilingual network. In: 7th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC 2015) (2015)
Savary, A., et al.: The PARSEME shared task on automatic identification of verbal multiword expressions. In: The 13th Workshop on Multiword Expression at EACL, pp. 31–47 (2017)
Ramisch, C., et al.: Edition 1.2 of the PARSEME shared task on semi-supervised identification of verbal multiword expressions. In: Joint Workshop on Multiword Expressions and Electronic Lexicons (MWE-LEX 2020) (2020)
Baldwin, T., Villavicencio, A.: Extracting the unextractable: a case study on verb-particles. In: COLING-2002: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002) (2002)
Chen, S., Yang, L., Zhou, J.: A study of nominal verbs in modern Chinese based on Shannon-Wiener index—case studies on “Bianhua” words. In: Su, Q., Xu, G., Yang, X. (eds.) Chinese Lexical Semantics, pp. 52–64. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-28953-8_5
Zhou, S., Wang, C., Xun, E.: Recognition of disyllabic intransitive verbs and study on disyllabic intransitive verbs taking objects based on structure retrieval. In: Su, Q., Xu, G., Yang, X. (eds.) Chinese Lexical Semantics, pp. 265–282. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-28953-8_21
Fazly, A., Cook, P., Stevenson, S.: Unsupervised type and token identification of idiomatic expressions. Comput. Linguist. 35(1), 61–103 (2009). https://doi.org/10.1162/coli.08-010-R1-07-048, https://aclanthology.org/J09-1005
Van de Cruys, T., Moirón, B.V.: Semantics-based multiword expression extraction. In: Proceedings of the Workshop on A Broader Perspective on Multiword Expressions, pp. 25–32 (2007)
Baldwin, T., Bannard, C., Tanaka, T., Widdows, D.: An empirical model of multiword expression decomposability. In: Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment, pp. 89–96 (2003)
Fazly, A., Stevenson, S.: Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures. In: Proceedings of the Workshop on A Broader Perspective on Multiword Expressions, pp. 9–16 (2007)
Kilgarriff, A., Tugwell, D.: Sketching Words. Lexicography and Natural Language Processing: A Festschrift in Honour of BTS Atkins, pp. 125–137 (2002)
Resnik, P.: Semantic classes and syntactic ambiguity. In: Human Language Technology: Proceedings of a Workshop Held at Plainsboro, New Jersey, 21–24 March 1993
Lison, P., Tiedemann, J.: OpenSubtitles 2016: extracting large parallel corpora from movie and TV subtitles (2016)
Che, W., Feng, Y., Qin, L., Liu, T.: N-LTP: an open-source neural language technology platform for Chinese. arXiv preprint arXiv:2009.11616 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wen, X., Li, Y., Zhao, Y., Xu, H. (2024). A Study of Identification of Chinese VO Idioms with Statistical Measures. In: Dong, M., Hong, JF., Lin, J., Jin, P. (eds) Chinese Lexical Semantics. CLSW 2023. Lecture Notes in Computer Science(), vol 14515. Springer, Singapore. https://doi.org/10.1007/978-981-97-0586-3_4
Download citation
DOI: https://doi.org/10.1007/978-981-97-0586-3_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0585-6
Online ISBN: 978-981-97-0586-3
eBook Packages: Computer ScienceComputer Science (R0)