Abstract
Terminology extraction is an important work for automatic update of domain specific knowledge. Contextual information helps to decide whether the extracted new terms are terminology or not. As extraction based on fixed patterns has very limited use to handle natural language text, we need both syntactical and semantic information in the context of a term to determine its termhood. In this paper, we investigate two window-based context word extraction methods taking into account of syntactic and semantic information. Based on the performance of each method individually, a hybrid method which combines both syntactical and semantic information is proposed. Experiments show that the hybrid method can achieve significant improvement.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Daille, B.: Study and Implementation of Combined Techniques for Automatic extraction of terminology. In: Resnik, P., Klavans, J. (eds.) The Balancing Act: Combining Symbolic and Statistical Approaches to Language, pp. 49–66. MIT Press, Cambridge (1996)
Milios, E., Zhang, Y., He, B., Dong, L.: Automatic Term Extraction and Document Similarity in Special Text Corpora. In: Proc. of the 6th Conference of the Pacific Association for Computational Linguistics, Halifax, NS, Canada, August 22-25, pp. 275–284 (2003)
Yirong, C., Qin, L., Wenjie, L., Zhifang, S., Luning, J.: A Study on Terminology Extraction Based on Classified Corpora. In: LREC2006 (2006)
Chien, L.F.: Pat-tree-based adaptive keyphrase extraction for intelligent Chinese information retrieval. Information Processing and Management 35, 501–521 (1999)
Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase Extraction. In: Proc. of 16th Int. Joint Conf. on Artificial Intelligence IJCAI-99, pp. 668–673 (1999)
Nakagawa, H., Mori, T.: A simple but powerful automatic term extraction method. In: Proc. of the 2nd Int. Workshop on Computational Terminology, Taipei,Taiwan, August 31, pp. 29–35 (2002)
Fahmi, I.: C-value method for multi-word term extraction. In: Seminar in Statistics and Methodology, May 23 (2005)
Chang, J.-S.: Domain Specific Word Extraction from Hierarchical Web Documents: A First Step Toward Building Lexicon Trees from Web Corpora. Proc. of the Fourth SIGHAN Workshop on Chinese Language Learning, 64–71 (2005)
Kageura, K., Umino, B.: Methods of automatic term recognition: a review. Terminology 3(2), 259–289 (1996)
Frantzi, K.T.: Incorporating Context Information for the Extraction of Terms. In: Proc. of ACL/EACL ’97, Madrid, Spain, July, pp. 501–503 (1997)
Frantzi, K.T., Annaniadou, S.: Extracting nested collocations. In: Proc. Of COLING’96, pp. 41–46 (1996)
Lu, Q., Chan, S.-T., Li, B., Yu, S.: A Unicode-based Adaptive Segmenter. Journal of Chinese Language and Computing 14(3), 221–234 (2004)
Schone, P., Jurafsky, D.: Is knowledge-free induction of multiword unit dictionary headwords a solved problem? In: Proc. of EMNLP (2001)
Luo, S., Sun, M.: Two-Character Chinese Word Extraction Based on Hybrid of Internal and Contextual Measures. In: Proc. of the Second SIGHAN Workshop on Chinese Language Processing, July, pp. 24–30 (2003)
Sui, Z., Chen, Y.: The Research on the automatic Term Extraction in the Domain of Information Science and Technology. In: Proc. of the 5th East Asia Forum of the Terminology (2002)
Hisamitsu, T., Niwa, Y.: A measure of term representativeness based on the number of co-occurring salient words. In: Proc. of the 19th COLING (2002)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ji, L., Sum, M., Lu, Q., Li, W., Chen, Y. (2007). Chinese Terminology Extraction Using Window-Based Contextual Information. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2007. Lecture Notes in Computer Science, vol 4394. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70939-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-70939-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70938-1
Online ISBN: 978-3-540-70939-8
eBook Packages: Computer ScienceComputer Science (R0)