SIGNUM: A Graph Algorithm for Terminology Extraction

Axel-Cyrille Ngonga Ngomo¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4919))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

Abstract

Terminology extraction is an essential step in several fields of natural language processing such as dictionary and ontology extraction. In this paper, we present a novel graph-based approach to terminology extraction. We use SIGNUM, a general purpose graph-based algorithm for binary clustering on directed weighted graphs generated using a metric for multi-word extraction. Our approach is totally knowledge-free and can thus be used on corpora written in any language. Furthermore it is unsupervised, making it suitable for use by non-experts. Our approach is evaluated on the TREC-9 corpus for filtering against the MESH and the UMLS vocabularies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

On Syntactical Graphs-of-Words

Automatic Terminology Extraction Using a Dependency-Graph in NLP

Biomedical term extraction: overview and a new methodology

Article 25 August 2015

References

Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press, Addison-Wesley (1999)
Google Scholar
Bourigault, D.: Lexter: A terminology extraction software for knowledge acquisition from texts. In: 9th Knowledge Acquisition for Knowledge-Based Systems Workshop, Banff, Canada (1995)
Google Scholar
Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. In: Proceedings of the 27th. Annual Meeting of the Association for Computational Linguistics, Vancouver, B.C, pp. 76–83. Association for Computational Linguistics (1989)
Google Scholar
Dagan, I., Church, K.: Termight: identifying and translating technical terminology. In: Proceedings of the fourth conference on Applied natural language processing, pp. 34–40. Morgan Kaufmann, San Francisco (1994)
Chapter Google Scholar
Dias, G.: Extraction Automatique dAssociations Lexicales partir de Corpora. PhD thesis, New University of Lisbon (Portugal) and LIFO University of Orléans (France), Lisbon, Portugal (2002)
Google Scholar
Dice, L.R.: Measures of the amount of ecological association between species. Ecology 26, 297–302 (1945)
Article Google Scholar
Dorow, B.: A Graph Model for Words and their Meanings. PhD thesis, University of Stuttgart, Stuttgart, Germany (2006)
Google Scholar
da Silva, J.F., Lopes, G.P.: A local maxima method and a fair dispersion normalization for extracting multi-words units from corpora. In: Sixth Meeting on Mathematics of Language, Orlando, USA, pp. 369–381 (1999)
Google Scholar
Giuliano, V.E.: The interpretation of word associations. In: Stevens, M.E., et al. (eds.) Proceedings of the Symposiums on Statistical Association Methods for Mechanical Documentation, Washington D.C., number 269, NBS (1964)
Google Scholar
Hamming, R.: Error-detecting and error-correcting codes. Bell System Technical Journal 29(2), 147–160 (1950)
MathSciNet Google Scholar
Justeson, J., Katz, S.: Co-occurrences of antonymous adjectives and their contexts. Computational Linguistics 17(1), 1–20 (1991)
Google Scholar
Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing, 1st edn. MIT Press, Cambridge (1999)
MATH Google Scholar
Milgram, S.: The small-world problem. Psychology Today 2, 60–67 (1967)
Google Scholar
Ngonga Ngomo, A.-C.: CLIque-based clustering. In: Proceedings of Knowledge Sharing and Collaborative Engineering Conference, St. Thomas, VI, USA (November 2006)
Google Scholar
Ngonga Ngomo, A.-C.: Knowledge-free discovery of domain-specific multi-word units. In: Proceedings of the 2008 ACM symposium on Applied computing, ACM, New York (to appear, 2008)
Google Scholar
Robertson, S.E., Hull, D.: The TREC 2001 filtering track report. In: Text REtrieval Conference (2001)
Google Scholar
Schone, P.: Toward Knowledge-Free Induction of Machine-Readable Dictionaries. PhD thesis, University of Colorado at Boulder, Boulder, USA (2001)
Google Scholar
Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–123 (1998)
Google Scholar
Smadja, F.A.: Retrieving collocations from text: Xtract. Computational Linguistics 19(1), 143–177 (1993)
Google Scholar
Steyvers, M., Tenenbaum, J.: The large-scale structure of semantic networks: Statistical analyses and a model of semantic growth. Cognitive Science: A Multidisciplinary Journal 29(1), 41–78 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Leipzig, Johannisgasse 26, Leipzig, D-04103, Germany
Axel-Cyrille Ngonga Ngomo

Authors

Axel-Cyrille Ngonga Ngomo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ngonga Ngomo, AC. (2008). SIGNUM: A Graph Algorithm for Terminology Extraction. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2008. Lecture Notes in Computer Science, vol 4919. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78135-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-540-78135-6_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78134-9
Online ISBN: 978-3-540-78135-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics