An empirically validated, onomasiologically structured, and linguistically motivated online terminology

Karolina Suchowolec¹,
Christian Lang² &
Roman Schneider²

361 Accesses
2 Citations
Explore all metrics

Abstract

Terminological resources play a central role in the organization and retrieval of scientific texts. Both simple keyword lists and advanced modelings of relationships between terminological concepts can make a most valuable contribution to the analysis, classification, and finding of appropriate digital documents, either on the web or within local repositories. This seems especially true for long-established scientific fields with elusive theoretical and historical branches, where the use of terminology within documents from different origins is often far from being consistent. In this paper, we report on the progress of a linguistically motivated project on the onomasiological re-modeling of the terminological resources for the grammatical information system grammis. We present the design principles and the results of their application. In particular, we focus on new features for the authoring backend and discuss how these innovations help to evaluate existing, loosely structured terminological content, as well as to efficiently deal with automatic term extraction. Furthermore, we introduce a transformation to a future SKOS representation. We conclude with a positioning of our resources with regard to the Knowledge Organization discourse and discuss how a highly complex information environment like grammis benefits from the re-designed terminological KOS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TermWatch II: Unsupervised Terminology Graph Extraction and Decomposition

Applying the OntoLex Model to a Multilingual Terminological Resource

Corpus evidence and lexicography

Notes

See [36]; the information system can be found online at http://www.ids-mannheim.de/grammis/.
http://www.ids-mannheim.de/progr@mm/.
https://www.w3.org/TR/2009/REC-skos-reference-20090818/.
https://www.w3.org/community/ontolex/wiki/Final_Model_Specification.
Some of the results we already discussed in other publications in greater detail; we make reference to those in the corresponding sections.
This distinction is inspired by lexicography where macrostructure is defined as the usually alphabetical order of entries and microstructure as the actual structure of one entry of a lexicon (cf. [18, p. 372]). Note that our macrostructure is ordered by concept relations instead of alphabetically.
Note, however, that the application of visualization techniques is not limited to the traditional approach as is illustrated by EcoLexicon [28], a resource dealing with environmental terminology, based in the paradigm of frame-based terminology (cf. [14]).
Note that both modes—hierarchical and non-hierarchical—are available to all types of relations, i.e., what is often known as hierarchical relations BT/NT can also be displayed as non-hierarchical network in our tool. The inclusion of non-hierarchical relations (RT) in a hierarchical tree, however, renders the resulting graph overly complex and hardly interpretable.
This feature was recently added as a result of a usability study by Fu et al. [17]. They find that depending on the area of application it can be more beneficial to use indented trees or node-link diagrams (graphs) as visualization technique; hence, Fu et al. conclude that in designing a visualization tool, multiple visualization techniques should be combined. Furthermore, they indicate the importance of customization to enable the user to adapt the visualization to their needs.
If the current concept is labeled Supplement (‘supplement’), this method will propose Adverbialsupplement (‘adverbial supplement’) as a candidate because the former is a substring of the latter.
https://www.coreon.com/.
http://www.dog-gmbh.de/de/produkte/lookup/.
https://www.kaleidoscope.at/de/terminologie/quickterm.
http://www.interverbumtech.de/.
As reference corpus we used a sample from DeReKo corpus (cf. [26]) which covers various text types and genres.
Ahmad et al.: [1, p. 720]:
$$\begin{aligned} \frac{w_\mathrm{s}/t_\mathrm{s}}{w_\mathrm{g}/t_\mathrm{g}} \end{aligned}$$
Note that Ahmad et al. use slightly different labels for the corpora: $w_\mathrm{s}$ refers to the frequency of a word in the specialist language corpus (the domain-specific target corpus), $w_\mathrm{g}$ to the frequency of a word in the general language corpus (the non-specialized language reference corpus), $t_\mathrm{s}$ to the total count of words in the specialist language corpus, and $t_\mathrm{g}$ refers to the total count of words in the general language corpus.
For a detailed description of our approach to these theory tags, see [41, pp. 207–211].
http://d2rq.org/.
https://www.w3.org/TR/r2rml/.
https://github.com/d2rq/d2rq/issues/45.
For instance, clashes of associative and hierarchical links, see example 27 in the SKOS reference, https://www.w3.org/TR/2009/REC-skos-reference-20090818.
Prospectively, the end-users of grammis might also benefit from visualization techniques, similar to those described in 3.2.1.
https://grammis.ids-mannheim.de/fragen/3185.
https://grammis.ids-mannheim.de/fragen/4550.

References

Ahmad, K., Gillam, L., Tostevin, L.: University of surrey participation in TREC8: weirdness indexing for logical document extrapolation, retrieval (WILDER). In: Voorhees, E., Harman, D. (eds.) NIST Special Publication 500–246: The Eighth Text Retrieval Conference (TREC-8), Gaithersburg, MA, pp. 717–724 (1999)
Almende, B.V., Thieurmel, B.: visNetwork: network visualization using ‘vis.js’ library. R package version 1.0.3. https://CRAN.R-project.org/package=visNetwork (2016). Accessed 1 June 2017
Augustinus, L., Vandeghinste, V., Van Eynde, F.: Example-based treebank querying. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2012/pdf/756_Paper.pdf (2012). Accessed 8 Nov 2017
Augustinus, L., Vandeghinste, V., Vanallemeersch, T.: Poly-gretel: cross-lingual example-based querying of syntactic constructions. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2016/pdf/486_Paper.pdf (2016). Accessed 8 Nov 2017
Baker, T., Bechhofer, S., Isaac, A., Miles, A., Schreiber, G., Summers, E.: Key choices in the design of Simple Knowledge Organization System (SKOS). Web Semant. Sci. Serv. Agents World Wide Web 20, 35–49 (2013)
Article Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)
Article Google Scholar
Bubenhofer, N., Schneider, R.: Using a domain ontology for the semantic-statistical classification of specialist hypertexts. In: Papers from the Annual International Conference on Computational Linguistics ‘Dialogue’. Moscow, 26 May 2010/30 May 2010, pp. 622–628 (2010)
Chang, W., Cheng, J., Allaire, J., Xie, Y., McPherson, J.: Shiny: Web Application Framework for R. R package version 1.0.0. https://CRAN.R-project.org/package=shiny (2017). Accessed 1 June 2017
Deutscher Terminologie-Tag eV: Terminologiearbeit—Best Practices, 2nd edn (2014)
DIN 2331: Begriffssysteme und ihre Darstellung (1980)
DIN 2342:2011-08: Begriffe der Terminologielehre (2011)
Drewer, P., Massion, F., Pulitano, D.: Was haben Wissensmodellierung, Wissensstruktur, künstliche Intelligenz und Terminologie miteinander zu tun? Technical Report, Deutscher Terminologie-Tag e.V (2017)
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. J. Comput. Linguist. Spec. Issue Using Large Corpora 19(1), 61–74 (1993)
Google Scholar
Faber, P.: Frames as a framework for terminology. In: Kockaert, H.J., Steurs, F. (eds.) Handbook of Terminology, vol. 1. John Benjamins Publishing Company, Amsterdam (2015)
Chapter Google Scholar
Frantzi, K., Ananiadou, S., Mima, H.: Automatic recognition of multi-word terms: the C-value/NC-value method. Int. J. Digit. Libr. 3(2), 115–130 (2000)
Article Google Scholar
Früh, B., Deubzer, F.: Von der Terminologieverwaltung zur Wissensorganisation. Edition 16(1), 27–32 (2016)
Google Scholar
Fu, B., Noy, N.F., Storey, M.A.: Indented tree or graph? A usability study of ontology visualization techniques in the context of class mapping evaluation. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) The Semantic Web—ISWC 2013: 12th International Semantic Web Conference, Sydney, NSW, Australia, 21–25 Oct 2013, Proceedings, Part I, vol. 8218. Springer, Berlin, pp. 117–134 (2013). https://doi.org/10.1007/978-3-642-41335-3_8
Hausmann, F.J.: Lexikographie. In: Schwarze, C., Wunderlich, D. (eds.) Handbuch der Lexikologie, pp. 367–398. Athenäum, Königstein/Ts (1985)
Hellmann, S., Unbehauen, J., Chiarcos, C., Ngonga Ngomo, A.C.: The tiger corpus navigator. In: Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories (TLT-9), Northern European Association for Language Technology (NEALT), pp. 91–102 (2010)
Hjørland, B.: Semantics and knowledge organization. Annu. Rev. Inf. Sci. Technol. 41(1), 367–405 (2007)
Article Google Scholar
Hjørland, B.: What is Knowledge Organization (KO)? Knowl. Organ. 35(2/3), 86–102 (2008)
Article Google Scholar
Hjørland, B. (ed.): ISKO Encyclopedia of Knowledge Organization (IEKO), online edn. http://www.isko.org/cyclo/ (2016). Accessed 30 Sept 2017
ISO 25964-1:2011: Information and documentation—thesauri and interoperability with other vocabularies—Part 1: thesauri for information retrieval (2011)
ISO 30042: Systems to manage terminology, knowledge and content—TermBase eXchange TBX, 1st edn (2008)
Justeson, J.S., Katz, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Nat. Lang. Eng. 1(1), 9–27 (1995)
Article Google Scholar
Kupietz, M., Keibel, H.: The Mannheim German Reference Corpus (DeReKo) as a basis for empirical linguistic research. In: Minegishi, M., Kawaguchi, Y. (eds.) Working Papers in Corpus-based Linguistics and Language Education, 3, Tokyo University of Foreign Studies, Tokyo, pp. 53–59 (2009)
Lang, C., Suchowolec, K., Schneider, R.: Extracting technical terminology from linguistic corpora. In: Proceedings of Grammar and Corpora 2016, Mannheim, Heidelberg University Publishing (heiUP), Heidelberg (2018)
León Araúz, P., Magaña Redondo, P.J.: EcoLexicon: contextualizing an environmental ontology. In: Proceedings of the Terminology and Knowledge Engineering (TKE) Conference, pp. 341–355 (2010)
Mazzocchi, F.: Knowledge organization system (KOS). In: [22], version 1.1. http://www.isko.org/cyclo/kos (2017). Accessed 30 Sept 2017
Michel, F., Montagnat, J., Faron-Zucker, C.: A survey of RDB to RDF translation approaches and tools. Technical Report, Laboratoire d’Informatique, Signaux et Systèmes de Sophia-Antipolis (I3S) (2014)
Mueller, T., Schmid, H., Schütze, H.: Efficient higher-order CRFs for morphological tagging. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Seattle, Washington, USA, pp. 322–332. http://www.aclweb.org/anthology/D13-1032 (2013). Accessed 8 Nov 2017
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/ (2016). Accessed 8 Nov 2017
Resnik, P., Elkiss, A.: The linguist’s search engine: An overview. In: Proceedings of the ACL 2005 Interactive Poster and Demonstration Session, Association for Computational Linguistics (ACL), pp. 33–36 (2005). https://doi.org/10.3115/1225753.1225762
Schmid, H.: Improvements in part-of-speech tagging with an application to German. In: Proceedings of the ACL SIGDAT-Workshop, Dublin, Ireland, pp. 1–9 (1995)
Schneider, R., Gottron, T.: A hybrid approach to statistical and semantical analysis of Web documents. In: Proceedings of the IASTED International Conference Internet and Multimedia Systems and Applications (EuroImsa), pp. 115–120 (2009)
Schneider, R., Schwinn, H.: Hypertext, Wissensnetz und Datenbank: Die Web-Informationssysteme grammis und Progr@mm. In: Berens, F.J., Steinle, M. (eds.) Ansichten und Einsichten. 50 Jahre Institut für Deutsche Sprache, IDS Eigenverlag, Mannheim, pp. 337–346. http://nbn-resolving.de/urn:nbn:de:bsz:mh39-24719 (2014). Accessed 7 Nov 2017
Sejane, I.: Wissensrepräsentation Linguistik. Modellierung, Potenzial und Grenzen am Beispiel der Ontologie zur deutschen Grammatik im GRAMMIS-Informationssystem des IDS, Mannheim. Ph.D. Thesis, Ruprecht-Karls-Universität Heidelberg (2010)
Souza, R.R., Tudhope, D., Almeida, M.B.: Towards a taxonomy of KOS: dimensions for classifying knowledge organisation systems. Knowl. Organ. 39(3), 172–179 (2012)
Google Scholar
Spärck Jones, K.: A statistical interpretation of term specifity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)
Article Google Scholar
Suchowolec, K.: Sprachlenkung—Aspekte einer übergreifenden Theorie. Frank & Timme, Berlin, dissertation, Stiftung Universität Hildesheim (2018)
Suchowolec, K., Lang, C., Schneider, R., Schwinn, H.: Shifting complexity from text to data model. In: Gracia, J., Bond, F., McCrae, J.P., Buitelaar, P., Chiarcos, C., Hellmann, S. (eds.) Language, Data, and Knowledge. Proceedings of the First International Conference, LDK 2017, 19 June 2017/20 June 2017, Galway, Ireland, Springer, Cham, no. 10318 in Lecture Notes in Artificial Intelligence, pp. 203–212 (2017)
Suchowolec, K., Lang, C., Schneider, R.: Grammar and its terminology. Re-designing terminology management system according to best practices (forthcoming)
Winkler, W.: String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage. In: Proceedings of the Section on Survey Research Methods (American Statistical Association), pp. 354–359 (1990)
Zifonun, G., Hoffmann, L., Strecker, B.: Grammatik der deutschen Sprache: Bd. 1–3. de Gruyter, Berlin (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Technische Hochschule Köln, Cologne, Germany
Karolina Suchowolec
Institut für Deutsche Sprache (IDS), Mannheim, Germany
Christian Lang & Roman Schneider

Authors

Karolina Suchowolec
View author publications
You can also search for this author in PubMed Google Scholar
Christian Lang
View author publications
You can also search for this author in PubMed Google Scholar
Roman Schneider
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karolina Suchowolec.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Suchowolec, K., Lang, C. & Schneider, R. An empirically validated, onomasiologically structured, and linguistically motivated online terminology. Int J Digit Libr 20, 253–268 (2019). https://doi.org/10.1007/s00799-018-0254-x

Download citation

Received: 30 September 2017
Revised: 27 July 2018
Accepted: 05 October 2018
Published: 17 November 2018
Issue Date: 01 September 2019
DOI: https://doi.org/10.1007/s00799-018-0254-x

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

TermWatch II: Unsupervised Terminology Graph Extraction and Decomposition

Applying the OntoLex Model to a Multilingual Terminological Resource

Corpus evidence and lexicography

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An empirically validated, onomasiologically structured, and linguistically motivated online terminology

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

TermWatch II: Unsupervised Terminology Graph Extraction and Decomposition

Applying the OntoLex Model to a Multilingual Terminological Resource

Corpus evidence and lexicography

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation