Abstract
Terminological resources play a central role in the organization and retrieval of scientific texts. Both simple keyword lists and advanced modelings of relationships between terminological concepts can make a most valuable contribution to the analysis, classification, and finding of appropriate digital documents, either on the web or within local repositories. This seems especially true for long-established scientific fields with elusive theoretical and historical branches, where the use of terminology within documents from different origins is often far from being consistent. In this paper, we report on the progress of a linguistically motivated project on the onomasiological re-modeling of the terminological resources for the grammatical information system grammis. We present the design principles and the results of their application. In particular, we focus on new features for the authoring backend and discuss how these innovations help to evaluate existing, loosely structured terminological content, as well as to efficiently deal with automatic term extraction. Furthermore, we introduce a transformation to a future SKOS representation. We conclude with a positioning of our resources with regard to the Knowledge Organization discourse and discuss how a highly complex information environment like grammis benefits from the re-designed terminological KOS.
Similar content being viewed by others
Notes
See [36]; the information system can be found online at http://www.ids-mannheim.de/grammis/.
Some of the results we already discussed in other publications in greater detail; we make reference to those in the corresponding sections.
This distinction is inspired by lexicography where macrostructure is defined as the usually alphabetical order of entries and microstructure as the actual structure of one entry of a lexicon (cf. [18, p. 372]). Note that our macrostructure is ordered by concept relations instead of alphabetically.
Note that both modes—hierarchical and non-hierarchical—are available to all types of relations, i.e., what is often known as hierarchical relations BT/NT can also be displayed as non-hierarchical network in our tool. The inclusion of non-hierarchical relations (RT) in a hierarchical tree, however, renders the resulting graph overly complex and hardly interpretable.
This feature was recently added as a result of a usability study by Fu et al. [17]. They find that depending on the area of application it can be more beneficial to use indented trees or node-link diagrams (graphs) as visualization technique; hence, Fu et al. conclude that in designing a visualization tool, multiple visualization techniques should be combined. Furthermore, they indicate the importance of customization to enable the user to adapt the visualization to their needs.
If the current concept is labeled Supplement (‘supplement’), this method will propose Adverbialsupplement (‘adverbial supplement’) as a candidate because the former is a substring of the latter.
As reference corpus we used a sample from DeReKo corpus (cf. [26]) which covers various text types and genres.
Ahmad et al.: [1, p. 720]:
$$\begin{aligned} \frac{w_\mathrm{s}/t_\mathrm{s}}{w_\mathrm{g}/t_\mathrm{g}} \end{aligned}$$Note that Ahmad et al. use slightly different labels for the corpora: \(w_\mathrm{s}\) refers to the frequency of a word in the specialist language corpus (the domain-specific target corpus), \(w_\mathrm{g}\) to the frequency of a word in the general language corpus (the non-specialized language reference corpus), \(t_\mathrm{s}\) to the total count of words in the specialist language corpus, and \(t_\mathrm{g}\) refers to the total count of words in the general language corpus.
For a detailed description of our approach to these theory tags, see [41, pp. 207–211].
For instance, clashes of associative and hierarchical links, see example 27 in the SKOS reference, https://www.w3.org/TR/2009/REC-skos-reference-20090818.
Prospectively, the end-users of grammis might also benefit from visualization techniques, similar to those described in 3.2.1.
References
Ahmad, K., Gillam, L., Tostevin, L.: University of surrey participation in TREC8: weirdness indexing for logical document extrapolation, retrieval (WILDER). In: Voorhees, E., Harman, D. (eds.) NIST Special Publication 500–246: The Eighth Text Retrieval Conference (TREC-8), Gaithersburg, MA, pp. 717–724 (1999)
Almende, B.V., Thieurmel, B.: visNetwork: network visualization using ‘vis.js’ library. R package version 1.0.3. https://CRAN.R-project.org/package=visNetwork (2016). Accessed 1 June 2017
Augustinus, L., Vandeghinste, V., Van Eynde, F.: Example-based treebank querying. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2012/pdf/756_Paper.pdf (2012). Accessed 8 Nov 2017
Augustinus, L., Vandeghinste, V., Vanallemeersch, T.: Poly-gretel: cross-lingual example-based querying of syntactic constructions. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2016/pdf/486_Paper.pdf (2016). Accessed 8 Nov 2017
Baker, T., Bechhofer, S., Isaac, A., Miles, A., Schreiber, G., Summers, E.: Key choices in the design of Simple Knowledge Organization System (SKOS). Web Semant. Sci. Serv. Agents World Wide Web 20, 35–49 (2013)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)
Bubenhofer, N., Schneider, R.: Using a domain ontology for the semantic-statistical classification of specialist hypertexts. In: Papers from the Annual International Conference on Computational Linguistics ‘Dialogue’. Moscow, 26 May 2010/30 May 2010, pp. 622–628 (2010)
Chang, W., Cheng, J., Allaire, J., Xie, Y., McPherson, J.: Shiny: Web Application Framework for R. R package version 1.0.0. https://CRAN.R-project.org/package=shiny (2017). Accessed 1 June 2017
Deutscher Terminologie-Tag eV: Terminologiearbeit—Best Practices, 2nd edn (2014)
DIN 2331: Begriffssysteme und ihre Darstellung (1980)
DIN 2342:2011-08: Begriffe der Terminologielehre (2011)
Drewer, P., Massion, F., Pulitano, D.: Was haben Wissensmodellierung, Wissensstruktur, künstliche Intelligenz und Terminologie miteinander zu tun? Technical Report, Deutscher Terminologie-Tag e.V (2017)
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. J. Comput. Linguist. Spec. Issue Using Large Corpora 19(1), 61–74 (1993)
Faber, P.: Frames as a framework for terminology. In: Kockaert, H.J., Steurs, F. (eds.) Handbook of Terminology, vol. 1. John Benjamins Publishing Company, Amsterdam (2015)
Frantzi, K., Ananiadou, S., Mima, H.: Automatic recognition of multi-word terms: the C-value/NC-value method. Int. J. Digit. Libr. 3(2), 115–130 (2000)
Früh, B., Deubzer, F.: Von der Terminologieverwaltung zur Wissensorganisation. Edition 16(1), 27–32 (2016)
Fu, B., Noy, N.F., Storey, M.A.: Indented tree or graph? A usability study of ontology visualization techniques in the context of class mapping evaluation. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) The Semantic Web—ISWC 2013: 12th International Semantic Web Conference, Sydney, NSW, Australia, 21–25 Oct 2013, Proceedings, Part I, vol. 8218. Springer, Berlin, pp. 117–134 (2013). https://doi.org/10.1007/978-3-642-41335-3_8
Hausmann, F.J.: Lexikographie. In: Schwarze, C., Wunderlich, D. (eds.) Handbuch der Lexikologie, pp. 367–398. Athenäum, Königstein/Ts (1985)
Hellmann, S., Unbehauen, J., Chiarcos, C., Ngonga Ngomo, A.C.: The tiger corpus navigator. In: Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories (TLT-9), Northern European Association for Language Technology (NEALT), pp. 91–102 (2010)
Hjørland, B.: Semantics and knowledge organization. Annu. Rev. Inf. Sci. Technol. 41(1), 367–405 (2007)
Hjørland, B.: What is Knowledge Organization (KO)? Knowl. Organ. 35(2/3), 86–102 (2008)
Hjørland, B. (ed.): ISKO Encyclopedia of Knowledge Organization (IEKO), online edn. http://www.isko.org/cyclo/ (2016). Accessed 30 Sept 2017
ISO 25964-1:2011: Information and documentation—thesauri and interoperability with other vocabularies—Part 1: thesauri for information retrieval (2011)
ISO 30042: Systems to manage terminology, knowledge and content—TermBase eXchange TBX, 1st edn (2008)
Justeson, J.S., Katz, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Nat. Lang. Eng. 1(1), 9–27 (1995)
Kupietz, M., Keibel, H.: The Mannheim German Reference Corpus (DeReKo) as a basis for empirical linguistic research. In: Minegishi, M., Kawaguchi, Y. (eds.) Working Papers in Corpus-based Linguistics and Language Education, 3, Tokyo University of Foreign Studies, Tokyo, pp. 53–59 (2009)
Lang, C., Suchowolec, K., Schneider, R.: Extracting technical terminology from linguistic corpora. In: Proceedings of Grammar and Corpora 2016, Mannheim, Heidelberg University Publishing (heiUP), Heidelberg (2018)
León Araúz, P., Magaña Redondo, P.J.: EcoLexicon: contextualizing an environmental ontology. In: Proceedings of the Terminology and Knowledge Engineering (TKE) Conference, pp. 341–355 (2010)
Mazzocchi, F.: Knowledge organization system (KOS). In: [22], version 1.1. http://www.isko.org/cyclo/kos (2017). Accessed 30 Sept 2017
Michel, F., Montagnat, J., Faron-Zucker, C.: A survey of RDB to RDF translation approaches and tools. Technical Report, Laboratoire d’Informatique, Signaux et Systèmes de Sophia-Antipolis (I3S) (2014)
Mueller, T., Schmid, H., Schütze, H.: Efficient higher-order CRFs for morphological tagging. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Seattle, Washington, USA, pp. 322–332. http://www.aclweb.org/anthology/D13-1032 (2013). Accessed 8 Nov 2017
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/ (2016). Accessed 8 Nov 2017
Resnik, P., Elkiss, A.: The linguist’s search engine: An overview. In: Proceedings of the ACL 2005 Interactive Poster and Demonstration Session, Association for Computational Linguistics (ACL), pp. 33–36 (2005). https://doi.org/10.3115/1225753.1225762
Schmid, H.: Improvements in part-of-speech tagging with an application to German. In: Proceedings of the ACL SIGDAT-Workshop, Dublin, Ireland, pp. 1–9 (1995)
Schneider, R., Gottron, T.: A hybrid approach to statistical and semantical analysis of Web documents. In: Proceedings of the IASTED International Conference Internet and Multimedia Systems and Applications (EuroImsa), pp. 115–120 (2009)
Schneider, R., Schwinn, H.: Hypertext, Wissensnetz und Datenbank: Die Web-Informationssysteme grammis und Progr@mm. In: Berens, F.J., Steinle, M. (eds.) Ansichten und Einsichten. 50 Jahre Institut für Deutsche Sprache, IDS Eigenverlag, Mannheim, pp. 337–346. http://nbn-resolving.de/urn:nbn:de:bsz:mh39-24719 (2014). Accessed 7 Nov 2017
Sejane, I.: Wissensrepräsentation Linguistik. Modellierung, Potenzial und Grenzen am Beispiel der Ontologie zur deutschen Grammatik im GRAMMIS-Informationssystem des IDS, Mannheim. Ph.D. Thesis, Ruprecht-Karls-Universität Heidelberg (2010)
Souza, R.R., Tudhope, D., Almeida, M.B.: Towards a taxonomy of KOS: dimensions for classifying knowledge organisation systems. Knowl. Organ. 39(3), 172–179 (2012)
Spärck Jones, K.: A statistical interpretation of term specifity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)
Suchowolec, K.: Sprachlenkung—Aspekte einer übergreifenden Theorie. Frank & Timme, Berlin, dissertation, Stiftung Universität Hildesheim (2018)
Suchowolec, K., Lang, C., Schneider, R., Schwinn, H.: Shifting complexity from text to data model. In: Gracia, J., Bond, F., McCrae, J.P., Buitelaar, P., Chiarcos, C., Hellmann, S. (eds.) Language, Data, and Knowledge. Proceedings of the First International Conference, LDK 2017, 19 June 2017/20 June 2017, Galway, Ireland, Springer, Cham, no. 10318 in Lecture Notes in Artificial Intelligence, pp. 203–212 (2017)
Suchowolec, K., Lang, C., Schneider, R.: Grammar and its terminology. Re-designing terminology management system according to best practices (forthcoming)
Winkler, W.: String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage. In: Proceedings of the Section on Survey Research Methods (American Statistical Association), pp. 354–359 (1990)
Zifonun, G., Hoffmann, L., Strecker, B.: Grammatik der deutschen Sprache: Bd. 1–3. de Gruyter, Berlin (1997)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Suchowolec, K., Lang, C. & Schneider, R. An empirically validated, onomasiologically structured, and linguistically motivated online terminology. Int J Digit Libr 20, 253–268 (2019). https://doi.org/10.1007/s00799-018-0254-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00799-018-0254-x