Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

An empirically validated, onomasiologically structured, and linguistically motivated online terminology

Re-designing scientific resources on German grammar

  • Published:
International Journal on Digital Libraries Aims and scope Submit manuscript

Abstract

Terminological resources play a central role in the organization and retrieval of scientific texts. Both simple keyword lists and advanced modelings of relationships between terminological concepts can make a most valuable contribution to the analysis, classification, and finding of appropriate digital documents, either on the web or within local repositories. This seems especially true for long-established scientific fields with elusive theoretical and historical branches, where the use of terminology within documents from different origins is often far from being consistent. In this paper, we report on the progress of a linguistically motivated project on the onomasiological re-modeling of the terminological resources for the grammatical information system grammis. We present the design principles and the results of their application. In particular, we focus on new features for the authoring backend and discuss how these innovations help to evaluate existing, loosely structured terminological content, as well as to efficiently deal with automatic term extraction. Furthermore, we introduce a transformation to a future SKOS representation. We conclude with a positioning of our resources with regard to the Knowledge Organization discourse and discuss how a highly complex information environment like grammis benefits from the re-designed terminological KOS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. See [36]; the information system can be found online at http://www.ids-mannheim.de/grammis/.

  2. http://www.ids-mannheim.de/progr@mm/.

  3. https://www.w3.org/TR/2009/REC-skos-reference-20090818/.

  4. https://www.w3.org/community/ontolex/wiki/Final_Model_Specification.

  5. Some of the results we already discussed in other publications in greater detail; we make reference to those in the corresponding sections.

  6. This distinction is inspired by lexicography where macrostructure is defined as the usually alphabetical order of entries and microstructure as the actual structure of one entry of a lexicon (cf. [18, p. 372]). Note that our macrostructure is ordered by concept relations instead of alphabetically.

  7. Note, however, that the application of visualization techniques is not limited to the traditional approach as is illustrated by EcoLexicon [28], a resource dealing with environmental terminology, based in the paradigm of frame-based terminology (cf. [14]).

  8. Note that both modes—hierarchical and non-hierarchical—are available to all types of relations, i.e., what is often known as hierarchical relations BT/NT can also be displayed as non-hierarchical network in our tool. The inclusion of non-hierarchical relations (RT) in a hierarchical tree, however, renders the resulting graph overly complex and hardly interpretable.

  9. This feature was recently added as a result of a usability study by Fu et al. [17]. They find that depending on the area of application it can be more beneficial to use indented trees or node-link diagrams (graphs) as visualization technique; hence, Fu et al. conclude that in designing a visualization tool, multiple visualization techniques should be combined. Furthermore, they indicate the importance of customization to enable the user to adapt the visualization to their needs.

  10. If the current concept is labeled Supplement (‘supplement’), this method will propose Adverbialsupplement (‘adverbial supplement’) as a candidate because the former is a substring of the latter.

  11. https://www.coreon.com/.

  12. http://www.dog-gmbh.de/de/produkte/lookup/.

  13. https://www.kaleidoscope.at/de/terminologie/quickterm.

  14. http://www.interverbumtech.de/.

  15. As reference corpus we used a sample from DeReKo corpus (cf. [26]) which covers various text types and genres.

  16. Ahmad et al.: [1, p. 720]:

    $$\begin{aligned} \frac{w_\mathrm{s}/t_\mathrm{s}}{w_\mathrm{g}/t_\mathrm{g}} \end{aligned}$$

    Note that Ahmad et al. use slightly different labels for the corpora: \(w_\mathrm{s}\) refers to the frequency of a word in the specialist language corpus (the domain-specific target corpus), \(w_\mathrm{g}\) to the frequency of a word in the general language corpus (the non-specialized language reference corpus), \(t_\mathrm{s}\) to the total count of words in the specialist language corpus, and \(t_\mathrm{g}\) refers to the total count of words in the general language corpus.

  17. For a detailed description of our approach to these theory tags, see [41, pp. 207–211].

  18. http://d2rq.org/.

  19. https://www.w3.org/TR/r2rml/.

  20. https://github.com/d2rq/d2rq/issues/45.

  21. For instance, clashes of associative and hierarchical links, see example 27 in the SKOS reference, https://www.w3.org/TR/2009/REC-skos-reference-20090818.

  22. Prospectively, the end-users of grammis might also benefit from visualization techniques, similar to those described in 3.2.1.

  23. https://grammis.ids-mannheim.de/fragen/3185.

  24. https://grammis.ids-mannheim.de/fragen/4550.

References

  1. Ahmad, K., Gillam, L., Tostevin, L.: University of surrey participation in TREC8: weirdness indexing for logical document extrapolation, retrieval (WILDER). In: Voorhees, E., Harman, D. (eds.) NIST Special Publication 500–246: The Eighth Text Retrieval Conference (TREC-8), Gaithersburg, MA, pp. 717–724 (1999)

  2. Almende, B.V., Thieurmel, B.: visNetwork: network visualization using ‘vis.js’ library. R package version 1.0.3. https://CRAN.R-project.org/package=visNetwork (2016). Accessed 1 June 2017

  3. Augustinus, L., Vandeghinste, V., Van Eynde, F.: Example-based treebank querying. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2012/pdf/756_Paper.pdf (2012). Accessed 8 Nov 2017

  4. Augustinus, L., Vandeghinste, V., Vanallemeersch, T.: Poly-gretel: cross-lingual example-based querying of syntactic constructions. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), European Language Resources Association (ELRA). http://www.lrec-conf.org/proceedings/lrec2016/pdf/486_Paper.pdf (2016). Accessed 8 Nov 2017

  5. Baker, T., Bechhofer, S., Isaac, A., Miles, A., Schreiber, G., Summers, E.: Key choices in the design of Simple Knowledge Organization System (SKOS). Web Semant. Sci. Serv. Agents World Wide Web 20, 35–49 (2013)

    Article  Google Scholar 

  6. Brin, S., Page, L.: The anatomy of a large-scale hypertextual search engine. Comput. Netw. ISDN Syst. 30(1–7), 107–117 (1998)

    Article  Google Scholar 

  7. Bubenhofer, N., Schneider, R.: Using a domain ontology for the semantic-statistical classification of specialist hypertexts. In: Papers from the Annual International Conference on Computational Linguistics ‘Dialogue’. Moscow, 26 May 2010/30 May 2010, pp. 622–628 (2010)

  8. Chang, W., Cheng, J., Allaire, J., Xie, Y., McPherson, J.: Shiny: Web Application Framework for R. R package version 1.0.0. https://CRAN.R-project.org/package=shiny (2017). Accessed 1 June 2017

  9. Deutscher Terminologie-Tag eV: Terminologiearbeit—Best Practices, 2nd edn (2014)

  10. DIN 2331: Begriffssysteme und ihre Darstellung (1980)

  11. DIN 2342:2011-08: Begriffe der Terminologielehre (2011)

  12. Drewer, P., Massion, F., Pulitano, D.: Was haben Wissensmodellierung, Wissensstruktur, künstliche Intelligenz und Terminologie miteinander zu tun? Technical Report, Deutscher Terminologie-Tag e.V (2017)

  13. Dunning, T.: Accurate methods for the statistics of surprise and coincidence. J. Comput. Linguist. Spec. Issue Using Large Corpora 19(1), 61–74 (1993)

    Google Scholar 

  14. Faber, P.: Frames as a framework for terminology. In: Kockaert, H.J., Steurs, F. (eds.) Handbook of Terminology, vol. 1. John Benjamins Publishing Company, Amsterdam (2015)

    Chapter  Google Scholar 

  15. Frantzi, K., Ananiadou, S., Mima, H.: Automatic recognition of multi-word terms: the C-value/NC-value method. Int. J. Digit. Libr. 3(2), 115–130 (2000)

    Article  Google Scholar 

  16. Früh, B., Deubzer, F.: Von der Terminologieverwaltung zur Wissensorganisation. Edition 16(1), 27–32 (2016)

    Google Scholar 

  17. Fu, B., Noy, N.F., Storey, M.A.: Indented tree or graph? A usability study of ontology visualization techniques in the context of class mapping evaluation. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) The Semantic Web—ISWC 2013: 12th International Semantic Web Conference, Sydney, NSW, Australia, 21–25 Oct 2013, Proceedings, Part I, vol. 8218. Springer, Berlin, pp. 117–134 (2013). https://doi.org/10.1007/978-3-642-41335-3_8

  18. Hausmann, F.J.: Lexikographie. In: Schwarze, C., Wunderlich, D. (eds.) Handbuch der Lexikologie, pp. 367–398. Athenäum, Königstein/Ts (1985)

  19. Hellmann, S., Unbehauen, J., Chiarcos, C., Ngonga Ngomo, A.C.: The tiger corpus navigator. In: Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories (TLT-9), Northern European Association for Language Technology (NEALT), pp. 91–102 (2010)

  20. Hjørland, B.: Semantics and knowledge organization. Annu. Rev. Inf. Sci. Technol. 41(1), 367–405 (2007)

    Article  Google Scholar 

  21. Hjørland, B.: What is Knowledge Organization (KO)? Knowl. Organ. 35(2/3), 86–102 (2008)

    Article  Google Scholar 

  22. Hjørland, B. (ed.): ISKO Encyclopedia of Knowledge Organization (IEKO), online edn. http://www.isko.org/cyclo/ (2016). Accessed 30 Sept 2017

  23. ISO 25964-1:2011: Information and documentation—thesauri and interoperability with other vocabularies—Part 1: thesauri for information retrieval (2011)

  24. ISO 30042: Systems to manage terminology, knowledge and content—TermBase eXchange TBX, 1st edn (2008)

  25. Justeson, J.S., Katz, S.M.: Technical terminology: some linguistic properties and an algorithm for identification in text. Nat. Lang. Eng. 1(1), 9–27 (1995)

    Article  Google Scholar 

  26. Kupietz, M., Keibel, H.: The Mannheim German Reference Corpus (DeReKo) as a basis for empirical linguistic research. In: Minegishi, M., Kawaguchi, Y. (eds.) Working Papers in Corpus-based Linguistics and Language Education, 3, Tokyo University of Foreign Studies, Tokyo, pp. 53–59 (2009)

  27. Lang, C., Suchowolec, K., Schneider, R.: Extracting technical terminology from linguistic corpora. In: Proceedings of Grammar and Corpora 2016, Mannheim, Heidelberg University Publishing (heiUP), Heidelberg (2018)

  28. León Araúz, P., Magaña Redondo, P.J.: EcoLexicon: contextualizing an environmental ontology. In: Proceedings of the Terminology and Knowledge Engineering (TKE) Conference, pp. 341–355 (2010)

  29. Mazzocchi, F.: Knowledge organization system (KOS). In: [22], version 1.1. http://www.isko.org/cyclo/kos (2017). Accessed 30 Sept 2017

  30. Michel, F., Montagnat, J., Faron-Zucker, C.: A survey of RDB to RDF translation approaches and tools. Technical Report, Laboratoire d’Informatique, Signaux et Systèmes de Sophia-Antipolis (I3S) (2014)

  31. Mueller, T., Schmid, H., Schütze, H.: Efficient higher-order CRFs for morphological tagging. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Seattle, Washington, USA, pp. 322–332. http://www.aclweb.org/anthology/D13-1032 (2013). Accessed 8 Nov 2017

  32. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/ (2016). Accessed 8 Nov 2017

  33. Resnik, P., Elkiss, A.: The linguist’s search engine: An overview. In: Proceedings of the ACL 2005 Interactive Poster and Demonstration Session, Association for Computational Linguistics (ACL), pp. 33–36 (2005). https://doi.org/10.3115/1225753.1225762

  34. Schmid, H.: Improvements in part-of-speech tagging with an application to German. In: Proceedings of the ACL SIGDAT-Workshop, Dublin, Ireland, pp. 1–9 (1995)

  35. Schneider, R., Gottron, T.: A hybrid approach to statistical and semantical analysis of Web documents. In: Proceedings of the IASTED International Conference Internet and Multimedia Systems and Applications (EuroImsa), pp. 115–120 (2009)

  36. Schneider, R., Schwinn, H.: Hypertext, Wissensnetz und Datenbank: Die Web-Informationssysteme grammis und Progr@mm. In: Berens, F.J., Steinle, M. (eds.) Ansichten und Einsichten. 50 Jahre Institut für Deutsche Sprache, IDS Eigenverlag, Mannheim, pp. 337–346. http://nbn-resolving.de/urn:nbn:de:bsz:mh39-24719 (2014). Accessed 7 Nov 2017

  37. Sejane, I.: Wissensrepräsentation Linguistik. Modellierung, Potenzial und Grenzen am Beispiel der Ontologie zur deutschen Grammatik im GRAMMIS-Informationssystem des IDS, Mannheim. Ph.D. Thesis, Ruprecht-Karls-Universität Heidelberg (2010)

  38. Souza, R.R., Tudhope, D., Almeida, M.B.: Towards a taxonomy of KOS: dimensions for classifying knowledge organisation systems. Knowl. Organ. 39(3), 172–179 (2012)

    Google Scholar 

  39. Spärck Jones, K.: A statistical interpretation of term specifity and its application in retrieval. J. Doc. 28(1), 11–21 (1972)

    Article  Google Scholar 

  40. Suchowolec, K.: Sprachlenkung—Aspekte einer übergreifenden Theorie. Frank & Timme, Berlin, dissertation, Stiftung Universität Hildesheim (2018)

  41. Suchowolec, K., Lang, C., Schneider, R., Schwinn, H.: Shifting complexity from text to data model. In: Gracia, J., Bond, F., McCrae, J.P., Buitelaar, P., Chiarcos, C., Hellmann, S. (eds.) Language, Data, and Knowledge. Proceedings of the First International Conference, LDK 2017, 19 June 2017/20 June 2017, Galway, Ireland, Springer, Cham, no. 10318 in Lecture Notes in Artificial Intelligence, pp. 203–212 (2017)

  42. Suchowolec, K., Lang, C., Schneider, R.: Grammar and its terminology. Re-designing terminology management system according to best practices (forthcoming)

  43. Winkler, W.: String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage. In: Proceedings of the Section on Survey Research Methods (American Statistical Association), pp. 354–359 (1990)

  44. Zifonun, G., Hoffmann, L., Strecker, B.: Grammatik der deutschen Sprache: Bd. 1–3. de Gruyter, Berlin (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karolina Suchowolec.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Suchowolec, K., Lang, C. & Schneider, R. An empirically validated, onomasiologically structured, and linguistically motivated online terminology. Int J Digit Libr 20, 253–268 (2019). https://doi.org/10.1007/s00799-018-0254-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00799-018-0254-x

Keywords

Navigation