Modelling Linguistic Annotations

Philipp Cimiano⁵,
Christian Chiarcos⁶,
John P. McCrae ORCID: orcid.org/0000-0002-7227-1331⁷ &
…
Jorge Gracia⁸

847 Accesses
1 Citations

Abstract

This chapter describes how linguistic annotations can be represented in RDF. Web Annotation and NIF provide the means to reference text segments on the web. Yet, representing linguistic annotations requires appropriate vocabularies. We discuss relevant vocabularies and illustrate how they can be applied to support annotation at different levels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

N. Ide, C. Chiarcos, M. Stede, S. Cassidy, Designing annotation schemes: from model to representation, in Handbook of Linguistic Annotation, ed. by N. Ide, J. Pustejovsky. Text, Speech, and Language Technology (Springer, Berlin, 2017)
Chapter Google Scholar
S. Bird, M. Liberman, A formal framework for linguistic annotation. Speech Commun. 33(1–2), 23 (2001)
Article Google Scholar
N. Ide, K. Suderman, The Linguistic Annotation Framework: a standard for annotation interchange and merging. Lang. Resour. Eval. 48(3), 395 (2014)
Article Google Scholar
ISO, ISO 24612:2012. Language resource management—Linguistic Annotation Framework. Technical Report, ISO/TC 37/SC 4, Language resource management (2012). https://www.iso.org/standard/37326.html
N. Ide, K. Suderman, GrAF: a graph-based format for linguistic annotations, in Proceedings of the 1st Linguistic Annotation Workshop (LAW 2007), Prague, 2007, pp. 1–8
Google Scholar
C. Chiarcos, S. Dipper, M. Götze, U. Leser, A. Lüdeling, J. Ritz, M. Stede, A flexible framework for integrating annotations from different tools and tag sets. TAL (Traitement Automatique des Langues) 49(2), 217 (2008)
Google Scholar
W. Bosma, P. Vossen, A. Soroa, G. Rigau, M. Tesconi, A. Marchetti, M. Monachini, C. Aliprandi, KAF: a generic semantic annotation format, in Proceedings of the 5th International Conference on Generative Approaches to the Lexicon GL 2009, Pisa, 2009
Google Scholar
R. Eckart, Choosing an XML database for linguistically annotated corpora, in Sprache und Datenverarbeitung. Proceedings of the KONVENS 2008 Workshop on Datenbanktechnologien für Hypermediale Linguistische Anwendungen, Berlin, 2008
Google Scholar
A. Burchardt, S. Padó, D. Spohr, A. Frank, U. Heid, Formalising multi-layer corpora in OWL/DL—Lexicon modelling, querying and consistency control, in Proceedings of the 3rd International Joint Conference on NLP (IJCNLP), Hyderabad, 2008, pp. 389–396
Google Scholar
S. Cassidy, An RDF realisation of LAF in the DaDa annotation server, in Proceedings of the 5th Joint ISO-ACL/SIGSEM Workshop on Interoperable Semantic Annotation (ISA-5), Hong Kong, 2010
Google Scholar
A. Fokkens, A. Soroa, Z. Beloki, N. Ockeloen, G. Rigau, W.R. van Hage, P. Vossen, NAF and GAF: linking linguistic annotations, in Proceedings of the 10th Joint ISO-ACL SIGSEM Workshop on Interoperable Semantic Annotation (2014), pp. 9–16
Google Scholar
E. Rubiera, L. Polo, D. Berrueta, A. El Ghali, TELIX: an RDF-based model for linguistic annotation, in Proceedings of the 9th Extended Semantic Web Conference (ESWC 2012), Heraklion, 2012
Google Scholar
S. Hellmann, J. Lehmann, S. Auer, M. Brümmer, Integrating NLP using linked data, in Proceedings of the 12th International Semantic Web Conference (ISWC). Lecture Notes in Computer Science, vol. 8219 (Springer, Heidelberg, 2013), pp. 98–113
Chapter Google Scholar
N. Ide, K. Suderman, E. Nyberg, J. Pustejovsky, M. Verhagen, LAPPS/Galaxy: current state and next steps, in Proceedings of the 3rd International Workshop on Worldwide Language Service Infrastructure and 2nd Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016) (2016), pp. 11–18
Google Scholar
O. Christ, A modular and flexible architecture for an integrated corpus query system, in Proceedings of the 3rd Conference on Computational Lexicography and Text Research (COMPLEX’94), Budapest, 1994
Google Scholar
A. Kilgarriff, V. Baisa, J. Bušta, M. Jakubíček, V. Kovář, J. Michelfeit, P. Rychlý, V. Suchomel, The Sketch Engine: ten years on. Lexicography 1(1), 7 (2014). https://doi.org/10.1007/s40607-014-0009-9
Article Google Scholar
C. Chiarcos, C. Fäth, CoNLL-RDF: Linked corpora done in an NLP-friendly way, in Proceedings of the 1st International Conference on Language, Data, and Knowledge, LDK 2017, ed. by J. Gracia, F. Bond, J.P. McCrae, P. Buitelaar, C. Chiarcos, S. Hellmann (Springer, Cham, 2017), pp. 74–88. https://doi.org/10.1007/978-3-319-59888-8_6
Google Scholar
J. Nivre, Ž. Agić, L. Ahrenberg, et al., Universal dependencies 1.4 (2016). http://hdl.handle.net/11234/1-1827
S. Brants, S. Hansen, Developments in the TIGER annotation scheme and their realization in the corpus, in Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC), Las Palmas, 2002, pp. 1643–1649
Google Scholar
W. Lezius, H. Biesinger, C. Gerstenberger, TigerXML quick reference guide. Technical Report, IMS, University of Stuttgart (2002)
Google Scholar
K.K. Schuler, VerbNet: a broad-coverage, comprehensive verb lexicon. Ph.D. thesis, University of Pennsylvania, Philadelphia, PA (2005). AAI3179808
Google Scholar
J. Eckle-Kohler, J. McCrae, C. Chiarcos, lemonUby—a large, interlinked, syntactically-rich resource for ontologies. Semantic Web J. 6(4), 371 (2015)
Article Google Scholar
C. Chiarcos, Interoperability of corpora and annotations, in Linked Data in Linguistics, ed. by C. Chiarcos, S. Nordhoff, S. Hellmann (Springer, Heidelberg, 2012), pp. 161–179
Chapter Google Scholar
C. Chiarcos, POWLA: modeling linguistic corpora in OWL/DL, in Proceedings of the 9th Extended Semantic Web Conference (ESWC-2012), Heraklion, 2012, pp. 225–239
Google Scholar
N. Mazziotta, Building the syntactic reference corpus of medieval French using NotaBene RDF annotation tool, in Proceedings of the 4th Linguistic Annotation Workshop (Association for Computational Linguistics, Stroudsburg, 2010), pp. 142–146
Google Scholar
S. Hellmann, J. Lehmann, S. Auer, M. Brümmer, Integrating NLP using linked data, in Proceedings of the 12th International Semantic Web Conference, 21–25 October 2013, Sydney, 2013. Also see http://persistence.uni-leipzig.org/nlp2rdf/
S. Dipper, M. Götze, Accessing heterogeneous linguistic data—generic XML-based representation and flexible visualization, in Proceedings of the 2nd Language & Technology Conference 2005, Poznan, 2005, pp. 23–30
Google Scholar
M.G. Stefanie Dipper, ANNIS: complex multilevel annotations in a linguistic database, in Proceedings of the 5th Workshop on NLP and XML (NLPXML-2006): Multi-Dimensional Markup in Natural Language Processing, Trento, 2006
Google Scholar
N. Ide, L. Romary, International standard for a Linguistic Annotation Framework. Nat. Lang. Eng. 10(3–4), 211 (2004)
Article Google Scholar
N. Ide, K. Suderman, GrAF: a graph-based format for linguistic annotations, in Proceedings of the Linguistic Annotation Workshop. Prague (Association for Computational Linguistics, Stroudsburg, 2007), pp. 1–8
Google Scholar
M. Stede, H. Bieler, S. Dipper, A. Suriyawongk, Summar: combining linguistics and statistics for text summarization, in Proceedings of the 17th European Conference on Artificial Intelligence (ECAI), Riva del Garda, 2006, pp. 827–828
Google Scholar
A. Zeldes, J. Ritz, A. Lüdeling, C. Chiarcos, ANNIS: a search tool for multi-layer annotated corpora, in Corpus Linguistics, Liverpool, 2009, pp. 20–23
Google Scholar
F. Zipser, L. Romary, A model oriented approach to the mapping of annotation formats using standards, in Proceedings of the Workshop on Language Resources and Language Technology Standards, collocated with LREC (LR&LTS 2010), Valetta, 2010
Google Scholar
N. Ide, C.F. Baker, C. Fellbaum, C.J. Fillmore, R. Passonneau, MASC: the manually annotated sub-corpus of American English, in Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC-2008), Marrakech, 2008, pp. 2455–2461
Google Scholar
D.A. de Araujo, S.J. Rigo, J.L.V. Barbosa, Ontology-based information extraction for juridical events with case studies in Brazilian legal realm. Artif. Intell. Law 25(4), 379 (2017)
Google Scholar
C. Chiarcos, C. Fäth, Graph-based annotation engineering: towards a gold corpus for Role and Reference Grammar, in Proceedings of the 2nd Conference on Language, Data and Knowledge (LDK). OpenAccess Series in Informatics (Schloss Dagstuhl, Leibniz-Zentrum fuer Informatik, 2019)
Google Scholar
C. Chiarcos, B. Kosmehl, C. Fäth, M. Sukhareva, Analyzing Middle High German syntax with RDF and SPARQL, in Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC) (Miyazaki, Japan, 2018)
Google Scholar
T. Krause, U. Leser, A. Lüdeling, graphANNIS: a fast query engine for deeply annotated linguistic corpora. J. Lang. Technol. Comput. Linguist. 31(1), 1 (2016)
Google Scholar
M. Marcus, B. Santorini, M.A. Marcinkiewicz, Building a large annotated corpus of English: the Penn Treebank. Comput. Linguist. 19(2), 313 (1993)
Google Scholar
P. Kingsbury, M. Palmer, From TreeBank to PropBank, in Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC), Las Palmas, 2002
Google Scholar
E. Hovy, M. Marcus, M. Palmer, L. Ramshaw, R. Weischedel, OntoNotes: the 90% solution, in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL) (Association for Computational Linguistics, New York, 2006), pp. 57–60
Google Scholar
L. Carlson, D. Marcu, M.E. Okurowski, Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory, in Current and New Directions in Discourse and Dialogue, ed. by J. van Kuppevelt, R. Smith. Text, Speech, and Language Technology, vol. 22, chap. 5 (Kluwer, Dordrecht, 2003)
Google Scholar
P. Mendes, M. Jakob, A. García-Silva, C. Bizer, DBpedia SpotLight: shedding light on the web of documents, in Proceedings of the 7th International Conference on Semantic Systems (I-Semantics 2011), Graz, 2011
Google Scholar
C. Lai, S. Bird, Querying and updating treebanks: a critical survey and requirements analysis, in Proceedings of the Australasian Language Technology Workshop (2004), pp. 139–146
Google Scholar
M. Kouylekov, S. Oepen, Semantic technologies for querying linguistic annotations: an experiment focusing on graph-structured data, in Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC) (Reykjavik, Iceland, 2014)
Google Scholar
A. Frank, C. Ivanovic, Building literary corpora for computational literary analysis—a prototype to bridge the gap between CL and DH, in Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC), Miyazaki, May 7–12, 2018
Google Scholar
P. Banski, J. Bingel, N. Diewald, E. Frick, M. Hanl, M. Kupietz, P. Pezik, C. Schnober, A. Witt, KorAP: the new corpus analysis platform at IDS Mannheim, in Proceedings of the 6th Language & Technology Conference on Human Language Technology Challenges for Computer Science and Linguistics, December 7–9, 2013, Poznan, (2014), pp. 586–587
Google Scholar
T. Krause, U. Leser, A. Lüdeling, graphANNIS: a fast query engine for deeply annotated linguistic corpora. JLCL 31(1), 1 (2016)
Google Scholar
B. Bohnet, J. Kuhn, The best of both worlds: a graph-based completion model for transition-based parsers, in Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (Association for Computational Linguistics, Stroudsburg, 2012), pp. 77–87
Google Scholar
F. Ferraro, M. Thomas, M.R. Gormley, T. Wolfe, C. Harman, B. Van Durme, Concretely annotated corpora, in Proceedings of the AKBC Workshop at NIPS (2014)
Google Scholar
N. Ide, J. Pustejovsky (eds.), Designing Annotation Schemes: From Model to Representation. Text, Speech, and Language Technology (Springer, Berlin, 2017)
Google Scholar
A. Pareja-Lora, M. Blume, B. Lust, C. Chiarcos (eds.), Development of Linguistic Linked Open Data Resources for Collaborative Data-Intensive Research in the Language Sciences (MIT Press, Cambridge, 2019)
Google Scholar
D. Cavar, O. Baldinger, U.M. Joshua Herring, Y. Zhang, S. Bedekar, S. Panicker, An annotation encoding schema for natural language processing using JSON: NLP JSON schema version 0.1, November 2018. Technical Report, Indiana University (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Semantic Computing Group, Bielefeld University, Bielefeld, Germany
Philipp Cimiano
Angewandte Computerlinguistik, Goethe-University, Frankfurt am Main, Germany
Christian Chiarcos
Insight Centre for Data Analytics, National University of Ireland, Galway, Ireland
John P. McCrae
Aragon Institute of Engineering Research (I3A), University of Zaragoza, Zaragoza, Spain
Jorge Gracia

Authors

Philipp Cimiano
View author publications
You can also search for this author in PubMed Google Scholar
Christian Chiarcos
View author publications
You can also search for this author in PubMed Google Scholar
John P. McCrae
View author publications
You can also search for this author in PubMed Google Scholar
Jorge Gracia
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cimiano, P., Chiarcos, C., McCrae, J.P., Gracia, J. (2020). Modelling Linguistic Annotations. In: Linguistic Linked Data. Springer, Cham. https://doi.org/10.1007/978-3-030-30225-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-30225-2_6
Published: 14 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30224-5
Online ISBN: 978-3-030-30225-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics