Abstract
This chapter describes how linguistic annotations can be represented in RDF. Web Annotation and NIF provide the means to reference text segments on the web. Yet, representing linguistic annotations requires appropriate vocabularies. We discuss relevant vocabularies and illustrate how they can be applied to support annotation at different levels.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
N. Ide, C. Chiarcos, M. Stede, S. Cassidy, Designing annotation schemes: from model to representation, in Handbook of Linguistic Annotation, ed. by N. Ide, J. Pustejovsky. Text, Speech, and Language Technology (Springer, Berlin, 2017)
S. Bird, M. Liberman, A formal framework for linguistic annotation. Speech Commun. 33(1–2), 23 (2001)
N. Ide, K. Suderman, The Linguistic Annotation Framework: a standard for annotation interchange and merging. Lang. Resour. Eval. 48(3), 395 (2014)
ISO, ISO 24612:2012. Language resource management—Linguistic Annotation Framework. Technical Report, ISO/TC 37/SC 4, Language resource management (2012). https://www.iso.org/standard/37326.html
N. Ide, K. Suderman, GrAF: a graph-based format for linguistic annotations, in Proceedings of the 1st Linguistic Annotation Workshop (LAW 2007), Prague, 2007, pp. 1–8
C. Chiarcos, S. Dipper, M. Götze, U. Leser, A. Lüdeling, J. Ritz, M. Stede, A flexible framework for integrating annotations from different tools and tag sets. TAL (Traitement Automatique des Langues) 49(2), 217 (2008)
W. Bosma, P. Vossen, A. Soroa, G. Rigau, M. Tesconi, A. Marchetti, M. Monachini, C. Aliprandi, KAF: a generic semantic annotation format, in Proceedings of the 5th International Conference on Generative Approaches to the Lexicon GL 2009, Pisa, 2009
R. Eckart, Choosing an XML database for linguistically annotated corpora, in Sprache und Datenverarbeitung. Proceedings of the KONVENS 2008 Workshop on Datenbanktechnologien für Hypermediale Linguistische Anwendungen, Berlin, 2008
A. Burchardt, S. Padó, D. Spohr, A. Frank, U. Heid, Formalising multi-layer corpora in OWL/DL—Lexicon modelling, querying and consistency control, in Proceedings of the 3rd International Joint Conference on NLP (IJCNLP), Hyderabad, 2008, pp. 389–396
S. Cassidy, An RDF realisation of LAF in the DaDa annotation server, in Proceedings of the 5th Joint ISO-ACL/SIGSEM Workshop on Interoperable Semantic Annotation (ISA-5), Hong Kong, 2010
A. Fokkens, A. Soroa, Z. Beloki, N. Ockeloen, G. Rigau, W.R. van Hage, P. Vossen, NAF and GAF: linking linguistic annotations, in Proceedings of the 10th Joint ISO-ACL SIGSEM Workshop on Interoperable Semantic Annotation (2014), pp. 9–16
E. Rubiera, L. Polo, D. Berrueta, A. El Ghali, TELIX: an RDF-based model for linguistic annotation, in Proceedings of the 9th Extended Semantic Web Conference (ESWC 2012), Heraklion, 2012
S. Hellmann, J. Lehmann, S. Auer, M. Brümmer, Integrating NLP using linked data, in Proceedings of the 12th International Semantic Web Conference (ISWC). Lecture Notes in Computer Science, vol. 8219 (Springer, Heidelberg, 2013), pp. 98–113
N. Ide, K. Suderman, E. Nyberg, J. Pustejovsky, M. Verhagen, LAPPS/Galaxy: current state and next steps, in Proceedings of the 3rd International Workshop on Worldwide Language Service Infrastructure and 2nd Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016) (2016), pp. 11–18
O. Christ, A modular and flexible architecture for an integrated corpus query system, in Proceedings of the 3rd Conference on Computational Lexicography and Text Research (COMPLEX’94), Budapest, 1994
A. Kilgarriff, V. Baisa, J. Bušta, M. Jakubíček, V. Kovář, J. Michelfeit, P. Rychlý, V. Suchomel, The Sketch Engine: ten years on. Lexicography 1(1), 7 (2014). https://doi.org/10.1007/s40607-014-0009-9
C. Chiarcos, C. Fäth, CoNLL-RDF: Linked corpora done in an NLP-friendly way, in Proceedings of the 1st International Conference on Language, Data, and Knowledge, LDK 2017, ed. by J. Gracia, F. Bond, J.P. McCrae, P. Buitelaar, C. Chiarcos, S. Hellmann (Springer, Cham, 2017), pp. 74–88. https://doi.org/10.1007/978-3-319-59888-8_6
J. Nivre, Ž. Agić, L. Ahrenberg, et al., Universal dependencies 1.4 (2016). http://hdl.handle.net/11234/1-1827
S. Brants, S. Hansen, Developments in the TIGER annotation scheme and their realization in the corpus, in Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC), Las Palmas, 2002, pp. 1643–1649
W. Lezius, H. Biesinger, C. Gerstenberger, TigerXML quick reference guide. Technical Report, IMS, University of Stuttgart (2002)
K.K. Schuler, VerbNet: a broad-coverage, comprehensive verb lexicon. Ph.D. thesis, University of Pennsylvania, Philadelphia, PA (2005). AAI3179808
J. Eckle-Kohler, J. McCrae, C. Chiarcos, lemonUby—a large, interlinked, syntactically-rich resource for ontologies. Semantic Web J. 6(4), 371 (2015)
C. Chiarcos, Interoperability of corpora and annotations, in Linked Data in Linguistics, ed. by C. Chiarcos, S. Nordhoff, S. Hellmann (Springer, Heidelberg, 2012), pp. 161–179
C. Chiarcos, POWLA: modeling linguistic corpora in OWL/DL, in Proceedings of the 9th Extended Semantic Web Conference (ESWC-2012), Heraklion, 2012, pp. 225–239
N. Mazziotta, Building the syntactic reference corpus of medieval French using NotaBene RDF annotation tool, in Proceedings of the 4th Linguistic Annotation Workshop (Association for Computational Linguistics, Stroudsburg, 2010), pp. 142–146
S. Hellmann, J. Lehmann, S. Auer, M. Brümmer, Integrating NLP using linked data, in Proceedings of the 12th International Semantic Web Conference, 21–25 October 2013, Sydney, 2013. Also see http://persistence.uni-leipzig.org/nlp2rdf/
S. Dipper, M. Götze, Accessing heterogeneous linguistic data—generic XML-based representation and flexible visualization, in Proceedings of the 2nd Language & Technology Conference 2005, Poznan, 2005, pp. 23–30
M.G. Stefanie Dipper, ANNIS: complex multilevel annotations in a linguistic database, in Proceedings of the 5th Workshop on NLP and XML (NLPXML-2006): Multi-Dimensional Markup in Natural Language Processing, Trento, 2006
N. Ide, L. Romary, International standard for a Linguistic Annotation Framework. Nat. Lang. Eng. 10(3–4), 211 (2004)
N. Ide, K. Suderman, GrAF: a graph-based format for linguistic annotations, in Proceedings of the Linguistic Annotation Workshop. Prague (Association for Computational Linguistics, Stroudsburg, 2007), pp. 1–8
M. Stede, H. Bieler, S. Dipper, A. Suriyawongk, Summar: combining linguistics and statistics for text summarization, in Proceedings of the 17th European Conference on Artificial Intelligence (ECAI), Riva del Garda, 2006, pp. 827–828
A. Zeldes, J. Ritz, A. Lüdeling, C. Chiarcos, ANNIS: a search tool for multi-layer annotated corpora, in Corpus Linguistics, Liverpool, 2009, pp. 20–23
F. Zipser, L. Romary, A model oriented approach to the mapping of annotation formats using standards, in Proceedings of the Workshop on Language Resources and Language Technology Standards, collocated with LREC (LR<S 2010), Valetta, 2010
N. Ide, C.F. Baker, C. Fellbaum, C.J. Fillmore, R. Passonneau, MASC: the manually annotated sub-corpus of American English, in Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC-2008), Marrakech, 2008, pp. 2455–2461
D.A. de Araujo, S.J. Rigo, J.L.V. Barbosa, Ontology-based information extraction for juridical events with case studies in Brazilian legal realm. Artif. Intell. Law 25(4), 379 (2017)
C. Chiarcos, C. Fäth, Graph-based annotation engineering: towards a gold corpus for Role and Reference Grammar, in Proceedings of the 2nd Conference on Language, Data and Knowledge (LDK). OpenAccess Series in Informatics (Schloss Dagstuhl, Leibniz-Zentrum fuer Informatik, 2019)
C. Chiarcos, B. Kosmehl, C. Fäth, M. Sukhareva, Analyzing Middle High German syntax with RDF and SPARQL, in Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC) (Miyazaki, Japan, 2018)
T. Krause, U. Leser, A. Lüdeling, graphANNIS: a fast query engine for deeply annotated linguistic corpora. J. Lang. Technol. Comput. Linguist. 31(1), 1 (2016)
M. Marcus, B. Santorini, M.A. Marcinkiewicz, Building a large annotated corpus of English: the Penn Treebank. Comput. Linguist. 19(2), 313 (1993)
P. Kingsbury, M. Palmer, From TreeBank to PropBank, in Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC), Las Palmas, 2002
E. Hovy, M. Marcus, M. Palmer, L. Ramshaw, R. Weischedel, OntoNotes: the 90% solution, in Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL) (Association for Computational Linguistics, New York, 2006), pp. 57–60
L. Carlson, D. Marcu, M.E. Okurowski, Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory, in Current and New Directions in Discourse and Dialogue, ed. by J. van Kuppevelt, R. Smith. Text, Speech, and Language Technology, vol. 22, chap. 5 (Kluwer, Dordrecht, 2003)
P. Mendes, M. Jakob, A. García-Silva, C. Bizer, DBpedia SpotLight: shedding light on the web of documents, in Proceedings of the 7th International Conference on Semantic Systems (I-Semantics 2011), Graz, 2011
C. Lai, S. Bird, Querying and updating treebanks: a critical survey and requirements analysis, in Proceedings of the Australasian Language Technology Workshop (2004), pp. 139–146
M. Kouylekov, S. Oepen, Semantic technologies for querying linguistic annotations: an experiment focusing on graph-structured data, in Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC) (Reykjavik, Iceland, 2014)
A. Frank, C. Ivanovic, Building literary corpora for computational literary analysis—a prototype to bridge the gap between CL and DH, in Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC), Miyazaki, May 7–12, 2018
P. Banski, J. Bingel, N. Diewald, E. Frick, M. Hanl, M. Kupietz, P. Pezik, C. Schnober, A. Witt, KorAP: the new corpus analysis platform at IDS Mannheim, in Proceedings of the 6th Language & Technology Conference on Human Language Technology Challenges for Computer Science and Linguistics, December 7–9, 2013, Poznan, (2014), pp. 586–587
T. Krause, U. Leser, A. Lüdeling, graphANNIS: a fast query engine for deeply annotated linguistic corpora. JLCL 31(1), 1 (2016)
B. Bohnet, J. Kuhn, The best of both worlds: a graph-based completion model for transition-based parsers, in Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics (Association for Computational Linguistics, Stroudsburg, 2012), pp. 77–87
F. Ferraro, M. Thomas, M.R. Gormley, T. Wolfe, C. Harman, B. Van Durme, Concretely annotated corpora, in Proceedings of the AKBC Workshop at NIPS (2014)
N. Ide, J. Pustejovsky (eds.), Designing Annotation Schemes: From Model to Representation. Text, Speech, and Language Technology (Springer, Berlin, 2017)
A. Pareja-Lora, M. Blume, B. Lust, C. Chiarcos (eds.), Development of Linguistic Linked Open Data Resources for Collaborative Data-Intensive Research in the Language Sciences (MIT Press, Cambridge, 2019)
D. Cavar, O. Baldinger, U.M. Joshua Herring, Y. Zhang, S. Bedekar, S. Panicker, An annotation encoding schema for natural language processing using JSON: NLP JSON schema version 0.1, November 2018. Technical Report, Indiana University (2018)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Cimiano, P., Chiarcos, C., McCrae, J.P., Gracia, J. (2020). Modelling Linguistic Annotations. In: Linguistic Linked Data. Springer, Cham. https://doi.org/10.1007/978-3-030-30225-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-30225-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30224-5
Online ISBN: 978-3-030-30225-2
eBook Packages: Computer ScienceComputer Science (R0)