Abstract
Modern NLP systems rely either on unsupervised methods, or on data created as part of governmental initiatives such as MUC, ACE, or GALE. The data created in these efforts tend to be annotated according to task-specific schemes. The Anaphoric Bank is an attempt to create large quantities of data annotated with anaphoric information according to a general purpose and linguistically motivated scheme. We do this by pooling smaller amounts of data annotated according to rich schemes that are by and large compatible, and by taking advantage of Web collaboration. In this chapter we discuss the markup infrastructure that underpins the two modalities of Web collaboration in the project: expert annotation and game-based annotation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
von Ahn, L.: Games with a purpose. Computer 39(6), 92–94 (2006)
Bird, S., Liberman, M.: Annotation graphs as a framework for multidimensional linguistic data analysis. In: Proceedings of the Workshop ”Towards Standards and Tools for Discourse Tagging”, Association for Computational Linguistics, pp. 1–10 (1999), http://xxx.lanl.gov/abs/cs.CL/9907003
Broeder, D., Kemps-Snijders, M., Uytvanck, D.V., Windhouwer, M., Withers, P., Wittenburg, P., Zinn, C.: A data category registry- and component-based metadata framework. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC), European Language Resources Association (ELRA), Valletta, Malta, pp. 43–47 (2010)
Chamberlain, J., Poesio, M., Kruschwitz, U.: Phrase Detectives: A Web-based collaborative annotation game. In: iSemantics (2008)
Clark, H.H.: Bridging. In: Johnson-Laird, P.N., Wason, P.C. (eds.) Thinking: Readings in Cognitive Science, pp. 411–420. Cambridge University Press, Cambridge (1977)
DCMI Usage Board, DCMI Metadata Terms. DCMI Recommendation, Dublin Core Metadata Initiative (2006), http://dublincore.org/documents/dcmi-terms/
van Deemter, K., Kibble, R.: On coreferring: Coreference in MUC and related annotation schemes. Computational Linguistics 26(4), 629–637 (2000)
Diewald, N.: Serengeti – A brief Starting Guide. Technical manual (2008), http://www.text-technology.de/publications/serengeti_guide.pdf
Diewald, N., Stührenberg, M., Garbar, A., Goecke, D.: Serengeti – Webbasierte Annotation semantischer Relationen. LDV Forum 23(2) (2008)
Dipper, S.: XML-based Stand-off Representation and Exploitation of Multi-Level Linguistic Annotation. In: Proceedings of Berliner XML Tage 2005 (BXML 2005), Berlin, Germany, pp. 39–50 (2005)
Garrett, J.J.: Ajax: A new approach to web applications (2005), http://adaptivepath.com/ideas/essays/archives/000385.php , http://adaptivepath.com/ideas/essays/archives/000385.php
Hirschman, L.: MUC-7 coreference task definition, version 3.0. In: Chinchor, N. (ed.) Proceedings of the 7th Message Understanding Conference (1998), http://www.muc.saic.com/proceedings/muc_7_toc.html
Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., Weischedel, R.: Ontonotes: the 90% solution. In: Proc. HLT-NAACL (2006)
Ide, N., Suderman, K.: GrAF: A Graph-based Format for Linguistic Annotations. In: Proceedings of the Linguistic Annotation Workshop, Association for Computational Linguistics, Prague, Czech Republic, pp. 1–8 (2007)
IMDI (ISLE Metadata Initiative) Metadata Elements for Session Descriptions. version 3.0.4. Reference Document, MPI, Nijmegen (2003), http://www.mpi.nl/IMDI/documents/Proposals/IMDI_MetaData_3.0.4.pdf
IMDI (ISLE Metadata Initiative) Metadata Elements for Catalogue Descriptions. version 3.0.0. Tech. rep., MPI, Nijmegen (2004), http://www.mpi.nl/IMDI/documents/Proposals/IMDI_Catalogue_3.0.0.pdf
Johnson, N.L., Rasmussen, S., Joslyn, C., Rocha, L., Smith, S., Kantor, M.: Symbiotic Intelligence: Self-Organizing Knowledge on Distributed Networks Driven by Human Interaction. In: Proceedings of the Sixth International Conference on Artificial Life. MIT Press, Cambridge (1998)
Jurafsky, D., Martin, J.H.: Speech and Language Processing, 2nd edn. Prentice-Hall, Englewood Cliffs (2008)
Krasavina, O., Chiarcos, C.: PoCoS – Potsdam Coreference Scheme. In: Proceedings of The Linguistic Annotation Workshop, Association for Computational Linguistics, pp. 156–163 (2007), http://acl.ldc.upenn.edu/W/W07/W07-1525.pdf
Kruschwitz, U., Chamberlain, J., Poesio, M.: (Linguistic) Science Through Web Collaboration in the ANAWIKI Project. In: Proceedings of WebSci 2009, Athens (2009)
Morton, T., LaCivita, J.: WordFreak: An Open Tool for Linguistic Annotation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Edmonton, Canada, pp. 17–18 (2003)
Müller, C., Strube, M.: Multi-level annotation of linguistic data with mmax2. In: Braun, S., Kohn, K., Mukherjee, J. (eds.) Corpus Technology and Language Pedagogy. New Resources, New Tools, New Methods, English Corpus Linguistics, Peter Lang, vol. 3, pp. 197–214 (2006)
Navarretta, C.: Abstract anaphora resolution in Danish. In: Dybkjaer, L., Hasida, K., Traum, D. (eds.) Proc. of the 1st SIGdial Workshop on Discourse and Dialogue, ACL, pp. 56–65 (2000)
Orăsan C, PALinkA: A highly customisable tool for discourse annotation. In: Proceedings of the Fourth SIGdial Workshop on Discourse and Dialogue, Sapporo, Japan (2003)
Poesio, M.: Discourse annotation and semantic annotation in the GNOME corpus. In: Proc. of the ACL Workshop on Discourse Annotation, Barcelona, pp. 72–79 (2004)
Poesio, M.: The MATE/GNOME scheme for anaphoric annotation, revisited. In: Proceedings of SIGDIAL, Boston (2004)
Poesio, M., Artstein, R.: The reliability of anaphoric annotation, reconsidered: Taking ambiguity into account. In: Proceedings of The ACL Workshop on Frontiers in Corpus Annotation, Association for Computational Linguistics, pp. 76–83 (2005), http://acl.ldc.upenn.edu/W/W05/W05-0311.pdf
Sasaki, F., Wegener, C., Witt, A., Metzing, D., Pönninghaus, J.: Co-reference annotation and resources: A multilingual corpus of typologically diverse languages. In: Proceedings of the 3nd International Conference on Language Resources and Evaluation (LREC-2002), Las Palmas, Spain (2002)
Simons, G., Bird, S.: OLAC Metadata. OLAC: Open Language Archives Community (2003), http://www.language-archives.org/OLAC/metadata.html
Siorpaes, K., Hepp, M.: Games with a purpose for the semantic web. IEEE Intelligent Systems 23(3), 50–60 (2008)
Stührenberg, M., Goecke, D.: SGF – An integrated model for multiple annotations and its application in a linguistic domain. In: Proceedings of Balisage: The Markup Conference, Montreal, Kanada (2008), http://www.balisage.net/Proceedings/html/2008/Stuehrenberg01/Balisage2008-Stuehrenberg01.html
Stührenberg, M., Jettka, D.: A toolkit for multi-dimensional markup: The development of SGF to XStandoff. In: Proceedings of Balisage: The Markup Conference, Montréal, Québec, Balisage Series on Markup Technologies (2009)
Stührenberg, M., Goecke, D., Diewald, N., Cramer, I., Mehler, A.: Webbased Annotation of Anaphoric Relations and Lexical Chains. In: Proceedings of The Linguistic Annotation Workshop, Association for Computational Linguistics, pp. 140–147 (2007), http://acl.ldc.upenn.edu/W/W07/W07-1523.pdf
Thompson, H.S., McKelvie, D.: Hyperlink semantics for standoff markup of read-only documents. In: Proceedings of SGML Europe 1997: The next decade – Pushing the Envelope, Barcelona, pp. 227–229 (1997), http://www.ltg.ed.ac.uk/~ht/sgmleu97.html
Waltinger, U., Mehler, A., Stührenberg, M.: An integrated model of lexical chaining: application, resources and its format. In: Storrer, A., Geyken, A., Siebert, A., Würzner, K.M. (eds) KONVENS 2008 – Ergänzungsband Textressourcen und lexikalisches Wissen, Berlin, pp. 59–70 (2008)
Witt, A., Goecke, D., Sasaki, F., Lüngen, H.: Unification of XML Documents with Concurrent Markup. Literary and Lingustic Computing 20(1), 103–116 (2005)
Witt, A., Stührenberg, M., Goecke, D., Metzing, D.: Integrated linguistic annotation models and their application in the domain of antecedent detection. In: Mehler, A., Kühnberger, K.U., Lobin H., Lüngen, H., Storrer, A., Witt, A. (eds.) Modelling, Learning and Processing of Text Technological Data Structures, Studies in Computational Intelligence, Springer, Heidelberg (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Poesio, M. et al. (2011). Markup Infrastructure for the Anaphoric Bank: Supporting Web Collaboration. In: Mehler, A., Kühnberger, KU., Lobin, H., Lüngen, H., Storrer, A., Witt, A. (eds) Modeling, Learning, and Processing of Text Technological Data Structures. Studies in Computational Intelligence, vol 370. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22613-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-22613-7_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22612-0
Online ISBN: 978-3-642-22613-7
eBook Packages: EngineeringEngineering (R0)