Abstract
While the Web of Data, the Web of Documents and Natural Language Processing are well researched individual fields, approaches to combine all three are fragmented and not yet well aligned. This chapter analyzes current efforts in collaborative knowledge extraction to uncover connection points between the three fields. The special focus is on three prominent RDF data sets (DBpedia, LinkedGeoData and Wiktionary2RDF), which allow users to influence the knowledge extraction process by adding another crowd-sourced layer on top. The recently published NLP Interchange Format (NIF) provides a way to annotate textual resources on the Web through the assignment of URIs with fragment identifiers. We will show how this formalism can easily be extended to encompass new annotation layers and vocabularies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
More data sets can be explored here: http://thedatahub.org/tag/published-by-third-party
- 14.
- 15.
- 16.
- 17.
- 18.
http://factforge.net or http://lod.openlinksw.com provide SPARQL interfaces to query billions of aggregated facts.
- 19.
- 20.
- 21.
- 22.
- 23.
For DBpedia Live see http://live.dbpedia.org/
- 24.
- 25.
- 26.
- 27.
See http://en.wiktionary.org/wiki/semantic for a simple example page
- 28.
- 29.
- 30.
- 31.
- 32.
- 33.
- 34.
- 35.
- 36.
- 37.
- 38.
For English see http://en.wiktionary.org/wiki/Wiktionary:ELE
- 39.
- 40.
for example http://wiktionary.dbpedia.org/resource/dog
- 41.
- 42.
- 43.
Note that with ‘/’ the identifier is sent to the server during a request (e.g. Linked Data), while everything after ‘#’ can only be processed by the client.
- 44.
- 45.
- 46.
for the resolution of prefixes, we refer the reader to http://prefix.cc
- 47.
- 48.
- 49.
- 50.
- 51.
- 52.
- 53.
- 54.
- 55.
- 56.
- 57.
References
Auer S, Lehmann J (2010) Making the web a data washing machine – creating knowledge out of interlinked data. Semant Web J 1:97–104
Auer S, Dietzold S, Lehmann J, Hellmann S, Aumueller D (2009) Triplify: light-weight linked data publication from relational databases. In: Proceedings of the 18th international conference on world wide web, WWW 2009, Madrid, Spain, 20–24 April 2009. ACM, pp 621–630
Auer S, Lehmann J, Hellmann S (2009) LinkedGeoData – adding a spatial dimension to the web of data. In: Proceedings of 8th international semantic web Conference (ISWC), Chantilly, VA, USA
Berners-Lee T (2006) Design issues: linked data. http://www.w3.org/DesignIssues/LinkedData.html
Bizer C (2011) Evolving the web into a global data space. http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/Bizer-GlobalDataSpace-Talk-BNCOD2011.pdf, keynote at 28th British National Conference on Databases (BNCOD2011)
Bizer C (2012) Dbpedia 3.8 released, including enlarged ontology and additional localized versions. http://tinyurl.com/dbpedia-3-8
Bühmann L, Lehmann J (2012) Universal owl axiom enrichment for large knowledge bases. In: Proceedings of EKAW 2012, Galway, Ireland. http://jens-lehmann.org/files/2012/ekaw_enrichment.pdf
Chiarcos C (2012) Ontologies of linguistic annotation: survey and perspectives. In: Proceedings of the eight international conference on language resources and evaluation (LREC’12), Istanbul, Turkey
Chiarcos C (2012) Powla: modeling linguistic corpora in owl/dl. In: Proceedings of 9th extended semantic web conference (ESWC2012), Heraklion, Crete, Greece
Chiarcos C, Hellmann S, Nordhoff S (2011) Towards a linguistic linked open data cloud: the open linguistics working group. TAL 52(3):245–275. http://www.atala.org/Towards-a-Linguistic-Linked-Open
Chiarcos C, Nordhoff S, Hellmann S (eds) (2012) Linked data in linguistics. Representing language data and metadata. Springer, Heidelberg. (ISBN 978-3-642-28248-5). http://www.springer.com/computer/ai/book/978-3-642-28248-5
Gruber TR (1993) A translation approach to portable ontology specifications. Knowl Acquis 5(2):199–220
Hellmann S, Unbehauen J, Chiarcos C, Ngonga Ngomo AC (2010) The TIGER corpus navigator. In: 9th international workshop on treebanks and linguistic theories (TLT-9), Tartu, Estonia, pp 91–102
Hellmann S, Lehmann J, Auer S (2012) Linked-data aware uri schemes for referencing text fragments. In: EKAW 2012, Galway, Ireland. Lecture notes in artificial intelligence (LNAI). Springer,
Hellmann S, Stadler C, Lehmann J (2012) The German DBpedia: a sense repository for linking entities. In: Chiarcos C, Nordhoff S, Hellmann S (eds) (2012) Linked data in linguistics. Representing language data and metadata. Springer, Berlin/New York, pp 181–190
Hepp M, Bachlechner D, Siorpaes K (2006) Harvesting wiki consensus – using wikipedia entries as ontology elements. In: Völkel M, Schaffert S (eds) Proceedings of the first workshop on semantic wikis – from wiki to semantics, co-located with the 3rd annual european semantic web conference (ESWC 2006), Budva, Montenegro. http://www.eswc2006.org/
Hepp M, Siorpaes K, Bachlechner D (2007) Harvesting wiki consensus: using wikipedia entries as vocabulary for knowledge management. IEEE Internet Comput 11(5):54–65
Ide N, Pustejovsky J (2010) What does interoperability mean, anyway? Toward an operational definition of interoperability. In: Proceedings of the second international conference on global interoperability for language resources (ICGL 2010), Hong Kong, China
Ide N, Suderman K (2007) GrAF: a graph-based format for linguistic annotations. In: Proceedings of the linguistic annotation workshop (LAW 2007), Prague, Czech Republic, pp 1–8
Khalili A, Auer S, Hladky D (2012) The rdfa content editor – from wysiwyg to wysiwym. In: Proceedings of COMPSAC 2012 – trustworthy software systems for the digital society, 16–20 July 2012, Izmir, Turkey. Best paper award
Kontokostas D, Bratsas C, Auer S, Hellmann S, Antoniou I, Metakides G (2011) Towards linked data internationalization – realizing the greek dbpedia. In: Proceedings of the ACM WebSci’11, Koblenz, Germany
Kontokostas D, Bratsas C, Auer S, Hellmann S, Antoniou I, Metakides G (2012) Internationalization of linked data: the case of the Greek DBpedia edition. J Web Semant 15:51–61
Lehmann J, Bizer C, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) DBpedia – a crystallization point for the web of data. J Web Semant 7(3):154–165
McCrae J, Cimiano P, Montiel-Ponsoda E (2012) Integrating WordNet and Wiktionary with lemon. In: Chiarcos C, Nordhoff S, Hellmann S (eds) Linked data in linguistics, Springer, Heidelberg. (ISBN 978-3-642-28248-5). http://www.springer.com/computer/ai/book/978-3-642-28248-5
Mendes PN, Jakob M, García-Silva A, Bizer C (2011) Dbpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th international conference on semantic systems (I-Semantics), Graz, Austria
Mendes PN, Jakob M, Bizer C (2012) Dbpedia for nlp: a multilingual cross-domain knowledge base. In: Proceedings of the eight international conference on language resources and evaluation (LREC’12), Istanbul, Turkey
Meyer CM, Gurevych I (2011) OntoWiktionary – constructing an ontology from the collaborative online dictionary wiktionary. In: Pazienza M, Stellato A (eds) Semi-automatic ontology development: processes and resources. IGI Global, Hershey, PA, USA. http://www.ukp.tudarmstadt.de/publications/details/?no_cache=1&tx_bibtex_pi1[pub_id]=TUD-CS-2011-0202&type=99&tx_bibtex_pi1[bibtex]=yes
Quasthoff M, Hellmann S, Höffner K (2009) Standardized multilingual language resources for the web of data: http://corpora.uni-leipzig.de/rdf. In: 3rd prize at the LOD triplification challenge, Graz. http://triplify.org/files/challenge_2009/languageresources.pdf
Rizzo G, Troncy R, Hellmann S, Brümmer M (2012) NERD meets NIF: lifting NLP extraction results to the LinkedData cloud. In: Proceedings of linked data on the web workshop (WWW), Lyon, France
Stadler C, Lehmann J, Höffner K, Auer S (2011) Linkedgeodata: a core for a web of spatial open data. Semant Web J 3(4):333–354. http://iospress.metapress.com/content/141w054666871326
Unbehauen J, Hellmann S, Auer S, Stadler C (2012) Knowledge extraction from structured sources. In: Search computing – broadening web search. Lecture Notes in Computer Science, vol 7538. Springer, Berlin/Heidelberg. http://link.springer.com/chapter/10.1007/978-3-642-34213-4_3
Wilde E, Duerst M (2008) URI fragment identifiers for the text/plain media type. http://tools.ietf.org/html/rfc5147, [Online; Accessed 13-April-2011]
Windhouwer M, Wright SE (2012) Linking to linguistic data categories in isocat. In: Chiarcos C, Nordhoff S, Hellmann S (eds) (2012) Linked data in linguistics. Representing language data and metadata. Springer, Berlin/New York
Acknowledgements
We would like to thank our colleagues from AKSW research group and the LOD2 project for their helpful comments during the development of NIF. Especially, we would like to thank Christian Chiarcos for his support while using OLiA and Jonas Brekle for his work on Wiktionary2RDF. This work was partially supported by a grant from the European Union’s 7th Framework Programme provided for the project LOD2 (GA no. 257943).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Hellmann, S., Auer, S. (2013). TowardsWeb-Scale Collaborative Knowledge Extraction. In: Gurevych, I., Kim, J. (eds) The People’s Web Meets NLP. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35085-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-35085-6_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35084-9
Online ISBN: 978-3-642-35085-6
eBook Packages: Computer ScienceComputer Science (R0)