Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2283396.2283398guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Open information extraction: the second generation

Published: 16 July 2011 Publication History

Abstract

How do we scale information extraction to the massive size and unprecedented heterogeneity of the Web corpus? Beginning in 2003, our KnowItAll project has sought to extract high-quality knowledge from the Web.
In 2007, we introduced the Open Information Extraction (Open IE) paradigm which eschews hand-labeled training examples, and avoids domain-specific verbs and nouns, to develop unlexicalized, domain-independent extractors that scale to the Web corpus. Open IE systems have extracted billions of assertions as the basis for both common-sense knowledge and novel question-answering systems.
This paper describes the second generation of Open IE systems, which rely on a novel model of how relations and their arguments are expressed in English sentences to double precision/recall compared with previous systems such as TEXTRUNNER and WOE.

References

[1]
David J. Allerton. Stretched Verb Constructions in English. Routledge Studies in Germanic Linguistics. Routledge (Taylor and Francis), New York, 2002.
[2]
Michele Banko and Oren Etzioni. The tradeoffs between open and traditional relation extraction. In ACL'08, 2008.
[3]
Michele Banko, Michael J. Cafarella, Stephen Soderland, Matt Broadhead, and Oren Etzioni. Open information extraction from the web. In IJCAI, 2007.
[4]
Jonathan Berant, Ido Dagan, and Jacob Goldberger. Global learning of typed entailment rules. In ACL'11, 2011.
[5]
Andrew Carlson, Justin Betteridge, Bryan Kisiel, Burr Settles, Estevam R. Hruschka Jr., and Tom M. Mitchell. Toward an architecture for never-ending language learning. In AAAI'10, 2010.
[6]
Andrew Carlson, Justin Betteridge, Richard C. Wang, Estevam R. Hruschka Jr., and Tom M. Mitchell. Coupled semi-supervised learning for information extraction. In WSDM 2010, 2010.
[7]
Xavier Carreras and Lluis Marquez. Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling, 2005.
[8]
Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. Learning Arguments for Open Information Extraction. Submitted, 2011.
[9]
Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. The tradeoffs between syntactic features and semantic roles for open information extraction. In Knowledge Capture (KCAP), 2011.
[10]
Oren Etzioni, Michele Banko, and Michael J. Cafarella. Machine reading. In Proceedings of the 21st National Conference on Artificial Intelligence, 2006.
[11]
Anthony Fader, Stephen Soderland, and Oren Etzioni. Identifying Relations for Open Information Extraction. Submitted, 2011.
[12]
Gregory Grefenstette and Simone Teufel. Corpus-based method for automatic identification of support verbs for nominalizations. In EACL'95, 1995.
[13]
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. The weka data mining software: An update. SIGKDD Explorations, 1(1), 2009.
[14]
Raphael Hoffmann, Congle Zhang, and Daniel S. Weld. Learning 5000 relational extractors. In ACL '10, 2010.
[15]
Raphael Hoffmann, Congle Zhang, Xiao Ling, Luke Zettlemoyer, and Daniel S. Weld. Distant supervision for information extraction of overlapping relations. In ACL '11, 2011.
[16]
J. Kim and D. Moldovan. Acquisition of semantic patterns for information extraction from corpora. In Procs. of Ninth IEEE Conference on Artificial Intelligence for Applications, pages 171-176, 1993.
[17]
Thomas Lin, Mausam, and Oren Etzioni. Identifying Functional Relations in Web Text. In EMNLP'10, 2010.
[18]
Andres McCallum. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu, 2002.
[19]
Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky. Distant supervision for relation extraction without labeled data. In ACL-IJCNLP'09, 2009.
[20]
E. Riloff. Automatically constructing extraction patterns from untagged text. In AAAI'96, 1996.
[21]
Alan Ritter, Mausam, and Oren Etzioni. A Latent Dirichlet Allocation Method for Selectional Preferences. In ACL, 2010.
[22]
Alan Ritter, Sam Clark, Mausam, and Oren Etzioni. Named Entity Recognition in Tweets: An Experimental Study. Submitted, 2011.
[23]
Stefan Schoenmackers, Oren Etzioni, Daniel S. Weld, and Jesse Davis. Learning first-order horn clauses from web text. In EMNLP'10, 2010.
[24]
Yusuke Shinyama and Satoshi Sekine. Preemptive Information Extraction using Unrestricted Relation Discovery. In NAACL'06, 2006.
[25]
Stephen Soderland, Brendan Roof, Bo Qin, Shi Xu, Mausam, and Oren Etzioni. Adapting open information extraction to domain-specific relations. AI Magazine, 31(3):93- 102, 2010.
[26]
S. Soderland. Learning Information Extraction Rules for Semi-Structured and Free Text. Machine Learning, 34(1-3):233-272, 1999.
[27]
Suzanne Stevenson, Afsaneh Fazly, and Ryan North. Statistical measures of the semi-productivity of light verb constructions. In 2nd ACL Workshop on Multiword Expressions, pages 1-8, 2004.
[28]
Fei Wu and Daniel S. Weld. Open information extraction using Wikipedia. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL '10, pages 118-127, Morristown, NJ, USA, 2010. Association for Computational Linguistics.
[29]
A. Yates and O. Etzioni. Unsupervised methods for determining object and relation synonyms on the web. Journal of Artificial Intelligence Research, 34(1):255-296, 2009.
[30]
Jun Zhu, Zaiqing Nie, Xiaojiang Liu, Bo Zhang, and Ji-Rong Wen. StatSnowball: a statistical approach to extracting entity relationships. In WWW'09, 2009.

Cited By

View all
  • (2020)Creating Hardware Component Knowledge Bases with Training Data Generation and Multi-task LearningACM Transactions on Embedded Computing Systems10.1145/339190619:6(1-26)Online publication date: 29-Sep-2020
  • (2019)Approximate Definitional Constructs as Lightweight Evidence for Detecting Classes Among Wikipedia ArticlesProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358167(2373-2376)Online publication date: 3-Nov-2019
  • (2019)Automating the generation of hardware component knowledge basesProceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3316482.3326344(163-176)Online publication date: 23-Jun-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
IJCAI'11: Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One
July 2011
704 pages
ISBN:9781577355137

Sponsors

  • The International Joint Conferences on Artificial Intelligence, Inc. (IJCAI)

Publisher

AAAI Press

Publication History

Published: 16 July 2011

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Creating Hardware Component Knowledge Bases with Training Data Generation and Multi-task LearningACM Transactions on Embedded Computing Systems10.1145/339190619:6(1-26)Online publication date: 29-Sep-2020
  • (2019)Approximate Definitional Constructs as Lightweight Evidence for Detecting Classes Among Wikipedia ArticlesProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358167(2373-2376)Online publication date: 3-Nov-2019
  • (2019)Automating the generation of hardware component knowledge basesProceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3316482.3326344(163-176)Online publication date: 23-Jun-2019
  • (2019)A Novel Unsupervised Approach for Precise Temporal Slot Filling from Incomplete and Noisy Temporal ContextsThe World Wide Web Conference10.1145/3308558.3313435(3328-3334)Online publication date: 13-May-2019
  • (2019)Lightweight Lexical and Semantic Evidence for Detecting Classes Among Wikipedia ArticlesProceedings of the Twelfth ACM International Conference on Web Search and Data Mining10.1145/3289600.3291020(78-86)Online publication date: 30-Jan-2019
  • (2019)Conceptual Representations for Computational Concept CreationACM Computing Surveys10.1145/318672952:1(1-33)Online publication date: 25-Feb-2019
  • (2019)Utilizing structured knowledge bases in open IE based event template extractionApplied Intelligence10.1007/s10489-018-1269-049:1(206-219)Online publication date: 1-Jan-2019
  • (2019)Predicting hypernym---hyponym relations for Chinese taxonomy learningKnowledge and Information Systems10.1007/s10115-018-1166-158:3(585-610)Online publication date: 1-Mar-2019
  • (2018)Enriching a thesaurus as a better question-answering tool and information retrieval aidJournal of Information Science10.1177/016555151770621944:4(512-525)Online publication date: 1-Aug-2018
  • (2018)Relation Extraction Using Distant SupervisionACM Computing Surveys10.1145/324174151:5(1-35)Online publication date: 19-Nov-2018
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media