Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2505515.2505598acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Assessing sparse information extraction using semantic contexts

Published: 27 October 2013 Publication History

Abstract

One important assumption of information extraction is that extractions occurring more frequently are more likely to be correct. Sparse information extraction is challenging because no matter how big a corpus is, there are extractions supported by only a small amount of evidence in the corpus. A pioneering work known as REALM learns HMMs to model the context of a semantic relationship for assessing the extractions. This is quite costly and the semantics revealed for the context are not explicit. In this work, we introduce a lightweight, explicit semantic approach for sparse information extraction. We use a large semantic network consisting of millions of concepts, entities, and attributes to explicitly model the context of semantic relationships. Experiments show that our approach improves the F-score of extraction by at least 11.2% over state-of-the-art, HMM based approaches while maintaining more efficiency.

Supplemental Material

ZIP File
All figures involved in the source file of CIKM342-Li.tex.

References

[1]
http://research.microsoft.com/en-us/projects/probase/.
[2]
E. Agichtein and L. Gravano. Snowball: Extracting relations from large plain-text collections. In ACL'00, pages 85--94, 2000.
[3]
A. Ahuja and D. Downey. Improved extraction assessment through better language models. In HLT'10, pages 225--228, 2010.
[4]
A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka, and T. M. Mitchell. Toward an architecture for never-ending language learning. In AAAI'10, pages 1306--1313, 2010.
[5]
O. Culotta and A. McCallum. Confidence estimation for information extraction. In HLT-AACL'04, pages 109--112, 2004.
[6]
B. Dalvi, W. W. Cohen, and J. Callan. Websets: Extracting sets of entities from the web using unsupervised information extraction. In WSDM'12, pages 243--252, 2012.
[7]
D. Downey, O. Etzioni, and S. Soderland. A probabilistic model of redundancy in information extraction. In IJCAI'05, pages 1034--1041, 2005.
[8]
D. Downey, S. Schoenmackers, and O. Etzioni. Sparse information extraction: Unsupervised language models to the rescue. In ACL'07, pages 696--703, 2007.
[9]
D. Downeya, O. Etzionib, and S. Soderland. Analysis of a probabilistic model of redundancy in unsupervised information extraction. ARTIFICIAL INTELLIGENCE, 174:726--748, 2010.
[10]
R. Feldman and B. Rosenfeld. Boosting unsupervised relation extraction by using ner. In EMNLP'06, pages 473--481, 2006.
[11]
Z. Harris. Distributional Structure. The Philosophy of Linguistics, 1985.
[12]
M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In COLING'92, pages 539--545, 1992.
[13]
J. Hoffart, F. M. Suchanek, K. Berberich, E. L. Kelham, G. de Melo, and G. Weikum. Yago2: Exploring and querying world knowledge in time, space, context, and many languages. In WWW'11, pages 229--232, 2011.
[14]
R. Hoffmann, C. Zhang, X. Ling, L. Zettlemoyer, and D. S. Weld. Knowledge-based weak supervision for information extraction of overlapping relations. In HLT'11, pages 541--550, 2011.
[15]
C. W. Leung, J. Jiang, K. M. A. Chai, H. L. Chieu, and L. N. Teow. Unsupervised information extraction with distributional prior knowledge. In EMNLP'11, pages 814--824, 2011.
[16]
T. Li, P. Chubak, L. V. Lakshmanan, and R. Pottinger. Efficient extraction of ontologies from domain specific text corpora. In CIKM'12, pages 1537--1541, 2012.
[17]
S. P. Ponzetto and M. Strube. Deriving a large-scale taxonomy from wikipedia. In AAAI'07, pages 1440--1445, 2007.
[18]
W. Wang, R. Besançon, O. Ferret, and B. Grau. Filtering and clustering relations for unsupervised information extraction in open domain. In CIKM'11, pages 1405--1414, 2011.
[19]
F. Wu and D. S. Weld. Open information extraction using wikipedia. In ACL'10, pages 118--127, 2010.
[20]
W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: a probabilistic taxonomy for text understanding. In SIGMOD'12, pages 481--492, 2012.
[21]
F. Xu, H. Uszkoreit, S. Krause, and H. Li. Boosting relation extraction with limited closed-world knowledge. In COLING'10, pages 1354--1362, 2010.
[22]
L. Yao, S. Riedel, and A. McCallum. Collective cross-document relation extraction without labeled data. In EMNLP'10, pages 1013--1023, 2010.
[23]
J. Zhu, Z. Nie, X. Liu, B. Zhang, and J. rong Wen. Statsnowball: a statistical approach to extracting entity relationships. In WWW'09, pages 101--110, 2009.

Cited By

View all
  • (2021)A survey on automatically constructed universal knowledge basesJournal of Information Science10.1177/016555152092134247:5(551-574)Online publication date: 1-Oct-2021
  • (2018)Enriching a thesaurus as a better question-answering tool and information retrieval aidJournal of Information Science10.1177/016555151770621944:4(512-525)Online publication date: 1-Aug-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
October 2013
2612 pages
ISBN:9781450322638
DOI:10.1145/2505515
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. semantic context
  2. semantic network
  3. semantic relationship
  4. sparse information extraction

Qualifiers

  • Research-article

Conference

CIKM'13
Sponsor:
CIKM'13: 22nd ACM International Conference on Information and Knowledge Management
October 27 - November 1, 2013
California, San Francisco, USA

Acceptance Rates

CIKM '13 Paper Acceptance Rate 143 of 848 submissions, 17%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2021)A survey on automatically constructed universal knowledge basesJournal of Information Science10.1177/016555152092134247:5(551-574)Online publication date: 1-Oct-2021
  • (2018)Enriching a thesaurus as a better question-answering tool and information retrieval aidJournal of Information Science10.1177/016555151770621944:4(512-525)Online publication date: 1-Aug-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media