research-article

Assessing sparse information extraction using semantic contexts

Authors:

Xindong WuAuthors Info & Claims

CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

Pages 1709 - 1714

https://doi.org/10.1145/2505515.2505598

Published: 27 October 2013 Publication History

Abstract

One important assumption of information extraction is that extractions occurring more frequently are more likely to be correct. Sparse information extraction is challenging because no matter how big a corpus is, there are extractions supported by only a small amount of evidence in the corpus. A pioneering work known as REALM learns HMMs to model the context of a semantic relationship for assessing the extractions. This is quite costly and the semantics revealed for the context are not explicit. In this work, we introduce a lightweight, explicit semantic approach for sparse information extraction. We use a large semantic network consisting of millions of concepts, entities, and attributes to explicitly model the context of semantic relationships. Experiments show that our approach improves the F-score of extraction by at least 11.2% over state-of-the-art, HMM based approaches while maintaining more efficiency.

Supplemental Material

ZIP File

All figures involved in the source file of CIKM342-Li.tex.

Download
940.37 KB

References

[1]

http://research.microsoft.com/en-us/projects/probase/.

[2]

E. Agichtein and L. Gravano. Snowball: Extracting relations from large plain-text collections. In ACL'00, pages 85--94, 2000.

Digital Library

[3]

A. Ahuja and D. Downey. Improved extraction assessment through better language models. In HLT'10, pages 225--228, 2010.

Digital Library

[4]

A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. Hruschka, and T. M. Mitchell. Toward an architecture for never-ending language learning. In AAAI'10, pages 1306--1313, 2010.

[5]

O. Culotta and A. McCallum. Confidence estimation for information extraction. In HLT-AACL'04, pages 109--112, 2004.

Digital Library

[6]

B. Dalvi, W. W. Cohen, and J. Callan. Websets: Extracting sets of entities from the web using unsupervised information extraction. In WSDM'12, pages 243--252, 2012.

Digital Library

[7]

D. Downey, O. Etzioni, and S. Soderland. A probabilistic model of redundancy in information extraction. In IJCAI'05, pages 1034--1041, 2005.

Digital Library

[8]

D. Downey, S. Schoenmackers, and O. Etzioni. Sparse information extraction: Unsupervised language models to the rescue. In ACL'07, pages 696--703, 2007.

[9]

D. Downeya, O. Etzionib, and S. Soderland. Analysis of a probabilistic model of redundancy in unsupervised information extraction. ARTIFICIAL INTELLIGENCE, 174:726--748, 2010.

Digital Library

[10]

R. Feldman and B. Rosenfeld. Boosting unsupervised relation extraction by using ner. In EMNLP'06, pages 473--481, 2006.

Digital Library

[11]

Z. Harris. Distributional Structure. The Philosophy of Linguistics, 1985.

[12]

M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In COLING'92, pages 539--545, 1992.

Digital Library

[13]

J. Hoffart, F. M. Suchanek, K. Berberich, E. L. Kelham, G. de Melo, and G. Weikum. Yago2: Exploring and querying world knowledge in time, space, context, and many languages. In WWW'11, pages 229--232, 2011.

Digital Library

[14]

R. Hoffmann, C. Zhang, X. Ling, L. Zettlemoyer, and D. S. Weld. Knowledge-based weak supervision for information extraction of overlapping relations. In HLT'11, pages 541--550, 2011.

Digital Library

[15]

C. W. Leung, J. Jiang, K. M. A. Chai, H. L. Chieu, and L. N. Teow. Unsupervised information extraction with distributional prior knowledge. In EMNLP'11, pages 814--824, 2011.

[16]

T. Li, P. Chubak, L. V. Lakshmanan, and R. Pottinger. Efficient extraction of ontologies from domain specific text corpora. In CIKM'12, pages 1537--1541, 2012.

Digital Library

[17]

S. P. Ponzetto and M. Strube. Deriving a large-scale taxonomy from wikipedia. In AAAI'07, pages 1440--1445, 2007.

Digital Library

[18]

W. Wang, R. Besançon, O. Ferret, and B. Grau. Filtering and clustering relations for unsupervised information extraction in open domain. In CIKM'11, pages 1405--1414, 2011.

Digital Library

[19]

F. Wu and D. S. Weld. Open information extraction using wikipedia. In ACL'10, pages 118--127, 2010.

Digital Library

[20]

W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: a probabilistic taxonomy for text understanding. In SIGMOD'12, pages 481--492, 2012.

Digital Library

[21]

F. Xu, H. Uszkoreit, S. Krause, and H. Li. Boosting relation extraction with limited closed-world knowledge. In COLING'10, pages 1354--1362, 2010.

Digital Library

[22]

L. Yao, S. Riedel, and A. McCallum. Collective cross-document relation extraction without labeled data. In EMNLP'10, pages 1013--1023, 2010.

Digital Library

[23]

J. Zhu, Z. Nie, X. Liu, B. Zhang, and J. rong Wen. Statsnowball: a statistical approach to extracting entity relationships. In WWW'09, pages 101--110, 2009.

Digital Library

Cited By

Hossain BSalam ASchwitter R(2021)A survey on automatically constructed universal knowledge basesJournal of Information Science10.1177/016555152092134247:5(551-574)Online publication date: 1-Oct-2021
https://dl.acm.org/doi/10.1177/0165551520921342
Wu Y(2018)Enriching a thesaurus as a better question-answering tool and information retrieval aidJournal of Information Science10.1177/016555151770621944:4(512-525)Online publication date: 1-Aug-2018
https://dl.acm.org/doi/10.1177/0165551517706219

Index Terms

Assessing sparse information extraction using semantic contexts

Recommendations

Employing Semantic Context for Sparse Information Extraction Assessment

A huge amount of texts available on the World Wide Web presents an unprecedented opportunity for information extraction (IE). One important assumption in IE is that frequent extractions are more likely to be correct. Sparse IE is hence a challenging ...
Research on Representation of Geographic Spatio-temporal Information and Spatio-temporal Reasoning Rules Based on Geo-ontology and SWRL
ESIAT '09: Proceedings of the 2009 International Conference on Environmental Science and Information Application Technology - Volume 03

Recently spatial reasoning has gradually become a research hotspot in some domains such as GIS and spatio-temporal database. At present many spatio-temporal methods put forward by scholars are methods based on logic or algebra by and large, and ...
Wildlife video key-frame extraction based on novelty detection in semantic context

There is a growing evidence that visual saliency can be better modeled using top-down mechanisms that incorporate object semantics. This suggests a new direction for image and video analysis, where semantics extraction can be effectively utilized to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

October 2013

2612 pages

ISBN:9781450322638

DOI:10.1145/2505515

General Chairs:
Qi He
LinkedIn, USA
,
Arun Iyengar
IBM T.J. Watson Research Center, USA
,
Program Chairs:
Wolfgang Nejdl
L3S Research Center, Germany
,
Jian Pei
Simon Fraser University, Canada
,
Rajeev Rastogi
Amazon, India

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM'13

Sponsor:

CIKM'13: 22nd ACM International Conference on Information and Knowledge Management

October 27 - November 1, 2013

California, San Francisco, USA

Acceptance Rates

CIKM '13 Paper Acceptance Rate 143 of 848 submissions, 17%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
218
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)2

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hossain BSalam ASchwitter R(2021)A survey on automatically constructed universal knowledge basesJournal of Information Science10.1177/016555152092134247:5(551-574)Online publication date: 1-Oct-2021
https://dl.acm.org/doi/10.1177/0165551520921342
Wu Y(2018)Enriching a thesaurus as a better question-answering tool and information retrieval aidJournal of Information Science10.1177/016555151770621944:4(512-525)Online publication date: 1-Aug-2018
https://dl.acm.org/doi/10.1177/0165551517706219

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten