Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1526709.1526724acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

StatSnowball: a statistical approach to extracting entity relationships

Published: 20 April 2009 Publication History

Abstract

Traditional relation extraction methods require pre-specified relations and relation-specific human-tagged examples. Bootstrapping systems significantly reduce the number of training examples, but they usually apply heuristic-based methods to combine a set of strict hard rules, which limit the ability to generalize and thus generate a low recall. Furthermore, existing bootstrapping methods do not perform open information extraction (Open IE), which can identify various types of relations without requiring pre-specifications. In this paper, we propose a statistical extraction framework called Statistical Snowball (StatSnowball), which is a bootstrapping system and can perform both traditional relation extraction and Open IE.
StatSnowball uses the discriminative Markov logic networks (MLNs) and softens hard rules by learning their weights in a maximum likelihood estimate sense. MLN is a general model, and can be configured to perform different levels of relation extraction. In StatSnwoball, pattern selection is performed by solving an l1-norm penalized maximum likelihood estimation, which enjoys well-founded theories and efficient solvers. We extensively evaluate the performance of StatSnowball in different configurations on both a small but fully labeled data set and large-scale Web data. Empirical results show that StatSnowball can achieve a significantly higher recall without sacrificing the high precision during iterations with a small number of seeds, and the joint inference of MLN can improve the performance. Finally, StatSnowball is efficient and we have developed a working entity relation search engine called Renlifang based on it.

References

[1]
E. Agichtein and L. Gravano. Snowball: Extracting relations from large plain-text collections. In International Conference on Digital Libraries, 2000.
[2]
G. Andrew and J. Gao. Scalable training of l<sub>1</sub>-regularized log-linear models. In ICML, 2007.
[3]
M. Banko, M. Cafarella, S. Soderland, M. Broadhead, and O. Etzioni. Open information extraction from the web. In IJCAI, 2007.
[4]
M. Banko and O. Etzioni. The tradeoffs between open and traditional relation extraction. In ACL, 2008.
[5]
S. Brin. Extracting patterns and relations from the world wide web. In International Workshop on the Web and Databases, 1998.
[6]
C. Cortes and V. Vapnik. Support-vector networks. Machine Learing, 20:273--297, 1995.
[7]
O. Etzioni, M. Cafarella, D. Downey, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates. Unsupervised named-entity extraction from the web: An experimental study. Artificial Intelligence, 165(1):91--134, 2005.
[8]
C. Giuliano, A. Lavelli, and L. Romano. Exploiting shallow linguistic information for relation extraction from biomedical literature. In EACL, 2006.
[9]
A. Harabagiu, C. A. Bejan, and P. Morcheckarescu. Shallow semantics for relation extraction. In IJCAI, 2005.
[10]
T. N. Huynh and R. J. Mooney. Dsicriminative structure and parameter learning for markov logic networks. In ICML, 2008.
[11]
A. Kaban. On Bayesian classification with laplace priors. Pattern Recognition Letters, 28(10):1271--1282, 2007.
[12]
S. Kok and P. Domingos. Learning the structure of markov logic networks. In ICML, 2005.
[13]
S. Kok and P. Domingos. Statistical predicate invention. In ICML, 2007.
[14]
S. Kok and P. Domingos. Extracting semantic networks from text via relational clustering. In ECML, 2008.
[15]
J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML, 2001.
[16]
A. McCallum. Efficiently inducing features of conditional random fields. In UAI, 2003.
[17]
A. McCallum and D. Jensen. A note on the unification of information extraction and data mining using conditional probability, relational models. In IJCAI-2003 Workshop on Learning Statistical Models from Relational Data, 2003.
[18]
Z. Nie, J.-R. Wen, and W.-Y. Ma. Object-level vertical search. In CIDR, 2007.
[19]
S. D. Pietra, V. D. Pietra, and J. Lafferty. Inducing features of random fields. IEEE Trans. on PAMI, 1997.
[20]
H. Poon and P. Domingos. Joint inference in information extraction. In AAAI, 2007.
[21]
M. Richardson and P. Domingos. Markov logic networks. Machine Learing, 62(1--2):107--136, 2006.
[22]
Y. Shinyama and S. Sekine. Preemptive information extraction using unrestricted relation discovery. In HLT/NAACL, 2006.
[23]
P. Singla and P. Domingos. Discriminative training of markov logic networks. In AAAI, 2005.
[24]
C. H. Teo, Q. Le, A. Smola, and S. Vishwanathan. A scalable modular convex solver for regularized risk minimization. In SIGKDD, 2007.
[25]
R. Tibshirani. Regression shrinkage and selection via the LASSO. J. Royal. Statist. Soc., B(58):267--288, 1996.
[26]
D. Zelenko, C. AoneE, and A. Richardella. Kernel methods for relation extraction. Journal of Machine Learning Research, (3):1083--1106, 2003.
[27]
G. Zhou, M. Zhang, D. H. Ji, and Q. Zhu. Tree kernel-based relation extraction with context-sensitive structured parse tree information. In EMNLP-CoNLL, 2005.
[28]
J. Zhu, Z. Nie, J.-R. Wen, B. Zhang, and W.-Y. Ma. Simultaneous record detection and attribute labeling in web data extraction. In SIGKDD, 2006.

Cited By

View all
  • (2024)RSRNeT: a novel multi-modal network framework for named entity recognition and relation extractionPeerJ Computer Science10.7717/peerj-cs.185610(e1856)Online publication date: 9-Feb-2024
  • (2024)Verifiable Strong Privacy-Preserving Any-Hop Reachability Query on Blockchain-Assisted CloudIEEE Internet of Things Journal10.1109/JIOT.2024.344543111:24(39637-39650)Online publication date: 15-Dec-2024
  • (2024)Extraction of object-action and object-state associations from Knowledge GraphsJournal of Web Semantics10.1016/j.websem.2024.10081681(100816)Online publication date: Jul-2024
  • Show More Cited By
  1. StatSnowball: a statistical approach to extracting entity relationships

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '09: Proceedings of the 18th international conference on World wide web
    April 2009
    1280 pages
    ISBN:9781605584874
    DOI:10.1145/1526709

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 April 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Markov logic networks
    2. relationship extraction
    3. statistical models

    Qualifiers

    • Research-article

    Conference

    WWW '09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)24
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 26 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)RSRNeT: a novel multi-modal network framework for named entity recognition and relation extractionPeerJ Computer Science10.7717/peerj-cs.185610(e1856)Online publication date: 9-Feb-2024
    • (2024)Verifiable Strong Privacy-Preserving Any-Hop Reachability Query on Blockchain-Assisted CloudIEEE Internet of Things Journal10.1109/JIOT.2024.344543111:24(39637-39650)Online publication date: 15-Dec-2024
    • (2024)Extraction of object-action and object-state associations from Knowledge GraphsJournal of Web Semantics10.1016/j.websem.2024.10081681(100816)Online publication date: Jul-2024
    • (2024)Cnosso, a novel method for business document automation based on open information extractionExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.123038245:COnline publication date: 2-Jul-2024
    • (2024)OIE4PA: open information extraction for the public administrationJournal of Intelligent Information Systems10.1007/s10844-023-00814-z62:1(273-294)Online publication date: 1-Feb-2024
    • (2023)Entity Relationship Extraction Based on a Multi-Neural Network Cooperation ModelApplied Sciences10.3390/app1311681213:11(6812)Online publication date: 3-Jun-2023
    • (2023)A Comprehensive Survey on Automatic Knowledge Graph ConstructionACM Computing Surveys10.1145/361829556:4(1-62)Online publication date: 5-Sep-2023
    • (2023)Application of DA-Bi-SRU and Improved RoBERTa Model in Entity Relationship Extraction for High-Speed Train Bogie2023 6th International Conference on Data Science and Information Technology (DSIT)10.1109/DSIT60026.2023.00023(89-96)Online publication date: 28-Jul-2023
    • (2023)Answering reachability queries with ordered label constraints over labeled graphsFrontiers of Computer Science10.1007/s11704-022-2368-y18:1Online publication date: 12-Aug-2023
    • (2023)A system review on bootstrapping information extractionMultimedia Tools and Applications10.1007/s11042-023-17005-183:13(38329-38353)Online publication date: 5-Oct-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media