research-article

Coupled semi-supervised learning for information extraction

Authors:

Andrew Carlson,

Justin Betteridge,

Richard C. Wang,

Estevam R. Hruschka, Jr.,

Tom M. MitchellAuthors Info & Claims

WSDM '10: Proceedings of the third ACM international conference on Web search and data mining

Pages 101 - 110

https://doi.org/10.1145/1718487.1718501

Published: 04 February 2010 Publication History

Abstract

We consider the problem of semi-supervised learning to extract categories (e.g., academic fields, athletes) and relations (e.g., PlaysSport(athlete, sport)) from web pages, starting with a handful of labeled training examples of each category or relation, plus hundreds of millions of unlabeled web documents. Semi-supervised training using only a few labeled examples is typically unreliable because the learning task is underconstrained. This paper pursues the thesis that much greater accuracy can be achieved by further constraining the learning task, by coupling the semi-supervised training of many extractors for different categories and relations. We characterize several ways in which the training of category and relation extractors can be coupled, and present experimental results demonstrating significantly improved accuracy as a result.

References

[1]

Eugene Agichtein and Luis Gravano. Snowball: Extracting relations from large plain-text collections. In Proc. of JCDL, 2000.

Digital Library

[2]

Maria-Florina Balcan and Avrim Blum. A PAC-style model for learning from labeled and unlabeled data. In Proc. of COLT, 2004.

Digital Library

[3]

Daniel M. Bikel, Richard Schwartz, and Ralph M. Weischedel. An algorithm that learns what's in a name. Machine Learning, 34(1):211--231, 1999.

Digital Library

[4]

Avrim Blum and Tom Mitchell. Combining labeled and unlabeled data with co-training. In Proc. of COLT, 1998.

Digital Library

[5]

Sergey Brin. Extracting patterns and relations from the world wide web. In Proc. of WebDB Workshop at 6th International Conference on Extending Database Technology, 1998.

Digital Library

[6]

Michael J. Cafarella, Jayant Madhavan, and Alon Halevy. Web-scale extraction of structured data. SIGMOD Rec., 37(4):55--61, 2008.

Digital Library

[7]

Rich Caruana. Multitask learning. Machine Learning, 28:41--75, 1997.

Digital Library

[8]

Ming-Wei Chang, Lev-Arie Ratinov, and Dan Roth. Guiding semi-supervision with constraint-driven learning. In Proc. of ACL, 2007.

[9]

Michael Collins and Yoram Singer. Unsupervised models for named entity classification. In Proc. of EMNLP, 1999.

[10]

James R. Curran, Tara Murphy, and Bernhard Scholz. Minimising semantic drift with mutual exclusion bootstrapping. In Proc. of PACLING, 2007.

[11]

Hal Daume. Cross-task knowledge-constrained self training. In Proc. of EMNLP, 2008.

Digital Library

[12]

Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107--113, 2008.

Digital Library

[13]

Doug Downey, Matthew Broadhead, and Oren Etzioni. Locating complex named entities in web text. In Proc. of IJCAI, 2007.

Digital Library

[14]

Marti A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proc. of COLING, 1992.

Digital Library

[15]

Qiuhua Liu, Xuejun Liao, Hui Li, Jason Stack, and Lawrence Carin. Semi-supervised multitask learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6):1074--1086, 2009.

Digital Library

[16]

David McClosky, Eugene Charniak, and Mark Johnson. Effective self-training for parsing. In Proc. of NAACL, 2006.

Digital Library

[17]

Marius Pasca, Dekang Lin, Jeffrey Bigham, Andrei Lifchits, and Alpa Jain. Names and similarities on the web: fact extraction in the fast lane. In Proc. of ACL, 2006.

Digital Library

[18]

Marco Pennacchiotti and Patrick Pantel. Entity extraction via ensemble semantics. In Proc. of EMNLP, 2009.

Digital Library

[19]

Ellen Riloff and Rosie Jones. Learning dictionaries for information extraction by multi-level bootstrapping. In Proc. of AAAI, 1999.

Digital Library

[20]

Benjamin Rosenfeld and Ronen Feldman. Using corpus statistics on entities to improve semi-supervised relation extraction from the web. In Proc. of ACL, 2007.

[21]

Rion Snow, Brendan O'Connor, Daniel Jurafsky, and Andrew Y. Ng. Cheap and fast, but is it good? evaluating non-expert annotations for natural language tasks. In Proc. of EMNLP, 2008.

Digital Library

[22]

Partha Pratim Talukdar, Joseph Reisinger, Marius Pasca, Deepak Ravichandran, Rahul Bhagat, and Fernando Pereira. Weakly-supervised acquisition of labeled class instances using graph random walks. In Proc. of EMNLP, 2008.

Digital Library

[23]

Sebastian Thrun. Is learning the n-th thing any easier than learning the First? In Proc. of NIPS, 1996.

[24]

Nicola Uefing. Self-training for machine translation. In Proc. of NIPS workshop on Machine Learning for Multilingual Information Access, 2006.

[25]

Richard C. Wang and William W. Cohen. Iterative set expansion of named entities using the web. In Proc. of ICDM, 2008.

Digital Library

[26]

Richard C. Wang and William W. Cohen. Character-level analysis of semi-structured documents for set expansion. In Proc. of EMNLP, 2009.

Digital Library

[27]

Roman Yangarber. Counter-training in discovery of semantic patterns. In Proc. of ACL, 2003.

Digital Library

[28]

Dmitry Zelenko, Chinatsu Aone, Anthony Richardella, Jaz K, Thomas Hofmann, Tomaso Poggio, and John Shawe-Taylor. Kernel methods for relation extraction. Journal of Machine Learning Research, 3, 2003.

Digital Library

Cited By

Nevzorova AGizatullin B(2024)A system for automatic construction of knowledge graphs of mathematical documentsUchenye Zapiski Kazanskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki10.26907/2541-7746.2023.3.264-281165:3(264-281)Online publication date: 12-Jan-2024
https://doi.org/10.26907/2541-7746.2023.3.264-281
Salman MHaller AMéndez SNaseem U(2024)Doc‐KG: Unstructured documents to knowledge graph construction, identification and validation with WikidataExpert Systems10.1111/exsy.13617Online publication date: 8-May-2024
https://doi.org/10.1111/exsy.13617
Li YLiu KSatapathy RWang SCambria E(2024)Recent Developments in Recommender Systems: A Survey [Review Article]IEEE Computational Intelligence Magazine10.1109/MCI.2024.336398419:2(78-95)Online publication date: May-2024
https://doi.org/10.1109/MCI.2024.3363984
Show More Cited By

Index Terms

Coupled semi-supervised learning for information extraction
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning

Recommendations

Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
Semi-supervised multi-label classification using incomplete label information
Highlights
- An inductive semi-supervised method called Smile is proposed for multi-label classification using incomplete label information.
Abstract
Classifying multi-label instances using incompletely labeled instances is one of the fundamental tasks in multi-label learning. Most existing methods regard this task as supervised weak-label learning problem and assume sufficient ...
Semi-supervised partial label learning algorithm via reliable label propagation
Abstract
Partial label learning (PLL) is a weakly supervised learning method that is able to predict one label as the correct answer from a given candidate label set. In PLL, when all possible candidate labels are as signed to real-world training examples, ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '10: Proceedings of the third ACM international conference on Web search and data mining

February 2010

468 pages

ISBN:9781605588896

DOI:10.1145/1718487

General Chairs:
Brian D. Davison
Lehigh University, USA
,
Torsten Suel
Polytechnic Institute of NYU, USA
,
Program Chairs:
Nick Craswell
Microsoft, USA
,
Bing Liu
University of Illinois, Chicago, USA

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 February 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WSDM'10

Sponsor:

WSDM'10: Third ACM International Conference on Web Search and Data Mining

February 4 - 6, 2010

New York, New York, USA

Acceptance Rates

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

237
Total Citations
View Citations
2,010
Total Downloads

Downloads (Last 12 months)33
Downloads (Last 6 weeks)7

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nevzorova AGizatullin B(2024)A system for automatic construction of knowledge graphs of mathematical documentsUchenye Zapiski Kazanskogo Universiteta. Seriya Fiziko-Matematicheskie Nauki10.26907/2541-7746.2023.3.264-281165:3(264-281)Online publication date: 12-Jan-2024
https://doi.org/10.26907/2541-7746.2023.3.264-281
Salman MHaller AMéndez SNaseem U(2024)Doc‐KG: Unstructured documents to knowledge graph construction, identification and validation with WikidataExpert Systems10.1111/exsy.13617Online publication date: 8-May-2024
https://doi.org/10.1111/exsy.13617
Li YLiu KSatapathy RWang SCambria E(2024)Recent Developments in Recommender Systems: A Survey [Review Article]IEEE Computational Intelligence Magazine10.1109/MCI.2024.336398419:2(78-95)Online publication date: May-2024
https://doi.org/10.1109/MCI.2024.3363984
Lin ZYan JLei ZRao Y(2024)Lifelong Hierarchical Topic Modeling via Non-negative Matrix FactorizationWeb and Big Data10.1007/978-981-97-2421-5_11(155-170)Online publication date: 12-May-2024
https://doi.org/10.1007/978-981-97-2421-5_11
Ramanna S(2023)Tolerance-based granular methods: Foundations and applications in natural language processingIntelligent Decision Technologies10.3233/IDT-22021417:1(139-158)Online publication date: 20-Apr-2023
https://doi.org/10.3233/IDT-220214
Jie LFeng ZZhang MJing FGuo Q(2023)Review of Knowledge Graph and Its Vertical Applications in Industry2023 42nd Chinese Control Conference (CCC)10.23919/CCC58697.2023.10240572(5151-5157)Online publication date: 24-Jul-2023
https://doi.org/10.23919/CCC58697.2023.10240572
Ferrara AAnelli VMancino ADi Noia TDi Sciascio E(2023)KGFlex: Efficient Recommendation with Sparse Feature Factorization and Knowledge GraphsACM Transactions on Recommender Systems10.1145/35889011:4(1-30)Online publication date: 3-Apr-2023
https://dl.acm.org/doi/10.1145/3588901
Li ZWang HFeng ZChen LHan XLi YLong Y(2023)Deep Learning-Based Joint Extraction Model of Entity Relationships for Cloud Operations Knowledge Graph2023 5th International Academic Exchange Conference on Science and Technology Innovation (IAECST)10.1109/IAECST60924.2023.10502732(775-786)Online publication date: 8-Dec-2023
https://doi.org/10.1109/IAECST60924.2023.10502732
Zhang LLiu PGulla J(2023)Recommending on graphs: a comprehensive review from a data perspectiveUser Modeling and User-Adapted Interaction10.1007/s11257-023-09359-w33:4(803-888)Online publication date: 13-Mar-2023
https://doi.org/10.1007/s11257-023-09359-w
Han XChen WLiu ZLin YSun M(2023)Knowledge Representation Learning and Knowledge-Guided NLPRepresentation Learning for Natural Language Processing10.1007/978-981-99-1600-9_9(273-349)Online publication date: 24-Aug-2023
https://doi.org/10.1007/978-981-99-1600-9_9
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents