research-article

StuffIE: Semantic Tagging of Unlabeled Facets Using Fine-Grained Information Extraction

Authors:

Radityo Eko Prasojo,

Werner NuttAuthors Info & Claims

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

Pages 467 - 476

https://doi.org/10.1145/3269206.3271812

Published: 17 October 2018 Publication History

Abstract

Recent knowledge extraction methods are moving towards ternary and higher-arity relations to capture more information about binary facts. An example is to include the time, the location, and the duration of a specific fact. These relations can be even more complex to extract in advanced domains such as news, where events typically come with different facets including reasons, consequences, purposes, involved parties, and related events. The main challenge consists in first finding the set of facets related to each fact, and second tagging those facets to the relevant category.

In this paper, we tackle the above problems by proposing StuffIE, a fine-grained information extraction approach which is facet-centric. We exploit the Stanford dependency parsing enhanced by lexical databases such as WordNet to extract nested triple relations. Then, we exploit the syntactical dependencies to semantically tag facets using distant learning based on Oxford dictionary. We have tested the accuracy of the extracted facets and their semantic tags using DUC'04 dataset. The results show the high accuracy and coverage of our approach with respect to ClausIE, OLLIE, SEMAFOR SRL and Illinois SRL.

References

[1]

Gabor Angeli, Melvin Jose Johnson Premkumar, and Christopher D. Manning. 2015. Leveraging Linguistic Structure For Open Domain Information Extraction. In Proceedings of the 53rd ACL and the 7th IJCNLP (Volume 1: Long Papers). ACL, Beijing, China, 344--354.

[2]

Collin F. Baker, Charles J. Fillmore, and John B. Lowe. 1998. The berkeley framenet project. In Proceedings of the 36th ACL and 17th ICCL - Volume 1. Association for Computational Linguistics, 86--90.

Digital Library

[3]

Soumia Lilia Berrahou, Patrice Buche, Juliette Dibie, and Mathieu Roche. 2016. Xart System: Discovering and Extracting Correlated Arguments of N-ary Relations from Text. In Proceedings of the 6th WIMS (WIMS '16). ACM, New York.

Digital Library

[4]

Nikita Bhutani, H. V. Jagadish, and Dragomir R. Radev. 2016. Nested Propositions in Open Information Extraction. In EMNLP. The Association for Computational Linguistics, 55--64.

[5]

Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. 2010. Semantic Role Labeling for Open Information Extraction. In Proceedings of the NAACL HLT 2010 FAM-LbR (FAM-LbR '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 52--60.

Digital Library

[6]

James Clarke, Vivek Srikumar, Mark Sammons, and Dan Roth. 2012. An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines). In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12). Istanbul, Turkey, x--y.

[7]

Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research, Vol. 12, Aug (2011), 2493--2537.

Digital Library

[8]

SP Corder. 1968. Double-object verbs in English. (1968).

[9]

Luciano Del Corro and Rainer Gemulla. 2013. Clausie: clause-based open information extraction. Proceedings of the 22nd international conference on World Wide Web. ACM, 355--366.

Digital Library

[10]

George Doddington, Alexis Mitchell, Mark Przybocki, Lance Ramshaw, Stephanie Strassel, and Ralph Weischedel. 2004. The Automatic Content Extraction (ACE) Program Tasks, Data, and Evaluation. In Proceedings of LREC-2004. ELRA, Lisbon, Portugal.

[11]

Patrick Ernst, Amy Siu, and Gerhard Weikum. 2018. HighLife: Higher-arity Fact Harvesting. In Proceedings of the 2018 WWW (WWW '18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1013--1022.

Digital Library

[12]

Kiril Gashteovski, Rainer Gemulla, and Luciano Del Corro. 2017. MinIE: Minimizing Facts in Open Information Extraction. In EMNLP. Association for Computational Linguistics, 2630--2640.

[13]

Daniel Gildea and Daniel Jurafsky. 2002. Automatic Labeling of Semantic Roles. Comput. Linguist., Vol. 28, 3 (Sept. 2002), 245--288.

Digital Library

[14]

Paul Kingsbury and Martha Palmer. 2002. From TreeBank to PropBank. In LREC. 1989--1993.

[15]

Sebastian Krause, Hong Li, Hans Uszkoreit, and Feiyu Xu. 2012. Large-Scale Learning of Relation-extraction Rules with Distant Supervision from the Web. In Proceedings of the 11th ISWC (ISWC'12). Springer-Verlag, Berlin, 263--278.

Digital Library

[16]

Meghana Kshirsagar, Sam Thomson, Nathan Schneider, Jaime Carbonell, Noah A Smith, and Chris Dyer. 2015. Frame-semantic role labeling with heterogeneous annotations. people, Vol. 3 (2015), A0.

[17]

Erdal Kuzey, Jilles Vreeken, and Gerhard Weikum. 2014. A Fresh Look on Knowledge Bases: Distilling Named Events from News. In Proceedings of the 23rd CIKM. 1689--1698.

Digital Library

[18]

Hong Li, Sebastian Krause, Feiyu Xu, Andrea Moro, Hans Uszkoreit, and Roberto Navigli. 2015. Improvement of n-ary Relation Extraction by Adding Lexical Semantics to Distant-Supervision Rule Learning. In ICAART 2015 - Proceedings of the International Conference on Agents and Artificial Intelligence.

Digital Library

[19]

Filipe Mesquita, Jordan Schmidek, and Denilson Barbosa. 2013. Effectiveness and Efficiency of Open Relation Extraction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics, 447--457.

[20]

Martha Palmer, Daniel Gildea, and Nianwen Xue. 2010. Semantic Role Labeling .Morgan & Claypool Publishers.

Digital Library

[21]

Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Wen-tau Yih. 2017. Cross-Sentence N-ary Relation Extraction with Graph LSTMs. TACL, Vol. 5 (2017), 101--115.

[22]

Vasin Punyakanok, Dan Roth, and Wen-tau Yih. 2008. The Importance of Syntactic Parsing and Inference in Semantic Role Labeling. Comput. Linguist., Vol. 34, 2 (June 2008), 257--287.

Digital Library

[23]

Tengyu Ma Sanjeev Arora, Yingyu Liang. 2017. A Simple but Tough-to-Beat Baseline for Sentence Embeddings. In ICLR.

[24]

Michael Schmitz, Robert Bart, Stephen Soderland, Oren Etzioni, et al. 2012. Open language learning for information extraction. Proceedings of the 2012 EMNLP. Association for Computational Linguistics, 523--534.

Digital Library

[25]

Dafna Shahaf and Carlos Guestrin. 2012. Connecting Two (or Less) Dots: Discovering Structure in News Articles. TKDD, Vol. 5, 4 (2012), 24:1--24:31.

Digital Library

[26]

Vivek Srikumar and Dan Roth. 2013. Modeling Semantic Relations Expressed by Prepositions. Transactions of the Association for Computational Linguistics, Vol. 1 (2013), 231--242.

[27]

Zhibiao Wu and Martha Palmer. 1994. Verbs Semantics and Lexical Selection. In Proceedings of the 32Nd Annual Meeting on Association for Computational Linguistics (ACL '94). Association for Computational Linguistics, Stroudsburg, PA, USA, 133--138.

Digital Library

Cited By

Nguyen TRazniewski SRomero JWeikum G(2022)Refined Commonsense Knowledge from Large-Scale Web ContentsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3206505(1-16)Online publication date: 2022
https://doi.org/10.1109/TKDE.2022.3206505
Oral BEryiğit G(2022)Fusion of visual representations for multimodal information extraction from unstructured transactional documentsInternational Journal on Document Analysis and Recognition (IJDAR)10.1007/s10032-022-00399-325:3(187-205)Online publication date: 22-Apr-2022
https://doi.org/10.1007/s10032-022-00399-3
Wang JZheng XYang QQu JXu JChen ZLi Z(2021)Towards Nested and Fine-Grained Open Information ExtractionKnowledge Graph and Semantic Computing: Knowledge Graph Empowers New Infrastructure Construction10.1007/978-981-16-6471-7_14(185-197)Online publication date: 28-Oct-2021
https://doi.org/10.1007/978-981-16-6471-7_14
Show More Cited By

Index Terms

StuffIE: Semantic Tagging of Unlabeled Facets Using Fine-Grained Information Extraction
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning

Recommendations

Automatic Facet Extraction Based on Multidimensional Semantic Index
SKG '12: Proceedings of the 2012 Eighth International Conference on Semantics, Knowledge and Grids

Faceted search on web pages needs exact facets. However, it is difficult to extract facets exactly from web pages because the web pages are unstructured and lack of facet information. Therefore, facet extraction is a key to faceted search. This paper ...
Learning the semantics of structured data sources

Information sources such as relational databases, spreadsheets, XML, JSON, and Web APIs contain a tremendous amount of structured data that can be leveraged to build and augment knowledge graphs. However, they rarely provide a semantic model to describe ...
A Distant Learning Approach for Extracting Hypernym Relations from Wikipedia Disambiguation Pages

Extracting hypernym relations from text is one of the key steps in the automated construction and enrichment of semantic resources. The state of the art offers a large varierty of methods (linguistic, statistical, learning based, hybrid). This variety ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management

October 2018

2362 pages

ISBN:9781450360142

DOI:10.1145/3269206

General Chair:
Alfredo Cuzzocrea
University of Trieste, Italy
,
Program Chairs:
James Allan
University of Massachusetts, USA
,
Norman Paton
University of Manchester, United Kingdom
,
Divesh Srivastava
AT&T Labs Research, USA
,
Rakesh Agrawal
Data Insights Lab, USA
,
Andrei Broder
Google Research, USA
,
Mohammed Zaki
Rensselaer Polytechnic Institute, USA
,
Selcuk Candan
Arizona State University, USA
,
Alexandros Labrinidis
University of Pittsburgh, USA
,
Assaf Schuster
Technion, Israel
,
Haixun Wang
Google Research, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Free University of Bozen-Bolzano

Conference

CIKM '18

Sponsor:

CIKM '18: The 27th ACM International Conference on Information and Knowledge Management

October 22 - 26, 2018

Torino, Italy

Acceptance Rates

CIKM '18 Paper Acceptance Rate 147 of 826 submissions, 18%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
365
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)2

Reflects downloads up to 17 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Nguyen TRazniewski SRomero JWeikum G(2022)Refined Commonsense Knowledge from Large-Scale Web ContentsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3206505(1-16)Online publication date: 2022
https://doi.org/10.1109/TKDE.2022.3206505
Oral BEryiğit G(2022)Fusion of visual representations for multimodal information extraction from unstructured transactional documentsInternational Journal on Document Analysis and Recognition (IJDAR)10.1007/s10032-022-00399-325:3(187-205)Online publication date: 22-Apr-2022
https://doi.org/10.1007/s10032-022-00399-3
Wang JZheng XYang QQu JXu JChen ZLi Z(2021)Towards Nested and Fine-Grained Open Information ExtractionKnowledge Graph and Semantic Computing: Knowledge Graph Empowers New Infrastructure Construction10.1007/978-981-16-6471-7_14(185-197)Online publication date: 28-Oct-2021
https://doi.org/10.1007/978-981-16-6471-7_14
Gottschalk SDemidova EKejriwal MLopez VSequeda J(2019)EventKG – the hub of event knowledge on the web – and biographical timeline generationSemantic Web10.3233/SW-19035510:6(1039-1070)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.3233/SW-190355

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents