Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3269206.3271812acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

StuffIE: Semantic Tagging of Unlabeled Facets Using Fine-Grained Information Extraction

Published: 17 October 2018 Publication History

Abstract

Recent knowledge extraction methods are moving towards ternary and higher-arity relations to capture more information about binary facts. An example is to include the time, the location, and the duration of a specific fact. These relations can be even more complex to extract in advanced domains such as news, where events typically come with different facets including reasons, consequences, purposes, involved parties, and related events. The main challenge consists in first finding the set of facets related to each fact, and second tagging those facets to the relevant category.
In this paper, we tackle the above problems by proposing StuffIE, a fine-grained information extraction approach which is facet-centric. We exploit the Stanford dependency parsing enhanced by lexical databases such as WordNet to extract nested triple relations. Then, we exploit the syntactical dependencies to semantically tag facets using distant learning based on Oxford dictionary. We have tested the accuracy of the extracted facets and their semantic tags using DUC'04 dataset. The results show the high accuracy and coverage of our approach with respect to ClausIE, OLLIE, SEMAFOR SRL and Illinois SRL.

References

[1]
Gabor Angeli, Melvin Jose Johnson Premkumar, and Christopher D. Manning. 2015. Leveraging Linguistic Structure For Open Domain Information Extraction. In Proceedings of the 53rd ACL and the 7th IJCNLP (Volume 1: Long Papers). ACL, Beijing, China, 344--354.
[2]
Collin F. Baker, Charles J. Fillmore, and John B. Lowe. 1998. The berkeley framenet project. In Proceedings of the 36th ACL and 17th ICCL - Volume 1. Association for Computational Linguistics, 86--90.
[3]
Soumia Lilia Berrahou, Patrice Buche, Juliette Dibie, and Mathieu Roche. 2016. Xart System: Discovering and Extracting Correlated Arguments of N-ary Relations from Text. In Proceedings of the 6th WIMS (WIMS '16). ACM, New York.
[4]
Nikita Bhutani, H. V. Jagadish, and Dragomir R. Radev. 2016. Nested Propositions in Open Information Extraction. In EMNLP. The Association for Computational Linguistics, 55--64.
[5]
Janara Christensen, Mausam, Stephen Soderland, and Oren Etzioni. 2010. Semantic Role Labeling for Open Information Extraction. In Proceedings of the NAACL HLT 2010 FAM-LbR (FAM-LbR '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 52--60.
[6]
James Clarke, Vivek Srikumar, Mark Sammons, and Dan Roth. 2012. An NLP Curator (or: How I Learned to Stop Worrying and Love NLP Pipelines). In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12). Istanbul, Turkey, x--y.
[7]
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research, Vol. 12, Aug (2011), 2493--2537.
[8]
SP Corder. 1968. Double-object verbs in English. (1968).
[9]
Luciano Del Corro and Rainer Gemulla. 2013. Clausie: clause-based open information extraction. Proceedings of the 22nd international conference on World Wide Web. ACM, 355--366.
[10]
George Doddington, Alexis Mitchell, Mark Przybocki, Lance Ramshaw, Stephanie Strassel, and Ralph Weischedel. 2004. The Automatic Content Extraction (ACE) Program Tasks, Data, and Evaluation. In Proceedings of LREC-2004. ELRA, Lisbon, Portugal.
[11]
Patrick Ernst, Amy Siu, and Gerhard Weikum. 2018. HighLife: Higher-arity Fact Harvesting. In Proceedings of the 2018 WWW (WWW '18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1013--1022.
[12]
Kiril Gashteovski, Rainer Gemulla, and Luciano Del Corro. 2017. MinIE: Minimizing Facts in Open Information Extraction. In EMNLP. Association for Computational Linguistics, 2630--2640.
[13]
Daniel Gildea and Daniel Jurafsky. 2002. Automatic Labeling of Semantic Roles. Comput. Linguist., Vol. 28, 3 (Sept. 2002), 245--288.
[14]
Paul Kingsbury and Martha Palmer. 2002. From TreeBank to PropBank. In LREC. 1989--1993.
[15]
Sebastian Krause, Hong Li, Hans Uszkoreit, and Feiyu Xu. 2012. Large-Scale Learning of Relation-extraction Rules with Distant Supervision from the Web. In Proceedings of the 11th ISWC (ISWC'12). Springer-Verlag, Berlin, 263--278.
[16]
Meghana Kshirsagar, Sam Thomson, Nathan Schneider, Jaime Carbonell, Noah A Smith, and Chris Dyer. 2015. Frame-semantic role labeling with heterogeneous annotations. people, Vol. 3 (2015), A0.
[17]
Erdal Kuzey, Jilles Vreeken, and Gerhard Weikum. 2014. A Fresh Look on Knowledge Bases: Distilling Named Events from News. In Proceedings of the 23rd CIKM. 1689--1698.
[18]
Hong Li, Sebastian Krause, Feiyu Xu, Andrea Moro, Hans Uszkoreit, and Roberto Navigli. 2015. Improvement of n-ary Relation Extraction by Adding Lexical Semantics to Distant-Supervision Rule Learning. In ICAART 2015 - Proceedings of the International Conference on Agents and Artificial Intelligence.
[19]
Filipe Mesquita, Jordan Schmidek, and Denilson Barbosa. 2013. Effectiveness and Efficiency of Open Relation Extraction. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing . Association for Computational Linguistics, 447--457.
[20]
Martha Palmer, Daniel Gildea, and Nianwen Xue. 2010. Semantic Role Labeling .Morgan & Claypool Publishers.
[21]
Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, and Wen-tau Yih. 2017. Cross-Sentence N-ary Relation Extraction with Graph LSTMs. TACL, Vol. 5 (2017), 101--115.
[22]
Vasin Punyakanok, Dan Roth, and Wen-tau Yih. 2008. The Importance of Syntactic Parsing and Inference in Semantic Role Labeling. Comput. Linguist., Vol. 34, 2 (June 2008), 257--287.
[23]
Tengyu Ma Sanjeev Arora, Yingyu Liang. 2017. A Simple but Tough-to-Beat Baseline for Sentence Embeddings. In ICLR.
[24]
Michael Schmitz, Robert Bart, Stephen Soderland, Oren Etzioni, et al. 2012. Open language learning for information extraction. Proceedings of the 2012 EMNLP. Association for Computational Linguistics, 523--534.
[25]
Dafna Shahaf and Carlos Guestrin. 2012. Connecting Two (or Less) Dots: Discovering Structure in News Articles. TKDD, Vol. 5, 4 (2012), 24:1--24:31.
[26]
Vivek Srikumar and Dan Roth. 2013. Modeling Semantic Relations Expressed by Prepositions. Transactions of the Association for Computational Linguistics, Vol. 1 (2013), 231--242.
[27]
Zhibiao Wu and Martha Palmer. 1994. Verbs Semantics and Lexical Selection. In Proceedings of the 32Nd Annual Meeting on Association for Computational Linguistics (ACL '94). Association for Computational Linguistics, Stroudsburg, PA, USA, 133--138.

Cited By

View all
  • (2022)Refined Commonsense Knowledge from Large-Scale Web ContentsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3206505(1-16)Online publication date: 2022
  • (2022)Fusion of visual representations for multimodal information extraction from unstructured transactional documentsInternational Journal on Document Analysis and Recognition (IJDAR)10.1007/s10032-022-00399-325:3(187-205)Online publication date: 22-Apr-2022
  • (2021)Towards Nested and Fine-Grained Open Information ExtractionKnowledge Graph and Semantic Computing: Knowledge Graph Empowers New Infrastructure Construction10.1007/978-981-16-6471-7_14(185-197)Online publication date: 28-Oct-2021
  • Show More Cited By

Index Terms

  1. StuffIE: Semantic Tagging of Unlabeled Facets Using Fine-Grained Information Extraction

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '18: Proceedings of the 27th ACM International Conference on Information and Knowledge Management
      October 2018
      2362 pages
      ISBN:9781450360142
      DOI:10.1145/3269206
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 October 2018

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. distant learning
      2. facet extraction
      3. semantic labeling

      Qualifiers

      • Research-article

      Funding Sources

      • Free University of Bozen-Bolzano

      Conference

      CIKM '18
      Sponsor:

      Acceptance Rates

      CIKM '18 Paper Acceptance Rate 147 of 826 submissions, 18%;
      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)7
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 17 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Refined Commonsense Knowledge from Large-Scale Web ContentsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3206505(1-16)Online publication date: 2022
      • (2022)Fusion of visual representations for multimodal information extraction from unstructured transactional documentsInternational Journal on Document Analysis and Recognition (IJDAR)10.1007/s10032-022-00399-325:3(187-205)Online publication date: 22-Apr-2022
      • (2021)Towards Nested and Fine-Grained Open Information ExtractionKnowledge Graph and Semantic Computing: Knowledge Graph Empowers New Infrastructure Construction10.1007/978-981-16-6471-7_14(185-197)Online publication date: 28-Oct-2021
      • (2019)EventKG – the hub of event knowledge on the web – and biographical timeline generationSemantic Web10.3233/SW-19035510:6(1039-1070)Online publication date: 1-Jan-2019

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media