Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3549737.3549749acmotherconferencesArticle/Chapter ViewAbstractPublication PagessetnConference Proceedingsconference-collections
research-article

FarFetched: Entity-centric Reasoning and Claim Validation for the Greek Language based on Textually Represented Environments

Published: 09 September 2022 Publication History

Abstract

The abundance of online information narrows our collective attention span. We address the need for automated claim validation, based on the incorporated evidence from dynamic, textually represented environments created by online news sources. We present FarFetched, an entity-centric reasoning framework based on news, where latent connections between events, actions or statements are discovered via their identified entity mentions and are represented in a structured form with the help of a graph database. We propose an evidence construction approach that combines relevant extracts of the stored information from various online sources in order to support or refute a given claim in free text, relying on entity linking and semantic similarity. We then leverage textual entailment recognition to provide a measurable way for assessing whether this claim is plausible based on the constructed evidence. Our approach tries to fill the gap in automated claim validation for less-resourced languages and is showcased on the Greek language, complemented by the training of relevant semantic textual similarity (STS) and natural language inference (NLI) models that are evaluated on translated versions of common benchmarks.

References

[1]
Sihem Amer-Yahia, Georgia Koutrika, Martin Braschler, Diego Calvanese, Davide Lanti, Hendrik Lücke-Tieke, Alessandro Mosca, Tarcisio Mendes de Farias, Dimitris Papadopoulos, Yogendra Patil, Guillem Rull, Ellery Smith, Dimitrios Skoutas, Srividya Subramanian, and Kurt Stockinger. 2022. INODE: Building an End-to-End Data Exploration System in Practice. SIGMOD Rec. 50, 4 (jan 2022), 23–29. https://doi.org/10.1145/3516431.3516436
[2]
Giannis Bekoulis, Christina Papagiannopoulou, and Nikos Deligiannis. 2021. A Review on Fact Extraction and Verification. ACM Comput. Surv. 55, 1, Article 12 (nov 2021), 35 pages. https://doi.org/10.1145/3485127
[3]
Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The Long-Document Transformer. CoRR abs/2004.05150(2020). arXiv:2004.05150https://arxiv.org/abs/2004.05150
[4]
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Lisbon, Portugal, 632–642. https://doi.org/10.18653/v1/D15-1075
[5]
Janez Brank, Gregor Leban, and Marko Grobelnik. 2017. Annotating documents with relevant wikipedia concepts. Proceedings of SiKDD(2017).
[6]
Janez Brank, Gregor Leban, and Marko Grobelnik. 2017. Annotating Documents with Relevant Wikipedia Concepts. In Proceedings of the Slovenian Conference on Data Mining and Data Warehouses (SiKDD 2017) (Ljubljana, Slovenia). 218–223.
[7]
Daniel Cer, Mona Diab, Eneko Agirre, Iñigo Lopez-Gazpio, and Lucia Specia. 2017. SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). Association for Computational Linguistics, Vancouver, Canada, 1–14. https://doi.org/10.18653/v1/S17-2001
[8]
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, and Ray Kurzweil. 2018. Universal Sentence Encoder for English. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Brussels, Belgium, 169–174. https://doi.org/10.18653/v1/D18-2029
[9]
Giovanni Luca Ciampaglia, Prashant Shiralkar, Luis M Rocha, Johan Bollen, Filippo Menczer, and Alessandro Flammini. 2015. Computational fact checking from knowledge networks. PloS one 10, 6 (2015), e0128193.
[10]
Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Unsupervised Cross-lingual Representation Learning at Scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 8440–8451. https://doi.org/10.18653/v1/2020.acl-main.747
[11]
Alexis Conneau, Ruty Rinott, Guillaume Lample, Adina Williams, Samuel Bowman, Holger Schwenk, and Veselin Stoyanov. 2018. XNLI: Evaluating Cross-lingual Sentence Representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 2475–2485. https://doi.org/10.18653/v1/D18-1269
[12]
Antonin Delpeuch. 2019. Opentapioca: Lightweight entity linking for wikidata. arXiv preprint arXiv:1904.09131(2019).
[13]
William Ferreira and Andreas Vlachos. 2016. Emergent: a novel data-set for stance classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, San Diego, California, 1163–1168. https://doi.org/10.18653/v1/N16-1138
[14]
François Goasdoué, Konstantinos Karanasos, Yannis Katsis, Julien Leblay, Ioana Manolescu, and Stamatis Zampetakis. 2013. Fact Checking and Analyzing the Web. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data (New York, New York, USA) (SIGMOD ’13). Association for Computing Machinery, New York, NY, USA, 997–1000. https://doi.org/10.1145/2463676.2463692
[15]
Felix Hamborg, Norman Meuschke, Corinna Breitinger, and Bela Gipp. 2017. news-please: A Generic News Crawler and Extractor. In Proceedings of the 15th International Symposium of Information Science (Berlin). 218–223. https://doi.org/10.5281/zenodo.4120316
[16]
Shan Jiang, Simon Baumgartner, Abe Ittycheriah, and Cong Yu. 2020. Factoring Fact-Checks: Structured Information Extraction from Fact-Checking Articles. Association for Computing Machinery, New York, NY, USA, 1592–1603. https://doi.org/10.1145/3366423.3380231
[17]
Woojeong Jin, Rahul Khanna, Suji Kim, Dong-Ho Lee, Fred Morstatter, Aram Galstyan, and Xiang Ren. 2021. ForecastQA: A Question Answering Challenge for Event Forecasting with Temporal Text Data. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Association for Computational Linguistics, Online, 4636–4650. https://doi.org/10.18653/v1/2021.acl-long.357
[18]
Thomas Kober, Sander Bijl de Vroe, and Mark Steedman. 2019. Temporal and Aspectual Entailment. In Proceedings of the 13th International Conference on Computational Semantics - Long Papers. Association for Computational Linguistics, Gothenburg, Sweden, 103–119. https://doi.org/10.18653/v1/W19-0409
[19]
Nikolaos Kolitsas, Octavian-Eugen Ganea, and Thomas Hofmann. 2018. End-to-End Neural Entity Linking. In Proceedings of the 22nd Conference on Computational Natural Language Learning. Association for Computational Linguistics, Brussels, Belgium, 519–529. https://doi.org/10.18653/v1/K18-1050
[20]
John Koutsikakis, Ilias Chalkidis, Prodromos Malakasiotis, and Ion Androutsopoulos. 2020. GREEK-BERT: The Greeks Visiting Sesame Street. In 11th Hellenic Conference on Artificial Intelligence (Athens, Greece) (SETN 2020). Association for Computing Machinery, New York, NY, USA, 110–117. https://doi.org/10.1145/3411408.3411440
[21]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 7871–7880. https://doi.org/10.18653/v1/2020.acl-main.703
[22]
Sarthak Majithia, Fatma Arslan, Sumeet Lubal, Damian Jimenez, Priyank Arora, Josue Caraballo, and Chengkai Li. 2019. ClaimPortal: Integrated Monitoring, Searching, Checking, and Analytics of Factual Claims on Twitter. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Florence, Italy, 153–158. https://doi.org/10.18653/v1/P19-3026
[23]
Julian Michael, Gabriel Stanovsky, Luheng He, Ido Dagan, and Luke Zettlemoyer. 2018. Crowdsourcing Question-Answer Meaning Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). Association for Computational Linguistics, New Orleans, Louisiana, 560–568. https://doi.org/10.18653/v1/N18-2089
[24]
Dimitris Papadopoulos, Nikolaos Papadakis, and Antonis Litke. 2020. A Methodology for Open Information Extraction and Representation from Large Scientific Corpora: The CORD-19 Data Exploration Use Case. Applied Sciences 10, 16 (2020). https://doi.org/10.3390/app10165630
[25]
Dimitris Papadopoulos, Nikolaos Papadakis, and Nikolaos Matsatsinis. 2021. PENELOPIE: Enabling Open Information Extraction for the Greek Language through Machine Translation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop. Association for Computational Linguistics, Online, 23–29. https://aclanthology.org/2021.eacl-srw.4
[26]
Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, and Gerhard Weikum. 2018. DeClarE: Debunking Fake News and False Claims using Evidence-Aware Deep Learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 22–32. https://doi.org/10.18653/v1/D18-1003
[27]
Kira Radinsky, Sagie Davidovich, and Shaul Markovitch. 2012. Learning to predict from textual data. Journal of Artificial Intelligence Research 45 (2012), 641–684.
[28]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21, 140 (2020), 1–67. http://jmlr.org/papers/v21/20-074.html
[29]
Nils Reimers, Philip Beyer, and Iryna Gurevych. 2016. Task-Oriented Intrinsic Evaluation of Semantic Textual Similarity. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. The COLING 2016 Organizing Committee, Osaka, Japan, 87–96. https://aclanthology.org/C16-1009
[30]
Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3982–3992. https://doi.org/10.18653/v1/D19-1410
[31]
Nils Reimers and Iryna Gurevych. 2020. Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 4512–4525. https://doi.org/10.18653/v1/2020.emnlp-main.365
[32]
Nils Reimers and Iryna Gurevych. 2020. Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. https://arxiv.org/abs/2004.09813
[33]
Mehdi Samadi, Partha Talukdar, Manuela Veloso, and Manuel Blum. 2016. ClaimEval: Integrated and Flexible Framework for Claim Evaluation Using Credibility of Sources. Proceedings of the AAAI Conference on Artificial Intelligence 30, 1 (Feb. 2016). https://ojs.aaai.org/index.php/AAAI/article/view/9996
[34]
Ellery Smith, Dimitris Papadopoulos, Martin Braschler, and Kurt Stockinger. 2022. LILLIE: Information extraction and database integration using linguistics and learning-based algorithms. Information Systems 105(2022), 101938. https://doi.org/10.1016/j.is.2021.101938
[35]
Jizhi Tang, Yansong Feng, and Dongyan Zhao. 2019. Learning to Update Knowledge Graphs by Reading News. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 2632–2641. https://doi.org/10.18653/v1/D19-1265
[36]
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. 2018. FEVER: a Large-scale Dataset for Fact Extraction and VERification. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 809–819. https://doi.org/10.18653/v1/N18-1074
[37]
Andreas Vlachos and Sebastian Riedel. 2014. Fact Checking: Task definition and dataset construction. In Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science. Association for Computational Linguistics, Baltimore, MD, USA, 18–22. https://doi.org/10.3115/v1/W14-2508
[38]
Piek Vossen, Rodrigo Agerri, Itziar Aldabe, Agata Cybulska, Marieke van Erp, Antske Fokkens, Egoitz Laparra, Anne-Lyse Minard, Alessio Palmero Aprosio, German Rigau, 2016. Newsreader: Using knowledge resources in a cross-lingual reading machine to generate more knowledge from massive streams of news. Knowledge-Based Systems 110 (2016), 60–85.
[39]
Piek Vossen, Tommaso Caselli, and Yiota Kontzopoulou. 2015. Storylines for structuring massive streams of news. In Proceedings of the First Workshop on Computing News Storylines. Association for Computational Linguistics, Beijing, China, 40–49. https://doi.org/10.18653/v1/W15-4507
[40]
Jim Webber. 2012. A Programmatic Introduction to Neo4j. In Proceedings of the 3rd Annual Conference on Systems, Programming, and Applications: Software for Humanity (Tucson, Arizona, USA) (SPLASH ’12). Association for Computing Machinery, New York, NY, USA, 217–218. https://doi.org/10.1145/2384716.2384777
[41]
Adina Williams, Nikita Nangia, and Samuel Bowman. 2018. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 1112–1122. https://doi.org/10.18653/v1/N18-1101
[42]
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Vol. 32. Curran Associates, Inc.https://proceedings.neurips.cc/paper/2019/file/dc6a7e655d7e5840e66733e9ee67cc69-Paper.pdf
[43]
Qi Zeng, Manling Li, Tuan Lai, Heng Ji, Mohit Bansal, and Hanghang Tong. 2021. GENE: Global Event Network Embedding. In Proceedings of the Fifteenth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-15). Association for Computational Linguistics, Mexico City, Mexico, 42–53. https://aclanthology.org/2021.textgraphs-1.5
[44]
Zijian Zhang, Koustav Rudra, and Avishek Anand. 2021. FaxPlainAC: A Fact-Checking Tool Based on EXPLAINable Models with HumAn Correction in the Loop. Association for Computing Machinery, New York, NY, USA, 4823–4827. https://doi.org/10.1145/3459637.3481985
[45]
Jie Zhou, Xu Han, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. 2019. GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 892–901. https://doi.org/10.18653/v1/P19-1085

Cited By

View all
  • (2024)Social Media Topic Classification on Greek RedditInformation10.3390/info1509052115:9(521)Online publication date: 26-Aug-2024
  • (2024)GRAAL: Graph-Based Retrieval for Collecting Related Passages across Multiple DocumentsInformation10.3390/info1506031815:6(318)Online publication date: 29-May-2024

Index Terms

  1. FarFetched: Entity-centric Reasoning and Claim Validation for the Greek Language based on Textually Represented Environments

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    SETN '22: Proceedings of the 12th Hellenic Conference on Artificial Intelligence
    September 2022
    450 pages
    ISBN:9781450395977
    DOI:10.1145/3549737
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 September 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. claim validation
    2. entity linking
    3. low-resource languages
    4. natural language inference
    5. semantic textual similarity

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • Hellenic Foundation for Research and Innovation (HFRI)

    Conference

    SETN 2022

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)22
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Social Media Topic Classification on Greek RedditInformation10.3390/info1509052115:9(521)Online publication date: 26-Aug-2024
    • (2024)GRAAL: Graph-Based Retrieval for Collecting Related Passages across Multiple DocumentsInformation10.3390/info1506031815:6(318)Online publication date: 29-May-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media