Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Combining Similarity Features and Deep Representation Learning for Stance Detection in the Context of Checking Fake News

Published: 26 June 2019 Publication History

Abstract

Fake news is nowadays an issue of pressing concern, given its recent rise as a potential threat to high-quality journalism and well-informed public discourse. The Fake News Challenge (FNC-1) was organized in early 2017 to encourage the development of machine-learning-based classification systems for stance detection (i.e., for identifying whether a particular news article agrees, disagrees, discusses, or is unrelated to a particular news headline), thus helping in the detection and analysis of possible instances of fake news. This article presents a novel approach to tackle this stance detection problem, based on the combination of string similarity features with a deep neural network architecture that leverages ideas previously advanced in the context of learning-efficient text representations, document classification, and natural language inference. Specifically, we use bi-directional Recurrent Neural Networks (RNNs), together with max-pooling over the temporal/sequential dimension and neural attention, for representing (i) the headline, (ii) the first two sentences of the news article, and (iii) the entire news article. These representations are then combined/compared, complemented with similarity features inspired on other FNC-1 approaches, and passed to a final layer that predicts the stance of the article toward the headline. We also explore the use of external sources of information, specifically large datasets of sentence pairs originally proposed for training and evaluating natural language inference methods to pre-train specific components of the neural network architecture (e.g., the RNNs used for encoding sentences). The obtained results attest to the effectiveness of the proposed ideas and show that our model, particularly when considering pre-training and the combination of neural representations together with similarity features, slightly outperforms the previous state of the art.

References

[1]
Darren Baker Ali K. Chaudhry and Philipp Thun-Hohenstein. 2017. Stance detection for the fake news challenge: Identifying textual relationships with deep neural nets. CS224n: Natural Language Processing with Deep Learning (2017).
[2]
Gaurav Bhatt, Aman Sharma, Shivam Sharma, Ankush Nagpal, Balasubramanian Raman, and Ankush Mittal. 2018. Combining neural, statistical and external features for fake news stance identification. In Proceedings of the The Web Conference.
[3]
Peter Bourgonje, Julian Moreno Schneider, and Georg Rehm. 2017. From clickbait to fake news detection: An approach based on detecting the stance of headlines to articles. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
[4]
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015. A large annotated corpus for learning natural language inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
[5]
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, et al. 2018. Universal sentence encoder. Arxiv Preprint Arxiv:1803.11175 (2018).
[6]
Delphine Charlet and Geraldine Damnati. 2017. SimBow at SemEval-2017 Task 3: Soft-cosine semantic similarity between questions for community question answering. In Proceedings of the International Workshop on Semantic Evaluation.
[7]
Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Enhanced LSTM for natural language inference. In Proceedings of the Annual Meeting of the Association for Computational Linguistics.
[8]
Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. 2017. Recurrent neural network-based sentence encoder with gated attention for natural language inference. In Proceedings of the Workshop on Evaluating Vector Space Representations for NLP.
[9]
Jihun Choi, Taeuk Kim, and Sang goo Lee. 2018. Cell-aware stacked LSTMs for modeling sentences. Arxiv Preprint Arxiv:1809.02279 (2018).
[10]
J. Choi, K. M. Yoo, and S.-g. Lee. 2017. Learning to compose task-specific tree structures. In Proceedings of the Conference of the Association for the Advancement of Artificial Intelligence.
[11]
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. In Proceedings of the NIPS Workshop on Deep Learning.
[12]
Alexis Conneau, Douwe Kiela, Holger Schwenk, Loïc Barrault, and Antoine Bordes. 2017. Supervised learning of universal sentence representations from natural language inference data. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
[13]
Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Åukasz Kaiser. 2019. Universal transformers. In Proceedings of the International Conference on Learning Representations.
[14]
Francisco Duarte, Bruno Martins, Cátia Sousa Pinto, and Mário J. Silva. 2018. A deep learning method for ICD-10 coding of free-text death certificates. In Proceedings of the EPIA Conference on Artificial Intelligence.
[15]
Yoav Goldberg. 2016. A primer on neural network models for natural language processing. J. Artif. Intell. Res. 57, 1 (2016), 345--420.
[16]
Yichen Gong, Heng Luo, and Jian Zhang. 2018. Natural language inference over interaction space. In Proceedings of the International Conference on Learning Representations.
[17]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neur. Comput. 9, 8 (1997).
[18]
Jinbae Im and Sungzoon Cho. 2017. Distance-based self-attention network for natural language inference. Arxiv Preprint Arxiv:1712.02047 (2017).
[19]
Krzysztof Janowicz and Grant McKenzie. 2017. How “alternative” are alternative facts? measuring statement coherence via spatial analysis. In Proceedings of the ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems.
[20]
Richard Socher Jeffrey Pennington and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
[21]
Kevin Gimpel John Wieting, Mohit Bansal and Karen Livescu. 2016. Towards universal paraphrastic sentence embeddings. In Proceedings of the International Conference on Learning Representations.
[22]
Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations.
[23]
Ryan Kiros, Yukun Zhu, Ruslan R. Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Skip-thought vectors. In Proceedings of the Neural Information Processing Systems Conference.
[24]
Lev Konstantinovskiy, Oliver Price, Mevan Babakar, and Arkaitz Zubiaga. 2018. Towards automated factchecking: Developing an annotation schema and benchmark for consistent automated claim detection. In Proceedings of the EMNLP Workshop on Fact Extraction and Verification.
[25]
Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In Proceedings of the International Conference on Machine Learning.
[26]
David M. J. Lazer, Matthew A. Baum, Yochai Benkler, Adam J. Berinsky, Kelly M. Greenhill, Filippo Menczer, Miriam J. Metzger, Brendan Nyhan, Gordon Pennycook, David Rothschild, et al. 2018. The science of fake news. Science 359, 6380 (2018).
[27]
Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Proceedings of the ACL Workshop on Text Summarization Branches Out.
[28]
Andre Martins and Ramon Astudillo. 2016. From softmax to sparsemax: A sparse model of attention and multi-label classification. In Proceedings o the International Conference on Machine Learning.
[29]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the International Conference on Learning Representations.
[30]
Mitra Mohtarami, Ramy Baly, James Glass, Preslav Nakov, Lluis Marquez, and Alessandro Moschitti. 2018. Automatic stance detection using end-to-end memory networks. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics.
[31]
Yixin Nie and Mohit Bansal. 2017. Shortcut-stacked sentence encoders for multi-domain inference. In Proceedings of the Workshop on Evaluating Vector Space Representations for NLP.
[32]
Boyuan Pan, Yazheng Yang, Zhou Zhao, Yueting Zhuang, Deng Cai, and Xiaofei He. 2018. Discourse marker augmented network with reinforcement learning for natural language inference. In Proceedings of the Annual Meeting of the Association for Computational Linguistics.
[33]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the Annual Meeting on Association for Computational Linguistics.
[34]
Verónica Pérez-Rosas, Bennett Kleinberg, Alexandra Lefevre, and Rada Mihalcea. 2018. Automatic detection of fake news. In Proceedings of the International Conference on Computational Linguistics.
[35]
Oskar Triebe Pfohl and Ferdinand Legros. 2017. Stance detection for the fake news challenge with attention and conditional encoding. CS224n: Natural Language Processing with Deep Learning (2017).
[36]
Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, and Gerhard Weikum. 2018. DeClarE: Debunking fake news and false claims using evidence-aware deep learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
[37]
Benjamin Riedel, Isabelle Augenstein, Georgios P Spithourakis, and Sebastian Riedel. 2017. A simple but tough-to-beat baseline for the fake news challenge stance detection task. Arxiv Preprint Arxiv:1707.03264 (2017).
[38]
T. Shen, T. Zhou, G. Long, J. Jiang, S. Pan, and C. Zhang. 2018. DiSAN: Directional self-attention network for RNN/CNN-Free language understanding. In Proceedings of the Conference of the Association for the Advancement of Artificial Intelligence.
[39]
Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Sen Wang, and Chengqi Zhang. 2018. Reinforced self-attention network: A hybrid of hard and soft attention for sequence modeling. In Proceedings of the International Joint Conference on Artificial Intelligence.
[40]
Kai Shu, Deepak Mahudeswaran, Suhang Wang, Dongwon Lee, and Huan Liu. 2018. FakeNewsNet: A data repository with news content, social context and dynamic information for studying fake news on social media. Arxiv Preprint Arxiv:1809.01286 (2018).
[41]
Kai Shu, Suhang Wang, and Huan Liu. 2017. Exploiting tri-relationship for fake news detection. Arxiv Preprint Arxiv:1712.07709 (2017).
[42]
Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2018. A compare-propagate architecture with alignment factorization for natural language inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
[43]
Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2018. Co-stack residual affinity networks with multi-level attention refinement for matching text sequences. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
[44]
Yi Tay, Luu Anh Tuan, and Siu Cheung Hui. 2018. A compare-propagate architecture with alignment factorization for natural language inference. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
[45]
M. Tosik, A. Mallia, and K. Gangopadhyay. 2018. Debunking fake news one feature at a time. Arxiv Preprint Arxiv:1808.02831 (2018).
[46]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Neural Information Processing Systems Conference.
[47]
Ramakrishna Vedantam, C. Lawrence Zitnick, and Devi Parikh. 2015. CIDEr: Consensus-based image description evaluation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[48]
Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false news online. Science 359, 6380 (2018).
[49]
Adina Williams, Nikita Nangia, and Samuel R. Bowman. 2018. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics.
[50]
Theresa Wilson, Janyce Wiebe, and Paul Hoffmann. 2018. DR-BiLSTM: Dependent reading bidirectional LSTM for natural language inference. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics.
[51]
William E. Winkler. 1990. String comparator metrics and enhanced decision rules in the fellegi-sunter model of record linkage. In Proceedings of the Section on Survey Research Methods of the American Statistical Association (1990).
[52]
Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alexander J. Smola, and Eduard H. Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics.
[53]
Wenpeng Yin, Katharina Kann, Mo Yu, and Hinrich Schütze. 2017. Comparative study of CNN and RNN for natural language processing. Arxiv Preprint Arxiv:1702.01923 (2017).
[54]
Qi Zeng, Quan Zhou, and Shanshan Xu. 2017. Neural stance detectors for fake news challenge. CS224n: Natural Language Processing with Deep Learning (2017).

Cited By

View all
  • (2024)Stance Detection in the Context of Fake News—A New ApproachFuture Internet10.3390/fi1610036416:10(364)Online publication date: 6-Oct-2024
  • (2024)An Investigation of Deepfake Voice Detection using Speech Pause Patterns: Pilot Study (Preprint)JMIR Biomedical Engineering10.2196/56245Online publication date: 16-Jan-2024
  • (2024)A Headline-Centric Graph-Based Dual Context Matching Approach for Incongruent News DetectionIEEE Transactions on Computational Social Systems10.1109/TCSS.2024.338469811:5(5913-5924)Online publication date: Oct-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Data and Information Quality
Journal of Data and Information Quality  Volume 11, Issue 3
Special Issue on Combating Digital Misinformation and Disinformation and On the Horizon
September 2019
160 pages
ISSN:1936-1955
EISSN:1936-1963
DOI:10.1145/3331015
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2019
Accepted: 01 October 2018
Revised: 01 September 2018
Received: 01 May 2018
Published in JDIQ Volume 11, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Fake news
  2. deep learning
  3. fact checking
  4. natural language processing
  5. recurrent neural networks
  6. stance detection

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Fundação para a Ciência e Tecnologia (FCT)
  • INESC-ID multi-annual funding from the PIDDAC programme

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)91
  • Downloads (Last 6 weeks)11
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Stance Detection in the Context of Fake News—A New ApproachFuture Internet10.3390/fi1610036416:10(364)Online publication date: 6-Oct-2024
  • (2024)An Investigation of Deepfake Voice Detection using Speech Pause Patterns: Pilot Study (Preprint)JMIR Biomedical Engineering10.2196/56245Online publication date: 16-Jan-2024
  • (2024)A Headline-Centric Graph-Based Dual Context Matching Approach for Incongruent News DetectionIEEE Transactions on Computational Social Systems10.1109/TCSS.2024.338469811:5(5913-5924)Online publication date: Oct-2024
  • (2024)Gated Recursive and Sequential Deep Hierarchical Encoding for Detecting Incongruent News ArticlesIEEE Transactions on Computational Social Systems10.1109/TCSS.2023.324744511:1(1023-1034)Online publication date: Feb-2024
  • (2024)A Novel Ensemble Machine Learning and Deep Learning Techniques with Feature Extraction and Selection for Fake News Stance Detection: A Comparative Analysis2024 Ninth International Conference on Science Technology Engineering and Mathematics (ICONSTEM)10.1109/ICONSTEM60960.2024.10568693(1-14)Online publication date: 4-Apr-2024
  • (2024)Enhanced Fake News Detection With Natural Language Processing2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE)10.1109/IC3SE62002.2024.10593218(1065-1069)Online publication date: 9-May-2024
  • (2024)SIFG: an ensemble model for sieving fake news from genuine without metadata by combining syntactic and semantic featuresInformation Security Journal: A Global Perspective10.1080/19393555.2024.2378747(1-17)Online publication date: 17-Jul-2024
  • (2023)Fake news stance detection using selective features and FakeNETPLOS ONE10.1371/journal.pone.028729818:7(e0287298)Online publication date: 31-Jul-2023
  • (2023)XAI in Automated Fact-Checking? The Benefits Are Modest and There's No One-Explanation-Fits-AllProceedings of the 35th Australian Computer-Human Interaction Conference10.1145/3638380.3638388(624-638)Online publication date: 2-Dec-2023
  • (2023)Misinformation detection based on news dispersion2023 24th International Conference on Digital Signal Processing (DSP)10.1109/DSP58604.2023.10167997(1-5)Online publication date: 11-Jun-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media