Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Deep contextualized text representation and learning for fake news detection

Published: 01 November 2021 Publication History

Abstract

In recent years, due to the widespread use of social media and broadcasting agencies around the world, people are extremely exposed to being affected by false information and fake news, all of which have negative impacts on both collective thoughts and governments’ policies. In recent years, the great success of pre-trained models for embedding contextual information from texts motivates researchers to utilize these embeddings in different natural language processing tasks. However, in a complex task like fake news detection, it is not determined which contextualized embedding can assist the classifier with more valuable features. Due to the lack of a comparative study about utilizing different contextualized pre-trained models besides distinct neural classifiers, we aim to dive into a comparative study about using different classifiers and embedding models. In this paper, we propose three classifiers with different pre-trained models for embedding input news articles. We connect Single-Layer Perceptron (SLP), Multi-Layer Perceptron (MLP), and Convolutional Neural Network (CNN) after the embedding layer which consists of novel pre-trained models such as BERT, RoBERTa, GPT2, and Funnel Transformer in order to benefit from deep contextualized representation provided by those models as well as deep neural classifications. We evaluate our proposed models on three well-known fake news datasets: LIAR (Wang, 2017), ISOT (Ahmed et al., 2017), and COVID-19 Patwa et al. (2020). The results on these three datasets show the superiority of our proposed models for fake news detection compared to the state-of-the-art models. The results show 7% and 0.1% improvements in classification accuracy compared to the proposed model by Goldani et al. (2021) on LIAR and ISOT, respectively. We also achieved 1% improvement compared to the proposed model by Shifath et al. (2021) on the COVID-19 dataset.

Highlights

Using different deep contextualized text representation models for fake news detection.
Providing a comprehensive comparative study on text representation for fake news detection.
Proposing different neural classifiers for word and text level representation.
Using Gaussian noise to overcome the overfitting problem.
Outperforming state-of-the-art methods in the field.

References

[1]
Ahmed H., Traore I., Saad S., Detection of online fake news using n-gram analysis and machine learning techniques, in: International conference on intelligent, secure, and dependable systems in distributed and cloud environments, Springer, 2017, pp. 127–138.
[2]
Amjad M., Sidorov G., Zhila A., Gómez-Adorno H., Voronkov I., Gelbukh A., “Bend the truth”: Benchmark dataset for fake news detection in Urdu language and its evaluation, Journal of Intelligent & Fuzzy Systems (2020) 1–13. Preprint.
[3]
Antoun W., Baly F., Achour R., Hussein A., Hajj H., State of the art models for fake news detection tasks, in: 2020 IEEE international conference on informatics, IoT, and enabling technologies, IEEE, 2020, pp. 519–524.
[4]
Bojanowski P., Grave E., Joulin A., Mikolov T., Enriching word vectors with subword information, Transactions of the Association for Computational Linguistics 5 (2017).
[5]
Chollet F., et al., Keras, 2015.
[6]
Clark, K., Luong, M.-T., Le, Q. V., & Manning, C. D. (2020). ELECTRA: Pre-training text encoders as discriminators rather than generators. In International conference on learning representations.
[7]
Dai Z., Lai G., Yang Y., Le Q.V., Funnel-transformer: Filtering out sequential redundancy for efficient language processing, 2020, CoRR, abs/2006.03236.
[8]
Devlin J., Chang M.-W., Lee K., Toutanova K., BERT: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies, Volume 1 (Long and short papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 4171–4186.
[9]
Goldani M., Momtazi S., Safabakhsh R., Detecting fake news with capsule neural networks, Applied Soft Computing 101 (2021).
[10]
Goldani M., Safabakhsh R., Momtazi S., Convolutional neural network with margin loss for fake news detection, Information Processing and Management 58 (2021).
[11]
Granik M., Mesyura V., Fake news detection using naïve Bayes classifier, in: 2017 IEEE first Ukraine conference on electrical and computer engineering, IEEE, 2017, pp. 900–903.
[12]
Hakak S., Alazab M., Khan S., Gadekallu T.R., Maddikunta P.K.R., Khan W.Z., An ensemble machine learning approach through effective feature extraction to classify fake news, Future Generation Computer Systems 117 (2021) 47–58.
[13]
Horne, B., & Adali, S. (2017). This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In Proceedings of the international AAAI conference on web and social media, Vol. 11, No. 1.
[14]
Huang Y.-F., Chen P.-H., Fake news detection using an ensemble learning model based on self-adaptive harmony search algorithms, Expert Systems with Applications 159 (2020).
[15]
Kaliyar R.K., Goswami A., Narang P., Sinha S., FNDNet–A deep convolutional neural network for fake news detection, Cognitive Systems Research 61 (2020) 32–44.
[16]
Khattar, D., Goud, J. S., Gupta, M., & Varma, V. (2019). Mvae: Multimodal variational autoencoder for fake news detection. In The world wide web conference (pp. 2915–2921).
[17]
Krizhevsky A., Sutskever I., Hinton G.E., ImageNet classification with deep convolutional neural networks, in: Proceedings of the 25th international conference on neural information processing systems - Volume 1, Curran Associates Inc., Red Hook, NY, USA, 2012, pp. 1097–1105.
[18]
Lazer D.M., Baum M.A., Benkler Y., Berinsky A.J., Greenhill K.M., Menczer F., et al., The science of fake news, Science 359 (6380) (2018) 1094–1096.
[19]
Liu Y., Ott M., Goyal N., Du J., Joshi M., Chen D., et al., RoBERTa: A robustly optimized BERT pretraining approach, 2019, CoRR, abs/1907.11692.
[20]
Liu C., Wu X., Yu M., Li G., Jiang J., Huang W., et al., A two-stage model based on BERT for short fake news detection, in: International conference on knowledge science, engineering and management, Springer, 2019, pp. 172–183.
[21]
Long Y., Lu Q., Xiang R., Li M., Huang C.-R., Fake news detection through multi-perspective speaker profiles, in: Proceedings of the Eighth international joint conference on natural language processing (Volume 2: Short papers), Asian Federation of Natural Language Processing, Taipei, Taiwan, 2017, pp. 252–256.
[22]
Ma J., Gao W., Wong K.-F., Detect rumors in microblog posts using propagation structure via kernel learning, in: Proceedings of the 55th Annual meeting of the association for computational linguistics (Volume 1: Long papers), Association for Computational Linguistics, Vancouver, Canada, 2017, pp. 708–717.
[23]
Mikolov T., Sutskever I., Chen K., Corrado G.S., Dean J., Distributed representations of words and phrases and their compositionality, in: Advances in neural information processing systems, 2013, pp. 3111–3119.
[25]
Ozbay F.A., Alatas B., Fake news detection within online social media using supervised artificial intelligence algorithms, Physica A: Statistical Mechanics and its Applications 540 (2020).
[26]
Patwa P., Sharma S., PYKL S., Guptha V., Kumari G., Akhtar M.S., et al., Fighting an infodemic: COVID-19 fake news dataset, in: Combating Online Hostile Posts in Regional Languages during Emergency Situation, Springer International Publishing, 2021, pp. 21–29.
[27]
Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (pp. 1532–1543).
[28]
Posadas-Durán J.-P., Gomez-Adorno H., Sidorov G., Escobar J.J.M., Detection of fake news in a new corpus for the Spanish language, Journal of Intelligent & Fuzzy Systems 36 (5) (2019) 4869–4876.
[29]
Radford A., Narasimhan K., Salimans T., Sutskever I., Improving language understanding by generative pre-training, 2018.
[30]
Radford A., Wu J., Child R., Luan D., Amodei D., Sutskever I., Language models are unsupervised multitask learners, OpenAI 1 (8) (2019) 9.
[31]
Shifath S., Khan M.F., Islam M., et al., A transformer based approach for fighting COVID-19 fake news, CoRR (2021) abs/2101.12027.
[32]
Silverman C., Strapagiel L., Shaban H., Hall E., Singer-Vine J., Hyperpartisan Facebook pages are publishing false and misleading information at an alarming rate, Buzzfeed News 20 (2016).
[33]
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Y. Bengio, & Y. LeCun (Eds.), 3rd international conference on learning representations, Conference track proceedings.
[34]
Tacchini, E., Ballarin, G., Vedova, M. L. D., Moret, S., & de Alfaro, L. (2017). Some like it hoax: Automated fake news detection in social networks. In Proceedings of the SoGood conference - second workshop on data science for social good.
[35]
Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., et al., Attention is all you need, in: Advances in neural information processing systems, 2017, pp. 5998–6008.
[36]
Wang W.Y., “Liar, liar pants on fire”: A new benchmark dataset for fake news detection, in: Proceedings of the 55th annual meeting of the Association for Computational Linguistics (Volume 2: Short papers), Association for Computational Linguistics, Vancouver, Canada, 2017, pp. 422–426.
[37]
Wani A., Joshi I., Khandve S., Wagh V., Joshi R., Evaluating deep learning approaches for Covid19 fake news detection, in: Combating Online Hostile Posts in Regional Languages during Emergency Situation, Springer International Publishing, 2021, pp. 153–163.
[38]
Yang Z., Dai Z., Yang Y., Carbonell J., Salakhutdinov R.R., Le Q.V., XLNet: Generalized autoregressive pretraining for language understanding, in: Advances in neural information processing systems 32, Curran Associates, Inc., 2019, pp. 5754–5764.
[39]
Zhang J., Dong B., Philip S.Y., Fakedetector: Effective fake news detection with deep diffusive neural network, in: 2020 IEEE 36th international conference on data engineering, IEEE, 2020, pp. 1826–1829.
[40]
Zhang T., Wang D., Chen H., Zeng Z., Guo W., Miao C., et al., BDANN: BERT-based domain adaptation neural network for multi-modal fake news detection, in: 2020 international joint conference on neural networks, IEEE, 2020, pp. 1–8.
[41]
Zhang D., Yang Z., Word embedding perturbation for sentence classification, CoRR (2018) abs/1804.08166.

Cited By

View all
  • (2024)Knowledge Graph-Based Hierarchical Text Semantic RepresentationInternational Journal of Intelligent Systems10.1155/2024/55832702024Online publication date: 1-Jan-2024
  • (2024)Filter-based Stance Network for Rumor VerificationACM Transactions on Information Systems10.1145/364946242:4(1-28)Online publication date: 26-Feb-2024
  • (2024)Novel approaches for fake news detection based on attention-based deep multiple-instance learning using contextualized neural language modelsNeurocomputing10.1016/j.neucom.2024.128263602:COnline publication date: 14-Oct-2024
  • Show More Cited By

Index Terms

  1. Deep contextualized text representation and learning for fake news detection
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image Information Processing and Management: an International Journal
            Information Processing and Management: an International Journal  Volume 58, Issue 6
            Nov 2021
            668 pages

            Publisher

            Pergamon Press, Inc.

            United States

            Publication History

            Published: 01 November 2021

            Author Tags

            1. Fake news detection
            2. Deep neural network
            3. Contextualized text representation

            Qualifiers

            • Research-article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 16 Jan 2025

            Other Metrics

            Citations

            Cited By

            View all
            • (2024)Knowledge Graph-Based Hierarchical Text Semantic RepresentationInternational Journal of Intelligent Systems10.1155/2024/55832702024Online publication date: 1-Jan-2024
            • (2024)Filter-based Stance Network for Rumor VerificationACM Transactions on Information Systems10.1145/364946242:4(1-28)Online publication date: 26-Feb-2024
            • (2024)Novel approaches for fake news detection based on attention-based deep multiple-instance learning using contextualized neural language modelsNeurocomputing10.1016/j.neucom.2024.128263602:COnline publication date: 14-Oct-2024
            • (2024)Boosting generalization of fine-tuning BERT for fake news detectionInformation Processing and Management: an International Journal10.1016/j.ipm.2024.10374561:4Online publication date: 18-Jul-2024
            • (2024)NSEPInformation Processing and Management: an International Journal10.1016/j.ipm.2023.10359461:2Online publication date: 1-Mar-2024
            • (2024)DSMMInformation Processing and Management: an International Journal10.1016/j.ipm.2023.10352861:1Online publication date: 1-Jan-2024
            • (2024)End-to-End Deep Networks with Hierarchical Attention and Capsule Capabilities for Misinformation Detection on Microblogging PlatformsSN Computer Science10.1007/s42979-023-02594-35:2Online publication date: 9-Feb-2024
            • (2024)Early detection of fake news on emerging topics through weak supervisionJournal of Intelligent Information Systems10.1007/s10844-024-00852-162:5(1263-1284)Online publication date: 1-Oct-2024
            • (2024)Context-Based Persuasion Analysis of Sentiment Polarity Disambiguation in Social Media Text StreamsNew Generation Computing10.1007/s00354-023-00238-x42:4(497-531)Online publication date: 1-Nov-2024
            • (2024)Bias Detection and Mitigation in Textual Data: A Study on Fake News and Hate Speech DetectionAdvances in Information Retrieval10.1007/978-3-031-56063-7_29(374-383)Online publication date: 24-Mar-2024
            • Show More Cited By

            View Options

            View options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media