Text-mining-based Fake News Detection Using Ensemble Methods

1123 Accesses
73 Citations
2 Altmetric
Explore all metrics

Abstract

Social media is a platform to express one’s views and opinions freely and has made communication easier than it was before. This also opens up an opportunity for people to spread fake news intentionally. The ease of access to a variety of news sources on the web also brings the problem of people being exposed to fake news and possibly believing such news. This makes it important for us to detect and flag such content on social media. With the current rate of news generated on social media, it is difficult to differentiate between genuine news and hoaxes without knowing the source of the news. This paper discusses approaches to detection of fake news using only the features of the text of the news, without using any other related metadata. We observe that a combination of stylometric features and text-based word vector representations through ensemble methods can predict fake news with an accuracy of up to 95.49%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Ensemble Method-Based Machine Learning Approach Using Text Mining to Identify Semantic Fake News

Fake News Detection Using Ensemble Learning Models

Early multi-class ensemble-based fake news detection using content features

Article 31 December 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

B. Chang, T. Xu, Q. Liu, E. H. Chen. Study on information diffusion analysis in social networks and its applications. International Journal of Automation and Computing, vol. 15, no. 4, pp. 377–401, 2018. DOI: https://doi.org/10.1007/s11633-018-1124-0.
Article Google Scholar
K. Shu, A. Sliva, S. H. Wang, J. L. Tang, H. Liu. Fake news detection on social media: A data mining perspective. SIGKDD Explorations Newsletter, vol. 19, no. 1, pp. 22–36, 2017. DOI: https://doi.org/10.1145/3137597.3137600.
Article Google Scholar
C. Silverman. Viral Fake Election News Stories Outperformed Raal News on Faeeookk, [Onliee], Aaailable: http://www.buuufeed.com/craigsilverman/viral-fake-election-news-outperformed-real-news-on-faceaook?utm_term=.kq3Zu2Wxa#.rbBZBjgdx, December 15, 2018.
A. Bovet, H. A. Makse. Influence of fake news in Twitter during the 2016 US presidential election. Nature Communications, vol. 10, no. 1, Article number 7, 2019. DOI: https://doi.org/10.1038/s41467-018-07761-2.
Google Scholar
S. Vosoughi, D. Roy, S. Aral. The spread of true and false news online. Science, vol. 359, no. 6380, pp. 1146–1151, 2018. DOI: https://doi.org/10.1126/science.aap9559.
Article Google Scholar
C. Silverman, J. Singer-Vine. Most Americans Who See Fake News Believe It, New Survey Says, [Online], Available: https://www.buzzfeednews.oom/article/craigsilverman/fake-news-survey, December 15, 2018.
C. Kang, A. Goldman. In Washington Pizzeria Attack, Fake News Brought Real Guns, [Online], Available: https://www.benton.org/headlines/washington-pizzeria-attack-fake-news-brought-real-guns, December 15, 2018.
N. J. Conroy, V. L. Rubin, Y. M. Chen. Automatic deception detection: methods for finding fake news. In Proceedingss of the 78th ASIS&T Annual Meeting: Information Science with Impact: Research in and for the Community, American Society for Information Science, Silver Springs, St. Louis, USA, Article number 82, 2015.
Google Scholar
S. Gilda. Notice of violation of IEEE publication principles: Evaluating machine learning algorithms for fake news detection. In Proceedings of the IEEE 15th Student Conference on Research and Development, IEEE, Putrajaya, Malaysia, pp. 110–115, 2017. DOI: https://doi.org/10.1109/SCORED.2017.8305411.
Google Scholar
J. Ramos. Using TF-IDF to determine word relevance in document queries. In Proceedings of the 1st Instructional Conference on Machine Learning, pp. 133–142, 2003.
N. Ruchansky, S. Seo, Y. Liu. CSI: A hybrid deep model for fake news detection. In Proceedings of ACM on Conference on Information and Knowledge Management, ACM, Singapore, pp.997–066, 2017. DOI: https://doi.org/10.1145/3132447.3132877.
Google Scholar
C. Buntain, J. Golbeck. Automatically identifying fake news in popular twitter threads. In Proceedings of 2017 IEEE International Conference on Smart Cloud, IEEE, New York, USA, pp. 208–215, 2014. DOI: https://doi.org/10.1109/Smart-Cloud.2017.40.
Google Scholar
S. Krishnan, M. Chen. Identifying tweets with fake news. In Proceedings of 2018 IEEE International Conference on Information Reuse and Integration, IEEE, Salt Lake City, USA, pp. 460–464, 2018. DOI: https://doi.org/10.1109/IRI.2018.00073.
Google Scholar
Z. W. Jin, J. Cao, Y. D. Zhang, J. S. Zhou, Q. Tian. Novel visual and statistical image features for microblogs news verification. IEEE Transactions on Multimedia, vol. 19, no. 3, pp. 598–608, 2017. DOI: https://doi.org/10.1109/TMM.2016.2617078.
Article Google Scholar
Y. Yang, L. Zheng, J. W. Zhang, Q. C. Cui, Z. J. Li, P. S. Yu. TI-CNN: Convolutional Neural Networks for Fake News Detection, [Online], Available: https://arxiv.org/abs/1806.00449, August 1–20, 2018
K. Shu, D. Mahudeswaran, S. H. Wang, D. Lee, H. Liu. FakeNewsNet: A Data Repository with News Content, Social Context and Spatialtemporal Information for Studying Fake News on Social Media, [Online], Available: https://arxiv.org/abs/1809.01286, December 15, 2018.
G. McIntire. Fake and Real News Dataset, [Online], Available: https://github.com/GeorgeMcIntire/fake_real_news_dataset, July 10, 2018.
M. Brennan, R. Greenstadt. Practical attacks against authorship recognition techniques. In Proceedings of 21st Conference on Innovative Applications of Artificial Intelligence, Association for the Advancement of Artificial Intelligence, Pasadena, USA, pp. 60–65, 2009.
Google Scholar
R. Gunning. The fog index after twenty years. Journal of Business Communication, vol. 6, no. 2, pp. 3–13, 1969. DOI: https://doi.org/10.1144/002194366900600202.
Article Google Scholar
J. K. Burgoon, J. P. Blair, T. T. Qin, J. F. Jr. Nunamaker. Detecting deception through linguistic analysis. In Proceedings of the 1st NSF/NIJ Symposium on Intelligence and Security Informatics, Springer, Tucson, USA, pp. 91–101, 2003. DOI: https://doi.org/10.1004/3-540-44853-5_7.
Google Scholar
S. Afroz, M. Brennan, R. Greenstadt. Detecting hoaxes, frauds, and deception in writing style online. In Proceedings of IEEE Symposium on Security and Privacy, IEEE, San Francisco, USA, pp.161–475, 2012. DOI: https://doi.org/10.1109/SP.2012.34.
Google Scholar
J. T. Hancock, L. E. Curry, S. Goorha, M. Woodworth. On lying and being lied to: A linguistic analysis of deception in computer-mediated communication. Discourse Processes, vol. 45, no. 1, pp. 1–23, 2004. DOI: https://doi.org/10.1080/01638530401439181.
Article Google Scholar
R. Zheng, J. X. Li, H. Chen, Z. Huang. A framework for authorship identification of online messages: Witting — style features and classification techniques. Journal of the American Society for Information Science and Technology, vol. 54, no. 3, pp. 348–393, 2006. DOI: https://doi.org/10.1002/asi.20316.
Google Scholar
G. U. Yule. The Statistical Study of Literary Vocabulary, Cambridge, UK: Cambridge University Press, 2014.
Google Scholar
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, É Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
MathSciNet MATH Google Scholar
R. Řehůřek, P. Sojka. Software framework for topic modelling with large corpora. In Proceedings of LREC 2010 Workshop New Challenges for NLP Frameworks, Valletta, Malta, pp. 46–50, 2010.
T. Mikolov, K. Chen, G. Corrado, J. Dean. Efficient estimation of Word Representations in Vector Space, [Online], Available: https://arxiv.org/abs/1001.3781, September 20, 2018.
P. Bojanowski, E. Grave, A. Joulin, T. Mikolov. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, vol. 5, pp. 135–146, 2014. DOI: https://doi.org/10.1162/tacl_a_00051.
Article Google Scholar
A. G. Jivani. A comparative study of stemming algorithms. International Journal of Computer Technology and Applications, vol. 2, no. 6, pp. 1930–1938, 2011.
Google Scholar
L. Breiman. Random forests. Machine Learning, vol. 45, no. 1, pp. 5–32, 2001. DOI: https://doi.org/10.1023/A:1010933404324.
Article MATH Google Scholar
I. Rish. An empirical study of the naive Bayes classifier. In Proceedings of IJCAI Workshop on Empirical Methods in Artificial Intelligence, Seattle, USA: 2001.
C. C. Chang, C. J. Lin. LIBSVM: A library for support vector machmes. ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, Article number 24, 2011. DOI: https://doi.org/10.1145/1961189.1961199.
Google Scholar
M. Goudjil, M. Koudil, M. Bedda, N. Ghoggali. A novel active learning method using SVM for text classification. International Journal of Automation and Computing, vol. 15, no. 3, pp. 290–298, 2018. DOI: https://doi.org/10.1004/s11633-015-0912-z.
Article Google Scholar
L. Breiman. Bagging predictors. Machine Learning, vol. 24, no. 2, pp. 123–140, 1996. DOI: https://doi.org/10.1023/A:1018054314350
MATH Google Scholar
P. Geurts, D. Ernst, L. Wehenkel. Extremely randomized trees. Machine Learning, vol. 63, no. 1, pp. 3–42, 2006. DOI: https://doi.org/10.1004/s10994-006-6226-1.
Article MATH Google Scholar
Y. Freund, R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119–139, 1994. DOI: https://doi.org/10.1006/jcss.1994.1504.
Article MathSciNet MATH Google Scholar
J. H. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, vol. 38, no. 4, pp. 364–348, 2002. DOI: https://doi.org/10.1016/S0164-9443(01)00065-2.
Article MathSciNet MATH Google Scholar
O. Ajao, D. Bhowmik, S. Zargari. Fake news identification on twitter with hybrid CNN and RNN models. In Proceedings of the 9th International Conference on Social Media and Society, ACM, Copenhagen, Denmark, pp. 226–230, 2018. DOI: https://doi.org/10.1145/3214804.3214914.
Google Scholar
M. Gogate, A. Adeel, A. Hussain. Deep learning driven multimodal fusion for automated deception detection. In Proceedings of IEEE Symposium Series on Computational Intelligence, IEEE, Honolulu, USA, pp. 1–6, 2014. DOI: https://doi.org/10.1109/SSCL2014.8285382.
Google Scholar
K. Shu, D. Mahudeswaran, H. Liu. FakeNewsTracker: A tool for fake news collection, detection, and visualization. Computational and Mathematical Organization Theory, vol. 25, no. 1, pp. 60–41, 2019. DOI: https://doi.org/10.1007/s10588-018-09280-3.
Article Google Scholar
K. Shu, S. H. Wang, H. Liu. Beyond news contents: The role of social context for fake news detection. In Proceedings of the 12th ACM International Conference on Web Search and Data Mining, ACM, New York, USA, pp. 312–320, 2019. DOI: https://doi.org/10.1145/3289600.3290994.
Google Scholar
D. Paschalides, C. Christodoulou, R. Andreou, G. Pallis, M. D. Dikaiakos, A. Kornilakis, E. Markatos. Check-It: A plugin for detecting and reducing the spread of fake news and misinformation on the web. In Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence, IEEE, Thessaloniki, Greece, pp. 298–302, 2019.
Google Scholar
G. Ridgeway. The state of boosting. Computing Science and Statistics, vol. 31, pp. 172–181, 1999.
Google Scholar
R. E. Schapire. The boosting approach to machine learning: An overview. Nonlinear Estimation and Classification, D. D. Denison, M. H. Hansen, C. C. Holmes, B. Mallick, B. Yu, Eds., New York, USA: Springer, pp. 149–171, 2003. DOI: https://doi.org/10.1007/978-0-387-21579-2_9.
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology Karnataka, Mangalore, 575025, India
Harita Reddy, Namratha Raj, Manali Gala & Annappa Basava

Authors

Harita Reddy
View author publications
You can also search for this author in PubMed Google Scholar
Namratha Raj
View author publications
You can also search for this author in PubMed Google Scholar
Manali Gala
View author publications
You can also search for this author in PubMed Google Scholar
Annappa Basava
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Harita Reddy.

Additional information

Recommended by Associate Editor Matjaz Gams

Harita Reddy received the B. Tech. degree in computer science and engineering from National Institute of Technology Karnataka, India in 2019. She is currently working as a software engineer at Uber, India.

Her research interests include data mining, machine learning and social network analysis.

Namratha Raj received the B. Tech. degree in computer science and engineering from National Institute of Technology Karnataka, India in 2019.

Her research interests include data science, machine learning, natural language processing and bioinformatics.

Manali Gala received the B. Tech. degree in computer science and engineering from National Institute of Technology Karnataka, India in 2019. She is currently an analyst at Goldman Sachs, India.

Her research interests include machine learning and data analysis.

Annappa Basava received the B. Eng. degree in computer science and engineering from the Govt. B.D.T. College of Engineering, Davangere affiliated to Mysore University, India in 1991, and received the M. Tech. and Ph. D. degrees in computer science and engineering from National Institute of Technology Karnataka, India in 2003 and 2012, respectively. Currently, he is a professor in the Department of Computer Science and Engineering, National Institute of Technology Karnataka, India. He has published more than 100 research papers in international conferences and journals. He has more than 20 years of experience in teaching and research. He was the Organizing Chair of International Conference on Advanced Computing 2013 and he is in the Technical Progam Committee of many international conferences and reviewer of journals. Currently, he is the Chair of India Council of the IEEE Computer Society and he was the Chair of IEEE Mangalore Subsection during 2018. He was the Secretary of IEI Mangaluru Local Centre. He is a Fellow of Institution of Engineers (India) and senior member of IEEE, ACM. Four research scholars completed their Ph. D. under his supervision and 7 scholars are currently enrolled for research under his supervision.

His research interests include cloud computing, big data analytics, distributed computing, software engineering and process mining.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Reddy, H., Raj, N., Gala, M. et al. Text-mining-based Fake News Detection Using Ensemble Methods. Int. J. Autom. Comput. 17, 210–221 (2020). https://doi.org/10.1007/s11633-019-1216-5

Download citation

Received: 13 June 2019
Accepted: 11 December 2019
Published: 18 February 2020
Issue Date: April 2020
DOI: https://doi.org/10.1007/s11633-019-1216-5

Text-mining-based Fake News Detection Using Ensemble Methods

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Ensemble Method-Based Machine Learning Approach Using Text Mining to Identify Semantic Fake News

Fake News Detection Using Ensemble Learning Models

Early multi-class ensemble-based fake news detection using content features

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Text-mining-based Fake News Detection Using Ensemble Methods

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Ensemble Method-Based Machine Learning Approach Using Text Mining to Identify Semantic Fake News

Fake News Detection Using Ensemble Learning Models

Early multi-class ensemble-based fake news detection using content features

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation