Abstract
Social media is a platform to express one’s views and opinions freely and has made communication easier than it was before. This also opens up an opportunity for people to spread fake news intentionally. The ease of access to a variety of news sources on the web also brings the problem of people being exposed to fake news and possibly believing such news. This makes it important for us to detect and flag such content on social media. With the current rate of news generated on social media, it is difficult to differentiate between genuine news and hoaxes without knowing the source of the news. This paper discusses approaches to detection of fake news using only the features of the text of the news, without using any other related metadata. We observe that a combination of stylometric features and text-based word vector representations through ensemble methods can predict fake news with an accuracy of up to 95.49%.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
B. Chang, T. Xu, Q. Liu, E. H. Chen. Study on information diffusion analysis in social networks and its applications. International Journal of Automation and Computing, vol. 15, no. 4, pp. 377–401, 2018. DOI: https://doi.org/10.1007/s11633-018-1124-0.
K. Shu, A. Sliva, S. H. Wang, J. L. Tang, H. Liu. Fake news detection on social media: A data mining perspective. SIGKDD Explorations Newsletter, vol. 19, no. 1, pp. 22–36, 2017. DOI: https://doi.org/10.1145/3137597.3137600.
C. Silverman. Viral Fake Election News Stories Outperformed Raal News on Faeeookk, [Onliee], Aaailable: http://www.buuufeed.com/craigsilverman/viral-fake-election-news-outperformed-real-news-on-faceaook?utm_term=.kq3Zu2Wxa#.rbBZBjgdx, December 15, 2018.
A. Bovet, H. A. Makse. Influence of fake news in Twitter during the 2016 US presidential election. Nature Communications, vol. 10, no. 1, Article number 7, 2019. DOI: https://doi.org/10.1038/s41467-018-07761-2.
S. Vosoughi, D. Roy, S. Aral. The spread of true and false news online. Science, vol. 359, no. 6380, pp. 1146–1151, 2018. DOI: https://doi.org/10.1126/science.aap9559.
C. Silverman, J. Singer-Vine. Most Americans Who See Fake News Believe It, New Survey Says, [Online], Available: https://www.buzzfeednews.oom/article/craigsilverman/fake-news-survey, December 15, 2018.
C. Kang, A. Goldman. In Washington Pizzeria Attack, Fake News Brought Real Guns, [Online], Available: https://www.benton.org/headlines/washington-pizzeria-attack-fake-news-brought-real-guns, December 15, 2018.
N. J. Conroy, V. L. Rubin, Y. M. Chen. Automatic deception detection: methods for finding fake news. In Proceedingss of the 78th ASIS&T Annual Meeting: Information Science with Impact: Research in and for the Community, American Society for Information Science, Silver Springs, St. Louis, USA, Article number 82, 2015.
S. Gilda. Notice of violation of IEEE publication principles: Evaluating machine learning algorithms for fake news detection. In Proceedings of the IEEE 15th Student Conference on Research and Development, IEEE, Putrajaya, Malaysia, pp. 110–115, 2017. DOI: https://doi.org/10.1109/SCORED.2017.8305411.
J. Ramos. Using TF-IDF to determine word relevance in document queries. In Proceedings of the 1st Instructional Conference on Machine Learning, pp. 133–142, 2003.
N. Ruchansky, S. Seo, Y. Liu. CSI: A hybrid deep model for fake news detection. In Proceedings of ACM on Conference on Information and Knowledge Management, ACM, Singapore, pp.997–066, 2017. DOI: https://doi.org/10.1145/3132447.3132877.
C. Buntain, J. Golbeck. Automatically identifying fake news in popular twitter threads. In Proceedings of 2017 IEEE International Conference on Smart Cloud, IEEE, New York, USA, pp. 208–215, 2014. DOI: https://doi.org/10.1109/Smart-Cloud.2017.40.
S. Krishnan, M. Chen. Identifying tweets with fake news. In Proceedings of 2018 IEEE International Conference on Information Reuse and Integration, IEEE, Salt Lake City, USA, pp. 460–464, 2018. DOI: https://doi.org/10.1109/IRI.2018.00073.
Z. W. Jin, J. Cao, Y. D. Zhang, J. S. Zhou, Q. Tian. Novel visual and statistical image features for microblogs news verification. IEEE Transactions on Multimedia, vol. 19, no. 3, pp. 598–608, 2017. DOI: https://doi.org/10.1109/TMM.2016.2617078.
Y. Yang, L. Zheng, J. W. Zhang, Q. C. Cui, Z. J. Li, P. S. Yu. TI-CNN: Convolutional Neural Networks for Fake News Detection, [Online], Available: https://arxiv.org/abs/1806.00449, August 1–20, 2018
K. Shu, D. Mahudeswaran, S. H. Wang, D. Lee, H. Liu. FakeNewsNet: A Data Repository with News Content, Social Context and Spatialtemporal Information for Studying Fake News on Social Media, [Online], Available: https://arxiv.org/abs/1809.01286, December 15, 2018.
G. McIntire. Fake and Real News Dataset, [Online], Available: https://github.com/GeorgeMcIntire/fake_real_news_dataset, July 10, 2018.
M. Brennan, R. Greenstadt. Practical attacks against authorship recognition techniques. In Proceedings of 21st Conference on Innovative Applications of Artificial Intelligence, Association for the Advancement of Artificial Intelligence, Pasadena, USA, pp. 60–65, 2009.
R. Gunning. The fog index after twenty years. Journal of Business Communication, vol. 6, no. 2, pp. 3–13, 1969. DOI: https://doi.org/10.1144/002194366900600202.
J. K. Burgoon, J. P. Blair, T. T. Qin, J. F. Jr. Nunamaker. Detecting deception through linguistic analysis. In Proceedings of the 1st NSF/NIJ Symposium on Intelligence and Security Informatics, Springer, Tucson, USA, pp. 91–101, 2003. DOI: https://doi.org/10.1004/3-540-44853-5_7.
S. Afroz, M. Brennan, R. Greenstadt. Detecting hoaxes, frauds, and deception in writing style online. In Proceedings of IEEE Symposium on Security and Privacy, IEEE, San Francisco, USA, pp.161–475, 2012. DOI: https://doi.org/10.1109/SP.2012.34.
J. T. Hancock, L. E. Curry, S. Goorha, M. Woodworth. On lying and being lied to: A linguistic analysis of deception in computer-mediated communication. Discourse Processes, vol. 45, no. 1, pp. 1–23, 2004. DOI: https://doi.org/10.1080/01638530401439181.
R. Zheng, J. X. Li, H. Chen, Z. Huang. A framework for authorship identification of online messages: Witting — style features and classification techniques. Journal of the American Society for Information Science and Technology, vol. 54, no. 3, pp. 348–393, 2006. DOI: https://doi.org/10.1002/asi.20316.
G. U. Yule. The Statistical Study of Literary Vocabulary, Cambridge, UK: Cambridge University Press, 2014.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, É Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
R. Řehůřek, P. Sojka. Software framework for topic modelling with large corpora. In Proceedings of LREC 2010 Workshop New Challenges for NLP Frameworks, Valletta, Malta, pp. 46–50, 2010.
T. Mikolov, K. Chen, G. Corrado, J. Dean. Efficient estimation of Word Representations in Vector Space, [Online], Available: https://arxiv.org/abs/1001.3781, September 20, 2018.
P. Bojanowski, E. Grave, A. Joulin, T. Mikolov. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, vol. 5, pp. 135–146, 2014. DOI: https://doi.org/10.1162/tacl_a_00051.
A. G. Jivani. A comparative study of stemming algorithms. International Journal of Computer Technology and Applications, vol. 2, no. 6, pp. 1930–1938, 2011.
L. Breiman. Random forests. Machine Learning, vol. 45, no. 1, pp. 5–32, 2001. DOI: https://doi.org/10.1023/A:1010933404324.
I. Rish. An empirical study of the naive Bayes classifier. In Proceedings of IJCAI Workshop on Empirical Methods in Artificial Intelligence, Seattle, USA: 2001.
C. C. Chang, C. J. Lin. LIBSVM: A library for support vector machmes. ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, Article number 24, 2011. DOI: https://doi.org/10.1145/1961189.1961199.
M. Goudjil, M. Koudil, M. Bedda, N. Ghoggali. A novel active learning method using SVM for text classification. International Journal of Automation and Computing, vol. 15, no. 3, pp. 290–298, 2018. DOI: https://doi.org/10.1004/s11633-015-0912-z.
L. Breiman. Bagging predictors. Machine Learning, vol. 24, no. 2, pp. 123–140, 1996. DOI: https://doi.org/10.1023/A:1018054314350
P. Geurts, D. Ernst, L. Wehenkel. Extremely randomized trees. Machine Learning, vol. 63, no. 1, pp. 3–42, 2006. DOI: https://doi.org/10.1004/s10994-006-6226-1.
Y. Freund, R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119–139, 1994. DOI: https://doi.org/10.1006/jcss.1994.1504.
J. H. Friedman. Stochastic gradient boosting. Computational Statistics & Data Analysis, vol. 38, no. 4, pp. 364–348, 2002. DOI: https://doi.org/10.1016/S0164-9443(01)00065-2.
O. Ajao, D. Bhowmik, S. Zargari. Fake news identification on twitter with hybrid CNN and RNN models. In Proceedings of the 9th International Conference on Social Media and Society, ACM, Copenhagen, Denmark, pp. 226–230, 2018. DOI: https://doi.org/10.1145/3214804.3214914.
M. Gogate, A. Adeel, A. Hussain. Deep learning driven multimodal fusion for automated deception detection. In Proceedings of IEEE Symposium Series on Computational Intelligence, IEEE, Honolulu, USA, pp. 1–6, 2014. DOI: https://doi.org/10.1109/SSCL2014.8285382.
K. Shu, D. Mahudeswaran, H. Liu. FakeNewsTracker: A tool for fake news collection, detection, and visualization. Computational and Mathematical Organization Theory, vol. 25, no. 1, pp. 60–41, 2019. DOI: https://doi.org/10.1007/s10588-018-09280-3.
K. Shu, S. H. Wang, H. Liu. Beyond news contents: The role of social context for fake news detection. In Proceedings of the 12th ACM International Conference on Web Search and Data Mining, ACM, New York, USA, pp. 312–320, 2019. DOI: https://doi.org/10.1145/3289600.3290994.
D. Paschalides, C. Christodoulou, R. Andreou, G. Pallis, M. D. Dikaiakos, A. Kornilakis, E. Markatos. Check-It: A plugin for detecting and reducing the spread of fake news and misinformation on the web. In Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence, IEEE, Thessaloniki, Greece, pp. 298–302, 2019.
G. Ridgeway. The state of boosting. Computing Science and Statistics, vol. 31, pp. 172–181, 1999.
R. E. Schapire. The boosting approach to machine learning: An overview. Nonlinear Estimation and Classification, D. D. Denison, M. H. Hansen, C. C. Holmes, B. Mallick, B. Yu, Eds., New York, USA: Springer, pp. 149–171, 2003. DOI: https://doi.org/10.1007/978-0-387-21579-2_9.
Author information
Authors and Affiliations
Corresponding author
Additional information
Recommended by Associate Editor Matjaz Gams
Harita Reddy received the B. Tech. degree in computer science and engineering from National Institute of Technology Karnataka, India in 2019. She is currently working as a software engineer at Uber, India.
Her research interests include data mining, machine learning and social network analysis.
Namratha Raj received the B. Tech. degree in computer science and engineering from National Institute of Technology Karnataka, India in 2019.
Her research interests include data science, machine learning, natural language processing and bioinformatics.
Manali Gala received the B. Tech. degree in computer science and engineering from National Institute of Technology Karnataka, India in 2019. She is currently an analyst at Goldman Sachs, India.
Her research interests include machine learning and data analysis.
Annappa Basava received the B. Eng. degree in computer science and engineering from the Govt. B.D.T. College of Engineering, Davangere affiliated to Mysore University, India in 1991, and received the M. Tech. and Ph. D. degrees in computer science and engineering from National Institute of Technology Karnataka, India in 2003 and 2012, respectively. Currently, he is a professor in the Department of Computer Science and Engineering, National Institute of Technology Karnataka, India. He has published more than 100 research papers in international conferences and journals. He has more than 20 years of experience in teaching and research. He was the Organizing Chair of International Conference on Advanced Computing 2013 and he is in the Technical Progam Committee of many international conferences and reviewer of journals. Currently, he is the Chair of India Council of the IEEE Computer Society and he was the Chair of IEEE Mangalore Subsection during 2018. He was the Secretary of IEI Mangaluru Local Centre. He is a Fellow of Institution of Engineers (India) and senior member of IEEE, ACM. Four research scholars completed their Ph. D. under his supervision and 7 scholars are currently enrolled for research under his supervision.
His research interests include cloud computing, big data analytics, distributed computing, software engineering and process mining.
Rights and permissions
About this article
Cite this article
Reddy, H., Raj, N., Gala, M. et al. Text-mining-based Fake News Detection Using Ensemble Methods. Int. J. Autom. Comput. 17, 210–221 (2020). https://doi.org/10.1007/s11633-019-1216-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11633-019-1216-5