Abstract
Fake reviews detection is a considerable challenge to the different e-commerce and online business settings. This task aims to develop such systems that could ensure the veracity of reviews. The research community has made a range of attempts to deal with this issue. But unluckily, these attempts were geared to only small set of languages like English, Arabic, and some others. In the subcontinent, Roman Urdu is being used on the web. It has not been explored thoroughly for this task, however. On the other hand, over the last few years, deep learning methods have proved very successful for the diverse Natural Language Processing tasks. But, deep learning methods have not been explored for the Roman Urdu fake review detection task. To address this gap, this study has rendered a two-fold contribution (1) Construction of a novel Roman Urdu Fake Reviews Detection Corpus (RU-FRDC) which composes 5150 annotated reviews and (2) Comparison of various deep learning architectures including Simple RNN, LSTM, GRU, Bi-LSTM, and Bi-GRU. The evaluation has been carried out using widely used evaluation measures, i.e., Precision, Recall, \(F_1\), and ACC-ROC. The highest results were achieved using the stacked LSTM model (ACC - ROC = 0.943 and \(F_1 = 0.88\)).
Similar content being viewed by others
Notes
https://www.yelp.com/datasetLast. Visited: 03-Nov-2020.
Data Availability Statements The complete code and dataset developed and/or evaluated during the current study are available on https://comsatsnlpgroup.wordpress.com/.
These experiments used Keras (https://keras.io/) to implement Deep Learning models. According to the specifications of Keras, Embedding schema layer can only utilize an initial layer of the model, as presented in this study. The Keras embedding schema layer has been used to represent the sequences as dense embedding [40]. The Embedding schema layer uses one-hot encoding on each review (sentence) [41] by mapping words vector into the low-dimensional space. The embeddings developed by Keras embedding schema are utilized as a feature for the already described stacked LSTM model. On the top of the embedding schema layer, the model included two stacked LSTM layers that used Tanh as the activation function. Each hidden layer has been constituted of fixed 100 LSTM units.
References
Shashank K. Research on product review analysis and spam review detection. In: 4th International Conference on Signal Processing and Integrated Networks (SPIN), IEEE, 2017;pp 390–393
Zhu Y, Woo SS. Adversarial product review generation with word replacements. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 2018;2324–2326.
Zhang Z, Varadarajan B. Utility scoring of product reviews. In: Proceedings of the 15th ACM international conference on Information and knowledge management, 2006.
Jiménez FR, Mendoza NA. Too popular to ignore: the influence of online reviews on purchase intentions of search and experience products. J Interact Market. 2013;27(3):226–35.
Hossain F. Fake review detection using data mining, 2019.
Sun C, Du Q, Tian G. Exploiting product related review features for fake review detection. Math Prob Eng. 2016;2016:1–7.
Daud M, Mohibullah KRA. Roman Urdu opinion mining system (RUOMiS): Daud; 2015.
Mehmood K, Essam D, Shafi K. Sentiment analysis system for Roman Urdu. In: Proceedings of the 2018 Computing Conference, 2018.
Daud A, Khan W, Che D. Urdu language processing: a survey. Artif. Intell. Rev. 2016;279–311.
Kunchukuttan A, Mehta P, Bhattacharyya P. The IIT Bombay English-Hindi Parallel Corpus. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018), Miyazaki, 2018.
Dematis I, Karapistoli E, Vakali A. Fake review detection via exploitation of spam indicators and reviewer behavior characteristics. In: SOFSEM 2018: theory and practice of computer science, 2017.
Elmurngi E, Gherbi A. Fake reviews detection on movie reviews through sentiment analysis using. Int J Adv Syst Measure. 2018;11:196–207.
Alsubari S, Shelke M, Deshmukh S. Fake reviews identification based on deep computational linguistic. Int J Adv Sci Technol. 2020;29:3846–56.
El-Haless A, Hammad A. An approach for detecting spam in Arabic. Int Arab J Inf Technol. 2015;12:9–16.
Li H, Zhiyuan C, Bing L, Xiaokai W, Jidong S. Spotting fake reviews via collective. In: IEEE international conference on data mining, ICDM 2015., 2014.
Ott M, Claire C, Hancock JT. Negative deceptive opinion spam. 2013;497–501.
Jindal N, Liu B. Review spam detection. In: Proceedings of the 16th international conference on World Wide Web, 2007.
Rastogi A, Mehrotra M. Impact of behavioral and textual features on opinion spam detection. In: IEEE International Conference on Intelligent Computing and Control Systems, Madurai, 2018.
Wu X, Dong Y, Tao J, Huang C, Chawla NV. Reliable fake review detection via modeling temporal and behavioral patterns. In: IEEE International Conference on Big Data, Boston, 2017.
Phayung M. Thai fake news detection based on information retrieval, natural language processing and machine learning. SN Comput Sci. 2021;2(6):1–17.
Vasantharajan TC, Uthayasanker. Towards offensive language identification for Tamil code-mixed youtube comments and posts. SN Comput Sci. 2022;3(1):1–13.
Surana S, Dembla S, Bihani P. Identifying contradictions in the legal proceedings using natural language models. SN Comput Sci. 2022;3(3):1–14.
Baishya D, Deka JJ, Dey G, Singh PK. SAFER: sentiment analysis-based fake review detection in e-commerce using deep learning. SN Comput Sci. 2021;2(6):1–12.
Dadhich A, Thankachan B. Social and juristic challenges of AI for opinion mining approaches on Amazon and Flipkart product reviews using machine learning algorithms. SN Comput Sci. 2021;2(3):1–21.
Shojaee S, Azman A, Sharef N, Sulaiman N. A framework for fake review annotation. In: 17th UKSIM-AMSS IEEE International Conference on Modelling and Simulation, 2015.
Jindal N, Liu B. Opinion spam and analysis. In: Proceedings of the International Conference on Web Search and Web Data Mining, ACM, New York, NY, USA, 2008.
Ong T, Mannino M, Gregg D. Linguistic characteristics of shill reviews. Electron Commer Res Appl. 2013;13:69–78.
Viera V, Anthony J, Joanne Garrett M, others. Understanding interobserver agreement: the kappa statistic. J Fam Med. 2005;37:360–363.
Alberto TC, Lochter JV, Almeida TA. Tubespam: comment spam filtering on youtube. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA) 2015, December;138–143. IEEE.
Khan L, Amjad A, Ashraf N, Chang H-T, Gelbukh A. Urdu sentiment analysis with deep learning methods. IEEE Access. 2021;9:97803–12.
Xu J-M, Kwang-Sung J, Xiaojin Z, Amy B. Learning from bullying traces in social media. In: Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies, 2012;656–666.
Sharif O, Hoque MM, Hossain E. Sentiment analysis of Bengali texts on online restaurant reviews using multinomial Naïve Bayes. In: 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT) 2019 May;1–6. IEEE.
Monteiro RA, Santos RL, Pardo TA, Almeida TAD, Ruiz EE, Vale OA. Contributions to the study of fake news in portuguese: new corpus and automatic detection results. In: International Conference on Computational Processing of the Portuguese Language 2018, September;324–334. Springer, Cham.
Saumya S, Singh JP. Spam review detection using LSTM autoencoder: an unsupervised approach. Electron. Comm. Res. 2020;22:1–21.
Ko M-C, Huang H-H, Chen H-H. Paid review and paid writer detection. In: Proceedings of the International Conference on Web Intelligence, 2017.
Fornaciari T, Cagnina L, Rosso P, Poesio M. Fake opinion detection: how similar are crowdsourced datasets to real data? Lang Resour Eval. 2020;54(4):1019–58.
Alsubari SN, Deshmukh SN, Al-Adhaileh MH, Alsaade FW, Aldhyani TH. Development of integrated neural network model for identification of fake reviews in e-commerce using multidomain datasets. Appl Bionics Biomech. 2021;2021:1–11.
Fang W, Zhang J, Wang D, Chen Z, Li M. Entity disambiguation by knowledge and text jointly embedding. In: Proceedings of the 20th SIGNLL conference on computational natural language learning 2016, August;260–269.
Hochreiter and Shmidhuber. Long short-term memory. Neural Comput. 1997;9(8):1735–80.
Keras. Embedding layer. Keras.io. Available: https://keras.io/api/layers/core_layers/embedding/. [Accessed 2 Oct 2020]
Chen Z, Zhou Y. Research on automatic essay scoring of composition based on CNN and OR. In: 2019 2nd International Conference on Artificial Intelligence and Big Data (ICAIBD) 2019, May;13–18. IEEE.
Nisha K. Sentiment analysis of regional languages written in roman script on social media. Data Sci Intell Appl. 2021;52:113–9.
Sun C, Du Q, Tian G. Exploiting product related review features for fake review detection. Math Prob Eng. 2016;1:1–7.
Mukherjee A, Venkataraman V, Liu B, Glance N. Fake review detection: classification and analysis of real and pseudo reviews. 2013.
Wang Z, Zhang Y, Qian T. Fake review detection on Yelp. 2017.
Centor RM, Schwartz JS. An evaluation of methods for estimating the area under the receiver operating characteristic (ROC) curve. Med Decis Making. 1985;5:149–56.
Author information
Authors and Affiliations
Contributions
Mr. Hayat and Mr. Verdag collected data and executed experiments. Dr. Saeed and Mr. Ullah wrote the manuscript. Further, Dr. Saeed supervised the entire research project. Dr. Iqbal proofread the entire manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hayat, U., Saeed, A., Vardag, M.H.K. et al. Roman Urdu Fake Reviews Detection Using Stacked LSTM Architecture. SN COMPUT. SCI. 3, 470 (2022). https://doi.org/10.1007/s42979-022-01385-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-022-01385-6