Roman Urdu Fake Reviews Detection Using Stacked LSTM Architecture

Umer Hayat¹,
Ali Saeed²,
Muhammad Humayon Khan Vardag¹,
Muhammad Farhat Ullah¹ &
…
Nadeem Iqbal³

397 Accesses
3 Citations
Explore all metrics

Abstract

Fake reviews detection is a considerable challenge to the different e-commerce and online business settings. This task aims to develop such systems that could ensure the veracity of reviews. The research community has made a range of attempts to deal with this issue. But unluckily, these attempts were geared to only small set of languages like English, Arabic, and some others. In the subcontinent, Roman Urdu is being used on the web. It has not been explored thoroughly for this task, however. On the other hand, over the last few years, deep learning methods have proved very successful for the diverse Natural Language Processing tasks. But, deep learning methods have not been explored for the Roman Urdu fake review detection task. To address this gap, this study has rendered a two-fold contribution (1) Construction of a novel Roman Urdu Fake Reviews Detection Corpus (RU-FRDC) which composes 5150 annotated reviews and (2) Comparison of various deep learning architectures including Simple RNN, LSTM, GRU, Bi-LSTM, and Bi-GRU. The evaluation has been carried out using widely used evaluation measures, i.e., Precision, Recall, $F_1$, and ACC-ROC. The highest results were achieved using the stacked LSTM model (ACC - ROC = 0.943 and $F_1 = 0.88$).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Factitious or fact? Learning textual representations for fake online review detection

Article 28 September 2023

FRD-LSTM: a novel technique for fake reviews detection using DCWR with the Bi-LSTM method

Article 18 March 2023

Fake Reviews Detection Using Multi-input Neural Network Model

Notes

https://www.yelp.com/datasetLast. Visited: 03-Nov-2020.
Data Availability Statements The complete code and dataset developed and/or evaluated during the current study are available on https://comsatsnlpgroup.wordpress.com/.
These experiments used Keras (https://keras.io/) to implement Deep Learning models. According to the specifications of Keras, Embedding schema layer can only utilize an initial layer of the model, as presented in this study. The Keras embedding schema layer has been used to represent the sequences as dense embedding [40]. The Embedding schema layer uses one-hot encoding on each review (sentence) [41] by mapping words vector into the low-dimensional space. The embeddings developed by Keras embedding schema are utilized as a feature for the already described stacked LSTM model. On the top of the embedding schema layer, the model included two stacked LSTM layers that used Tanh as the activation function. Each hidden layer has been constituted of fixed 100 LSTM units.

References

Shashank K. Research on product review analysis and spam review detection. In: 4th International Conference on Signal Processing and Integrated Networks (SPIN), IEEE, 2017;pp 390–393
Zhu Y, Woo SS. Adversarial product review generation with word replacements. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 2018;2324–2326.
Zhang Z, Varadarajan B. Utility scoring of product reviews. In: Proceedings of the 15th ACM international conference on Information and knowledge management, 2006.
Jiménez FR, Mendoza NA. Too popular to ignore: the influence of online reviews on purchase intentions of search and experience products. J Interact Market. 2013;27(3):226–35.
Article Google Scholar
Hossain F. Fake review detection using data mining, 2019.
Sun C, Du Q, Tian G. Exploiting product related review features for fake review detection. Math Prob Eng. 2016;2016:1–7.
Article Google Scholar
Daud M, Mohibullah KRA. Roman Urdu opinion mining system (RUOMiS): Daud; 2015.
Mehmood K, Essam D, Shafi K. Sentiment analysis system for Roman Urdu. In: Proceedings of the 2018 Computing Conference, 2018.
Daud A, Khan W, Che D. Urdu language processing: a survey. Artif. Intell. Rev. 2016;279–311.
Kunchukuttan A, Mehta P, Bhattacharyya P. The IIT Bombay English-Hindi Parallel Corpus. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018), Miyazaki, 2018.
Dematis I, Karapistoli E, Vakali A. Fake review detection via exploitation of spam indicators and reviewer behavior characteristics. In: SOFSEM 2018: theory and practice of computer science, 2017.
Elmurngi E, Gherbi A. Fake reviews detection on movie reviews through sentiment analysis using. Int J Adv Syst Measure. 2018;11:196–207.
Google Scholar
Alsubari S, Shelke M, Deshmukh S. Fake reviews identification based on deep computational linguistic. Int J Adv Sci Technol. 2020;29:3846–56.
Google Scholar
El-Haless A, Hammad A. An approach for detecting spam in Arabic. Int Arab J Inf Technol. 2015;12:9–16.
Google Scholar
Li H, Zhiyuan C, Bing L, Xiaokai W, Jidong S. Spotting fake reviews via collective. In: IEEE international conference on data mining, ICDM 2015., 2014.
Ott M, Claire C, Hancock JT. Negative deceptive opinion spam. 2013;497–501.
Jindal N, Liu B. Review spam detection. In: Proceedings of the 16th international conference on World Wide Web, 2007.
Rastogi A, Mehrotra M. Impact of behavioral and textual features on opinion spam detection. In: IEEE International Conference on Intelligent Computing and Control Systems, Madurai, 2018.
Wu X, Dong Y, Tao J, Huang C, Chawla NV. Reliable fake review detection via modeling temporal and behavioral patterns. In: IEEE International Conference on Big Data, Boston, 2017.
Phayung M. Thai fake news detection based on information retrieval, natural language processing and machine learning. SN Comput Sci. 2021;2(6):1–17.
Google Scholar
Vasantharajan TC, Uthayasanker. Towards offensive language identification for Tamil code-mixed youtube comments and posts. SN Comput Sci. 2022;3(1):1–13.
Article Google Scholar
Surana S, Dembla S, Bihani P. Identifying contradictions in the legal proceedings using natural language models. SN Comput Sci. 2022;3(3):1–14.
Article Google Scholar
Baishya D, Deka JJ, Dey G, Singh PK. SAFER: sentiment analysis-based fake review detection in e-commerce using deep learning. SN Comput Sci. 2021;2(6):1–12.
Article Google Scholar
Dadhich A, Thankachan B. Social and juristic challenges of AI for opinion mining approaches on Amazon and Flipkart product reviews using machine learning algorithms. SN Comput Sci. 2021;2(3):1–21.
Article Google Scholar
Shojaee S, Azman A, Sharef N, Sulaiman N. A framework for fake review annotation. In: 17th UKSIM-AMSS IEEE International Conference on Modelling and Simulation, 2015.
Jindal N, Liu B. Opinion spam and analysis. In: Proceedings of the International Conference on Web Search and Web Data Mining, ACM, New York, NY, USA, 2008.
Ong T, Mannino M, Gregg D. Linguistic characteristics of shill reviews. Electron Commer Res Appl. 2013;13:69–78.
Article Google Scholar
Viera V, Anthony J, Joanne Garrett M, others. Understanding interobserver agreement: the kappa statistic. J Fam Med. 2005;37:360–363.
Alberto TC, Lochter JV, Almeida TA. Tubespam: comment spam filtering on youtube. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA) 2015, December;138–143. IEEE.
Khan L, Amjad A, Ashraf N, Chang H-T, Gelbukh A. Urdu sentiment analysis with deep learning methods. IEEE Access. 2021;9:97803–12.
Article Google Scholar
Xu J-M, Kwang-Sung J, Xiaojin Z, Amy B. Learning from bullying traces in social media. In: Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies, 2012;656–666.
Sharif O, Hoque MM, Hossain E. Sentiment analysis of Bengali texts on online restaurant reviews using multinomial Naïve Bayes. In: 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT) 2019 May;1–6. IEEE.
Monteiro RA, Santos RL, Pardo TA, Almeida TAD, Ruiz EE, Vale OA. Contributions to the study of fake news in portuguese: new corpus and automatic detection results. In: International Conference on Computational Processing of the Portuguese Language 2018, September;324–334. Springer, Cham.
Saumya S, Singh JP. Spam review detection using LSTM autoencoder: an unsupervised approach. Electron. Comm. Res. 2020;22:1–21.
Google Scholar
Ko M-C, Huang H-H, Chen H-H. Paid review and paid writer detection. In: Proceedings of the International Conference on Web Intelligence, 2017.
Fornaciari T, Cagnina L, Rosso P, Poesio M. Fake opinion detection: how similar are crowdsourced datasets to real data? Lang Resour Eval. 2020;54(4):1019–58.
Article Google Scholar
Alsubari SN, Deshmukh SN, Al-Adhaileh MH, Alsaade FW, Aldhyani TH. Development of integrated neural network model for identification of fake reviews in e-commerce using multidomain datasets. Appl Bionics Biomech. 2021;2021:1–11.
Article Google Scholar
Fang W, Zhang J, Wang D, Chen Z, Li M. Entity disambiguation by knowledge and text jointly embedding. In: Proceedings of the 20th SIGNLL conference on computational natural language learning 2016, August;260–269.
Hochreiter and Shmidhuber. Long short-term memory. Neural Comput. 1997;9(8):1735–80.
Article Google Scholar
Keras. Embedding layer. Keras.io. Available: https://keras.io/api/layers/core_layers/embedding/. [Accessed 2 Oct 2020]
Chen Z, Zhou Y. Research on automatic essay scoring of composition based on CNN and OR. In: 2019 2nd International Conference on Artificial Intelligence and Big Data (ICAIBD) 2019, May;13–18. IEEE.
Nisha K. Sentiment analysis of regional languages written in roman script on social media. Data Sci Intell Appl. 2021;52:113–9.
Google Scholar
Sun C, Du Q, Tian G. Exploiting product related review features for fake review detection. Math Prob Eng. 2016;1:1–7.
Google Scholar
Mukherjee A, Venkataraman V, Liu B, Glance N. Fake review detection: classification and analysis of real and pseudo reviews. 2013.
Wang Z, Zhang Y, Qian T. Fake review detection on Yelp. 2017.
Centor RM, Schwartz JS. An evaluation of methods for estimating the area under the receiver operating characteristic (ROC) curve. Med Decis Making. 1985;5:149–56.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Software Engineering, University of Lahore, Lahore, Pakistan
Umer Hayat, Muhammad Humayon Khan Vardag & Muhammad Farhat Ullah
Department of Software Engineering, University of Central Punjab, Lahore, Pakistan
Ali Saeed
Department of Computer Science and IT, University of Lahore, Lahore, Pakistan
Nadeem Iqbal

Authors

Umer Hayat
View author publications
You can also search for this author in PubMed Google Scholar
Ali Saeed
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Humayon Khan Vardag
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Farhat Ullah
View author publications
You can also search for this author in PubMed Google Scholar
Nadeem Iqbal
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Mr. Hayat and Mr. Verdag collected data and executed experiments. Dr. Saeed and Mr. Ullah wrote the manuscript. Further, Dr. Saeed supervised the entire research project. Dr. Iqbal proofread the entire manuscript.

Corresponding author

Correspondence to Muhammad Farhat Ullah.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hayat, U., Saeed, A., Vardag, M.H.K. et al. Roman Urdu Fake Reviews Detection Using Stacked LSTM Architecture. SN COMPUT. SCI. 3, 470 (2022). https://doi.org/10.1007/s42979-022-01385-6

Download citation

Received: 19 December 2021
Accepted: 21 August 2022
Published: 09 September 2022
DOI: https://doi.org/10.1007/s42979-022-01385-6

Roman Urdu Fake Reviews Detection Using Stacked LSTM Architecture

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Factitious or fact? Learning textual representations for fake online review detection

FRD-LSTM: a novel technique for fake reviews detection using DCWR with the Bi-LSTM method

Fake Reviews Detection Using Multi-input Neural Network Model

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Roman Urdu Fake Reviews Detection Using Stacked LSTM Architecture

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Factitious or fact? Learning textual representations for fake online review detection

FRD-LSTM: a novel technique for fake reviews detection using DCWR with the Bi-LSTM method

Fake Reviews Detection Using Multi-input Neural Network Model

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation