Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

Roman Urdu Fake Reviews Detection Using Stacked LSTM Architecture

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Fake reviews detection is a considerable challenge to the different e-commerce and online business settings. This task aims to develop such systems that could ensure the veracity of reviews. The research community has made a range of attempts to deal with this issue. But unluckily, these attempts were geared to only small set of languages like English, Arabic, and some others. In the subcontinent, Roman Urdu is being used on the web. It has not been explored thoroughly for this task, however. On the other hand, over the last few years, deep learning methods have proved very successful for the diverse Natural Language Processing tasks. But, deep learning methods have not been explored for the Roman Urdu fake review detection task. To address this gap, this study has rendered a two-fold contribution (1) Construction of a novel Roman Urdu Fake Reviews Detection Corpus (RU-FRDC) which composes 5150 annotated reviews and (2) Comparison of various deep learning architectures including Simple RNN, LSTM, GRU, Bi-LSTM, and Bi-GRU. The evaluation has been carried out using widely used evaluation measures, i.e., Precision, Recall, \(F_1\), and ACC-ROC. The highest results were achieved using the stacked LSTM model (ACC - ROC = 0.943 and \(F_1 = 0.88\)).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://www.yelp.com/datasetLast. Visited: 03-Nov-2020.

  2. Data Availability Statements The complete code and dataset developed and/or evaluated during the current study are available on https://comsatsnlpgroup.wordpress.com/.

  3. These experiments used Keras (https://keras.io/) to implement Deep Learning models. According to the specifications of Keras, Embedding schema layer can only utilize an initial layer of the model, as presented in this study. The Keras embedding schema layer has been used to represent the sequences as dense embedding [40]. The Embedding schema layer uses one-hot encoding on each review (sentence) [41] by mapping words vector into the low-dimensional space. The embeddings developed by Keras embedding schema are utilized as a feature for the already described stacked LSTM model. On the top of the embedding schema layer, the model included two stacked LSTM layers that used Tanh as the activation function. Each hidden layer has been constituted of fixed 100 LSTM units.

References

  1. Shashank K. Research on product review analysis and spam review detection. In: 4th International Conference on Signal Processing and Integrated Networks (SPIN), IEEE, 2017;pp 390–393

  2. Zhu Y, Woo SS. Adversarial product review generation with word replacements. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, 2018;2324–2326.

  3. Zhang Z, Varadarajan B. Utility scoring of product reviews. In: Proceedings of the 15th ACM international conference on Information and knowledge management, 2006.

  4. Jiménez FR, Mendoza NA. Too popular to ignore: the influence of online reviews on purchase intentions of search and experience products. J Interact Market. 2013;27(3):226–35.

    Article  Google Scholar 

  5. Hossain F. Fake review detection using data mining, 2019.

  6. Sun C, Du Q, Tian G. Exploiting product related review features for fake review detection. Math Prob Eng. 2016;2016:1–7.

    Article  Google Scholar 

  7. Daud M, Mohibullah KRA. Roman Urdu opinion mining system (RUOMiS): Daud; 2015.

  8. Mehmood K, Essam D, Shafi K. Sentiment analysis system for Roman Urdu. In: Proceedings of the 2018 Computing Conference, 2018.

  9. Daud A, Khan W, Che D. Urdu language processing: a survey. Artif. Intell. Rev. 2016;279–311.

  10. Kunchukuttan A, Mehta P, Bhattacharyya P. The IIT Bombay English-Hindi Parallel Corpus. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC 2018), Miyazaki, 2018.

  11. Dematis I, Karapistoli E, Vakali A. Fake review detection via exploitation of spam indicators and reviewer behavior characteristics. In: SOFSEM 2018: theory and practice of computer science, 2017.

  12. Elmurngi E, Gherbi A. Fake reviews detection on movie reviews through sentiment analysis using. Int J Adv Syst Measure. 2018;11:196–207.

    Google Scholar 

  13. Alsubari S, Shelke M, Deshmukh S. Fake reviews identification based on deep computational linguistic. Int J Adv Sci Technol. 2020;29:3846–56.

    Google Scholar 

  14. El-Haless A, Hammad A. An approach for detecting spam in Arabic. Int Arab J Inf Technol. 2015;12:9–16.

    Google Scholar 

  15. Li H, Zhiyuan C, Bing L, Xiaokai W, Jidong S. Spotting fake reviews via collective. In: IEEE international conference on data mining, ICDM 2015., 2014.

  16. Ott M, Claire C, Hancock JT. Negative deceptive opinion spam. 2013;497–501.

  17. Jindal N, Liu B. Review spam detection. In: Proceedings of the 16th international conference on World Wide Web, 2007.

  18. Rastogi A, Mehrotra M. Impact of behavioral and textual features on opinion spam detection. In: IEEE International Conference on Intelligent Computing and Control Systems, Madurai, 2018.

  19. Wu X, Dong Y, Tao J, Huang C, Chawla NV. Reliable fake review detection via modeling temporal and behavioral patterns. In: IEEE International Conference on Big Data, Boston, 2017.

  20. Phayung M. Thai fake news detection based on information retrieval, natural language processing and machine learning. SN Comput Sci. 2021;2(6):1–17.

    Google Scholar 

  21. Vasantharajan TC, Uthayasanker. Towards offensive language identification for Tamil code-mixed youtube comments and posts. SN Comput Sci. 2022;3(1):1–13.

    Article  Google Scholar 

  22. Surana S, Dembla S, Bihani P. Identifying contradictions in the legal proceedings using natural language models. SN Comput Sci. 2022;3(3):1–14.

    Article  Google Scholar 

  23. Baishya D, Deka JJ, Dey G, Singh PK. SAFER: sentiment analysis-based fake review detection in e-commerce using deep learning. SN Comput Sci. 2021;2(6):1–12.

    Article  Google Scholar 

  24. Dadhich A, Thankachan B. Social and juristic challenges of AI for opinion mining approaches on Amazon and Flipkart product reviews using machine learning algorithms. SN Comput Sci. 2021;2(3):1–21.

    Article  Google Scholar 

  25. Shojaee S, Azman A, Sharef N, Sulaiman N. A framework for fake review annotation. In: 17th UKSIM-AMSS IEEE International Conference on Modelling and Simulation, 2015.

  26. Jindal N, Liu B. Opinion spam and analysis. In: Proceedings of the International Conference on Web Search and Web Data Mining, ACM, New York, NY, USA, 2008.

  27. Ong T, Mannino M, Gregg D. Linguistic characteristics of shill reviews. Electron Commer Res Appl. 2013;13:69–78.

    Article  Google Scholar 

  28. Viera V, Anthony J, Joanne Garrett M, others. Understanding interobserver agreement: the kappa statistic. J Fam Med. 2005;37:360–363.

  29. Alberto TC, Lochter JV, Almeida TA. Tubespam: comment spam filtering on youtube. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA) 2015, December;138–143. IEEE.

  30. Khan L, Amjad A, Ashraf N, Chang H-T, Gelbukh A. Urdu sentiment analysis with deep learning methods. IEEE Access. 2021;9:97803–12.

    Article  Google Scholar 

  31. Xu J-M, Kwang-Sung J, Xiaojin Z, Amy B. Learning from bullying traces in social media. In: Proceedings of the 2012 conference of the North American chapter of the association for computational linguistics: Human language technologies, 2012;656–666.

  32. Sharif O, Hoque MM, Hossain E. Sentiment analysis of Bengali texts on online restaurant reviews using multinomial Naïve Bayes. In: 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT) 2019 May;1–6. IEEE.

  33. Monteiro RA, Santos RL, Pardo TA, Almeida TAD, Ruiz EE, Vale OA. Contributions to the study of fake news in portuguese: new corpus and automatic detection results. In: International Conference on Computational Processing of the Portuguese Language 2018, September;324–334. Springer, Cham.

  34. Saumya S, Singh JP. Spam review detection using LSTM autoencoder: an unsupervised approach. Electron. Comm. Res. 2020;22:1–21.

    Google Scholar 

  35. Ko M-C, Huang H-H, Chen H-H. Paid review and paid writer detection. In: Proceedings of the International Conference on Web Intelligence, 2017.

  36. Fornaciari T, Cagnina L, Rosso P, Poesio M. Fake opinion detection: how similar are crowdsourced datasets to real data? Lang Resour Eval. 2020;54(4):1019–58.

    Article  Google Scholar 

  37. Alsubari SN, Deshmukh SN, Al-Adhaileh MH, Alsaade FW, Aldhyani TH. Development of integrated neural network model for identification of fake reviews in e-commerce using multidomain datasets. Appl Bionics Biomech. 2021;2021:1–11.

    Article  Google Scholar 

  38. Fang W, Zhang J, Wang D, Chen Z, Li M. Entity disambiguation by knowledge and text jointly embedding. In: Proceedings of the 20th SIGNLL conference on computational natural language learning 2016, August;260–269.

  39. Hochreiter and Shmidhuber. Long short-term memory. Neural Comput. 1997;9(8):1735–80.

    Article  Google Scholar 

  40. Keras. Embedding layer. Keras.io. Available: https://keras.io/api/layers/core_layers/embedding/. [Accessed 2 Oct 2020]

  41. Chen Z, Zhou Y. Research on automatic essay scoring of composition based on CNN and OR. In: 2019 2nd International Conference on Artificial Intelligence and Big Data (ICAIBD) 2019, May;13–18. IEEE.

  42. Nisha K. Sentiment analysis of regional languages written in roman script on social media. Data Sci Intell Appl. 2021;52:113–9.

    Google Scholar 

  43. Sun C, Du Q, Tian G. Exploiting product related review features for fake review detection. Math Prob Eng. 2016;1:1–7.

    Google Scholar 

  44. Mukherjee A, Venkataraman V, Liu B, Glance N. Fake review detection: classification and analysis of real and pseudo reviews. 2013.

  45. Wang Z, Zhang Y, Qian T. Fake review detection on Yelp. 2017.

  46. Centor RM, Schwartz JS. An evaluation of methods for estimating the area under the receiver operating characteristic (ROC) curve. Med Decis Making. 1985;5:149–56.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

Mr. Hayat and Mr. Verdag collected data and executed experiments. Dr. Saeed and Mr. Ullah wrote the manuscript. Further, Dr. Saeed supervised the entire research project. Dr. Iqbal proofread the entire manuscript.

Corresponding author

Correspondence to Muhammad Farhat Ullah.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hayat, U., Saeed, A., Vardag, M.H.K. et al. Roman Urdu Fake Reviews Detection Using Stacked LSTM Architecture. SN COMPUT. SCI. 3, 470 (2022). https://doi.org/10.1007/s42979-022-01385-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-022-01385-6

Keywords

Navigation