From Customer’s Voice to Decision-Maker Insights: Textual Analysis Framework for Arabic Reviews of Saudi Arabia’s Super App
<p>The Arabic Super App User Experience Framework (ASA-UXF).</p> "> Figure 2
<p>Sentiment distribution of the sample dataset.</p> "> Figure 3
<p>Sentiment distribution of the Mrsool dataset.</p> "> Figure 4
<p>The most used words in the Mrsool dataset: (<b>a</b>) word frequencies and (<b>b</b>) word cloud.</p> "> Figure 5
<p>Monthly number of reviews combined with users’ sentiments for Mrsool.</p> "> Figure 6
<p>The most used words for (<b>a</b>) positive, (<b>b</b>) negative, and (<b>c</b>) neutral sentiment in Mrsool.</p> "> Figure 7
<p>The extracted (<b>a</b>) topic 0 (<b>b</b>) topic 1 (<b>c</b>) topic 2 and (<b>d</b>) topic 3 using BTM.</p> "> Figure 8
<p>Topics distribution among Mrsool dataset.</p> "> Figure 9
<p>Topic-based sentiment analysis of Mrsool dataset.</p> "> Figure 10
<p>The elbow method of (<b>a</b>) topic 0 (<b>b</b>) topic 1 (<b>c</b>) topic 2 and (<b>d</b>) topic 3.</p> "> Figure 11
<p>Clustering topics using (<b>a</b>) Euclidean distance and (<b>b</b>) cosine similarity.</p> "> Figure 12
<p>Clustering topics using (<b>a</b>) POS with Euclidean distance and (<b>b</b>) POS with cosine similarity.</p> "> Figure 13
<p>Topics’ distribution in the Careem dataset.</p> "> Figure 14
<p>Topic-based sentiment analysis of the Careem dataset.</p> ">
Abstract
:1. Introduction
- We propose a comprehensive framework that uses users’ reviews to provide decision-makers with a clear view of users’ opinions regarding their super app in a visualized way that will be easy and quick to understand. This is accomplished by introducing two machine learning approaches, namely, topic modeling with sentiment analysis and topic modeling with clustering.
- To the best of our knowledge, this is the first research study to provide a framework for analyzing the Arabic user reviews of a business sector super app in the context of Saudi Arabia utilizing topic modeling, sentiment analysis, and clustering approaches.
- We create an Arabic dataset collected from the Apple and Google Play Stores for some of the most well-known super applications found in the KSA.
2. Related Work
2.1. Sentiment Analysis
2.2. Topic Modeling
2.3. Clustering
3. Materials and Methods
3.1. Data Processing
3.1.1. Data Collection
3.1.2. Data Annotation
3.1.3. Data Preprocessing
- We removed unnecessary additional data such as non-Arabic characters, numbers, punctuation, emojis, symbols, extra whitespace, empty rows, and duplicate reviews. Furthermore, reviews with a length of fewer than two words were removed from the dataset, considering that one word does not construct a useful sentence, which may confuse the model.
- We applied tokenization to break up reviews into smaller portions (words) to aid in analyzing and comprehending these reviews.
- We removed stop words that indicate the most frequent words that commonly occur together in the corpus since they do not add any semantic information that is useful for the model, such as “in”, “on”, and “from”, using the NLTK package (https://www.nltk.org/search.html?q=stopwords) (accessed on 3 June 2023). In addition, proper nouns such as the application name and the name of God (Allah) that are irrelevant to comprehending the topic’s underlying concept were removed as applied by the researchers in [41,42], as instead, they may produce a confusing impact on the model. For this reason, it is advisable to eliminate them before proceeding with further examination.
- We normalized reviews to transform the text into a standard format. Additionally, we removed Tatweel or repeated words and characters that are used to emphasize the opinion. Meanwhile, we removed Tashkeel, which is a special inscription used in Arabic lettering. These steps were applied with the help of the pyArabic tool (https://github.com/linuxscout/pyarabic/tree/master (accessed on 3 June 2023)).
- We applied lemmatizations to convert words into their stem or root. Different types were examined as shown in Table 3, and to obtain accurate data in a shorter time, the Qalsadi Lemmatizer was chosen to be applied in the framework.
3.1.4. Document Embedding
3.2. Topic Modeling
3.3. Clustering
3.4. Visualization
3.5. Evaluation Method
3.6. Finding the Optimal of Parameter
- Grid search is a methodical technique to look for hyperparameters across the search space, and it generates all potential combinations regardless of the element’s influence in the optimization process. Thus, this method offers certain assurances when it assigns every parameter an equal chance of influencing the optimization process. However, it may require a large amount of computation and time when involving several parameters, each with multiple values [58,59].
- The elbow method is important for any unsupervised method, such as k-means clustering, to establish the optimal number of clusters to put the data into. This is because unsupervised learning lacks the ability to set the number of clusters. Accordingly, the elbow method is one of the methods used to define the best number of clusters. The fundamental concept behind the elbow method is using the square of the distance between each cluster’s centroid and sample points to generate a range of k-values. It operates by calculating the WCSS (within-cluster sum of squares). The WCSS is computed by iterating over the k-value. To determine the optimal k-value, it should plot the k-WCSS curve and find the downward inflection point [60].
4. Results
4.1. General Analysis
4.2. Topic Modeling and Sentiment Analysis
4.3. Topic Modeling and Clustering
Optimization
5. Testing
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Statista. Number of Smartphone Users Worldwide from 2014 to 2029. 2024. Available online: https://www.statista.com/forecasts/1143723/smartphone-users-in-the-world (accessed on 5 March 2024).
- Data.ai. The State of Mobile 2023: How to Navigate This Uncertain Year. 2023. Available online: https://www.data.ai/en/go/state-of-mobile-2023/ (accessed on 6 March 2024).
- Statista. App—Saudi Arabia. 2024. Available online: https://www.statista.com/outlook/amo/app/saudi-arabia (accessed on 6 March 2024).
- Muhammed, R. 50+ Eye-Opening UX Statistics That Prove UX Matters! 2023. Available online: https://www.wowmakers.com/blog/ux-statistics/ (accessed on 7 March 2024).
- Berni, A.; Borgianni, Y. From the definition of user experience to a framework to classify its applications in design. Proc. Des. Soc. 2021, 1, 1627–1636. [Google Scholar] [CrossRef]
- Perri, L. What’s New in the 2022 Gartner Hype Cycle for Emerging Technologies. Gartner Insights. 2022. Available online: https://www.gartner.com/en/articles/what-s-new-in-the-2022-gartner-hype-cycle-for-emerging-technologies (accessed on 17 May 2023).
- Statista. Super Apps—Statistics & Facts. 2024. Available online: https://www.statista.com/topics/10296/super-apps/ (accessed on 9 March 2024).
- Diaz Baquero, A.P. Super Apps: Opportunities and Challenges. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2021. [Google Scholar]
- Ota, F.K.C.; Meira, J.A.; Frank, R.; State, R. Towards Privacy Preserving Data Centric Super App. In Proceedings of the 2020 Mediterranean Communication and Computer Networking Conference (MedComNet), Arona, Italy, 17–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–4. [Google Scholar]
- Roa, L.; Correa-Bahnsen, A.; Suarez, G.; Cortés-Tejada, F.; Luque, M.A.; Bravo, C. Super-app behavioral patterns in credit risk models: Financial, statistical and regulatory implications. Expert Syst. Appl. 2021, 169, 114486. [Google Scholar] [CrossRef]
- Roa, L.; Rodríguez-Rey, A.; Correa-Bahnsen, A.; Arboleda, C.V. Supporting financial inclusion with graph machine learning and super-app alternative data. In Intelligent Systems and Applications, Proceedings of the 2021 Intelligent Systems Conference (IntelliSys), Amsterdam, The Netherlands, 2–3 September 2021; Springer: Cham, Switzerland, 2022; Volume 2, pp. 216–230. [Google Scholar]
- Airlangga, M.C.; Sulasikin, A.; Nugraha, Y.; Husna, N.L.R.; Aminanto, M.E.; Kurniawan, F.; Kanggrawan, J.I. Understanding Citizen Feedback of Jakarta Government Super App: Leveraging Deep Learning Models. In Proceedings of the 2023 IEEE International Smart Cities Conference (ISC2), Bucharest, Romania, 24–27 September 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
- Nakamura, W.T.; de Oliveira, E.C.; de Oliveira, E.H.; Redmiles, D.; Conte, T. What factors affect the UX in mobile apps? A systematic mapping study on the analysis of app store reviews. J. Syst. Softw. 2022, 193, 111462. [Google Scholar] [CrossRef]
- Jain, P.K.; Pamula, R.; Srivastava, G. A systematic literature review on machine learning applications for consumer sentiment analysis using online reviews. Comput. Sci. Rev. 2021, 41, 100413. [Google Scholar] [CrossRef]
- Kwon, H.J.; Ban, H.J.; Jun, J.K.; Kim, H.S. Topic modeling and sentiment analysis of online review for airlines. Information 2021, 12, 78. [Google Scholar] [CrossRef]
- Rafea, A.; GabAllah, N.A. Topic detection approaches in identifying topics and events from arabic corpora. Procedia Comput. Sci. 2018, 142, 270–277. [Google Scholar] [CrossRef]
- Ahmed, M.; Seraj, R.; Islam, S.M.S. The k-means algorithm: A comprehensive survey and performance evaluation. Electronics 2020, 9, 1295. [Google Scholar] [CrossRef]
- Ethnologue. What Are the Top 200 Most Spoken Languages? 2023. Available online: https://www.ethnologue.com/insights/ethnologue200/ (accessed on 12 March 2024).
- unesco. World Arabic Language Day. 2023. Available online: https://www.unesco.org/en/world-arabic-language-day (accessed on 12 March 2024).
- Ramzy, M.; Ibrahim, B. User satisfaction with Arabic COVID-19 apps: Sentiment analysis of users’ reviews using machine learning techniques. Inf. Process. Manag. 2024, 61, 103644. [Google Scholar] [CrossRef]
- Badaro, G.; Baly, R.; Hajj, H.; El-Hajj, W.; Shaban, K.B.; Habash, N.; Al-Sallab, A.; Hamdi, A. A survey of opinion mining in Arabic: A comprehensive system perspective covering challenges and advances in tools, resources, models, applications, and visualizations. Acm Trans. Asian-Low-Resour. Lang. Inf. Process. (TALLIP) 2019, 18, 1–52. [Google Scholar] [CrossRef]
- Nassif, A.B.; Elnagar, A.; Shahin, I.; Henno, S. Deep learning for Arabic subjective sentiment analysis: Challenges and research opportunities. Appl. Soft Comput. 2021, 98, 106836. [Google Scholar] [CrossRef]
- Sarker, I.H. Machine learning: Algorithms, real-world applications and research directions. Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef] [PubMed]
- Pilliang, M.; Akbar, H.; Firmansyah, G. Sentiment analysis for super applications in Indonesia: A case study of Gov2Go App. In Proceedings of the 2022 3rd International Conference on Electrical Engineering and Informatics (ICon EEI), Virtual Conference, 19–20 August 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 80–85. [Google Scholar]
- Aji, S.; Hidayatun, N.; Faqih, H. The sentiment analysis of Fintech users using support vector machine and particle swarm optimization method. In Proceedings of the 2019 7th International Conference on Cyber and IT Service Management (CITSM), Jakarta, Indonesia, 6–8 November 2019; IEEE: Piscataway, NJ, USA, 2019; Volume 7, pp. 1–5. [Google Scholar]
- Al-Hagree, S.; Al-Gaphari, G. Arabic Sentiment Analysis Based Machine Learning for Measuring User Satisfaction with Banking Services’ Mobile Applications: Comparative Study. In Proceedings of the 2022 2nd International Conference on Emerging Smart Technologies and Applications (eSmarTA), Ibb, Yemen, 25–26 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–4. [Google Scholar]
- Hadwan, M.; Al-Hagery, M.; Al-Sarem, M.; Saeed, F. Arabic sentiment analysis of users’ opinions of governmental mobile applications. Comput. Mater. Contin. 2022, 72, 4675–4689. [Google Scholar] [CrossRef]
- Hadwan, M.; Al-Sarem, M.; Saeed, F.; Al-Hagery, M.A. An improved sentiment classification approach for measuring user satisfaction toward governmental services’ mobile apps using machine learning methods with feature engineering and SMOTE technique. Appl. Sci. 2022, 12, 5547. [Google Scholar] [CrossRef]
- Banjabi, D.; Almezeini, N. Customer Satisfaction Toward Commercial E-Services in Saudi Arabia: A Sentiment Analysis. In Proceedings of the 2023 International Symposium on Networks, Computers and Communications (ISNCC), Doha, Qatar, 23–26 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
- Al-Smadi, F.; Al-Shboul, B.; Al-Darras, D.; Al-Qudah, D. Aspect-Based Sentiment Analysis of Arabic Restaurants Customers’ Reviews Using a Hybrid Approach. In Proceedings of the 14th International Conference on Management of Digital EcoSystems, Venice, Italy, 19–21 October 2022; pp. 123–128. [Google Scholar]
- Vayansky, I.; Kumar, S.A. A review of topic modeling methods. Inf. Syst. 2020, 94, 101582. [Google Scholar] [CrossRef]
- Abdelrazek, A.; Eid, Y.; Gawish, E.; Medhat, W.; Hassan, A. Topic modeling algorithms and applications: A survey. Inf. Syst. 2023, 112, 102131. [Google Scholar] [CrossRef]
- Kang, H.J.; Kim, C.; Kang, K. Analysis of the trends in biochemical research using latent dirichlet allocation (LDA). Processes 2019, 7, 379. [Google Scholar] [CrossRef]
- Sarker, M.R.I.; Matin, A. A hybrid collaborative recommendation system based on matrix factorization and deep neural network. In Proceedings of the 2021 International Conference on Information and Communication Technology for Sustainable Development (ICICT4SD), Dhaka, Bangladesh, 27–28 February 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 371–374. [Google Scholar]
- Rajendran, D.P.D.; Sundarraj, R.P. Using topic models with browsing history in hybrid collaborative filtering recommender system: Experiments with user ratings. Int. J. Inf. Manag. Data Insights 2021, 1, 100027. [Google Scholar] [CrossRef]
- Tushev, M.; Ebrahimi, F.; Mahmoud, A. Domain-specific analysis of mobile app reviews using keyword-assisted topic models. In Proceedings of the 44th International Conference on Software Engineering, Pittsburgh, PA, USA, 25–27 May 2022; pp. 762–773. [Google Scholar]
- Abuzayed, A.; Al-Khalifa, H. BERT for Arabic topic modeling: An experimental study on BERTopic technique. Procedia Comput. Sci. 2021, 189, 191–194. [Google Scholar] [CrossRef]
- Singh, S.; Chauhan, T.; Wahi, V.; Meel, P. Mining tourists’ opinions on popular Indian tourism hotspots using sentiment analysis and topic modeling. In Proceedings of the 2021 5th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 8–10 April 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1306–1313. [Google Scholar]
- Sutherland, I.; Sim, Y.; Lee, S.K.; Byun, J.; Kiatkawsin, K. Topic modeling of online accommodation reviews via latent dirichlet allocation. Sustainability 2020, 12, 1821. [Google Scholar] [CrossRef]
- Hu, N.; Zhang, T.; Gao, B.; Bose, I. What do hotel customers complain about? Text analysis using structural topic model. Tour. Manag. 2019, 72, 417–426. [Google Scholar] [CrossRef]
- Kiatkawsin, K.; Sutherland, I.; Kim, J.Y. A comparative automated text analysis of airbnb reviews in Hong Kong and Singapore using latent dirichlet allocation. Sustainability 2020, 12, 6673. [Google Scholar] [CrossRef]
- Ali, T.; Omar, B.; Soulaimane, K. Analyzing tourism reviews using an LDA topic-based sentiment analysis approach. MethodsX 2022, 9, 101894. [Google Scholar] [CrossRef] [PubMed]
- Lobo, E.H.; Abdelrazek, M.; Frølich, A.; Rasmussen, L.J.; Islam, S.M.S.; Kensing, F.; Grundy, J. Detecting user experience issues from mHealth apps that support stroke caregiver needs: An analysis of user reviews. Front. Public Health 2023, 11, 1027667. [Google Scholar]
- Marzijarani, S.B.; Sajedi, H. Opinion mining with reviews summarization based on clustering. Int. J. Inf. Technol. 2020, 12, 1299–1310. [Google Scholar] [CrossRef]
- Booth, F.; Potts, C.; Bond, R.; Mulvenna, M.D.; Ennis, E.; Mctear, M.F. Review mining to discover user experience issues in mental health and wellbeing chatbots. In Proceedings of the 33rd European Conference on Cognitive Ergonomics, New York, NY, USA, 4 October 2022; pp. 1–5. [Google Scholar]
- Permana, M.E.; Ramadhan, H.; Budi, I.; Santoso, A.B.; Putra, P.K. Sentiment analysis and topic detection of mobile banking application review. In Proceedings of the 2020 Fifth International Conference on Informatics and Computing (ICIC), Gorontalo, Indonesia, 3–4 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6. [Google Scholar]
- Moreno, A.; Iglesias, C.A. Understanding Customers’ Transport Services with Topic Clustering and Sentiment Analysis. Appl. Sci. 2021, 11, 10169. [Google Scholar] [CrossRef]
- Obeid, O.; Zalmout, N.; Khalifa, S.; Taji, D.; Oudah, M.; Alhafni, B.; Inoue, G.; Eryani, F.; Erdmann, A.; Habash, N. CAMeL tools: An open source python toolkit for Arabic natural language processing. In Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France, 11–16 May 2020; pp. 7022–7032. [Google Scholar]
- Le, Q.; Mikolov, T. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning, PMLR, Beijing, China, 22–24 June 2014; pp. 1188–1196. [Google Scholar]
- Yan, X.; Guo, J.; Lan, Y.; Cheng, X. A biterm topic model for short texts. In Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 13–17 May 2013; pp. 1445–1456. [Google Scholar]
- MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, 21 June–18 July 1967; Volume 1, pp. 281–297. [Google Scholar]
- Mimno, D.; Wallach, H.; Talley, E.; Leenders, M.; McCallum, A. Optimizing semantic coherence in topic models. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, UK, 27–31 July 2011; pp. 262–272. [Google Scholar]
- Heinrich, G. Parameter Estimation for Text Analysis; Technical Report; Citeseer: Princeton, NJ, USA, 2005. [Google Scholar]
- Shahapure, K.R.; Nicholas, C. Cluster quality analysis using silhouette score. In Proceedings of the 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), Sydney, Australia, 6–9 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 747–748. [Google Scholar]
- Ghazal, T.M. Performances of k-means clustering algorithm with different distance metrics. Intell. Autom. Soft Comput. 2021, 30, 735–742. [Google Scholar] [CrossRef]
- Mandal, A.; Chaki, R.; Saha, S.; Ghosh, K.; Pal, A.; Ghosh, S. Measuring similarity among legal court case documents. In Proceedings of the 10th Annual ACM India Compute Conference, New York, NY, USA, 16–18 November 2017; pp. 1–9. [Google Scholar]
- Hanifi, M.; Chibane, H.; Houssin, R.; Cavallucci, D. Problem formulation in inventive design using Doc2vec and Cosine Similarity as Artificial Intelligence methods and Scientific Papers. Eng. Appl. Artif. Intell. 2022, 109, 104661. [Google Scholar]
- Alibrahim, H.; Ludwig, S.A. Hyperparameter optimization: Comparing genetic algorithm against grid search and bayesian optimization. In Proceedings of the 2021 IEEE Congress on Evolutionary Computation (CEC), Krakow, Poland, 28 June–1 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1551–1559. [Google Scholar]
- Marinov, D.; Karapetyan, D. Hyperparameter optimisation with early termination of poor performers. In Proceedings of the 2019 11th Computer Science and Electronic Engineering (CEEC), Colchester, UK, 18–20 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 160–163. [Google Scholar]
- Yuan, C.; Yang, H. Research on K-value selection method of K-means clustering algorithm. J 2019, 2, 226–235. [Google Scholar] [CrossRef]
- Salloum, S.A.; AlHamad, A.Q.; Al-Emran, M.; Shaalan, K. A survey of Arabic text mining. In IIntelligent Natural Language Processing: Trends and Applications; Springer: Cham, Switzerland, 2018; pp. 417–431. [Google Scholar]
- Dialani, P. The Future of Data Revolution will be Unstructured Data. Analytics Insight. Saatavilla. 2020. Available online: https://www.analyticsinsight.net/insights/the-future-of-data-revolution-will-be-unstructured-data (accessed on 28 February 2023).
- Chiche, A.; Yitagesu, B. Part of speech tagging: A systematic review of deep learning and machine learning approaches. J. Big Data 2022, 9, 10. [Google Scholar]
- Tseng, S.C.; Lu, Y.C.; Chakraborty, G.; Chen, L.S. Comparison of sentiment analysis of review comments by unsupervised clustering of features using LSA and LDA. In Proceedings of the 2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST), Morioka, Japan, 23–25 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar]
- Kumar, A.V.; Meera, K. Sentiment Analysis Using K Means Clustering on Microblogging Data Focused on Only the Important Sentiments. In Proceedings of the 2022 10th International Conference on Emerging Trends in Engineering and Technology-Signal and Information Processing (ICETET-SIP-22), Nagpur, India, 29–30 April 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–5. [Google Scholar]
- Heriswan, D.D.; Sari, Y.A.; Furqon, M. Clustering Public Opinions Related to Quarantine during COVID-19 on Twitter Using K-DENCLUE Algorithms. In Proceedings of the 6th International Conference on Sustainable Information Engineering and Technology, Malang, Indonesia, 13–14 September 2021; pp. 252–257. [Google Scholar]
- Jacob, S.S.; Vijayakumar, R. Sentimental analysis over twitter data using clustering based machine learning algorithm. J. Ambient. Intell. Humaniz. Comput. 2021; 1–12. [Google Scholar] [CrossRef]
- Alhawarat, M.O.; Abdeljaber, H.; Hilal, A. Effect of stemming on text similarity for Arabic language at sentence level. PeerJ Comput. Sci. 2021, 7, e530. [Google Scholar]
- Atwan, J.; Wedyan, M.; Bsoul, Q.; Hammadeen, A.; Alturki, R. The use of stemming in the Arabic text and its impact on the accuracy of classification. Sci. Program. 2021, 2021, 1367210. [Google Scholar] [CrossRef]
Ref | Year | Methodology | Environment | Super App | Country | Language | Results | Limitations |
---|---|---|---|---|---|---|---|---|
[24] | 2022 | Sentiment analysis | Gov2Go app | Yes | Indonesia | English | 97% for LSTM, 94.22% for NB | Data only collected from the Google Play Store |
[25] | 2019 | Sentiment analysis | Ovo app | Yes | Indonesia | Indonesian | 82.89% for SVM | Small dataset |
[26] | 2022 | Sentiment analysis | Eight banking services’ applications | No | Yemen | Arabic | 89.65% for NB | Data only collected from the Google Play Store, using an unbalanced dataset |
[27] | 2022 | Sentiment analysis | Tawakkalna, Tetaman, Tabaud, Sehhaty, Mawid, and Sehhah apps | Yes | Saudi Arabia | Arabic | 78.46% | Using unbalanced dataset |
[28] | 2022 | Sentiment analysis | Tawakkalna, Tetaman, Tabaud, Sehhaty, Mawid, and Sehhah apps | Yes | Saudi Arabia | Arabic converted to English | 94.38% for SVM | Using Google Translate to translate Arabic reviews |
[29] | 2023 | Sentiment analysis | Saudi Ministry of Commerce’s X account and Commercial report app | No | Saudi Arabia | Arabic | 94% for SVM | Unbalanced dataset |
[30] | 2022 | Aspect-based sentiment analysis | Jeeran restaurant’s website | No | Jordan | Arabic | 84.47% for SVM | Unbalanced dataset |
[38] | 2021 | Lexicon-based sentiment analysis and topic modeling | TripAdvisor website | No | India | English | Different sentimental scores were obtained based on star rating and location; different numbers of topics were calculated for ten Indian locations | Uses the AFINN lexicon based on simple classifications for text and which has insufficient emotional words, making it harder to detect negative reviews |
[39] | 2020 | Topic modeling | Online Travel Agencies | No | South Korea | English | 14 topics for LDA | Includes specific accommodation types in certain locations; the cultural differences in the country’s accommodation types prevent the generalization of the research results to other countries |
[40] | 2019 | Topic modeling | TripAdvisor website | No | New York | English | 10 topics for STM | Using an old dataset for one city |
[41] | 2020 | Topic modeling | Airbnb website | No | Singapore and Hong Kong | English | 5 topics for Singapore, 12 topics for Hong Kong using LDA | Using topic modeling alone cannot capture users’ sentiments |
[42] | 2022 | Lexicon-based sentiment analysis and topic modeling | TripAdvisor website | No | Morocco | English | 77.3%, 4 topics for LDA | Using 12-year-old data |
[44] | 2020 | Clustering | TripAdvisor website | No | Seattle and Manhattan | English | 5 clusters for GMM | Using clustering alone cannot capture users’ sentiments |
[46] | 2020 | Topic modeling, sentiment analysis, and clustering | Mobile banking app | No | Indonesia | Indonesian | 5 topics for LDA, 86.76% for NB, 20 clusters for k-means | The visualization showed overlapping between extracted topics, indicating the extraction of similar topics |
[47] | 2021 | Topic modeling and clustering | Uber customer service’s X account | No | Not mentioned | English | 7 topics for LDA, 7 clusters for k-means | This study does not provide a geographical analysis, as users’ opinion differ from one region to another |
Review | Translated Review | Rating | Sentiment |
---|---|---|---|
We thank you all for the excellent, fast, and clean service | 3 | positive | |
Thanks to him | 1 | positive | |
My order was for 99 riyals, and the delivery offer was for 1 riyal. After the order was completed, my account became 115. I did not like manipulation and the last time I ordered from Mrsool | 5 | negative | |
The delivery price is very high | 5 | negative | |
The worst app, their prices are high and the food is delivered cold | 5 | negative |
Reviews | Reviews Translated into English | Tokenize Reviews into Words | Arabic Light Stemmer | Snowball Stemmer | ISRI Stemmer | Farasa Lemmatization | Qalsadi Lemmatizer |
---|---|---|---|---|---|---|---|
I liked the delivery, it was very fast | |||||||
The food always arrives cold | |||||||
Drivers are slow in delivering orders |
Original Reviews | Reviews Translated into English | Preprocessed Reviews |
---|---|---|
The new update is bad. Please reconsider and return the application as it was before | ||
As they like changed the order submission system and make it automatic | ||
What is the new update, everything is pending |
Number of Topics | Alpha | Beta | Iteration | Coherence Score |
---|---|---|---|---|
3 | 0.1 | 0.01 | 1000 | −104.60 |
4 | 0.3 | 0.007 | 500 | −103.24 |
5 | 10 | 0.007 | 500 | −105.68 |
6 | 1 | 0.01 | 1000 | −105.93 |
7 | 0.3 | 0.01 | 1000 | −109.06 |
Topic Number | Topic Name | Word Cloud | Topic Description |
---|---|---|---|
0 | Delivery and Payment | Users expressed delivery and payment issues. | |
1 | Customer support and updates | Users expressed problems with updating and difficulties that they face in contacting technical and customer support. | |
2 | Prices | Users expressed their views on delivery prices concerning location. | |
3 | Application | Users expressed their opinions in general regarding the Mrsool app. |
Topic Number | Topic Name | Evaluation Metric | Silhouette Score of Reviews Using Doc2Vec and K-Means | Cluster Result Using PCA Visualization |
---|---|---|---|---|
0 | Delivery and Payment | Euclidean distance | 0.68 | |
Cosine similarity | 0.63 | |||
1 | Customer support and updates | Euclidean distance | 0.72 | |
Cosine similarity | 0.67 | |||
2 | Prices | Euclidean distance | 0.69 | |
Cosine similarity | 0.38 | |||
3 | Application | Euclidean distance | 0.76 | |
Cosine similarity | 0.63 |
Topic Number | Topic Name | Evaluation Metric | The Result before the Optimization | The Result after the Optimization | The Optimized Cluster Result Using PCA Visualization |
---|---|---|---|---|---|
0 | Delivery and Payment | Euclidean distance | 0.68 | 0.68 | |
Cosine similarity | 0.63 | 0.73 | |||
1 | Customer support and updates | Euclidean distance | 0.72 | 0.72 | |
Cosine similarity | 0.67 | 0.74 | |||
2 | Prices | Euclidean distance | 0.69 | 0.70 | |
Cosine similarity | 0.38 | 0.58 | |||
3 | Application | Euclidean distance | 0.76 | 0.74 | |
Cosine similarity | 0.63 | 0.46 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alrayani, B.; Kalkatawi, M.; Abulkhair, M.; Abukhodair, F. From Customer’s Voice to Decision-Maker Insights: Textual Analysis Framework for Arabic Reviews of Saudi Arabia’s Super App. Appl. Sci. 2024, 14, 6952. https://doi.org/10.3390/app14166952
Alrayani B, Kalkatawi M, Abulkhair M, Abukhodair F. From Customer’s Voice to Decision-Maker Insights: Textual Analysis Framework for Arabic Reviews of Saudi Arabia’s Super App. Applied Sciences. 2024; 14(16):6952. https://doi.org/10.3390/app14166952
Chicago/Turabian StyleAlrayani, Bodoor, Manal Kalkatawi, Maysoon Abulkhair, and Felwa Abukhodair. 2024. "From Customer’s Voice to Decision-Maker Insights: Textual Analysis Framework for Arabic Reviews of Saudi Arabia’s Super App" Applied Sciences 14, no. 16: 6952. https://doi.org/10.3390/app14166952
APA StyleAlrayani, B., Kalkatawi, M., Abulkhair, M., & Abukhodair, F. (2024). From Customer’s Voice to Decision-Maker Insights: Textual Analysis Framework for Arabic Reviews of Saudi Arabia’s Super App. Applied Sciences, 14(16), 6952. https://doi.org/10.3390/app14166952