Bidirectional encoder representations from transformers and deep learning model for analyzing smartphone-related tweets

Sudheesh R¹, Muhammad Mujahid², Furqan Rustam³, Bhargav Mallampati⁴, Venkata Chunduri⁵, Isabel de la Torre Díez⁶, Imran Ashraf ⁷

1Kodiyattu Veedu, Kollam, Valakom, India

2Department of Computer Science, Khwaja Fareed University of Engineering and Information Technology, Rahim Yar Khan, Pakistan

3School of Computer Science, University College Dublin, Dublin, Ireland

4College of Engineering, University of North Texas, Denton, TX, United States of America

5Indiana State University, Terre Haute, IN, United States of America

6Department of Signal Theory, Communications and Telematics Engineering, University of Valladolid, Valladolid, Spain

7Information and Communication Engineering, Yeungnam University, Gyeongsan, Republic of Korea

DOI: 10.7717/peerj-cs.1432

Published: 2023-08-03
Accepted: 2023-05-18
Received: 2023-03-07

Academic Editor: Bilal Alatas

Subject Areas: Artificial Intelligence, Data Mining and Machine Learning, Network Science and Online Social Networks, Text Mining, Sentiment Analysis
Keywords: Sentiment classification, BERT, TextBlob, Twitter

Copyright: © 2023 R et al.
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.

Cite this article: R S, Mujahid M, Rustam F, Mallampati B, Chunduri V, de la Torre Díez I, Ashraf I. 2023. Bidirectional encoder representations from transformers and deep learning model for analyzing smartphone-related tweets. PeerJ Computer Science 9:e1432 https://doi.org/10.7717/peerj-cs.1432

The authors have chosen to make the review history of this article public.

Abstract

Nearly six billion people globally use smartphones, and reviews about smartphones provide useful feedback concerning important functions, unique characteristics, etc. Social media platforms like Twitter contain a large number of such reviews containing feedback from customers. Conventional methods of analyzing consumer feedback such as business surveys or questionnaires and focus groups demand a tremendous amount of time and resources, however, Twitter’s reviews are unstructured and manual analysis is laborious and time-consuming. Machine learning and deep learning approaches have been applied for sentiment analysis, but classification accuracy is low. This study utilizes a transformer-based BERT model with the appropriate preprocessing pipeline to obtain higher classification accuracy. Tweets extracted using Tweepy SNS scrapper are used for experiments, while fine-tuned machine and deep learning models are also employed. Experimental results demonstrate that the proposed approach can obtain a 99% classification accuracy for three sentiments.

Introduction

Social media enables businesses to interact with prospective customers in a timely, informative, and cost-effective manner. Currently, a huge number of executives are interested in social communication networks. Users of social media can participate in discussions, share information, and publish content related to products and services (Kaplan & Haenlein, 2010). Social media include blogs, microblogs, social networking websites, image, and video hosting services, instant messaging, and posting content from a large number of users. Through social media, we may communicate with the rest of society and express our thoughts, ideas, and opinions. Due to its widespread acceptance as an efficient communication resource, social media especially Twitter with nearly 200 million users is becoming more popular. Twitter is a fast-growing instant messaging network where users “tweet” to communicate information. These tweets offer people’s opinions and knowledge about social and business problems (Pak & Paroubek, 2010).

Nowadays, most companies use a variety of techniques to improve their products. Companies mostly use customer exposure as a strategy to understand customers’ opinions. Feedback surveys, systematic forms, ratings, and remote monitoring are typical ways to get feedback from customers. Using the information acquired from these comments, business companies can improve quality of products and services (Liu, 2012). The smartphone industry has been expanding dramatically, not just through traditional sales but also through internet sales. Customers search for the best phone and its features through online platforms and share information on social media (Silver, Huang & Taylor, 2019). Smartphones are becoming ever more essential in ordinary routine, and they offer a huge variety of platforms for information, communication, education, and entertainment (Barkhuus & Polichar, 2011). Smartphones typically feature touch screens, mobile Internet connectivity through WiFi or mobile networks, and the ability to install applications. Twitter is more valuable for companies as users express their opinions and sentiments in the form of short and long texts on any product (Jansen et al., 2009). Businesses and organizations struggle to gather tweets and analyze the containing sentiments. Tweets may be quickly assessed and classified into positive or negative emotions using automated sentiment analysis.

Several works have been presented that utilize machine and deep learning approaches for automatic sentiment classification. Parts of speech (PoS), lexicon-based approaches, and TextBlob are privileged techniques in this regard. For example, sentiments regarding reviews of smartphones are classified using support vector machine (SVM) in Kumari, Sharma & Soni (2017) and reported a 91% accuracy. Similarly, Krishnan, Sudheep & Santhanakrishnan (2017) used only a novel lexicon-based method to perform sentiment analysis on tweets for different mobile phones including the iPhone, Lenovo, Motorola, Nexus, and Samsung. Gurumoorthy & Suresh (2022) evaluated the sentiments related to popular smartphone products involving the natural language processing (NLP) toolkit and TextBlob. Such approaches have several restrictions. Machine learning and deep learning approaches are no extensively studied. A few studies focused on tweets labeling regarding sentiments while other deployed machine and deep learning models for sentiment classification. In addition, preprocessing techniques are not utilized effectively and their impact on classification accuracy is not extensively studied. Models are not evaluated in terms of computation cost. This study focuses on the above-mentioned limitations and makes the following contributions

A large number of tweets are extracted about smart phones from Twitter for brands like Apple, Samsung, and Xiaomi. Since extracted tweets are unstructured, preprocessing is performed to remove unnecessary and redundant data. Preprocessing being important to enhance the efficacy of models, the impact of preprocessing is analyzed regarding models’ performance and time complexity.
TextBlob is utilized to extract the polarity and subjectivity from tweets related to smartphone brands and label them as positive, negative, or neutral. The bag of words (BoW), term frequency-inverse document frequency and Word2Vec feature engineering techniques are used to extract relevant features.
A transformer-based bidirectional encoder representations from transformers (BERT) model is proposed to accurately classify the sentiments. The effectiveness and reliability of the proposed model are checked against other models. Furthermore, the proposed model’s robustness is evaluated on the additional dataset.
Different machine learning models like logistic regression (LR), random forest (RF), K nearest neighbor (KNN), SVM, stochastic gradient descent (SGD), decision tree (DT), extra tree classifier (ETC) and gradient boosting machine (GBM) are fine-tuned to obtain the best results and compared their performance the proposed BERT model.

This article is further divided into four sections. Section 2 presents the details of the literature review relevant to the current problem. Section 3 contains the materials and proposed approach in which we briefly discuss dataset information, preprocessing, machine and deep learning models, and their architecture. The results and discussion are summarized in Section 4. The conclusion is given in Section 5.

Related Work

Sentiment classification is an important research area in NLP and has been investigated widely during the past few years. For example, Naramula & Kalaivania (2021) collected tweets regarding iPhone and Samsung using the natural language toolkit (NLTK) and utilized machine learning models including RF, KNN, and SVM to classify tweets. Similarly, Jagdale, Shirsat & Deshmukh (2019) product review dataset for sentiment analysis using the lexicon-based approach with machine learning models. However, results from machine learning are not compared with other models, and the dataset was also limited. Combining lexicon-based and SVM, study (Chamlertwat et al., 2012) utilized microblog sentiment analysis to assist smartphone manufacturers. The study determined that some Apple customers tweet about necessary defects of mobile devices.

Along the same direction, Ray & Chakrabarti (2017) used lexicon-based, aspect-level, and document-level sentiment analysis on product reviews collected from Twitter. Over 3,000 tweets are collected and preprocessed for experiments. The classification is done using a lexicon dictionary-based method and emotions are detected. Fang & Zhan (2015) performed sentence-level and review-level sentiment analysis on Amazon product reviews using both manually labeled tweets and automatically labeled tweets. For sentiment classification, three machine learning models naive Bayes (NB), RF and SVM are utilized. To conduct an accurate analysis, the tweets’ punctuation, misspellings, and slang words are eliminated. Following that, a feature vector is developed using pertinent features. The classification of tweets into positive and negative classifications is completed using various classifiers (Neethu & Rajasree, 2013).

Twitter tweets relating to mobile phones are collected in Driyani & Walter Jeyakumar (2021) for sentiment analysis. The SVM model is used for sentiment classification on three different variations of the dataset. SVM is evaluated using different kernels and four different cross-validation approaches are also utilized. Results show that the radial-based function (RBF) kernel performs better than any other kernel. In addition, the performance of SVM is reported to decrease while increasing the dataset size. In Singla, Randhawa & Jain (2017), positive and negative moods in reviews regarding mobile phones are collected from Amazon.com and analyzed. Reviews are categorized using a variety of models including NB, SVM, and DT. Results report an 81.87% accuracy with SVM. Chawla, Dubey & Rana (2017) carried out a sentiment analysis on text from smartphone reviews. Of the NB and SVM models, SVM is reported to have good results.

Amazon mobile phone reviews are utilized for sentiment analysis in Dhabekar & Patil (2021). The tweets are processed and labeled using Vader-Analyzer. The long short-term memory (LSTM) model with one embedding layer including 256,800 parameters, an LSTM layer containing 256,800 parameters, and a dense classification layer containing 394 parameters are used for experiments. The proposed LSTM model achieved a 93% accuracy. Similarly, Onan (2019) analyzed the sentiment of product reviews using a convolutional neural network (CNN) model with deep learning models. Deep learning and the word embedding techniques fastText, GloVe, and Word2Vec are employed to classify sentiments. Additionally, the authors compare the proposed deep learning model to conventional machine learning models and reported promising results.

Iqbal et al. (2022) used an LSTM model with various combinations of layers for sentiment categorization on five different product review datasets obtained from Twitter and Amazon. The study used a combination of LSTM and CNN in sentiment analysis of tweets (Umer et al., 2021). Moreover, the efficacy of Word2Vec and term frequency-inverse document frequency techniques is evaluated. The study demonstrates that deep learning models outperform machine learning. Table 1 presents a comparative analysis of the discussed research works.

Table 1:

Summary of related work.

Reference	Model	Datasets	Results
Jagdale, Shirsat & Deshmukh (2019)	SVM	Amazon product reviews	92.85% accuracy, 91.64 precision, 95.64 f1-score
Chamlertwat et al. (2012)	Lexicon based + machine learning approach	Smartphone brands tweets	The authors only extract positive or negative sentiments.
Fang & Zhan (2015)	SVM, NB, RF	Amazon product reviews	The Naïve Bayes and SVM achieved same f1 score on review level sentiment classification.
Driyani & Walter Jeyakumar (2021)	SVM with RBF kernel	Apple iPhone reviews	SVM with the RBF kernel, and only 18,000 reviews achieved 91.87% accuracy.
Chawla, Dubey & Rana (2017)	SVM and NB	Smartphone related reviews	The naive Bayes model achieved 40% accuracy, and the SVM achieved 90% accuracy on smartphone reviews. This study does not extract important features from the data. The deep learning experiments are also missing.
Dhabekar & Patil (2021)	LSTM	Amazon products	93% accuracy, 93 precision , 93 recall, 92 f1 score
Iqbal et al. (2022)	LSTM	Amazon products	LSTM with different layers achieved better results on amazon food reviews, smartphone accessories, Yelp, amazon products, and IMDB tweets datasets.
This study	BERT	SmartPhone	The proposed BERT model achieved 99.3% accuracy by utilizing preprocessing techniques, and 98.4% accuracy without applying preprocessing to the smartphone-related tweets dataset.

DOI: 10.7717/peerjcs.1432/table-1

The authors (Supriyadi & Sibaroni, 2023) used Twitter tweets related to the Xiaomi smartphone for sentiment analysis with different aspects like camera, random access memory (RAM), and screen size. The authors first preprocessed the tweets and then divided the cleaned dataset into training and testing, separately. To analyze the opinions of the public towards different smartphone aspects, they adopted BERT and IndoBERT models for analysis. The proposed IndoBERT model achieved 90% accuracy. The authors achieved 78% positive sentiments with battery aspects, 76% on RAM aspects, and 68% on camera aspects. With battery quality aspects, IndoBERT achieved 18% negative sentiments. Sally (2023) extracts reviews for the Samsumg Galaxy S21, iPhone 13, and Google Pixel 6. The downloaded data contains text, numbers, and emojis. The reviews are in different languages, but for sentiment analysis, they utilized reviews in English only. The numbers and emojis are removed because they contain no meaningful data and only textual data are considered. The textual data were mostly labeled using the VADER technique into positive, negative, and neutral tweets. The feature extraction is performed using BoW. To classify the sentiments, the authors employed SVM, NB, and DT classifiers. Out of these classifiers, SVM attained 78% accuracy. The authors did not adopt any deep transformers that learn complex representations of data and achieve poor results. Also, this study does not compare the classifiers with other methods to validate the results.

Yuhan & Huiping (2023) used aspect-level sentiment analysis of smartphone-related reviews using the context window self-attention (CWSA) model. On the Chinese dataset, the authors achieved an F1 score of 89.6%. They used a limited dataset for aspect-level analysis. Similarly, in Baydogan & Alatas (2022), the authors used NLP techniques to gather two tweet datasets related to hate speech detection. They employed BoW and TF-IDF, two important techniques, to extract features from the datasets. They used ten ML and DL models for sentiment classification. Results proved that recurrent neural network model performs best with both datasets, with an accuracy of 78% and 90%, respectively. In another study (Baydogan & Alatas, 2021a), the authors utilized a hate speech dataset for sentiment classification. They used three feature extraction techniques including BoW, TF-IDF, and Word2Vec, for extracting features. Ant lion optimizations (ALO) and moth flame optimization (MFO) methods were also utilized. Results indicate that ALO and MFO methods perform with 92% and 90% accuracy, respectively, compared to machine learning. Baydogan & Alatas (2021b) collected 12000 unlabeled tweets and performed sentiment analysis using NLP for preprocessing and labeling the tweets. A machine learning approach is employed for classification. A spider optimization algorithm was developed that performed best compared to machine learning models and obtained an 86% accuracy.

Proposed Methodology

The proposed methodology for performing sentiment analysis on smartphone brands is described in this section. Figure 1 shows the proposed workflow diagram for sentiment classification. First, the tweets dataset is extracted from Twitter for the top three smartphone brands.

Figure 1: Work flow diagram of the proposed approach.

Download full-size image

DOI: 10.7717/peerjcs.1432/fig-1

It is followed by the cleaning process where a range of preprocessing steps is used. Tweets are then classified as positive, negative, and neutral using the TextBlob technique. After this, important features are extracted from the cleaned text using the count vectorizer approach. Finally, data is split into train and test sets. In addition to the proposed BERT model, a number of machine learning and deep learning models are used for sentiment classification. The details of the steps involved in the workflow are briefly described in subsequent sections.

Dataset information

The dataset is extracted from Twitter for October 2022 to November 2022 using the tweepy SNS scrapper and the query “Apple phone, smartphone, Samsung smartphone, Xiaomi smartphone” is used. Using the search scrapper, 33,383 unstructured tweets containing punctuation, stopwords, uniform resource locator (URLs), tags, usernames, and emoticons are collected. After removing the punctuation, null and duplicate values from the tweets, a total of 32,420 tweets are used for experiments. The Twitter dataset includes the date, user name, location, and tweet text; a few sample tweets are given in Table 2.

Table 2:

Sample tweets from the collected dataset.

Date	User	Location	Tweets
2022-10-30	StrayTurtle	California, USA	Think Apple would every bring back the Blackberry design? I bet people would go for it. Physical keyboard on the bottom half of an iPhone for a flip-smartphone or the screen slides over the keyboard. Had to Google it, #iPhone #Apple @Apple
2022-10-28	pickles769899	Columbus, GA	@SamsungMobile This I the 4th time my phone has shut off to update. I just postponed it 1 hr ago! Really making me think of switching back to the IPhone. I turn my phone off every night and it could do it then. Not sure if it’s just me or happens to othe
2022-11-01	Abhi_Banarasi	India	Reading about the power of expected 200 MP on @SamsungMobile S23 Ultra and it’s quite impressive. Pretty excited for that. #s23ultraleaks

DOI: 10.7717/peerjcs.1432/table-2

Besides the collected dataset, an additional dataset is used for validation of the proposed approach. The second dataset is obtained from Kaggle and contains a total of 10K cryptocurrency-related tweets. The dataset contains tweets that were collected from January 1, 2021, to November 2022.

Text preprocessing

Preprocessing is a technique by which the unstructured data are transformed into a comprehensible and logical format (Vijayarani, Ilamathi & Nithya, 2015). Preprocessing is primarily used to improve the quality of text data by reducing its quantity so that the machine can identify important patterns.

A machine learning model will also have a higher level of accuracy since the machine can learn from the data more accurately. Initial processing of data is performed in preparation for further analysis or processing. Several procedures are followed to prepare data for models’ training. Following steps are performed in preprocessing, as shown in Table 3.

Table 3:

Text preprocessing steps used in this study.

Steps

Step 1 —Lowercase conversion

Due to the case-sensitivity of models, conversion to lowercase is essential. The model treats “SMARTPHONE” and “smartphone” as two separate words if conversion is not performed. Sample tweets before and after conversion are given here;

Before Lowercase Conversion

Sentence 1: Apple will remove physical Buttons from iPhone 15 Pro Smartphone

Sentence 2: Samsung Mobile, the new update was unnecessary. The notifications look bad. https://t.co/L6nzYpB8Oy

After Lowercase Conversion

Sentence 1: apple will remove physical buttons from iPhone 15 pro smartphone

Sentence 2: samsung mobile, the new update was unnecessary. the notifications look bad. https://t.co/L6nzYpB8Oy

Step 2 —Removal of numbers

Numerical values are removed to improve the model’s training and reduce computational complexity. Text data typically consists of quantitative values such as digits, which are of little relevance for decision-making processes. As a consequence, the numerical values provide a significant challenge to the algorithm when it attempts to extract features from the texts. In the majority of instances, the classification of data does not include the use of values including numbers (Anandarajan, Hill & Nolan, 2019). When dealing with textual data or reviews that are not concerned with numbers, it is needed to remove them. Sample data before and after numbers removal are given here;

Before number removal

Sentence 1: apple will remove physical buttons from iphone 15 pro smartphone

Sentence 2: samsung mobile, the new update was unnecessary. the notifications look bad. https://t.co/L6nzYpB8Oy

After number removal

Sentence 1: apple will remove physical buttons from iphone pro smartphone

Sentence 2: samsung mobile, the new update was unnecessary. the notifications look bad. https://t.co/L6nzYpB8Oy

Step 3 —Punctuation removal

The third step of data preprocessing is punctuation removal which aims to remove the punctuation from the data. Punctuations are removed from data because they do not contribute to the learning of a machine learning model. Also, it reduces the machine’s ability to differentiate between other characters and punctuation. The following samples illustrate the punctuation removal process;

Before Punctuation removal

Sentence 1: apple will remove physical buttons from iphone pro smartphone

Sentence 2: samsung mobile, the new update was unnecessary. the notifications look bad. https://t.co/L6nzYpB8Oy

After Punctuation removal

Sentence 1: apple will remove physical buttons from iphone pro smartphone

Sentence 2: samsung mobile the new update was unnecessary the notifications look bad https://t.co/L6nzYpB8Oy

Step 4 —Stopwords removal

Preprocessing involves deleting non-classifiable objects from the data set. Stop words clarify the meaning for humans, but for machine learning models they do not add any value and are thus removed (Pradana & Hayaty, 2019). Stopwords include ”is”, ”am”, ”I”, ”the”, ”to”, ”are”, ”that”, ”they,” etc.

Before Stopwords removal

Sentence 1: apple will remove physical buttons from iphone pro smartphone

Sentence 2: samsung mobile the new update was unnecessary the notifications look bad https://t.co/L6nzYpB8Oy

After Stopwords removal