Twitter Sentiment Analysis
Twitter Sentiment Analysis
Twitter Sentiment Analysis
net/publication/301408174
CITATIONS READS
14 2,009
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Aliza Sarlan on 03 November 2016.
Abstract—Social media have received more attention nowadays. • Sentiment Analysis of Web Based Applications Focus on
Public and private opinion about a wide variety of subjects are Single Tweet Only.
expressed and spread continually via numerous social media.
Twitter is one of the social media that is gaining popularity.
Twitter offers organizations a fast and effective way to analyze With the rapid growth of the World Wide Web, people are
customers’ perspectives toward the critical to success in the using social media such as Twitter which generates big
market place. Developing a program for sentiment analysis is an volumes of opinion texts in the form of tweets which is
approach to be used to computationally available for the sentiment analysis [3]. This translates to a
measure customers’ perceptions. This paper reports on the huge volume of information from a human viewpoint which
design of a sentiment analysis, extracting a vast amount of make it difficult to extract a sentences, read them, analyze
tweets. Prototyping is used in this development. Results classify tweet by tweet, summarize them and organize them into an
customers’ perspective via tweets into positive and negative, understandable format in a timely manner [3].
which is represented in a pie chart and html page. However, the
program has planned to develop on a web application system, but • Difficulty of Sentiment Analysis with inappropriate
due to limitation of Django which can be worked on a Linux English
server or LAMP, for further this approach need to be done.
Keywords-component; Twitter, sentiment, opinion mining,
social media, natural language processing Informal language refers to the use of colloquialisms and
slang in communication, employing the conventions of spoken
language [4] such as ‘would not’ and ‘wouldn’t’. Not all
I. INTRODUCTION systems are able to detect sentiment from use of informal
According to [1], millions of people are using social language and this could hanker the analysis and decision-
network sites to express their emotions, opinion and disclose making process.
about their daily lives. However, people write anything such Emoticons, are a pictorial representation of human facial
as social activities or any comment on products. Through the expressions [5], which in the absence of body language and
online communities provide an interactive forum where prosody serve to draw a receiver's attention to the tenor or
consumers inform and influence others.Moreover, social temper of a sender's nominal verbal communication,
media provides an opportunity for business that giving a improving and changing its interpretation [ 6 ] . For example,
platform to connect with their customers such as social media ☺ indicates a happy state of mind. Systems currently in place
to advertise or speak directly to customers for connecting with do not have sufficient data to allow them to draw feelings out
customer’s perspective of products and services. of the emoticons. As humans often turn to emoticons to
In contrast, consumers have all the power when it comes to properly express what they cannot put into words [6]. Not
what consumers want to see and how consumers respond. being able to analyze this puts the organization at a loss.
With this, the company’s success & failure is publicly shared Short-form is widely used even with short message service
and end up with word of mouth. However, the social network (SMS). The usage of short-form will be used more frequently
can change the behavior and decision making of consumers, on Twitter so as to help to minimize the characters used. This
for example, [2] mentions that 87% of internet users are is because Twitter has p ut a limit on its characters t o 1 4 0
influenced in their purchase and decision by customer’s [ 7 ] . F o r e x a m p l e , ‘Tba’ refers to be announced.
review. So that, if organization can catch up faster on what
their customer’s think, it would be more beneficial to organize B. Objective
to react on time and come up with a good strategy to compete The objectives of the study are first, to study the sentiment
their competitors. analysis in microblogging which in view to analyze feedback
from a customer of an organization’s product; and second, is
A. Problem Statement
to develop a program for customers’ review on a product
Despite the availability of software to extract data which allows an organization or individual to sentiment and
regarding a person’s sentiment on a specific product or analyzes a vast amount of tweets into a useful format.
service,organizations and other data workers still face issues
regarding the data extraction.
213
2014 International Conference on Information Technology and Multimedia (ICIMU), November 18 – 20, 2014, Putrajaya, Malaysia
Sentiment analysis refers to the general method to extract [19] showed some example of switch negation, negation
polarity and subjectivity from semantic orientation which simply to reverse the polarity of the lexicon: changing
refers to the strength of words and polarity text or phrases beautiful (+3) into not beautiful (-3). More examples:
[19]. There has two main approaches for extracting sentiment She is not terrific (6-5=1) but not terrible (-6+5=-1)
automatically which are the lexicon-based approach and either.
machine-learning-based approach [19-23]. In this case, the negation of a strongly negative or positive
value reflects a mixed perspective which is correctly captured
1. Lexicon-based Approach
in the shifted value. However, [21] has mentioned the
Lexicon-based methods make use of predefined list of limitation of machine-learning-based approach to be more
words where each word is associated with a specific sentiment suitable for Twitter than the lexical based method.
[21]. The lexicon methods vary according to the context in Furthermore, [20] stated that machine learning methods can
which they were created and involve calculating orientation generate a fixed number of the most regularly happening
for a document from the semantic orientation of texts or popular words which assigned an integer value on behalf of
phrases in the documents [19]. Besides, [24] also states that a the frequency of the word in the Twitter.
lexicon sentiment is to detect word-carrying opinion in the
F. Techniques of Sentiment Analysis
corpus and then to predict opinion expressed in the text. [20]
has shown the lexicon methods which have a basic paradigm The semantic concepts of entities extracted from tweets
which are: can be used to measure the overall correlation of a group of
entities with a given sentiment polarity [12]. Polarity refers to
i. Preprocess each tweet, post by remove punctuation
the most basic form, which is if a text or sentence is positive
ii. Initialize a total polarity score (s) equal 0 -> s=0
or negative [25]. However, sentiment analysis has techniques
iii. Check if token is present in a dictionary, then
in assigning polarity such as:
If token is positive, s will be positive (+)
1. Natural Language Processing (NLP)
If token is negative, s will be negative (-)
NLP techniques are based on machine learning and
iv. Look at the total polarity score of tweet post especially statistical learning which uses a general learning
algorithm combined with a large sample, a corpus, of data to
If s > threshold, tweet post as positive learn the rules [26]. Sentiment analysis has been handled as a
If s < threshold, tweet post as negative Natural Language Processing denoted NLP, at many levels of
However, [21] highlighted one advantage of leaning-based granularity. Starting from being a document level
method, is that it has the ability to adapt and create trained classification task [27], it has been handled at the sentence
models for specific purposes and contexts. In contrast, an level [28] and more recently at the phrase level [13]. NLP is a
availability of labeled data and hence the low applicability of field in computer science which involves making computers
the method of new data which is cause labeling data might be derive meaning from human language and input as a way of
costly or even prohibitive for some tasks [21]. interacting with the real world.
2. Machine-learning-based Approach 2. Case-Based Reasoning (CBR)
Machine learning methods often rely on supervised Case-Based Reasoning (CBR) is one of the techniques
classification approaches where sentiment detection is framed available to implement sentiment analysis. CBR is known by
as a binary which are positive and negative [24]. This recalling the past successfully solved problems and use the
approach requires labeled data to train classifiers [21]. This same solutions to solve the current closely related problems
approach, it becomes apparent that aspects of the local context [29]. [25] identified some of the advantages of using CBR that
of a word need to be taken into account such as negative (e.g. CBR does not require an explicit domain model and so
Not beautiful) and intensification (e.g. Very beautiful) [19]. elicitation becomes a task of gathering care histories and CBR
However, [20] showed a basic paradigm for create a feature system can learn by acquiring new knowledge as cases. This
vector is: and the application of database techniques make the
maintenance of large columns of information easier [25].
i. Apply a part of speech tagger to each tweet post
ii. Collect all the adjective for entire tweet posts 3. Artificial Neural Network (ANN)
iii. Make a popular word set composed of the top N
adjectives [13] mentioned that Artificial Neural Network (ANN) or
iv. Navigate all of the tweets in the experimental set to known as neural network is a mathematical technique that
create the following: interconnects group of artificial neurons. It will process
• Number of positive words information using the connections approach to computation.
• Number of negative words ANN is used in finding the relationship between input and
• Presence, absence or frequency of each word output or to find patterns in data[25].
4. Support Vector Machine(SVM)
214
2014 International Conference on Information Technology and Multimedia (ICIMU), November 18 – 20, 2014, Putrajaya, Malaysia
Support Vector Machine is to detect the sentiments of language independent, but uses a convention that is known to
tweets [23]. [10] together with [37] stated SVM is able to programmers of the C-family of languages, including Python
extract and analyze to obtain upto70%-81.3% of accuracy on and many others. However, output s size depends on the time
the test set. [29] collected training data from three different for retrieving tweets from Twitter.
Twitter sentiment detection websites which mainly use some
Nevertheless, the output will be categorized into 2 forms,
pre-built sentiment lexicons to label each tweet as positive or
which are encoded and un-encoded. According to security
negative. Using SVM trained from these noisy labeled data,
issue for accessing a data, some of the output will be shown in
they obtained 81.3% in sentiment classification accuracy.
an ID form such as string ID. Sentiment Analysis. The tweets
G. Application Programming Interface(API) will be assigned the value of each word, together with
categorize into positive and negative word, according to
Alchemy API performs better than the others in terms of
lexicon dictionary. The result will be shown in .txt, .csv and
the quality and the quantity of the extracted entities [14]. As
html.
time passed the PythonTwitter Application Programming
Interface (API) is created by collected tweets [30]. Python can B. Sentiment Analysis
automatically calculated frequency of messages being re-
Tweets from JSON file will be assigned the value of each
tweeted every 100 seconds, sorted the top 200 messages based
word by matching with the lexicon dictionary. As a limitation
on there-tweeting frequency, and stored them
of words in the lexicon dictionary which is not able to assign a
in the designated database [12]. As the Python Twitter API
value to every single word from tweets. However, as a
only included Twitter messages for the most recent six days,
scientific language of python, which is able to analyze a sense
collected the data needed to be stored in a different database
of each tweet into positive or negative for getting a result.
[14].
H. Python
C. Information Presented
Python was found by Guido Van Rossum in Natherland,
1989 which has been public in 1991[31]. Python is a The result will be shown in a pie chart which is
programming language that's available and solves a computer representing a percentage of positive, negative and null
problem which is providing a simple way to write out a sentiment hash tags. For null hash tag is representing the hash
solution [31]. [32] mentioned that Python can be called as a tags that were assigned zero value. However, this program is
scripting language. Moreover, [32] and [32] also supported able to list a top ten positive and negative hash tags.
that actually Python is a just description of language because it
can be one written and run on many platforms. In addition,
[34] mentioned that Python is a language that is great for
writing a prototype because Python is less time consuming and
working prototype provided, contrast with other programming
languages.
Many researchers have been saying that Python is efficient,
especially for a complex project, as [33] has mentioned that
Python is suitable to start up social networks or media
steaming projects which most always are a web-based which is
driving a big data. [34] gave the reason that because Python
can handle and manage the memory used. Besides Python
creates a generator that allows an iterative process of things,
one item at a time and allow program to grab source data one
item at a time to pass each through the full processing chain.
215
2014 International Conference on Information Technology and Multimedia (ICIMU), November 18 – 20, 2014, Putrajaya, Malaysia
Twitter sentiment analysis is developed to analyze [15] G. Kalia, “A Research Paper on Social Madia: An Innovative
Educational Too”, Vol.1, pp. 43-50, Chitkara University, 2013.
customers perspectives toward the critical to success in the
[16] Internet World Start, “Usage and Population Statistic”, Retrieved 10 15,
marketplace. The program is using a machine-based learning 2013 from: http://www.internetworldstats.com/stats.htm
approach which is more accurate for analyzing a sentiment; [17] A.M. Kaplan, and M, Haenlein, “Users of the world, unite! The
together with natural language processing techniques will be challenges and opportunities of Social Media,” France: Paris, 2010.
[18] Q. Tang, B. Gu, and A.B. Whinston, “Content Contribution in Social
used.
Media: The case of YouTube”, 2nd conference of social media. Hawaii:
As a result, program will be categorized sentiment into Maui, 2012.
[19] M.Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, “ Lexicon-
positive and negative, which is represented in a pie chart and Based Methods for Sentiment Analysis,” Association for Computational
html page Although, the program has been planned to be Linguistics, 2011.
developed as a web application, due to limitation of Django [20] M. Annett, and G. Kondrak, “A Comparison of Sentiment Analysis
which can only work on Linux server or LAMP. Thus, it Techniques: Polarizing Movie Blogs,” Conference on web search and
web data mining (WSDM). University of Alberia: Department of
cannot be realized. Therefore, further enhancement of this Computing Science, 2009.
element is recommended in future study. [21] P. Goncalves, F. Benevenuto, M. Araujo and M. Cha, “Comparing and
Combining Sentiment Analysis Methods”, 2013.
REFERENCES [22] E. Kouloumpis, T. Wilson, and J. Moore, “Twitter Sentiment
[1] M.Rambocas, and J. Gama, “MarketingResearch:TheRoleof Analysis:The Good the Bad and theOMG!”, (Vol.5). International
SentimentAnalysis”. The 5th SNA-KDD Workshop’11. Universityof Porto, AAAI, 2011.
2013. [23] S. Sharma, “Application of Support Vector Machines for Damage
[2] A. K. Jose, N. Bhatia, and S. Krishna, “TwitterSentimentAnalysis”. detection in Structure,” Journal of Machine Learning Research, 2008.
NationalInstituteof TechnologyCalicut,2010. [24] A.Sharma, and S. Dey, “Performance Investigation of Feature Selection
[3] P. Lai, “ExtractingStrongSentimentTrendfromTwitter”. Stanford Methods and Sentiment Lexicons for Sentiment Analysis,” Association
University, 2012. for the advancement of Artificial Intelligence, 2012.
[4] Y. Zhou, and Y. Fan, “ A Sociolinguistic Study of American Slang,” [25] J. Spencer and G. Uchyigit, “Sentiment or: Sentiment Analysis of
Theory and Practice in Language Studies, 3(12), 2209–2213, 2013. Twitter Data,” Second Joint Conference on Lexicon and Computational
doi:10.4304/tpls.3.12.2209-2213 Semantics. Brighton:University of Brighton, 2008.
[5] M. Comesaña, A. P.Soares, M.Perea, A.P. Piñeiro, I. Fraga, and A. [26] A. Blom and S. Thorsen, “Automatic Twitter replies with Python,”
Pinheiro, “ Author ’ s personal copy Computers in Human Behavior International conference “Dialog 2012”.
ERP correlates of masked affective priming with emoticons,” Computers [27] B. Pang, and L. Lee, “Opinion mining and sentiment analysis,” 2nd
in Human Behavior, 29, 588–595, 2013. workshop on making sense of Microposts. Ithaca: Cornell University.
[6] A.H.Huang, D.C. Yen, & X. Zhang, “Exploring the effects of Vol.2(1), 2008.
emoticons,” Information & Management, 45(7), 466–473, 2008. [28] M. Hu, and B. Liu, “Mining and summarizing customer reviews,” 2004.
[7] D. Boyd, S. Golder, & G. Lotan, “Tweet, tweet, retweet: Conversational [29] P. Nakov, Z. Kozareva, A. Ritter, S. Rosenthal, V. Stoyanov, T. Wilson,
aspects of retweeting on twitter,” System Sciences (HICSS), 2010 …. Sem Eval-2013 Task2:Sentiment AnalysisinTwitter (Vol.2,pp. 312-320
Retrieved from ,2013.
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5428313 [29] J. Wu, J., Wang, & L. Liu, “Kernel-Based Method for Automated
[8] T. Carpenter, and T. Way, “Tracking Sentiment Analysis through Walking Patterns Recognition Using Klnematics Data”. 5th Workshop on
Twitter,”. ACM computer survey. Villanova:VillanovaUniversity, 2010. Natural Language Processing. China: Xi’an Jiaotong University. 2006.
[9] D. Osimo, and F. Mureddu, “Research Challenge on Opinion Mining [30] T. D. Smedt, and W. Daelemans, “Pattern for Python,” Proceeding of
and Sentiment Analysis,” Proceeding of the 12th conference of Fruct COLING. Belgium: University of Antwerp, 2012.
association, 2010, United Kingdom. [31] A. Sweigart, “Invent your own computer games with Python. 2nd
[10] A. Pak,and P. Paroubek, “Twitter as a Corpus for Sentiment Analysis edition, 2012. Retrieved from http://inventwithpython.com/
and Opinion Mining,” Special Issue of International Journal of Computer [32] C. Seberino, “Python. Faster and easier software development,” Annual
Application, France:Universitede Paris-Sud, 2010. Conference. California: San Diego, 2012.
[11] S.Lohmann, M. Burch, H. Schmauder and D. Weiskopf, “Visual [33] A.Lukaszewski, “MySQL for Python. Integrate the flexibility of Python
Analysis of Microblog Content Using Time-Varying Co-occurrence and the power of MySQL to boost the productivity of your applications,”
Highlighting in Tag Clouds,” Annual conference of VISVISUS. UK: Birningham. Packt Publishing Ltd, 2010.
Germany: University of Stuttgart, 2012. [34] V. Nareyko, “Why python is perfect for startups,” Retrieved 01 10, 2014
[12] H. Saif, Y.He, and H. Alani, “SemanticSentimentAnalysisof Twitter,” from: http://opensource.com/business/13/12/why-python-perfect-startups
Proceeding of the Workshop on Information Extraction and Entity [35] A. Hawkins, “There is more to becoming a thought leader than giving
Analytics on Social Media Data. United Kingdom: Knowledge Media yourself the title”. Retrieved 10 18, 2013. from:
Institute, 2011. http://www.thesocialmediashow.co.uk/author/admin/
[13] A. Agarwal, B. Xie, I. Vovsha, O. Rambow, and R.Passonneau, [36] R. Prabowo, and M. Thelwall, “Sentiment Analysis:A Combined
“Sentiment Analysis of Twitter Data,” Annual International Approach,” International World Wide Web Conference Committee
Conferences. New York:Columbia University, 2012. (IW3C2), 2009. UnitedKingdom:Universityof Wolverhamption.
[14] J. Zhang, Y. Qu, J. Cody and Y. Wu, “ A case study of Microblogging in [37] H. Saif, Y. He and H. Alani, “Alleviating Data Scarcity for Twitter
the Enterprise: Use, Value, and Related Issues,” Proceeding of the Sentiment Analysis”. Association for Computational Linguistics, 2012.
workshop on Web 2.0., 2010.
216