Nothing Special   »   [go: up one dir, main page]

0% found this document useful (0 votes)
29 views9 pages

Mehmood, Pall, Khan (2014)

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/287566528

A Study of Sentiment and Trend Analysis Techniques for Social Media Content

Article  in  International Journal of Modern Education and Computer Science · December 2014


DOI: 10.5815/ijmecs.2014.12.07

CITATIONS READS

6 65

3 authors, including:

Abdul Sattar
Universiti Teknologi PETRONAS
1 PUBLICATION   6 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Abdul Sattar on 05 May 2020.

The user has requested enhancement of the downloaded file.


I.J. Modern Education and Computer Science, 2014, 12, 47-54
Published Online December 2014 in MECS (http://www.mecs-press.org/)
DOI: 10.5815/ijmecs.2014.12.07

A Study of Sentiment and Trend Analysis


Techniques for Social Media Content
Asad Mehmood
Shaheed Zulfikar Ali Bhutto Institute of Science and Technology, Islamabad, Pakistan
Email: asadmahmood16@hotmail.com

Abdul S. Palli and M.N.A. Khan


Shaheed Zulfikar Ali Bhutto Institute of Science and Technology, Islamabad, Pakistan
Email: abdulsattarpalli@gmail.com, mnak2010@gmail.com

Abstract—The social media networks have evolved status, video, link, image and music; and this information
rapidly and people frequently use these services to is very useful for trend analysis.
communicate with others and express themselves by There is a verity of events discussed on social media
sharing opinions, views, ideas etc. on different topics. (such as Twitter, Facebook) which can be grouped into
The social media trend analysis is generally carried out two categories: planned events and unplanned events.
by sifting the corresponding or interlinked events Planned events include general elections in a country,
discussed on social media websites such as Twitter, music concert, educational or employment workshops,
Facebook etc. The fundamental objective behind such sports tournament etc. and unplanned events pertain to
analyses is to determine the level of criticality with unanticipated, out of the blue and sudden incident such as
respect to criticism or appreciation described in the earthquakes, hurricanes, bomb blasts, spot-fixing etc.
comments, tweets or blogs. The trend analysis techniques Sakaki et al. [1] divide these events in two categories –
can also be systematically exploited for opinion making social events such as large parties, sports events,
among the masses at large. The results of such analyses exhibitions, accidents and political campaigns, and
show how people think, assess, orate and opine about natural events such as storms, tornadoes and earthquakes.
different issues. This paper primarily focuses on the trend Twitter is one of the evolving social media which, on
detection and sentiment analysis techniques and their average, hosts around 200 million tweets per day. It is an
efficacy in the contextual information. We further discuss evolving social media platform that acts as
these techniques which are used to analyze the sentiments communication source where the real-time information
expressed within a particular sentence, paragraph or from users becomes available instantly. People express
document etc. The analysis based on sentiments can pave their views/comments about any event of interest and
way for automatic trend analysis, topic recognition and share the latest information about the particular
opinion mining etc. Furthermore, we can fairly estimate event/incident. This can be quite useful for creating
the degree of positivity and negativity of the opinions and awareness and getting solution to the problem as well as
sentiments based on the content obtained from a ascertaining general public‘s trend on that issue. Tweets
particular social media. can correspond to innumerable products, services, social
issues, news, incidents and reviews etc. Further, people
Index Terms—Trend Analysis, Sentiment Analysis, also comment and share views about tweets pertaining to
Social Media Analysis, Semantic Web, Opinion Mining. various topics/issues. Different organizations rely on such
information to analyze and evaluate the customers‘ and
consumers‘ views about their products or services.
Twitter share tweets in a unique way, which cannot out
I. INTRODUCTION
rightly be used to judge the essence of the tweets and
Social media has evolved into a vibrant platform where topics being discussed on social media. For example, TV
people communicate freely with each other, share ideas channels use tweets as a major source of feedback about
and comment on various events and issues. Twitter is one their programs and talk shows. Although a single tweet
of the evolving social media which, on average, hosts consists of maximum 140 characters, but this information
around 200 million tweets per day. Tweets generally becomes reasonably huge due to numerous tweets being
correspond to infinite number of products, services, social made by thousands of users on a single issue in a shorter
issues, news, incidents and reviews etc. Further, people span of time. Eventually, analyzing such a large amount
also comment and share views about tweets pertaining to of data which is totally in textual form becomes a
various topics/issues. Twitter share these tweets in a colossal task. According to Pohl et al. [2], social media
unique way, which cannot be out rightly used to judge the platform can be used to manage crisis by (i) sharing
essence of the tweets and topics being discussed on social useful information about a particular event or disaster
media. Facebook supports five different types of post before actually it happened so that people get awareness,

Copyright © 2014 MECS I.J. Modern Education and Computer Science, 2014, 12, 47-54
48 A Study of Sentiment and Trend Analysis Techniques for Social Media Content

(ii) to control effects of the event if it had happened, and medium, high which will mainly be linked to the critical
(iii) get out from the disastrous situation. comments/tweets. Sentiments are evaluated and extracted
Facebook is one the most popular social media from the social media content, which can either be in
network that started evolving in 2004. People use positive or negative attitude. The positive attitude of a
Facebook for several purposes — such as posting reviews person can be conceived as being happy or pleased with
about products/services on dedicated pages; vote on the content expressed by someone on certain issue. On
different kinds of polls launched by users etc. Facebook the other hand, negative expressions can pertain to being
offers users three types of actions (―like‖, ―comments‖ unhappy or angry with the content posted by someone on
and ―share‖) against a post to express themselves. The certain issue.
number of ―likes‖ for a post shows how many people Trend analysis in a traditional sense can be defined as
viewed and liked it as a positive gesture whereas some the frequently mentioned topics throughout the stream of
user comment to express likes and dislikes about the post. user activity [5]. Hence, for generating an effective trend
Some people use posts frequently to update their status to out of the social media content, the need for an automated
share their mood, thoughts, activities, criticism, likes, classifier becomes necessary to reduce the time for
dislike about any related event or situation which is analyzing the large amount of data and improving
associated with them. This medium not only serves as a efficiency of the analysis process.
source of expressing for large number of users, but has Sentiment analysis basically tries to judge different
also become a business for some people. aspects of natural language which help people to find
The fad of microblogging is becoming popular among valuable information from large amount of unstructured
the Internet users. Microblogging is broadcast medium data [6]. It is an emerging concept in which different
and is an underlying form of blogs. Microblog also human emotions are determined from textual content. It
known as micropost, has smaller size and is quite enables us to extract opinions and sentimental feelings of
different from the traditional blogs. The microblogs are the people. To know people‘s opinion about a particular
usually a miniature form of the actual content like short event and its future impact (commonly termed as social
sentences, images or video links [3, 4]. Millions of people media trends), there is a need for an automated system
across the world use this medium to express their views that can analyze such a huge amount of data and produce
on different events and daily-life routine matter. Though desired results with certain level of accuracy so that such
this trend has emerged recently yet a number of results can be made acceptable by the masses. On Internet,
approaches related to sentiment analysis of microblogs people use blog posts and forums for promoting products
have been devised and explored. or services as well as discussing any topic and expressing
Social media such as Facebook, Twitter etc. is one of their views. The sentiment analysis on this platform
the fast evolving phenomena for sentiment analysis to possesses very important information for security
know how people think about a particular event. In the analysts to keep an eye on the activities of miscreants and
present day technology driven culture, we can get terrorists etc. However, it becomes a serious challenge to
opinions from different polls and advertisements placed perform such types of analysis on a big data. Numerous
on blogs and social media sources. In general, human events take place regularly in our daily life, therefore, it is
beings have natural instinct to share information or give not possible to manually analyze every event and predict
feedback about the product or items they purchase in their its future impact. It is really hard for the computing
daily lives. And this very trait of sharing information has machines to automatically extract the meaning and tone
now moved on to social media sites like Twitter, of content as people express so many things in many
Facebook, Linkedin and other microblogging sites. By different ways and styles etc. Sentiment analyses can
this means, these sources are becoming very useful in prove very useful when we analyze search engine results,
identifying and analyzing diverse opinions on different different blogs, social networks, web forums, different
topics and areas. Twitter is one of the important sources review of people on books, movies, sport and products
for getting opinion from microblogging data available in etc [7]. This can help reduce the efforts required to go
different languages. Such data can be obtained through through large amount of documents to generate an
the Twitter‘s ―Tweet Entities‖ using various applications. opinion about the nature of the contents.
Two types of analyses, trend analysis and sentimental
analysis, can be highly beneficial to determine how
people think and get emotional on certain social, religious
II. LITERATURE REVIEW
or political issues. One reason for trend analysis can be to
detect an emergent or suspicious behavior happening on Topic identification on social media helps understand
the social media platform. For example, trend analysis what is being discussed and it also helps users to grab the
can be used to see how certain groups of people are using broader picture without reading all the available
it to launch their propaganda or forging facts about information. Using network topology in trend deduction
certain political or religious issues. Corporate sector can method can distinguish ―viral topics‖ (topic which
also use it to get feedback about their products. spreads via peer influence) from the shared information
Social media trend can be broadly categorized into four topics (spreads via news media). In the context of social
types namely positive, negative, neutral and uninterested. media, Twitter is ranked as the second most famous
The level of trend can further be classified into low, social network [8]. The tweets made on it can be used to

Copyright © 2014 MECS I.J. Modern Education and Computer Science, 2014, 12, 47-54
A Study of Sentiment and Trend Analysis Techniques for Social Media Content 49

predict the future impact of different events/issues by algorithm uses search API to get the time and location of
applying data mining techniques. The authors designed a a tweet and the same is automatically attached with the
tool that performs trend analysis for social celebrities to tweet when it is posted via iPhone or phone that has GPS
find most influential among them as well as tend analysis system. The other alternate it uses for finding event
of national and international issues, and recent events. location is to get the registered location of the user
The system proposed by Lin et al. [8] has two layers: data through Kalman filter or particle filtering algorithms.
processing layer (for data collection and applying data Achrekar et al. [14] designed a system to predict the
mining techniques for performing trend analysis) and ratio of flu disease in USA for a specific time period
information display layer (for representing or using tweets. For data collection, the authors developed a
visualization the results). After collecting the required crawler using the Twitter real time search API to retrieve
data which mostly pertains to message properties, it uses tweets which match the keywords and to collect patient
Term Frequency- Inverse Document Frequency (TF- details such as name, age, location etc. from their profiles
IDF)and fixed keywords to analyze the tweets. The mined in order to identify the affected area and the number of
results are presented to the user in four sections: top news expected patients in that area.
section, trending topics section, active users section and In order to evaluate large volume of Twitter data, Hao
top sources section. et al. [15] introduced three techniques which were mainly
Huberman et al. [9] used tweets to predict revenue based on visual sentiment analysis. The suggested
generated fora newly released movie. The proposed technique include topic-based analysis that involves
approach entails selecting and analyzing tweets, before natural language processing to determine nature of the
and after the release of a movie, that match the specific topic of discussion by extracting different opinion-related
keywords taken from title of the movie. Tumasjan et al. attributes to measure the degree of sentiments. Then a
[10] suggest that Twitter can be used as a platform for stream analysis is performed on large number of tweets to
political discussion, and tweets can be used to detect extract information of interest based on positive/negative
election results. In this regards, the party with highest rate attitude and the influencing characteristics that it
of tweets in its favor has fair chance of winning the possesses within the larger density of tweets. Pixel cell-
elections. Wegrzyn-Wolska et al. [11] designed a based sentiment calendars and high density geomaps are
sentimental analysis system based on tweets for French used to visualize and depict large number of tweets in a
Presidential Election held in 2012. The aim of study was single view.
to correlate what is being discussed at Twitter-sphere. By Since Twitter data is one of the important sources of
using REST API, the authors collected specific tweets microblogging platform, so it has been used for sentiment
matching the user supplied keywords. The system detects analysis and Pak et al. [16] used this corpus for sentiment
trend by calculating both the frequency of a searched analysis and opinion mining. The corpus containing
keyword in the dataset and its sentiments on the basis of emotions like happy smiley ―:)‖ or sad smiley ―:(‖ are
positive, negative or neutral comments made against a readily evaluated as positive or negative sentiments
post. respectively. After corpus collection, it is analyzed to
Asur et al. [12] argue that the topic being discussed check how data has been distributed into subjective
quite often on twitter during certain timeframe becomes corpus (containing positive or negative set) and objective
the trend. Re-tweets on the same topic from multiple corpus (containing neutral set). The authors calculated the
users such as news from various media sources can also presence of n-gram for extracting binary feature and
be the reason of setting trend of the topic. If multiple keyword frequency was used to obtain rest of the general
topics being discussed on Twitter among the different information. The analytics reported by the authors
user groups are divided according to the region, then it showed that objective sets contained more common and
results into multiple trends in tandem. The frequently proper nouns which in turn have often used personal
discussed topic becomes the principal trend in the trends pronouns. Similarly, the objective sets bloggers addressed
list. Likewise, Asur et al. [13] compared SinaWeibo themselves as third person while the subjective sets
(Chinese social media) with Twitter and found bloggers described themselves as first person or second
differences in trends. On SinaWeibo, users share jokes, person. Lima et al. [17] propose a classifier for Twitter
images, videos, and re-tweet most of the time. But on messages that comprises three modules: Support
twitter people mostly amplify the news which they obtain Counting, Database Selection and Classification modules.
from other media sources. Support counting module counts percentage of the tweets
Sakaki et al. [1] propose earthquake detection and that contain at least one word or emotion in the tweet.
reporting system which sends email alerts to the Database Selection module divides data into two sets:
registered users when it gets any tweet originated from training set and testing set. Classification module
Japan only about an earthquake. In the proposed system, classifies data using Naïve Bayes algorithm.
all the Twitter users are considered as a sensor because Cvijikj and Michahelles [18] figured out discovering
they send sensory information. After every second, the trends on Facebook using features of shared posts. They
system searches tweets which match the given keywords categoried topic of interest into three groups: descriptive
and applies semantic analysis to get accurate results. To events, popular events and daily routines. For this
know the location or area where that event has occurred, purpose, Graph API search feature is used to find the
the system uses event detection algorithm. The proposed posts against the specific keyword after every 10 minutes.

Copyright © 2014 MECS I.J. Modern Education and Computer Science, 2014, 12, 47-54
50 A Study of Sentiment and Trend Analysis Techniques for Social Media Content

Trend deduction is a two pronged strategy which involves proximity types. In Proximity Patterns, it described
topic identification and cluster detection. Only the polarity of words which were used in the document.
‗status‘ attribute used with Facebook posts was used to Corley et al. [21] proposed a technique for finding Flu
identify trend. The methodology uses TF-IDF for cases discussed over blogs and finding the relationship
assigning weights to the terms on the basis of two quanta: among the outcome of blog posts and data reported by
frequency of occurrence of a term within a single Center for Disease Control and Prevention. Python script
document and the number of documents in the corpus with combination of pyMPI was used for data extraction.
which contain the given term. Finally, the trend analysis The pyMPI is software that integrates the Message
was done using LingPipe API. Li et al. [19] applied Passing Interface into the Python interpreter". Suzumura
clustering-based sentiment analysis approach on dataset et al. [22] developed a system for processing large
which was based on the review of a movie. The dataset amount of data on-the-fly by using web services
was divided into two primary clusters corresponding to including Twitter Search Service and Twitter Streaming
the positive and negative user comments and remarks API, and then displayed the tweets on Google Maps using
about that movie. For experimental purpose, the Google Maps API and Ajax components.
technique of TF-IDF (Term Frequency – Inverse Karamibekr et al. [23] proposed an approach to
Document Frequency) was applied. The stability of classify the sentiments using verbs from the sentence.
results was increased when the voting mechanism was The approach uses the verbs as core opinion terms in
used, and finally a symbolic technique was applied to social domains on the pretext that verb is considered as
enhance the clustering results. Symbolic technique, as the main opinion term. The authors used lexical
described by Li et al. [19], is the technique in which each knowledge to extract the sentiment terms present in the
term (or word) is assigned some value based on the sentence. With the help of bootstrapping process, authors
negative and positive connotation as well asthe collect the list of opinion verbs using an English
intensity/criticality of that term (or word). Then, an dictionary and its synonyms in the WordNet [24]. A
aggregation functions is used to obtain cumulative word subject in the sentence describes action of the doer or
scores to draw conclusions about the overall negativity or what the predicate does, whereas the object in sentence is
positivity of the comments. what or whom the verb is acting. Cai et al. [25] proposed
Goorha et al. [20] used entity extraction system for an approach to create an effective sentiment based
extracting more relevant and most occurred phrases form taxonomy that employs statistical based approach for the
tweets, blogs and newspaper articles to find out people's sentiment analysis. The sentiments expressed by the
opinion about a product or company. The proposed words, are measured on the scale of positive or negative.
system used cluster streaming to identify related terms Then, the list of positive and negative words is created by
and used IF-IDF for assigning and calculating weights of using two external NLP resources. In order to score the
terms which were useful for making cluster. Pohl et al. [2] relative sentiment between the posts which have the
proposed a system to detect sub-events related to a crisis positive and negative words, the author characterize the
situation based on YouTube and Flickr data such as degree of positive/negative sentiment that each word
picture or video. Video or picture data item has two parts: conveys. Koncz et al. [26] proposed a feature selection
the coordinates and the terms. For extraction of the approach that used frequencies of terms in a particular
coordinates, it uses two-phase clustering approach which document. The values are normalized in terms of total
is based on longitude and latitude; and for the terms number of documents in category.
extraction, it uses textual metadata fields of a particular Mizumoto et al. [27] proposed the system in which the
data item and calculates TF/IDF for clustering. To present author created polarity dictionary to determine the
results visually, it uses OpenStreetMap which is capable sentimental polarities of stock market. While constructing
to display cluster data and its location. the dictionary, the author has used the semi-supervised
A proximity-based sentiment analysis is proposed by learning approach from which small polarity dictionary is
Hasan et al. [7] that uses features based on word made. Using the co-occurrence frequency with words in
proximities within a sentence. The authors used three polarity dictionary, those new words are added to the
proximity-based features which are called proximity dictionary whose polarities are unknown. For estimating
distribution, mutual information between proximity types, the polarity of the text, the author has used sentiment
and proximity patterns. The dataset was divided into analysis method. The polarity of article is determined
number of segments in which each segment contained according to the frequency of words in the polarity
over 100 words. The distance between positive and diction; hence, the articles are determined as positive,
negative pair of word was calculated. Three proximity- negative or neutral. In [28], the authors introduced text
based features were used. In Proximity Distributions, sentiment classification for the contextual information.
different numbers of bin were considered which returned For this purpose, the flow of the sentiments and keywords
the distribution of pair-wise distances from the proximity in the paragraph were taken out from the contextual
models. In Mutual Information between Proximity Types, information. Finally, by computing the contextual
the relationship between the proximity types was used to information degree (linearly combined weighted sum of
determine the polarity of the document. Then, the contextual information), the overall sentiment of sentence
theoretic quantities of entropy for each sequence, was was classified.
used to get mutual information between pairs of Iqbal et al. [30, 31] proposed performance metrics for

Copyright © 2014 MECS I.J. Modern Education and Computer Science, 2014, 12, 47-54
A Study of Sentiment and Trend Analysis Techniques for Social Media Content 51

software design and software project management. [12] analyzed twitter data to know the reasons which are
Process improvement methodologies are elaborated in important for different stages of trend by keywords
[32, 43] and Khan et al. [33] carried out quality assurance matching. There should me some mechanism for
assessment. Amir et al. [34] discussed agile software matching misspelled words, and sentimental analysis of
development processes. Rehman et al. [35] and Khan et al. tweets for better results. Yu et al. [13] observed trends
[44] analyzed issues pertaining to requirement over Chines social media and compared with Twitter
engineering processes. Umar and Khan [36, 37] analyzed trends. Research is more focused towards analyzing new
non-functional requirements for software maintainability. tweets and retweets over SinaWeibo (Chinese social
Khan et al. [38, 39] proposed a machine learning media). Trends can also be found through location as well.
approaches for post-event timeline reconstruction. Khan Sakaki et al. [1] used Kalman filter (a particle filtering
[40] suggests that Bayesian techniques are more algorithms) for semantic analysis of users tweets that
promising than other conventional machine learning belongs to Japan and performed semantic analysis to
techniques for timeline reconstruction. Rafique and Khan generate email alerts about real time event such as
[41] explored various methods, practices and tools being earthquake. There is a very less chance of getting GPS
used for static and live digital forensics. In [42], Bashir data (location of tweet) from tweet because every user
and Khan discuss triaging methodologies being used for could not have i-Phone to tweet. User may be able to
live digital forensic analysis. tweet from any location other that the registered location.
Achrekar et al. [14] Used auto-regression with
exogenous inputs (ARX) model to design a system to
predict the ratio of flu disease in USA using tweets of a
III. CRITICAL ANALYSIS
specific time period. Pohl et al. [20] used streaming
Budak [5] Used Independent Trend Formation Model clustering algorithm and TF-IDF to design a system
(ITFM) and nearest neighbor model to identify structural which tells the public opinion about any product or
trends and compared them with traditional trends by company by extracting more relevant and most occurred
analyzing tweets. However, it does not defined structural phrases form tweets, blogs and newspaper‘s articles, and
trends in generalized form so one cannot identify gap displays result in a user interface by plotting the clusters.
among the discussed trend types. Lin et al. [8] used It decides popularity of an entity (product or company)
natural language processing, semantic analysis, and TF- based on the number of tweets which discuss that
IDF to Analyze tweets of celebrities for finding trend on particular entity. There is no any mechanism used for
Twitter. Same approach can also be used to analyze semantic analysis to decide the positivity and negativity
tweets of common people or public to figure out public of tweets about the entity. There is a great chance that
trend on national and international issues and events. tweets might be about the closing down of a company or
Cvijikj [18] used linguistic analysis, TF-IDF, clustering product. Pohl et al. [2] proposed a system to detect sub-
by distribution, and clustering by co-occurrence to events in crisis situation over YouTube and Flickr data
identify the topic of discussion by analyzing the ‗status‘ such as picture or video using two-phase clustering and
posts over Facebook. Nevertheless, selecting only the TF-IDF approach. However, the approach lacks
‗status‘ posts does not provide such a dataset to analyze mechanism for filtering the duplication of data. Suzumura
trends. One should collect other types also such as posted et al. [22] designed the system architecture for processing
image or audio/video and users comments on that post. the large amount of data on the fly rather than store and
Asur et al. [9] used linguistic analysis and sentiment process using probabilistic models (such as Temporal
analysis on tweets which matches the given keywords to Model, Spatial Model). The authors developed a system
analyze and predicts the revenue of a movie. The author for predicting real time events based on their designed
selects tweets by matching the keywords that are present architecture. But, such techniques used for finding
in only the title of a movie. User may include director, location are unable to find the specific location of the
producer, actor/actress, or character titles in their tweets, posted massages. Corley et al. [21] proposed a technique
but there is a possibility that that those tweet will be left for finding Flu cases discussed over blogs and to find
over. relationship among the outcome of blog posts and data
Tumasjan et al. [10] used linguistic analysis and reported by Center for Disease Control and Prevention. It
sentiment analysis techniques to predict the election filters only English language words from blog and
results by using twitter as a platform for political compares them with the given keywords. No procedure
discussion as well as a data source for finding people's was defined for matching any misspelled or informal
opinion about a particular political party. Dataset contains short words.
only those tweets which contain names of political parties In [15], the authors primarily focused on visual opinion
and names of some well-known politicians. However, the analysis and sentiment analysis. The technique visualized
tweets which miss the parties name can also be very large number of Tweets to a single view that depicts the
useful for this analysis, since people may use polling overall sentiments. This approach does not help identify
symbols instead of a party name or people may use opinion association. The major focus area of research in
slogans of the parties in their tweets. Furthermore, the [16] was sentiment analysis in which the proposed
selected tweets only come from the same group of users technique automatically collected corpus to train a
instead of variety of other social media content. Asur et al. sentiment classifier. Syntactic structures are useful in

Copyright © 2014 MECS I.J. Modern Education and Computer Science, 2014, 12, 47-54
52 A Study of Sentiment and Trend Analysis Techniques for Social Media Content

describing emotions in the Twitter content. The approach predict future trends over Facebook by collecting all type
needs to develop multilingual version of such technique of public posts and related user comments and applying
or to include an auto-translator with the proposed sentiment analysis. We also intend to propose a model
technique. Li and Liu [19] focused on sentiment analysis that is capable to analyze different events being discussed
and clustering. The cluster based technique was applied on social media along with detecting trends to make
which had more accurate results as compared to future prediction about the outcome of such events and
techniques that involve human intervention. The dataset issues. Since, most of the content available on Facebook
used for evaluation was small which would otherwise and Twitter are unstructured text, therefore, we have
have produced more accurate results. Sentiment Analysis planned to develop the system which can automatically
was major focus in the research and technique proposed analyze sentiments from the available content and verify
by Lima and deCastro [17]. The proposed technique the opinions that are expressed in the contextual
automatically trained the classifier based on Naïve Bayes information.
algorithm to categorize datasets. However, the
combination of different proposed techniques could have REFERENCES
produced accurate results while using the neural
[1] Sakaki, T., Okazaki, M., & Matsuo, Y. (2010).
classification. Earthquake shakes Twitter users: real-time event detection
The techniques proposed in [28] are based on by social sensors. In Proceedings of the 19th international
sentiment analysis and security informatics which conference on World Wide Web (pp. 851-860). ACM.
estimate sentiments that were expressed in content. It [2] Pohl, D., Bouchachia, A., &Hellwagner, H. (2012).
does not require labeling and collecting the document. It Automatic Identification of Crisis-Related Sub-Events
is a semi-supervised sentiment estimation technique using Clustering. In Machine Learning and Applications
which lacks multi-lingual sentiment analysis as well as (ICMLA), 2012 11th International Conference on (Vol. 2,
the criteria/procedure that indicates when to smooth the pp. 333-338). IEEE.
[3] Kaplan, A. M., & Haenlein, M. (2011). The early bird
polarity estimates. The major area of focus of the
catches the news: Nine things you should know about
techniques proposed by Hasan and Adjeroh [7] is based micro-blogging. Business Horizons, 54(2), 105-113.
on sentiment analysis and text mining. The technique [4] Lohmann, S., Burch, M., Schmauder, H., & Weiskopf, D.
used proximity-based sentiment analysis. The approach (2012, May). Visual analysis of microblog content using
depends on the polarity of dictionary that was created time-varying co-occurrence highlighting in tag clouds. In
from the corpus. Again, the major area of focus in [26] Proceedings of the International Working Conference on
was Sentiment Analysis. The approach used the feature Advanced Visual Interfaces (pp. 753-756). ACM.
selection in comparison with Information Gain (IG) [5] Budak, C., Agrawal, D., & El Abbadi, A. (2011).
feature selection. The approach showed slightly poor Structural trend analysis for online social networks.
Proceedings of the VLDB Endowment, 4(10), 646-656.
results than the Information Gain (IG).
[6] Bloom, K., Garg, N., & Argamon, S. (2007). Extracting
Mizumoto et al. [27] used semi-supervised learning for appraisal expressions.HLT-NAACL 2007, 308-315.
sentiment analysis. The technique created the polarity [7] Hasan, S. S., & Adjeroh, D. A. (2011). Proximity-based
dictionary to estimate the polarity of stock market sentiment analysis. In Applications of Digital Information
contextual information. This technique does not deal with and Web Technologies (ICADIWT), 2011 Fourth
the negation and adversative conjunction. The major International Conference on the (pp. 106-111). IEEE.
research focus in [23] was on sentiment analysis and [8] Lin, Y. C., Yang, P. C., Hsieh, W. T., & Seng-cho, T. C.
opinion structure. The technique used verb oriented Technology Trend Analysis Tool using Twitter as a
sentiment classification approach for social domains. The Source.
[9] Asur, S., &Huberman, B. A. (2010). Predicting the future
technique did not use main verbs of the sentences to
with social media. In Web Intelligence and Intelligent
classify the sentiments related to sentence. Cai et al. [25] Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM
carried out sentiment analysis through Topic Detection. International Conference on (Vol. 1, pp. 492-499). IEEE.
The technique for sentiment analysis used sentiment [10] Tumasjan, A., Sprenger, T. O., Sandner, P. G., &Welpe, I.
classification and sentiment detection scheme. The M. (2010). Predicting elections with twitter: What 140
technique does not use part-of-speech and word syntactic characters reveal about political sentiment. In Proceedings
relationships doing sentiment analysis and topic detection. of the fourth international AAAI conference on weblogs
and social media (pp. 178-185).
[11] Wegrzyn-Wolska, K., & Bougueroua, L. (2012). Tweets
mining for French Presidential Election. In Computational
IV. CONCLUSION AND FUTURE WORK
Aspects of Social Networks (CASoN), 2012 Fourth
In this study, we looked into different techniques used International Conference on (pp. 138-143). IEEE.
for trend analysis on social media such as Facebook and [12] Asur, S., Huberman, B. A., Szabo, G., & Wang, C. (2011).
Twitter. We also studied that how data which is available Trends in social media: Persistence and decay. In 5th
International AAAI Conference on Weblogs and Social
on social media can be used in different ways to analyze Media.
and predict future trends. We observed that there is not [13] Yu, L., Asur, S., &Huberman, B. A. (2011). What trends
any flexible system that has data dictionary with more in chinese social media. arXiv preprint arXiv:1107.3522.
appropriate keywords for predicting trends over [14] Achrekar, H., Gandhe, A., Lazarus, R., Yu, S. H., & Liu,
Facebook. As most of the work is done using Twitter as a B. (2011). Predicting flu trends using twitter data. In
source, we have plan to design and develop a system to Computer Communications Workshops (INFOCOM

Copyright © 2014 MECS I.J. Modern Education and Computer Science, 2014, 12, 47-54
A Study of Sentiment and Trend Analysis Techniques for Social Media Content 53

WKSHPS), 2011 IEEE Conference on (pp. 702-707). [30] Iqbal S., Khalid M., Khan, M N A. A Distinctive Suite of
IEEE. Performance Metrics for Software Design. International
[15] Hao, M., Rohrdantz, C., Janetzko, H., Dayal, U., Keim, D. Journal of Software Engineering & Its Applications, 7(5),
A., Haug, L., & Hsu, M. C. (2011, October). Visual (2013).
sentiment analysis on twitter data streams. In Visual [31] Iqbal S., Khan M.N.A., Yet another Set of Requirement
Analytics Science and Technology (VAST), 2011 IEEE Metrics for Software Projects. International Journal of
Conference on (pp. 277-278). IEEE. Software Engineering & Its Applications, 6(1), (2012).
[16] Pak, A., & Paroubek, P. (2010, May). Twitter as a corpus [32] Faizan M., Ulhaq S., Khan M N A., Defect Prevention and
for sentiment analysis and opinion mining. In Proceedings Process Improvement Methodology for Outsourced
of LREC (Vol. 2010). Software Projects. Middle-East Journal of Scientific
[17] Lima, A. C., & de Castro, L. N. (2012, November). Research, 19(5), 674-682, (2014).
Automatic sentiment analysis of Twitter messages. In [33] Khan K., Khan A., Aamir M., Khan M N A., Quality
Computational Aspects of Social Networks (CASoN), 2012 Assurance Assessment in Global Software Development.
Fourth International Conference on (pp. 52-57). IEEE. World Applied Sciences Journal, 24(11), (2013).
[18] Cvijikj, I. P., & Michahelles, F. (2011). Monitoring trends [34] Amir M., Khan K., Khan A., Khan M N A., An Appraisal
on facebook. In Dependable, Autonomic and Secure of Agile Software Development Process. International
Computing (DASC), 2011 IEEE Ninth International Journal of Advanced Science & Technology, 58, (2013).
Conference on (pp. 895-902). IEEE. [35] Rehman T U., Khan M N A., Riaz N., Analysis of
[19] Li, G., & Liu, F. (2010, November). A clustering-based Requirement Engineering Processes, Tools/Techniques
approach on sentiment analysis. In Intelligent Systems and and Methodologies. International Journal of Information
Knowledge Engineering (ISKE), 2010 International Technology & Computer Science, 5(3), (2013).
Conference on (pp. 331-337). IEEE. [36] Umar M., Khan, M N A., A Framework to Separate Non-
[20] Pohl, D., Bouchachia, A., & Hellwagner, H. (2012). Functional Requirements for System Maintainability.
Automatic Identification of Crisis-Related Sub-Events Kuwait Journal of Science & Engineering, 39(1 B), 211-
using Clustering. In Machine Learning and Applications 231, (2012).
(ICMLA), 2012 11th International Conference on (Vol. 2, [37] Umar M., Khan, M. N. A, Analyzing Non-Functional
pp. 333-338). IEEE. Requirements (NFRs) for software development. In IEEE
[21] Corley, C. D., Mikler, A. R., Singh, K. P., & Cook, D. J. 2nd International Conference on Software Engineering
(2009). Monitoring influenza trends through mining social and Service Science (ICSESS), 2011 pp. 675-678), (2011).
media. In International Conference on Bioinformatics & [38] Khan, M. N. A., Chatwin, C. R., & Young, R. C. (2007).
Computational Biology (pp. 340-346). A framework for post-event timeline reconstruction using
[22] Suzumura, T., & Oiki, T. (2011). StreamWeb: Real-Time neural networks. digital investigation, 4(3), 146-157.
Web Monitoring with Stream Computing. In Web [39] Khan, M. N. A., Chatwin, C. R., & Young, R. C. (2007).
Services (ICWS), 2011 IEEE International Conference on Extracting Evidence from Filesystem Activity using
(pp. 620-627). IEEE. Bayesian Networks. International journal of Forensic
[23] Karamibekr, M., & Ghorbani, A. A. (2012, December). computer science, 1, 50-63.
Verb Oriented Sentiment Classification. In Web [40] Khan, M. N. A. (2012). Performance analysis of Bayesian
Intelligence and Intelligent Agent Technology (WI-IAT), networks and neural networks in classification of file
2012 IEEE/WIC/ACM International Conferences on (Vol. system activities. Computers & Security, 31(4), 391-401.
1, pp. 327-331). IEEE. [41] Rafique, M., & Khan, M. N. A. (2013). Exploring Static
[24] C. Fellbaum. Wordnet: An electronic lexical database. and Live Digital Forensics: Methods, Practices and Tools.
[25] Cai, K., Spangler, S., Chen, Y., & Zhang, L. (2008, International Journal of Scientific & Engineering
December). Leveraging sentiment analysis for topic Research 4(10): 1048-1056.
detection. In Web Intelligence and Intelligent Agent [42] Bashir, M. S., & Khan, M. N. A. (2013). Triage in Live
Technology, 2008. WI-IAT'08. IEEE/WIC/ACM Digital Forensic Analysis. International journal of
International Conference on (Vol. 1, pp. 265-271). IEEE. Forensic Computer Science 1, 35-44.
[26] Koncz, P., & Paralic, J. (2011, June). An approach to [43] Faizan M., Khan M NA., Ulhaq S., Contemporary Trends
feature selection for sentiment analysis. In Intelligent in Defect Prevention: A Survey Report. International
Engineering Systems (INES), 2011 15th IEEE Journal of Modern Education & Computer Science, 4(3),
International Conference on (pp. 357-362). IEEE. (2012).
[27] Mizumoto, K., Yanagimoto, H., & Yoshioka, M. (2012, [44] Khan, MNA., Khalid M., ulHaq S., Review of
May). Sentiment Analysis of Stock Market News with Requirements Management Issues in Software
Semi-supervised Learning. In Computer and Information Development. International Journal of Modern Education
Science (ICIS), 2012 IEEE/ACIS 11th International & Computer Science, 5(1), (2013).
Conference on (pp. 325-328). IEEE.
[28] Colbaugh, R., & Glass, K. (2011, September). Agile
Sentiment Analysis of Social Media Content for Security
Informatics Applications. In Intelligence and Security
Informatics Conference (EISIC), 2011 European (pp. 327- Authors’ Profiles
331). IEEE.
[29] Toutanova, K., Klein, D., Manning, C. D., & Singer, Y. Asad Mehmood has completed his MS in Computing from
(2003, May). Feature-rich part-of-speech tagging with a Shaheed Zulfikar Ali Bhutto Institute of Science and
cyclic dependency network. In Proceedings of the 2003 Technology (SZABIST), Islamabad, Pakistan. He has over 7
Conference of the North American Chapter of the years of industry experience at his credit. His research interests
Association for Computational Linguistics on Human include Business Intelligence, Big Data Analytics and
Language Technology-Volume 1 (pp. 173-180). Sentimental Analysis.
Association for Computational Linguistics.

Copyright © 2014 MECS I.J. Modern Education and Computer Science, 2014, 12, 47-54
54 A Study of Sentiment and Trend Analysis Techniques for Social Media Content

Abdul Sattar Palli has completed his MS in Computing from M. N. A. Khan obtained D.Phil. degree in Computer System
Shaheed Zulfikar Ali Bhutto Institute of Science and Engineering. His research interests are in the fields of software
Technology (SZABIST), Islamabad, Pakistan. His research engineering, data mining, cyber administration, digital forensic
interests include Data Mining and Software Engineering. analysis and machine learning techniques.

How to cite this paper: Asad Mehmood, Abdul S. Palli, M.N.A. Khan,"A Study of Sentiment and Trend Analysis
Techniques for Social Media Content", IJMECS, vol.6, no.12, pp.47-54, 2014.DOI: 10.5815/ijmecs.2014.12.07

Copyright © 2014 MECS I.J. Modern Education and Computer Science, 2014, 12, 47-54

View publication stats

You might also like