Nothing Special   »   [go: up one dir, main page]

Introduction

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 27

TWITTER SENTIMENT ANALYSIS

ABSTRACT
INTRODUCTION

 Twitter Sentimental Analysis is used to identify as well as classify the sentiments


that are expressed in the text source.Sentiment analysis is the process of detecting
positive or negative sentiment in text.
 Modern sentiment analysis researches are now on different domains and languages,
In a classification of tweets data was considered with the environment. It's
estimated that 90% of the world’s data is unstructured , in other words it’s
unorganized.
 Huge volumes of unstructured business data are created every day: emails, chats,
social media conversations, surveys, articles, documents but it’s hard to analyse for
sentiment in a timely and efficient manner.
 Twitter sentiment analysis analyzes the sentiment or emotion of tweets. It uses
natural language processing and machine learning algorithms to classify tweets
automatically as positive, negative, or neutral based on their content. It can be done
for individual tweets or a larger dataset related to a particular event.
OBJECTIVES
 To Improve businesses quickly understand the overall opinions of their customers.
 To update automatically sorting the sentiment behind reviews, social media
conversations, and more, you can make faster and more accurate decisions.
 To provide accuracy of fetching and analysing the twits on demonetization and to
perform sentiment analysis in twitter sentiments.
EXISTING SYSTEM AND CHALLENGES

  The first challenge faced is storing and accessing the information from the large
huge amount of data sets from the clusters. We need a standard computing platform
to manage large data since the data is growing, and data stores in different data
storage locations in a centralized system, which will scale down the huge data into
sizable data for computing.
  The second challenge is retrieving the data from the large social media data sets. In
the scenarios where the data is growing daily, it’s somewhat difficult to accessing the
data from the large networks if we want to do specific action to be performed.
  The third challenge concentrates on the algorithm design for handling the
problems raised by the huge data volume and the dynamic data characteristics. The
main scope of the project is to accuracy of fetching and analysing the tweets on
demonetization and to perform sentiment analysis to find the most popular hash tags
which is trending and finding the average rating of each tweet based on that topic.
PROPOSED SYSTEM AND ADVANTAGES

 Retrieval of Data: Public Twitter data is mined using the existing Twitter APIs for data
extraction. Tweets would be selected based on a few chosen keywords pertaining to the
domain of our concern, i.e. product reviews. We have elected to use the Twitter API due to
ease of data extraction.
 Pre-processing: In this stage, the data is put through a pre-processing stage in which we
remove identifying information such as Twitter handles, timestamps of the message and
embedded links and videos. Such information is largely irrelevant and may cause false
results to be given by our system.
 Improve Tweet Correction and Accuracy : As twits are written for human perusal, they
often contain slang, misspellings and other irrelevant data. Thus, we correct the
misspellings in the sentences and look to replace the slang in the sentences with words from
Standard English that may roughly relate to the slang in question. As slang itself can be used
to display a wide variety of sentiment, often with greater emotional impact, this process is
necessary so that slang words may be considered as part of the emotion expressed.
LITERATURE SURVEY

 [1] In 2019, Saad and Yang [1] have aimed for giving a complete tweet sentiment
analysis on the basis of ordinal regression with machine learning algorithms. The
suggested model included pre-processing tweets as first step and with the
feature extraction model, an effective feature was generated.
 [2] In 2018, Fang et al. [2] have suggested multi-strategy sentiment analysis
models using the semantic fuzziness for resolving the issues. The outcomes have
demonstrated that the proposed model has attained high efficiency.
 [3] In 2019, Afzaal et al. [3] have recommended a novel approach of aspect-
based sentiment classification, which recognized the features in a precise manner
and attained the best classification accuracy.
CONT..

 [4] In 2019, Feizollah et al. [4] have concentrated on tweets related to two halal
products such as halal cosmetics and halal tourism. By utilizing Twitter search
function, Twitter information was extracted, and a new model was employed for
data filtering.
 [5] In 2018, Mukhtar et al. [5] have performed the sentiment analysis to the Urdu
blogs attained from several domain with Supervised Machine learning and
Lexicon-based models.
 [6] In 2020, Kumar et al. have presented a hybrid deep learning approach named
ConVNet-SVMBoVW that dealt with the real-time data for predicting the fine-
grained sentiment. In order to measure the hybrid polarity, an aggregation model
was developed.
CONT..
 [7] In 2018, Abdi et al. have proffered a machine learning technique for summarizing the
opinions of the users mentioned in reviews. The suggested method merged multiple
kinds of features into a unique feature set for modelling accurate classification model.
 [8] In 2019, Zhao et al. [8] have offered a novel image-text consistency driven multi- modal
sentiment evaluation model, which explored the correlation among the text and image.
Later, a multi-modal adaptive sentiment analysis model was implemented.
 [9] In 2019, Park et al. [9] have developed a semi-supervised sentiment-discriminative
objective for resolving the issue by documents partial sentiment data.
 [10] In 2019, Vashishtha and Susan [10] have calculated the sentiment related to social
media posts by a new set of fuzzy rules consisting of many datasets and lexicons.
SOFTWARE AND HARDWARE REQUIREMENTS

Software Requirements
 Language : Python 3.7
 IDE : Anaconda
 Library : Machine Learning Libraries
 Operating System : Windows

Hardware Requirements
 Hard Disk/SSD : 512GB
 RAM : 8GB
 Processor : Intel Core i5
DIAGRAM
ALGORITHM

Vader Sentiment
 The first algorithm compares each word in a tweet to a database of words that are
labelled as having positive or negative sentiment. There are many such datasets. For
this analysis, I downloaded a list of positive and negative sentiment words from Kaggle
datasets. This was done using the NLTK word-tokenizer. NLTK is one of the more
popular natural language processing toolkits for the Python language.
 Convert to lower case
 Remove @ mentions in tweets
 Remove hyperlinks
 Remove contractions (e.g. convert “won’t” to “will” and “not”)
 Remove punctuation
SAMPLE OUTPUT
Import the dataset
VISIBLE THE TWEETER LABEL AND MESSAGE
CHECKING DISTRIBUTION OF TWEETS
USING NLTK
SPLIT THE FREQUENT WORD CHART
WORD CLOUD
DECISION TREE ACCURACY
LOGISTIC REGRESSION ACCURACY
RANDOM FOREST ACCURACY
SAMPLE CODE

Import Libraries
 import numpy as np#array
 import pandas as pd#Data Frame
 import matplotlib.pyplot as plt#Chart
 import seaborn as sns#Visual
 import warnings
 # text preprocessing
 train = pd.read_csv('train_tweet.csv')
 test = pd.read_csv('test_tweets.csv')
 print(train.shape)
 print(test.shape)
 train.head(10)
 test.head(10) train.isnull().any()
 test.isnull().any()
CONT..

checking out the negative comments from the train set


 train[train['label'] == 0].head(10)
checking the distribution of tweets in the data
 length_train = train['tweet'].str.len().plot.hist(color = 'pink', figsize = (6, 4))
 length_test = test['tweet'].str.len().plot.hist(color = 'orange', figsize = (6, 4))
 train[train['label'] == 1].head(10)
adding a column to represent the length of the tweet
 train['len'] = train['tweet'].str.len()
 test['len'] = test['tweet'].str.len()
 train.head(10)
CONCLUSION

 In this project we have conducted sentiment analysis of twitter data using


convolution neural network algorithms and LSTM instead of the machine learning
approaches such as SVM and Nave Bayes, by using the global vector
representation model and have classified the emotion into five distinct types. The
focus of the project is to evaluate the accuracy between the various classification
algorithms and understand what accuracy is been generated also to understand
the sentiments of the people with the help of sentimental analysis. In this project,
the two algorithms are compared to the sentimental classification of tweets. It
can be helping a particular organization to understand their people and to make
the business even better through sentimental understanding.
REFERENCES

 [1]Yi, S., & Liu, X. (2020). Machine learning based customer sentiment analysis for recommending shoppers,
shops based on customers’ review. Complex & Intelligent Systems, 1(1). DOI:
https://doi.org/10.1007/s40747-020-00155-2

 [2]Vohra, S., & Teraiya, J. (2013). A Comparative Study of Sentiment Analysis Techniques. International Journal of
Information, Knowledge and Research in Computer Engineering, 2(2),313-317.

 [3]Machine Learning & its Applications Outsource to India. (2020). Retrieved on May 18, 2020,Twitter sentiment
analysis using modern techniques from https://www.outsource2india.com/software/articles/machine-learning-
applications-how-it-works-who-uses-it.asp

 [4]Jain, A. P., & Dandannavar, P. (2016). Application of machine learning techniques to sentiment analysis. Second
International Conference on Applied and Theoretical Computing and Communication Technology (ICATccT), 1(1). 628–
632. DOI: https://doi.org/10.1109/ICATCCT.2016.7912076

 [5]Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain
Shams Engineering Journal, 5(4), 1093–1113. DOI: https://doi.org/10.1016/j.asej.2014.04.011
CONT..

 [6]Aydogan, E., & Akcayol, M. A. (2016). A comprehensive survey for sentiment analysis tasks
using machine learning techniques. 2016 International Symposium on Innovations in Intelligent
Systems and Applications (INISTA,.1(1) 1–7. DOI: https://doi.org/10.1109/INISTA.2016.7571856
 [7]Ahmad, M., Aftab, S., Muhammad, S. S., & Ahmad, S. (2017). Machine learning techniques
for sentiment analysis: A review. International journal of Multi-disciplinary science and
Engineering, 8(3), 27-35.
 [8]Mäntylä, M. V., Graziotin, D., & Kuutila, M. (2018). The evolution of sentiment analysis—A
review of research topics, venues, and top cited papers. Computer Science Review, 27(1), 16–
32. DOI: https://doi.org/10.1016/j.cosrev.2017.10.002
 [9]Kumar, A., & Sebastian, T. M. (2012). Sentiment Analysis: A Perspective on its Past, Present
and Future. International Journal of Intelligent Systems and Applications, 4(10), 1–14. DOI:
https://doi.org/10.5815/ijisa.2012.10.01
 [10] Swathi, R., & Seshadri, R. (2017). Systematic survey on evolution of machine learning for
big data. International Conference on Intelligent Computing and Control Systems (ICICCS), 1(1),
204–209. DOI: https://doi.org/10.1109/ICCONS.2017.8250711
THANK YOU

You might also like