Introduction
Introduction
Introduction
ABSTRACT
INTRODUCTION
The first challenge faced is storing and accessing the information from the large
huge amount of data sets from the clusters. We need a standard computing platform
to manage large data since the data is growing, and data stores in different data
storage locations in a centralized system, which will scale down the huge data into
sizable data for computing.
The second challenge is retrieving the data from the large social media data sets. In
the scenarios where the data is growing daily, it’s somewhat difficult to accessing the
data from the large networks if we want to do specific action to be performed.
The third challenge concentrates on the algorithm design for handling the
problems raised by the huge data volume and the dynamic data characteristics. The
main scope of the project is to accuracy of fetching and analysing the tweets on
demonetization and to perform sentiment analysis to find the most popular hash tags
which is trending and finding the average rating of each tweet based on that topic.
PROPOSED SYSTEM AND ADVANTAGES
Retrieval of Data: Public Twitter data is mined using the existing Twitter APIs for data
extraction. Tweets would be selected based on a few chosen keywords pertaining to the
domain of our concern, i.e. product reviews. We have elected to use the Twitter API due to
ease of data extraction.
Pre-processing: In this stage, the data is put through a pre-processing stage in which we
remove identifying information such as Twitter handles, timestamps of the message and
embedded links and videos. Such information is largely irrelevant and may cause false
results to be given by our system.
Improve Tweet Correction and Accuracy : As twits are written for human perusal, they
often contain slang, misspellings and other irrelevant data. Thus, we correct the
misspellings in the sentences and look to replace the slang in the sentences with words from
Standard English that may roughly relate to the slang in question. As slang itself can be used
to display a wide variety of sentiment, often with greater emotional impact, this process is
necessary so that slang words may be considered as part of the emotion expressed.
LITERATURE SURVEY
[1] In 2019, Saad and Yang [1] have aimed for giving a complete tweet sentiment
analysis on the basis of ordinal regression with machine learning algorithms. The
suggested model included pre-processing tweets as first step and with the
feature extraction model, an effective feature was generated.
[2] In 2018, Fang et al. [2] have suggested multi-strategy sentiment analysis
models using the semantic fuzziness for resolving the issues. The outcomes have
demonstrated that the proposed model has attained high efficiency.
[3] In 2019, Afzaal et al. [3] have recommended a novel approach of aspect-
based sentiment classification, which recognized the features in a precise manner
and attained the best classification accuracy.
CONT..
[4] In 2019, Feizollah et al. [4] have concentrated on tweets related to two halal
products such as halal cosmetics and halal tourism. By utilizing Twitter search
function, Twitter information was extracted, and a new model was employed for
data filtering.
[5] In 2018, Mukhtar et al. [5] have performed the sentiment analysis to the Urdu
blogs attained from several domain with Supervised Machine learning and
Lexicon-based models.
[6] In 2020, Kumar et al. have presented a hybrid deep learning approach named
ConVNet-SVMBoVW that dealt with the real-time data for predicting the fine-
grained sentiment. In order to measure the hybrid polarity, an aggregation model
was developed.
CONT..
[7] In 2018, Abdi et al. have proffered a machine learning technique for summarizing the
opinions of the users mentioned in reviews. The suggested method merged multiple
kinds of features into a unique feature set for modelling accurate classification model.
[8] In 2019, Zhao et al. [8] have offered a novel image-text consistency driven multi- modal
sentiment evaluation model, which explored the correlation among the text and image.
Later, a multi-modal adaptive sentiment analysis model was implemented.
[9] In 2019, Park et al. [9] have developed a semi-supervised sentiment-discriminative
objective for resolving the issue by documents partial sentiment data.
[10] In 2019, Vashishtha and Susan [10] have calculated the sentiment related to social
media posts by a new set of fuzzy rules consisting of many datasets and lexicons.
SOFTWARE AND HARDWARE REQUIREMENTS
Software Requirements
Language : Python 3.7
IDE : Anaconda
Library : Machine Learning Libraries
Operating System : Windows
Hardware Requirements
Hard Disk/SSD : 512GB
RAM : 8GB
Processor : Intel Core i5
DIAGRAM
ALGORITHM
Vader Sentiment
The first algorithm compares each word in a tweet to a database of words that are
labelled as having positive or negative sentiment. There are many such datasets. For
this analysis, I downloaded a list of positive and negative sentiment words from Kaggle
datasets. This was done using the NLTK word-tokenizer. NLTK is one of the more
popular natural language processing toolkits for the Python language.
Convert to lower case
Remove @ mentions in tweets
Remove hyperlinks
Remove contractions (e.g. convert “won’t” to “will” and “not”)
Remove punctuation
SAMPLE OUTPUT
Import the dataset
VISIBLE THE TWEETER LABEL AND MESSAGE
CHECKING DISTRIBUTION OF TWEETS
USING NLTK
SPLIT THE FREQUENT WORD CHART
WORD CLOUD
DECISION TREE ACCURACY
LOGISTIC REGRESSION ACCURACY
RANDOM FOREST ACCURACY
SAMPLE CODE
Import Libraries
import numpy as np#array
import pandas as pd#Data Frame
import matplotlib.pyplot as plt#Chart
import seaborn as sns#Visual
import warnings
# text preprocessing
train = pd.read_csv('train_tweet.csv')
test = pd.read_csv('test_tweets.csv')
print(train.shape)
print(test.shape)
train.head(10)
test.head(10) train.isnull().any()
test.isnull().any()
CONT..
[1]Yi, S., & Liu, X. (2020). Machine learning based customer sentiment analysis for recommending shoppers,
shops based on customers’ review. Complex & Intelligent Systems, 1(1). DOI:
https://doi.org/10.1007/s40747-020-00155-2
[2]Vohra, S., & Teraiya, J. (2013). A Comparative Study of Sentiment Analysis Techniques. International Journal of
Information, Knowledge and Research in Computer Engineering, 2(2),313-317.
[3]Machine Learning & its Applications Outsource to India. (2020). Retrieved on May 18, 2020,Twitter sentiment
analysis using modern techniques from https://www.outsource2india.com/software/articles/machine-learning-
applications-how-it-works-who-uses-it.asp
[4]Jain, A. P., & Dandannavar, P. (2016). Application of machine learning techniques to sentiment analysis. Second
International Conference on Applied and Theoretical Computing and Communication Technology (ICATccT), 1(1). 628–
632. DOI: https://doi.org/10.1109/ICATCCT.2016.7912076
[5]Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain
Shams Engineering Journal, 5(4), 1093–1113. DOI: https://doi.org/10.1016/j.asej.2014.04.011
CONT..
[6]Aydogan, E., & Akcayol, M. A. (2016). A comprehensive survey for sentiment analysis tasks
using machine learning techniques. 2016 International Symposium on Innovations in Intelligent
Systems and Applications (INISTA,.1(1) 1–7. DOI: https://doi.org/10.1109/INISTA.2016.7571856
[7]Ahmad, M., Aftab, S., Muhammad, S. S., & Ahmad, S. (2017). Machine learning techniques
for sentiment analysis: A review. International journal of Multi-disciplinary science and
Engineering, 8(3), 27-35.
[8]Mäntylä, M. V., Graziotin, D., & Kuutila, M. (2018). The evolution of sentiment analysis—A
review of research topics, venues, and top cited papers. Computer Science Review, 27(1), 16–
32. DOI: https://doi.org/10.1016/j.cosrev.2017.10.002
[9]Kumar, A., & Sebastian, T. M. (2012). Sentiment Analysis: A Perspective on its Past, Present
and Future. International Journal of Intelligent Systems and Applications, 4(10), 1–14. DOI:
https://doi.org/10.5815/ijisa.2012.10.01
[10] Swathi, R., & Seshadri, R. (2017). Systematic survey on evolution of machine learning for
big data. International Conference on Intelligent Computing and Control Systems (ICICCS), 1(1),
204–209. DOI: https://doi.org/10.1109/ICCONS.2017.8250711
THANK YOU