Report Final 5th Sem

EMOTION DECTION WITH TEXT BY
USING PYTHON
A MINI PROJECT REPORT
Submitted by
MANTHU KAVYA(11152120232)
MARAM LASYA(111521202033)
PABBISETTY BABY SURAKSHA(111521202039)
in partial fulfillment for the award of the

degree of
BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND BUSINESS SYSTEMS
R.M.D. ENGINEERING COLLEGE

(An Autonomous Institution)
KAVARAIPETTAI – 601 206
NOVEMBER/DECEMBER 2023
1
1
BONAFIDE CERTIFICATE
Certified that this project titled “AUTISM VOX” submitted by “MANTHU

KAVYA(111521202032), MARAM LASYA (111521202033),PABBISETTY VENKATA BABY
SURAKSHA(111521202039)”is the bonafide work of who carried out the project work under
my supervision.
SIGNATURE SIGNATURE
Dr. G. Amudha M.E, Ph.D., Dr.G.Amudha M.E,ph.D.,
HEAD OF THE DEPARTMENT SUPERVISOR

Professor Assistant Professor
Department of Computer Science and Department of Computer Science
and Business Systems, Business Systems,
R.M.D. Engineering College, R.M.D Engineering College,
R.S.M. Nagar, R.S.M. Nagar,
Kavaraipettai - 601206. Kavaraipettai - 601206.
2
CERTIFICATE OF EVALUATION
College Name R.M.D ENGINEERING COLLEGE
Department Computer Science and Business Systems
Semester 05
Name of the Students with Name of the Supervisor

Title of Project
Register Numbers with Designation
EMOTION DECTION WITH MANTHU KAVYA(111521202032) DEPARTMENT OF

TEXT USING PYTHON
MARAM LASYA (111521202033) COMPUTER SCIENCE AND
PABBISETTY VENKATA BABY BUSINESS SYSTEMS

SURAKSHA(111521202039)
The report of the project work submitted by the above students in partial fulfillment for
the award of Bachelor of Technology Degree in Department of Computer Science and Business
Systems of R.M.D Engineering College was evaluated and confirmed to be the report of the
work done by the above students and then evaluated.
Submitted the project during the Viva-Voce held on
INTERNAL EXAMINER
3
ACKNOWLEDGEMENT
The success and outcome of this project required a lot of guidance, Support and kind co-
operation from many, for successful completion. We wish to express our sincere thanks to all
those who were involved in the completion of this project.
It is our immense pleasure to express our deep sense of gratitude to our respected chairman Thiru
R. S. Munirathinam, our vice chairman Thiru R. M. Kishore, and our director Thiru R. Jothi Naidu
for the facilities and support given by them in the college.
We are extremely thankful to our principal, Dr. N. Anbuchezhian, M.S, M.B.A, M.E, Ph.D., for
giving us an opportunity to serve the purpose of education.
We are indebted to Dr. G. Amudha, M.E, Ph.D., Professor, Head of the Department in
Computer Science and Business Systems for providing the necessary guidance and constant
encouragement for successful completion of this project on time.
We extend our sincere thanks and gratitude to our project guide MS.CH.SRILAKSHMI
B.E,M.E.,( Ph.D), Assistant Professor in the Department of Computer Science and Business
Systems, who guided us all along till the completion of our project work.
Last but not the least, I wish to thank all the teaching and non-teaching staff of CSBS department
for their help in the completion of the project.
4
CHAPTER TITLE PAGE NO
1. Abstract
2. Introduction
3. Process of sentiment
analysis and emotion detection
4. Datasets for sentiment analysis

And emotion detection.
5. Pre-processing of text
6. Feature extraction
7. Sentiment analysis and emotional

Analysis techniques
8. Confusion Matrix
9. Model evaluation and imterpretation
10. Result
11. Conclusion
5
1.Abstract-
Social media is a popular way for people to share their feelings with
the world, using text, images, audio, and videos. However, handling all this text
on social networks can be tough. The internet constantly generates loads of
unstructured data on social media every second. To understand how people
think, we use something called sentiment analysis. It helps us figure out if
someone has a positive, negative, or neutral opinion about a product, service,
person, or place.
Sometimes, sentiment analysis isn't enough, and we need to know exactly how
someone is feeling. That's where emotion detection comes in. It helps us
understand a person's emotional or mental state more precisely. In this review
paper, we explore different levels of sentiment analysis, various models for
understanding emotions, and the processes used to analyze sentiments and
detect emotions from text. We also look at the challenges we face in studying
how people feel and think, delving into the complexities of this evolving field.
2.INTRODUCTION
In the world of language understanding and communication, there are two
essential components: understanding what people say and creating our own
messages. The first aspect, which is understanding, can be quite challenging
because human language is often ambiguous and can have various meanings
depending on the context and tone.
With the widespread use of the internet, social media has become a significant
platform for people to share their thoughts and feelings using text, pictures,
audio, and videos. Managing and making sense of this vast amount of textual
content on social media can be a daunting task, especially given the constant
influx of new content.
To help us make sense of this digital deluge, we rely on a technique known as

"sentiment analysis." This method assists us in determining whether the text
expresses a positive, negative, or neutral sentiment towards a particular product,
service, person, or place. However, sometimes, a simple positive or negative
classification isn't sufficient. We need to understand the specific emotions
people are experiencing. This is where "emotion detection" comes into play,
enabling us to pinpoint emotions like happiness, anger, or sadness.
6
Although the terms "sentiment analysis" and "emotion detection" are sometimes
used interchangeably, they have distinct purposes. Sentiment analysis provides
an overall sentiment assessment, whereas emotion detection delves deeper into
the identification of specific emotions.
These techniques have wide-ranging applications across various fields. In the

business world, companies use social media for marketing and customer
feedback, and understanding customer sentiments is crucial for improving their
products and services.In the healthcare sector, individuals share their health-
related experiences on social media platforms. Emotion analysis can be a
valuable tool for identifying when someone may require mental health support.
3.Process of sentiment analysis and emotion detection
Process of sentiment analysis and emotion detection comes across various stages
like collecting dataset, pre-processing, feature extraction, model development.
4.Datasets for sentiment analysis and emotion detection

Sentiment analysis and emotion detection
datasets are used to train and evaluate machine learning models that can identify
the sentiment or emotion expressed in a piece of text. Table 2 lists a few popular
datasets, along with their domain, size, and type.One of the most popular
datasets is SemEval, which includes data from Twitter, reviews, and news
articles. Another popular dataset is Stanford Sentiment Treebank (SST), which
contains product reviews. The International Survey of Emotional Antecedents
and Reactions (ISEAR) dataset includes surveys that
7
respondents completed after experiencing one of seven emotions: anger,
disgust, fear, happiness, sadness, surprise, or shame.
Many datasets also include data from social media sites such as Twitter,
YouTube, and Facebook. This data is often unstructured and needs to be
preprocessed before it can be used by machine learning models.
5.Pre-processing of text-
On social media platforms, people naturally express their

feelings and emotions. However, the data extracted from posts, reviews,
comments, and critiques on these platforms are often unstructured, posing a
challenge for machines when it comes to sentiment and emotion analysis.
Therefore, the pre-processing stage plays a pivotal role in data refinement, as
data quality significantly influences subsequent analysis techniques.
Organizing a dataset involves several pre-processing steps, including

tokenization, the removal of stop words, POS tagging, and more . It's essential
to strike a balance during pre-processing since some techniques can
inadvertently remove vital information needed for sentiment and emotion
analysis.
Tokenization, for instance, is the process of dividing a document, paragraph, or
sentence into smaller word segments called tokens . For instance, the sentence
"this place is so beautiful" is tokenized into 'this,' "place," is, "so," beautiful.’
Standardization is another crucial step, where text is normalized to ensure
consistency. This may involve converting text into standard forms and
correcting spelling errors .
Inessential words, such as articles and certain prepositions that don't contribute
to emotion recognition or sentiment analysis, should be omitted. For example,
stop words like "is," "at," "an," and "the" are removed to streamline
computations . POS tagging is employed to identify the different parts of speech
in a sentence. This aids in extracting relevant aspects from a sentence, as
sentiments and emotions are often conveyed through adjectives .
Stemming and lemmatization are critical pre-processing steps. In stemming,

words are reduced to their root form by removing suffixes. For example,
"argued" and "argue" are both transformed into "argue." This simplifies
sentence processing and computation . Lemmatization involves analyzing the
morphology of words to remove inflectional endings and return words to their
base form, or lemma . For instance, "caught" becomes "catch" .
6.Feature extraction
Feature extraction is a critical step in Natural Language
Processing (NLP) tasks, particularly when dealing with textual data. This
process involves the conversion of text-based information into numerical
representations that can be understood and processed by machine learning
algorithms. In this report, we will explore three common methods for feature
extraction using machine learning in Python: Count Vectorization (Bag of
Words), TF-IDF Vectorization, and Hashing Vectorization.
Count Vectorization, often referred to as the "Bag of Words" approach, is a

straightforward method for feature extraction. It creates a matrix where each
row corresponds to a sentence or document, and each column represents a word
from the dataset's vocabulary. The values in the cells signify the frequency of
each word within the corresponding sentence.
*TF-IDF Vectorization*
TF-IDF (Term Frequency-Inverse Document Frequency) Vectorization is

another technique used for feature extraction. It assigns numerical values to
words based on their importance in the document. Words that appear frequently
in a document but infrequently in others are considered more significant.
*Hashing Vectorization*
Hashing Vectorization is a method for transforming text data into a fixed

number of features. It uses a hash function to map words to specific positions in
a vector.
Feature extraction is a fundamental process in preparing text data for machine

learning tasks. In this report, we explored three common techniques: Count
Vectorization, TF-IDF Vectorization, and Hashing Vectorization. The choice of
method depends on the specific NLP task and dataset characteristics. Each
technique has its advantages and limitations, making it essential to consider the
nature of the data and the objectives of the analysis when selecting the most
appropriate feature extraction approach.
7.Sentiment analysis and emotion analysis techniques

The application of machine learning techniques in
sentiment analysis has gained significant attention. This approach leverages the
power of algorithms and statistical models to automatically classify and
understand the sentiments expressed in textual data. Here, we delve deeper into
the essential aspects of this machine learning-based approach.
*Data Splitting: Training and Testing Datasets*
One fundamental step in the machine learning-based approach is data splitting,

which involves dividing the dataset into two main segments: the training dataset
and the testing dataset. The training dataset serves as the foundation for
educating the machine learning model. It is here that the model learns to
recognize patterns and characteristics associated with different sentiment
categories. The testing dataset, on the other hand, is reserved for evaluating the
model's performance and assessing how effectively it can generalize from the
training data.
*Supervised Classification Algorithms*

Machine learning algorithms used for sentiment analysis are often categorized
under supervised classification. In this setup, the algorithm is provided with
labeled examples of text, where each example is associated with a sentiment
category (e.g., positive, negative, neutral). The algorithm learns to make
predictions by identifying patterns and relationships between the textual content
and the sentiment labels in the training data. Some of the commonly employed
supervised classification algorithms for sentiment analysis include:
1. *Naïve Bayes*: This probabilistic algorithm is based on Bayes' theorem and

is particularly effective for text classification tasks. Naïve Bayes assumes that
features (words) are conditionally independent, making it computationally
efficient.
2. **Support Vector Machine (SVM)**: SVM is a powerful algorithm for

classification tasks. It works by finding an optimal hyperplane that separates
different sentiment categories while maximizing the margin between them.
3. *Decision Trees*: Decision trees are tree-like structures that recursively split
data into smaller subsets based on the most discriminative features. They are
interpretable and can be used for sentiment analysis tasks.
8.Confusion Matrix
A confusion matrix is a fundamental tool in the evaluation of
classification models, including those used for emotion detection with text in
Python. It provides a way to assess the performance of your model by
comparing the predicted labels to the actual labels. In the context of emotion
detection with text, this matrix can help you understand how well your model is
classifying emotions (e.g., happy, sad, angry, etc.) based on the text data.
A confusion matrix is typically a square matrix with dimensions equal to the

number of classes (emotions in this case) you are trying to predict. It is
organized as follows:
Actual Positive Actual Negative
Predicted Positive True Positive False Positive
Predicted Negative False Negative True Negative
Here's what each of these terms means in the context of emotion detection:
True Positive (TP): The model correctly predicted the positive class (e.g.,
correctly classified a text as "happy").
False Positive (FP): The model incorrectly predicted the positive class when it
should have been negative (e.g., incorrectly classifying a text as "happy" when
it's not).
False Negative (FN): The model incorrectly predicted the negative class when it
should have been positive (e.g., failing to classify a text as "happy" when it is).
True Negative (TN): The model correctly predicted the negative class (e.g.,
correctly classifying a text as "not happy").
Once you have your confusion matrix, you can calculate various evaluation
metrics to assess the performance of your emotion detection model, including:
Accuracy: (TP + TN) / (TP + TN + FP + FN) - It measures the overall

correctness of predictions.
Precision: TP / (TP + FP) - It measures the proportion of positive predictions

that were actually correct. In the context of emotion detection, it represents the
percentage of texts correctly classified as a particular emotion.
Recall (Sensitivity or True Positive Rate): TP / (TP + FN) - It measures the
proportion of actual positive samples that were correctly classified. In emotion
detection, it indicates how well the model identifies texts of a particular
emotion.
F1 Score: 2 * (Precision * Recall) / (Precision + Recall) - It combines precision

and recall into a single metric to balance the trade-off between the two.
9.Model evaualtion and interpretation

Model interpretation in Python involves techniques and tools to
understand and explain the predictions of machine learning models. It helps
uncover insights, feature importance, and decision-making processes within
complex models. Common methods include feature importance analysis,
partial dependence plots, SHAP values, LIME, and various model-agnostic
approaches. These techniques assist in making machine learning models more
transparent and accountable, providing valuable insights into their behavior
and aiding in trust and decision-making.
10.Result
Emotion detection with text in Python is a valuable and versatile
application of natural language processing and machine learning. It allows for
the automated categorization of emotions in textual content, enabling
applications in sentiment analysis, customer feedback analysis, and emotionally
intelligent chatbots. Accurate emotion detection enhances user experiences,
provides valuable business insights, and aids psychological research.
Leveraging Python's libraries and tools, developers and data scientists can
create powerful systems that better understand and respond to human emotions
in written communication.
11.Conclusion
In conclusion, implementing emotion detection with text using
Python offers a powerful means to understand and categorize human emotions
in written content. By leveraging natural language processing techniques and
machine learning models, we can build systems that automatically recognize
and classify emotions, enabling applications such as sentiment analysis,
customer feedback analysis, and chatbots with emotional intelligence. Accurate
emotion detection enhances user experiences, business insights, and
psychological research, making it a valuable and versatile tool in the realm of
text analysis and human-computer interaction.

Report Final 5th Sem

Uploaded by

Copyright:

Available Formats

Report Final 5th Sem

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Report Final 5th Sem

Uploaded by

Copyright:

Available Formats

EMOTION DECTION WITH TEXT BY

PABBISETTY BABY SURAKSHA(111521202039)

in partial fulfillment for the award of the

R.M.D. ENGINEERING COLLEGE

Certified that this project titled “AUTISM VOX” submitted by “MANTHU

Dr. G. Amudha M.E, Ph.D., Dr.G.Amudha M.E,ph.D.,

HEAD OF THE DEPARTMENT SUPERVISOR

College Name R.M.D ENGINEERING COLLEGE

Department Computer Science and Business Systems

Name of the Students with Name of the Supervisor

EMOTION DECTION WITH MANTHU KAVYA(111521202032) DEPARTMENT OF

PABBISETTY VENKATA BABY BUSINESS SYSTEMS

work done by the above students and then evaluated.

Submitted the project during the Viva-Voce held on

4. Datasets for sentiment analysis

7. Sentiment analysis and emotional

9. Model evaluation and imterpretation

To help us make sense of this digital deluge, we rely on a technique known as

These techniques have wide-ranging applications across various fields. In the

4.Datasets for sentiment analysis and emotion detection

On social media platforms, people naturally express their

Organizing a dataset involves several pre-processing steps, including

Stemming and lemmatization are critical pre-processing steps. In stemming,

Count Vectorization, often referred to as the "Bag of Words" approach, is a

TF-IDF (Term Frequency-Inverse Document Frequency) Vectorization is

Hashing Vectorization is a method for transforming text data into a fixed

Feature extraction is a fundamental process in preparing text data for machine

7.Sentiment analysis and emotion analysis techniques

One fundamental step in the machine learning-based approach is data splitting,

*Supervised Classification Algorithms*

1. *Naïve Bayes*: This probabilistic algorithm is based on Bayes' theorem and

2. **Support Vector Machine (SVM)**: SVM is a powerful algorithm for

A confusion matrix is typically a square matrix with dimensions equal to the

Actual Positive Actual Negative

Predicted Positive True Positive False Positive

Predicted Negative False Negative True Negative

Accuracy: (TP + TN) / (TP + TN + FP + FN) - It measures the overall

Precision: TP / (TP + FP) - It measures the proportion of positive predictions

F1 Score: 2 * (Precision * Recall) / (Precision + Recall) - It combines precision

9.Model evaualtion and interpretation

You might also like

Supervised Classification Algorithms

1. Naïve Bayes: This probabilistic algorithm is based on Bayes' theorem and

2. Support Vector Machine (SVM): SVM is a powerful algorithm for