Minor Project Report Soumya2
Minor Project Report Soumya2
Minor Project Report Soumya2
Submitted By
Soumya Ranjan Jena
(2224100031)
CERTIFICATE
This is to certify that the project report entitled “WHATSAPP CHAT ANALYZER”
being submitted by Mr. Soumya Ranjan Jena bearing the registration no: 2224100031 in
partial fulfilment for the award of the Degree of Master in Computer Application to the Odisha
University of Technology and Research is a record of bonafide work carried out by him under
my guidance and supervision.
The results embodied in this project report have not been submitted to any other
University or Institute for the award of any Degree or Diploma.
DECLARATION
ACKNOWLEDGEMENT
I would like to express my sincere gratitude to my advisor, Mrs. Swarna Lata pati, School of
computer science whose knowledge and guidance have motivated me to achieve goals. He has
consistently been a source of motivation, encouragement, and inspiration. The time I have spent
working under his supervision has truly been a pleasure.
I take it as a great privilege to express our heartfelt gratitude to Dr Jibitesh Mishra, Head of
School of Computer Science for his valuable support and to all senior faculty members of the
School of computer science for their help during my course. Thanks to the programmers and
non-teaching staff of School of computer science.
Finally, special thanks to my parents for their support and encouragement throughout my life
and this course. Thanks to all my friends and well-wishers for their constant support.
The most used and efficient method of communication in recent times is an application
called WhatsApp. WhatsApp chats consists of various kinds of conversations held among
group of people. This chat consists of various topics. This information can provide lots of data
for latest technologies such as machine learning. The most important thing for a machine
learning models is to provide the right learning experience which is indirectly affected by the
data that we provide to the model. This tool aims to provide in depth analysis of this data
which is provided by WhatsApp. Irrespective of whichever topic the conversation is based our
developed code can be applied to obtain a better understanding of the data. The advantage of
this tool is that is implemented using simple python modules such as pandas, matplotlib, seaborn
and sentiment analysis which are used to create data frames and plot different graphs, where
then it is displayed in the flutter application which is efficient and less resources consuming
algorithm, therefor it can be easily applied to largest dataset.
CONTENTS
1. INTRODUCTION…………………………………………………..
1.1 . Introduction
1.2 . Problem Statement
1.3 . Existing System
1.4 . Proposed System
1.5 . Objectives
2. LITERRATURE SURVEY………………………………………..
2.1. Feasibility Study
2.1.1. Technical Feasibility
2.1.2. Economical Feasibility
2.1.3. Operational Feasibility
3. REQUIREMENT ANALYSIS……………………………………
3.1. Requirements Analysis
3.2. Platform Specification
3.3. Functional Requirements
3.4. Non- Functional Requirements
3.5. System Specification
3.5.1. Software Requirements
3.5.2. Hardware Requirements
4. DESIGN…………………………………………………………….
4.1. Software Requirement Specification
4.1.1. Glossary
4.1.2. Use Case Model
4.2. Conceptual Level Class Diagram
4.3. Conceptual Level Activity Diagram
5. SYSTEM MODELING…………………………………………...
5.1. Conceptual Level Sequence
Diagram
5.2. Conceptual Level Collaboration
Diagram
5.3. Conceptual Level State Diagram
5.4. Conceptual Level Component
Diagram
5.5. Conceptual Level Deployment
Diagram
5.6. Methodology and
Implementation Phase
5.6.1. Methodology
5.6.1.1. Description
Incremental model
5.6.1.2. Advantages and
Disadvantages
5.6.1.3. Reason for Use
5.6.2. Implementation Phase
5.6.2.1. Language Used
Characteristics
5.7. Testing
5.7.1. Testing Objectives
5.7.2. Testing Methods &
Strategies used along with Test Data
1.1. Introduction
This tool is based on data analysis and processing. The first step in implementing a machine
learning algorithm is to understand the right learning experience from which the model
starts improving on. Data pre-processing plays a major role when it comes to machine
learning. In order to make the model more efficient we need lots of data, so we turned our
focus primarily on one of the largescale data producers owned by Facebook which is
nothing but WhatsApp. WhatsApp claims that nearly 55 billion messages are sent each
day. The average user spends 195 minutes per week on WhatsApp, and is a member of
plenty of groups. With this treasure house of data right under our very noses, it is but
imperative that we embark on a mission to gain insights on the messages which our
phones are forced to bear witness to..
This project aims to provide a better understanding towards various types of chats.
This analysis proves to be better input to machine learning models which essentially
explore the chat data. It require proper learning instances which provides better
accuracy for these models. Our project ensures to prove an in-depth exploratory data
analysis on various types of WhatsApp chats.
Sentiment Analysis: Determine the overall sentiment of the conversations, whether they
are positive, negative, or neutral. This can help in understanding the emotional tone of
the discussions.
Topic Modeling: Identify and categorize the main topics or themes discussed in the
conversations. This can be achieved through techniques such as topic modeling
algorithms (e.g., Latent Dirichlet Allocation).
User Behavior Analysis: Analyze the behavior of individual users within the chat,
including the frequency of messages, response times, and participation levels. This can
provide insights into user engagement.
Keyword Extraction: Extract relevant keywords or phrases that are frequently used in
the conversations. This can help in identifying key topics or trends.
Named Entity Recognition (NER): Recognize and classify named entities such as
people, locations, organizations, and dates mentioned in the conversations. This can
assist in understanding the context of the discussions.
Anomaly Detection: Identify unusual patterns or outliers in the chat data. This can be
useful for detecting potential issues or abnormal behavior within the group.
Time Series Analysis: Explore how conversation patterns change over time, identifying
peak activity periods, and understanding the dynamics of the group or individual
interactions.
User Profiling: Create profiles for individual users based on their language use, topics
of interest, and overall behavior in the chat. This can be valuable for personalized
insights.
The technical feasibility study report whether there exists correct required resources and
technologies which will be used for project development. It is the measure of the
specific technical solution and the availability of the technical resource and expertise.
In our project we will be using Jupiter Notebook(web based application) and VS
code(text editor), both of them are open source software as long with these various
python libraries and will be used.
Software recruitment specification (SRS) is a technical specification of requirements for the software
product. SRS represent an overview of products, features and summaries the processing environments
for development operation and maintenance of the product.
Requirement Specification:
Description
The class diagram has following two classes with their respective attribute and methods:
DataFrame
o Attributes : user, message, date, time, year, month, day, dayname,
dayofweek, weeknum, hour, minute, meridian
o Methods : separateDateTime
Generate report
o Attributes : selectedUser, message, dataFrame, timeFormat
o Methods : fetch_stats, chat_form, most_talkative, hourly_timeline,
daily_timeline, weekly_timeline, most_busy_day, most_busy_month,
crete_worldcloud
The class Dataframe is creating the class Generate report so Dataframe class include
Generate report class.
4.3. Activity Diagram
In the activity diagram as the initial activity starts user will upload the file as a
input which is action and in the next action time format will be selected.
The decision box check chat format represents the validity of the time format of
the file.
If the time format is correct then analysis will be done and process will end.
If the time format is wrong user will have to again check for the correct format.
5. SYSTEM MODELING
5.1. Sequence Diagram
The Sequence diagram start with upload chat in front-end then check time
format will be exe it will match time format of chat upload with time format user
selected then it goes to server then server perform analysis operation and send
back to result in user end.
If time format of chat and user select time format not match it will display a
invalid time format select error.
The State diagram start with the uploading of the file and after that in the next
state time format will be selected if the time format is valid then in the next state
analysis will be done. The analysis state will complete when the overall result
will be shown on the user interface.
In the analysis state the user can select the option of whose analysis he or she
wants to see and this will give corresponding next state of display result.
First, a simple working system implementation only a few basic features is built and then that is
delivered to the customer. Then thereafter many successive iteration are implemented and delivered
to the customer until the desired system is realised. At any time, the plan is made just for the next
increment and not for any kind of long term plans. Therefore it is easier to modify the version as per
the need of the customer. The development team first undertake to develop core features of the
system.
5.6.2. Implementation
Python is a highly general purpose and a very popular programming language. Python
programming language is being used in web development, machine learning
application, along with all cutting-edge technology in the software industry. Python
programming language is very well suited for beginners.
1. python is currently the most widely used multi-purpose, high level programming
language.
2. Python allows programming in object oriented and procedural paradigm.
3. Python programs generally are smaller than other programming languages like
Java. Programmers have to type a relatively less and indention requirement of
the language makes them readable all the time.
4. Python language is being used by almost all technician companies like Google,
Amazon, Facebook, Instagram, Dropbox, Uber…. Etc.
5.6.2.1. Software requirements for developing application
Jupyter notebook
VS code
Technologies
Python and its libraries (streamlit)
ML algorithm
NLTK
5.7. Testing
Testing is the major quality control that can be used during software development. Its
basic function is to detect the errors in the software. During requirement analysis and
design, the output is the document that is usually textual and non-executable. After the
coding phase, a computer program is available that can be executed for testing process
purposes.
5.7.2. Testing Methods & Strategies used along with Test Data
Software Testing Strategies : Software testing is defined as an activity to check
whether the actual results match the expected results and to ensure that the software
system is defect free. It involves execution of a software component or system component
to evaluate one or more properties of interest. Software testing also helps to identify
errors, gaps or missing requirements in contrast to the actual requirements. It can be either
done manually or using automated tools.
In simple terms, Software Testing means Verification of Application under Test (AUT).
1. Functional Testing
2. Non-Functional Testing
1. Functional Testing
Functional testing is defined as a type of testing which verifies that each function of the
software application operates in conformance with the requirement specification. This
testing involves checking of User Interface, APIs, Database, security client or server
application and functionality of the Application under Test. The testing can be done either
manually or using automation.
2. Non-Functional Testing
Non-functional testing is defined as a type of software testing to check non-
functional aspect of a software application. It is designed to test the readiness of a
system as per non-functional parameters which are never addressed by functional
testing. An excellent example of a non-functional test would be to check how many
people can simultaneously login into a software. Non-functional testing is equally
important as a functional testing and affects client satisfaction. Non-functional
testing should increase usability, efficiency and maintainability.
6. CONCLUSION &
FUTURE WORK
6.1. Conclusion