Nothing Special   »   [go: up one dir, main page]

3vc16cs034-k R Tejaswini

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

JNANA SANGAMA, BELAGAVI-590018

An
Internship Report
“SENTIMENT CLASSIFICATION USING N-GRAM
IDF ”
COMPUTER SCIENCE AND ENGINEERING

Submitted by

K R TEJASWINI

USN: 3VC16CS034

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


ACCREDITED BY NATIONAL BOARD OF ACCREDITATION
RAO BAHADUR Y MAHABALESWARAPPA ENGINEERING
COLLEGE

ACCREDITED BY NAAC WITH B++


CANTONMENT, BALLARI-583104, KARNATAKA
2019 – 2020
VEERASHAIVA VIDYAVARDAHKA SANGHA ’S
RAO BAHADUR Y MAHABALESWARAPPA ENGINEERING
COLLEGE
ACCREDITED BY NAAC WITH B++
CANTONMENT, BALLARI-583104, KARNATAKA
2019 – 2020

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CERTIFICATE
Certified that Internship work entitled “SENTIMENT
CLASSIFICATION USING N-GRAM IDF” carried out by K R
Tejaswini bearing USN: 3VC16CS034, in partial fulfillment for the award
of Bachelor of Engineering in Computer Science and Engineering of the
Visvesvaraya Technological University, Belgaum during the year 2019-
2020. It is certified that all corrections/suggestions indicated for internal
assessment have been incorporated in the report deposited in the library. The
internship report has been approved as it satisfies the academic requirements
in respect of internship work prescribed for the said Degree.

…………………………..
…………………………
Signature Of Internship Co-Ordinator Signature of
HOD
Name of Examiners: Signature:
1)
2)
VEERASHAIVA VIDYAVARDAHKA SANGHA ’S
RAO BAHADUR Y MAHABALESWARAPPA ENGINEERING
COLLEGE
CANTONMENT, BALLARI-583104, KARNATAKA
2019 – 2020

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

DECLARATION

I, K R TEJASWINI, student of Bachelor of Engineering in Computer


Science and Engineering, Rao BahadurY. Mahabaleshwarappa Engineering
College, Ballary I here by declare that the dissertation entitled
“SENTIMENT CLASSIFICATION USING N-GRAM IDF ” embodies
the report of my internship work carried out independently by me during
7thsemester BE in Computer science and Engineering under the Supervision
and guidance of Shivkumar , Computer Science and Engineering
Department, RYMEC Ballari and GVS India private limited Hyderabad,
Sravani valmiki Project Manager.This work has been submitted in the
partial fulfillment for the award of Bachelor of Engineering in Computer
Science and Engineering by Visvesvaraya University, Belgaum during the
academic year 2019-2020.

Date: Name:K RTEJASWINI


PLACE: Ballari USN: 3VC16CS034
ACKNOWLEDGEMENT

I would like to express our regards and acknowledgement to all those who helped
in making this seminar possible.

I am grateful to the Principal Dr. K Veeresh for providing facilities and untiring
zeal, which constantly inspired me towards the attainment of everlasting
knowledge throughout the course.

I am deeply indebted to Dr. T Hanumantha Reddy, HOD of Computer Science


and Engineering department for the valuable suggestions and constant
encouragement provided for the successful completion of the seminar.

I would like to thank our guide, Shivakumar , Assistant Professor of

Computer Science and Engineering department for the constant guidance for the

successful completion of project.

Finally, I would like to thank all the staff members of Computer Science and
Engineering department for their guidance and support. I am also thankful to my
family and friends who continue to give me best support

NAME: K R TEJASWINI
USN: 3VC16CS034
TABLE OF CONTENTS
Chapter. Sub Serial Page
Content
NO Number Number
1 About Organization
1.1 Brief History 1
1.2 Company Services 2-3
1.3 Company Products 3
1.4 Company Management Directories 3
1.5 Company Domains 4-5
1.6 Client and Present Projects 5
2 About The Department
2.1 Introduction 6
3 Internship Domain
3.1 Introduction 7-10
3.2 Technology and Libraries Used 11
3.3 Tools Used 12-13
4 Task Performed
4.1 Technical Activities 14-16
4.2 Non Technical Activities 17
5 Internship Outcome
5.1 Project Title – Sentiment Classification 18
5.2 Internship Summary 18
5.3 Snapshots 20-24
5.4 Conclusion 24
Bibliography 25
CHAPTER 1
ABOUT THE ORGANIZATION

1.1Brief History About the Organization


GVS INDIA PVT. LTD. Was started in 2016 and put its main focus on software
development and services. Later on, it has started providing IOT based security systems
for the hotels and apartments. At present, GVS INDIA is about to launch two new
products into the market. Now the Company is looking ahead in working with Artificial
Intelligence and Machine Learning.Vision of the organization is to achieve global
recognition as a provider of Information Technology (IT) solutions to clients by
leveraging our core strengths.Mission of the organization is to achieve our business
objectives by providing state-of-art technology solutions to our clients, maintain and
expand our tradition of “Excellence through Quality” policy.

After the successful completion of the project, the team started approaching the clients
who were in need. The company got couple of good clients n started serving them. That
is how the company started generating the revenue. Since the team members were
experts in Java, python,Machine learning and Android, the company simultaneously
started to develop websites and few latest apps needed for the clients.

1.2 Company Services

Web design and development:

We not only design the online web but also we do web design and development adding
various features of multimedia technology into it and come up with the best and unique
low cost outcome at very short duration of time.

IT Service

Choosing the Information technology as one of the service area we provide all the
necessary service related to it.
Server Maintenance

Knowing the importance of maintenance of servers in the website hoisting and online
webs with multimedia we do have our own separate server and we do maintain them.

Pharma

Providing the product for on of the leading pharmaceutical to purchase the medicine
online.

Food Ordering System

As we need to following the trending the online food by developing one more new
product

IT Training

Knowing the importance of skilled trainers in the Industry we do offer Industrial Training
for the interested persons and train them with utmost knowledge and latest skill and
making them confident enough to face the challenges of the IT world.

Online Web and MultimediaWe not only design the online web but also we do online
web design and development adding various features of multimedia technology into it
and come up with the best and unique low cost outcome at very short duration of time.

Cloud Computing

We do Cloud Computing and related services which is the practice of using a network of
remote servers hosted on the Internet to store, manage, and process data, rather than a
local server or a personal computer.

Matrimonial SiteAs we know marriage is one of the main event in our lives, but in
market we have some existing apps because of slow service launching new app a smart
role in the development of any Business either small scale or large scale.
Project Management

We do Project management which is the Process and activity of planning, organizing,


Motivating, and controlling resources, procedures And protocols to achieve specific
goals in Scientific or daily problems effectively.

1.3Company Products

 Smart Surveillance system


 Electric Wheel chair
 Safety and Security Systems
 Biometrics
 Oxygen concentrator
 Personal safety Equipment’s
 Software Applications
 Website Design and Development

1.4 Company Management Directory

Founder -Gudluri Venu


Technical Head R&D-Mr. Vybhav
Chairpersons -Mr. Mohammed Hussain, Ms. Fatima Zahra.
HR -Nagamani
General Manager -Mohammed yaseen
Project Manager -Sravani Valmiki
Secretary - Banu
Board of Directors- Founder, Co-founder followed by Technical heads of all
departments.
1.5 Company Domains

1.6 Internet of Things (IOT)


Real time monitoring of enterprise industries requirements remotely, safely, securly and
efficiently

Machine & Deep Learning


Building the custom algorithums to recognise the person by his face, Ratina and
Fingerprint.

Java Enterprise Application Development


Using the latest trending versions of the Java enterprise versions provideing the solutions
for complex solutions efficiently with grater speeds.

UI Moderization
Using the cutting edge technologies to create the most powerful visual experiance to the
end users.

Hybrid and Native Applications


Creating the platform independent applications that should work irrespective of the OS of
the mobiles also making native application depends on the platform they choose.

Security Systems

Security systems which are used in apartments, schools, collages etc. it provides security
in situations like, if any fire occurs or if smoke is generated then it detects and sounds an
alarm and it releases the water to stop the fire. This security system is related to
IOTThese type of sensors are very common and are found either wired directly to an
alarm control panel, or they can typically be found in wireless door or window contacts
as sub-components.
Food Ordering System

Now a days we have many applications for ordering food.It is quite different when
compared to existing systems. The main aim of this app is to collect the food which is
about to get wasted in any restaurants, hostels etc, and deliver that food to orphanages.In
this app we are using java as backend and in java they are using a technology called
spring boot micro services and for communications they are using android app and IOS
app.Food delivery riders do not usually get any insurance cover or sick pay, since they
are independent contractors. Deliveroo chose to give the riders insurance in the United
Kingdom.

HR Management
The Company has a separate HR department for training and Recruiting Purpose. We do
offer HR management skills for the needed.

1.7 Clients and Present Projects


Clients:

 Next Power Systems Pvt Ltd.


 NSK Electronics.
 Gray Logix.
 Indian Army.
 Indian Air Force.

Present Projects:

 The company presently working for Next Power Systems Pvt Ltd to fulfill their
requirement with respect the advancement in Network Towers. The project
concentrates on providing 24/7 power supply, switching between Ac supply, and
Battery, Fault monitoring and RTC. The project is near in completion.The
company is also working on Embedded products and Android apps in the field of
E-Commerce.
CHAPTER 2
ABOUT THE DEPARTMENT

2.1 Introduction

GVS INDIA PVT. LTD.


Was started in 2016 and put its main focus on software development and services. To
achieve global recognition as a provider of Information Technology (IT) solutions to
clients by leveraging our core strengths.
To achieve our business objectives by providing state-of-art technology solutions to our
clients, maintain and expand our tradition of “Excellence through Quality” policy.Later
on, it has started providing IOT based security systems for the hotels and apartments. At
present, GVS INDIA is about to launch two new products into the market. Now the
Company is looking ahead in working with Artificial Intelligence and Machine Learning.
There are many uncertainties and challenges which companies must face throughout the
process. The use of best practices and the elimination of barriers to communication are
main concerns.
CHAPTER 3
INTERNSHIP DOMAIN

3.1 Introduction

Machine learning is a method of data analysis that automates analytical model building.
It is inherently different rather than pushing the commands by programmer regarding
how to solve; it explains how to proceed towards learning to solve the problem on its
own.These technologies are widely used in projects including Spelling correction in web
search engines, Analysis of information from IOT devices, Real-time language
translation and much more.
Machine learning algorithms are replacing a large amount of the jobs across the world, in
the upcoming years. The algorithms can be broadly classified as Supervised,
Unsupervised, Reinforcement Learning and others on the basis of their different
categories.
Fig 3.1 Classification of Machine Learning

Supervised Machine Learning Algorithm


These are the algorithms that work on predictions and search for patterns on given set of
samples. Supervised Machine Learning Algorithms attempt to render relationships and
dependencies between the target prediction output and the input features. In this, we start
from input variables (x) and an output variable (Y) and try to map functions from the
input to the output so that they establish a relationship which can be used for prediction.
There are some common algorithms that lie under the umbrella of a supervised algorithm
such as- Linear regression for regression problems, Random forest for classification and
regression problems, Support vector machines, Nearest Neighbour and others.
Supervised learning is commonly used in classification problems such as- Digit
recognition, Speech recognition, Diagnostics, Identity Fraud detection etc and Regression
problem such as weather forecasting, estimating life experience, population growth
prediction etc.

Unsupervised Machine Learning Algorithm


Unsupervised machine learning algorithms arrange the data into a group of clusters. It
describes its structure and makes complex data look simple and organised for analysis.
Unsupervised learning takes place when you have no labelled data available for training.
It is the basic type algorithm where you only have input data and no coinciding output
variables. These are called so because there is no corresponding output to a particular
input. Their problems can be further grouped into clustering to discover inherent
grouping and association problems. Some popular examples of unsupervised learning
algorithms are k-means for clustering problems used in recommender systems, customer
segmentation and targeting marketing and dimensionality reduction problem for big data
visualisation, feature elicitation, structure discovery etc.

Reinforcement Machine Learning Algorithm


These algorithms forward an action according to the data point and later assess the
decision. Algorithm utilises the observations collected from the interaction and take
actions so as to minimise the risk and maximise the benefits. The algorithm learns in an
iterative fashion. Common Algorithms that come under the reinforcement are Q-
Learning, Deep Adversarial Networks, and Temporal Difference. Algorithm is applicable
in the field of Game AI, skill acquisition, learning tasks, robot navigation and real-time
decision.

3.2 Technologies and Libraries Used

3.2.1 Python

Python is an object-oriented programming language created by Guido Rossum in 1989. It


is ideally designed for rapid prototyping of complex applications. It has interfaces to
many OS system calls and libraries and is extensible to C or C++. Many large companies
use the Python programming language include NASA, Google, YouTube, BitTorrent,
etc.

Python programming is widely used in Artificial Intelligence, Natural Language


Generation, Neural Networks and other advanced fields of Computer Science. Python
had deep focus on code readability & this class will teach you python from basics.

3.2.2Numpy

NumPy is a very popular python library for large multi-dimensional array and matrix
processing, with the help of a large collection of high-level mathematical functions. It is
very useful for fundamental scientific computations in Machine Learning. It is
particularly useful for linear algebra, Fourier transform, and random number capabilities.
High-end libraries like TensorFlow uses NumPy internally for manipulation of Tensors.

3.2.3Flask

Flask is a lightweight WSGI web application framework. It is designed to make getting


started quick and easy, with the ability to scale up to complex applications. It began as a
simple wrapper around Werkzeug and Jinja and has become one of the most popular
Python web application frameworks.

3.2.4 Matplotlib

Matpoltlib is a very popular Python library for data visualization. Like Pandas, it is not
directly related to Machine Learning. It particularly comes in handy when a programmer
wants to visualize the patterns in the data. It is a 2D plotting library used for creating 2D
graphs and plots. A module named pyplot makes it easy for programmers for plotting as
it provides features to control line styles, font properties, formatting axes, etc. It provides
various kinds of graphs and plots for data visualization, viz., histogram, error charts, bar
chats, etc.
3.2.5 TextBlob

TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple
API for diving into common natural language processing (NLP) tasks such as part-of-
speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and
more.

3.2.6PyMySQL

PyMySQL is an interface for connecting to a MySQL database server from Python. It


implements the Python Database API v2. 0 and contains a pure-Python MySQL client
library. The goal of PyMySQL is to be a drop-in replacement for MySQLdb.

3.2.7 MonkeyLearn

MonkeyLearn provides ready-to-use models for specific text analysis tasks such as
sentiment analysis, keyword extraction, or urgency detection.You just have to upload a
bunch of texts and tagging them manually. After you've fed your model a few examples,
it will start making predictions on its own.

3.2.8 CSV

A CSV file (Comma Separated Values file) is a type of plain text file that uses specific
structuring to arrange tabular data. Because it's a plain text file, it can contain only actual
text data—in other words, printable ASCII or Unicode characters. The structure of a CSV
file is given away by its name.
3.3 Tools Used

3.3.1 PYCHARM

Pycharm is an integrated development environment(IDE) used in computer


programming,specially for the python language.It is developed by the Czech company
JetBrains.It provide code analysis, a graophical debugger,an integrated unit
tester,integration with version control systems(VCSes), and supports web development
with Django as well as Data Science with Anaconda.Pycharm provides API so that
developers can write their own plugins to extend pycharm features.Several pluggins from
other JetBrains IDE also work Pycharm.There are more than 1000 plugins which are
compatible with Pycharm.

3.3.2 Web Browser

A web browser is a computer program that is used to access the web (to view webpages).
A browser can also be used to download files, send and receive email or short messages
across the internet.

Commonly-used web browsers, in order of market-share:

 Microsoft Internet Explorer (IE)


 Firefox (Mozilla)
 Safari (Mac only)
 Opera
 Netscape Navigator (NN)
CHAPTER 4
TASK PERFORMED

4.1 Technical Activities


Installation of Pycharm, python and mysql Importing python packages like
Flask ,Pymysql ,matplotlib,numpy, textblob etc. Creating a website which includes a
admin and users Classifying positive,negative and neutral words.Applying the classified
words to website operations. Admin task involves authorization of users. Users have to
register first and after authorization they can viewtheir profile,they can post ,send friend
requests,etc. The user can view their comments classified as positive,negative and netural
Users can view their classified comments via bar graph and Pigraph also.

4.1.1 Problem statement


Because of the poor accuracy of existing sentiment analysis tools trained with general
sentiment expressions, recent studies have tried to customize such tools with software
engineering datasets.However,it is reported that no tool is ready to accurately classify
sentences to negative,neutral,or positive,even if tools are specifically customized for
certain software engineering tasks.
4.1.2Experimental Design
In this project,we propose a machine learning based approach using n-gram features and
an automated machine learning tool for sentiment classification.Although n-gram phrase
are considered to be informative and useful compared to single words, using all n gram
phrases is not a good idea because of the large volume of data and many useless
features.To address this problem,we utilize n-gram IDF, a theoretical extensionof Inverse
Document Frequency(IDF) proposed by Hirakata et al.IDF measures how much
information the word provides;but it cannot handle multiple words.
4.1.3 Flow diagram

1.The DFD is also called as bubble chart.It is a simple graphical formalism that can be

used to represent a system in terms of input data to the system, various processing

carried out on this data, and the output data is generated by this system.

2.The data flow diagram (DFD) is one of the most important modeling tools.It is used to

model the system components.These components are the system process, that data used

by the process an external entity that interacts with the system and the information

flows in the system.


3.DFD show the in formation moves through the system and how it is modified by a

series of transformations. It is a graphical technique that depicts information flow and

and the transformations that are applied as data moves from input to output.
4.2 Non-Technical Activities

Performance Requirement

1. Dependability
The dependability of a computer system is a property of the system that equates to its
trustworthiness. Trustworthiness essentially means the degree of user confidence that the
system will operate as they expect and that the system will not 'fail' in normal use.
2.Availability
It is the ability of the system to deliver services when requested. There is no error in
theprogram while executing the program.
3. Reliability
The ability of the system to deliver services as specified. The program is compatible
withall types of operating system without any failure.
4. Safety
It is the ability of the system to operate without catastrophic failure. This program is user
friendly and it will never affect the system.
5. Security
It is the ability of the system to protect itself against accidental or deliberate intrusion.
CHAPTER 5
INTERNSHIP OUTCOME

5.1 Project Title: Sentiment classification

In this project,we propose a machine learning based approach using n-gram features and
an automated machine learning tool for sentiment classification.Although n-gram phrases
areconsidered to be informative and useful compared to single words, using all n gram
phrases is not a good idea because of the large volume of data and many useless
features.To address this problem,we utilize n-gram IDF, a theoretical extensionof Inverse
Document Frequency(IDF).IDF measures how much information the word provides;but
it cannot handle multiple words.

5.2 Internship Summary


Machine learning is a method of data analysis that automates analytical model building.
It is inherently different rather than pushing the commands by programmer regarding
how to solve; it explains how to proceed towards learning to solve the problem on its
own.These technologies are widely used in projects including Spelling correction in web
search engines, Analysis of information from IOT devices, Real-time language
translation and much more.
Machine learning algorithms are replacing a large amount of the jobs across the world, in
the upcoming years. The algorithms can be broadly classified as Supervised,
Unsupervised, Reinforcement Learning and others on the basis of their different
categories.Installation of Pycharm, python and mysql Importing python packages like
Flask ,Pymysql ,matplotlib,numpy, textblob etc. Creating a website which includes a
admin and users Classifying positive,negative and neutral words.Applying the classified
words to website operations. Admin task involves authorization of users. Users have to
register first and after authorization they can viewtheir profile,they can post ,send friend
requests,etc. The user can view their comments classified as positive,negative and netural
Users can view their classified comments via bar graph and Pigraph also.
Because of the poor accuracy of existing sentiment analysis tools trained with general
sentiment expressions, recent studies have tried to customize such tools with software
engineering datasets.However,it is reported that no tool is ready to accurately classify
sentences to negative,neutral,or positive,even if tools are specifically customized for
certain software engineering tasks.n this project,we propose a machine learning based
approach using n-gram features and an automated machine learning tool for sentiment
classification.Although n-gram phrases areconsidered to be informative and useful
compared to single words, using all n gram phrases is not a good idea because of the
large volume of data and many useless features.To address this problem,we utilize n-
gram IDF, a theoretical extensionof Inverse Document Frequency(IDF) proposed by
Hirakata et al.IDF measures how much information the word provides;but it cannot
handle multiple words.
5.3 Snapshots

Home page
Admin login

User details
Registration

User login
Profile

Send request
5.4 Conclusion
In this paper,we proposed a sentiment classification method using n-gram IDF and
automates machine learning.We apply this method on three datasets including question
and answers from Stack Overflow,reviews of mobile applications, and comments on jira
issue trackers.Our good classification performance is not based only on advanced
automated machine learning.N-gram IDF also worked well to capture dataset
specific,software-engineering related positive,neutral and negative expressions.Because
of the capability of extracting useful sentiment expressions with n-gram IDF,our method
can be applicable to various software engineering datasets.
BIBLIOGRAPHY.
[1] Y. Zhang and D. Hou, “Extracting problematic API features from forum discussions,” in
Proceedings of 21st International Conference on Program Comprehension (ICPC), 2013, pp.
142–151.

[2] S. Panichella, A. Di Sorbo, E. Guzman, C. A. Visaggio, G. Canfora, and H. C. Gall,


“How can i improve my app? classifying user reviews for software maintenance and
evolution,” in Proceedings of 31st IEEE International Conference on Software Maintenance
and Evolution (ICSME), 2015, pp. 281–290.

[3] M. Ortu, B. Adams, G. Destefanis, P. Tourani, M. Marchesi, and R. Tonelli, “Are bullies
more productive?: Empirical study of affectiveness vs. issue fixing time,” in Proceedings of
12th Working Conference on Mining Software Repositories (MSR), 2015, pp. 303–313.

You might also like