Welcome to Scribd!

0% found this document useful (0 votes)

10 views

Natural Language Processing

Uploaded by

This document discusses natural language processing and sentiment analysis of restaurant reviews. It imports data, preprocesses text by removing stopwords and applying stemming, creates bag-of-words representations, trains naive bayes and decision tree classifiers on the data, and evaluates the models using accuracy on a held-out test set. Key steps include loading and cleaning a dataset of restaurant reviews, vectorizing the text into feature vectors, training classifiers on 80% of the data and evaluating on the remaining 20%, and reporting the confusion matrices and accuracies of the naive bayes and decision tree models.

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Natural Language Processing

Uploaded by

shivaybhargava33

0% found this document useful (0 votes)

10 views5 pages

Original Description:

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Download as docx, pdf, or txt

0% found this document useful (0 votes)

10 views5 pages

Natural Language Processing

Uploaded by

shivaybhargava33

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Download as docx, pdf, or txt

Jump to Page

You are on page 1of 5

Search inside document

Natural Language Processing

Install nltk
conda install -c anaconda nltk

Data Set: Restaurant_Reviews.tsv (Tab Separated File)

Import Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import os

Import Data Set

os.chdir('C:\\Noble\\Training\\Deep Learning\\Training\Data\\')
os.getcwd()
# \t – for tab separated
# quoting = 3 – ignore “” from processing
dataset = pd.read_csv('Restaurant_Reviews.tsv', delimiter = '\t', quoting = 3)
dataset

Get one row from data set – example line 5

dataset['Review'][5]

To Print / View all stop words

import nltk # for stop words
from nltk.corpus import stopwords
nltk.download('stopwords')
all_stopwords = stopwords.words('english')
print (all_stopwords)

Cleaning the Data Set

import re
# re – Regular expression - https://docs.python.org/3/library/re.html
import nltk # for stop words
nltk.download('stopwords') # importing all stopwords
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer # For applying steming in the
dataset , to get the root of the word
corpus = [] # create a list to store all cleaned words
for i in range(0, 1000):
# dataset['Review'][i] - source data to prcess - i th record in the data
review = re.sub('[^a-zA-Z]', ' ', dataset['Review'][i]) # Replace punctuations
with space, other than letters replace with space
review = review.lower()
review = review.split() # split into different words
ps = PorterStemmer() # get root words
all_stopwords = stopwords.words('english') # get english stop words
all_stopwords.remove('not') # Remove “not” from stop words
review = [ps.stem(word) for word in review if not word in set(all_stopwords)]
review = ' '.join(review)
corpus.append(review)

Print Corpus
print (corpus)

To check Number of Distinct Words

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer() # 1500 is decided by statement len(X[0]). Fist execute
without max features
X = cv.fit_transform(corpus).toarray()
len(X[0])

Create a Bag of Words (tokenization)

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features = 1500) # 1500 is decided by statement
len(X[0]). Fist execute without max features
X = cv.fit_transform(corpus).toarray()
y = dataset.iloc[:, -1].values # this is dependent variable
print(len(X[0])) # this gives me the max_features count
print (X)
print (y)

Train Test Split

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20,
random_state = 0)

Print Size
print (X.shape)
print (X_train.shape)
print (X_test.shape)

Create Naïve Bayce Algorithms

from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)

Prediction
y_pred = classifier.predict(X_test)
Print Result Actual and Predict
print(np.concatenate((y_pred.reshape(len(y_pred),1),
y_test.reshape(len(y_test),1)),1))

Confusion Matrix to print Accuracy

from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

Create Decision Tree Classifier

from sklearn.tree import DecisionTreeClassifier

dt= DecisionTreeClassifier()
dt.fit(X_train, y_train)
dt_pred = dt.predict(X_test)
cm = confusion_matrix(y_test,dt_pred)
print(cm)
accuracy_score(y_test,dt_pred)

This Study Resource Was
Document5 pages
This Study Resource Was
MALLUPEDDI SAI LOHITH MALLUPEDDI SAI LOHITH
No ratings yet
ANSYS Fluent Migration Manual 18.2
Document48 pages
ANSYS Fluent Migration Manual 18.2
Panda Hero
No ratings yet
Shreya Srivastava-27
Document3 pages
Shreya Srivastava-27
saurabhsin6294
No ratings yet
Python Code Examples
Document30 pages
Python Code Examples
Asaf Katz
No ratings yet
Source Code Python Jemmy
Document7 pages
Source Code Python Jemmy
Fadilah Riczky
No ratings yet
Bag of Words
Document1 page
Bag of Words
Mohith Kumar Narahari
No ratings yet
Programming Questions
Document5 pages
Programming Questions
Hari Sree. M
No ratings yet
NLP Lab Manual
Document15 pages
NLP Lab Manual
shalima
No ratings yet
Aped For Fake News
Document6 pages
Aped For Fake News
Bless Co
No ratings yet
IR - 754 All Practical
Document21 pages
IR - 754 All Practical
754Durgesh Vishwakarma
No ratings yet
Natural Language Processing
Document17 pages
Natural Language Processing
coding ak
No ratings yet
Medical Text Classifier GabrieldeOlaguibel
Document12 pages
Medical Text Classifier GabrieldeOlaguibel
gabriel-l
No ratings yet
Design A Neural Network For Classifying Movie Reviews
Document5 pages
Design A Neural Network For Classifying Movie Reviews
hxd3945
No ratings yet
Python FAT2
Document7 pages
Python FAT2
gta579035
No ratings yet
ASTW RA03 PracticalManual
Document18 pages
ASTW RA03 PracticalManual
Diksha Nasa
No ratings yet
Email Spam Classifier
Document22 pages
Email Spam Classifier
phenomenal beast
No ratings yet
Python Lab Manual
Document19 pages
Python Lab Manual
Rahul Yadav
No ratings yet
تجربة كود
Document3 pages
تجربة كود
altzoro2018
No ratings yet
Machine Learning
Document54 pages
Machine Learning
Jacob
No ratings yet
R语言基础入门指令 (tips)
Document14 pages
R语言基础入门指令 (tips)
s2000152
No ratings yet
Python Project
Document2 pages
Python Project
bebshnnsjs
No ratings yet
Aryan Cs Project
Document28 pages
Aryan Cs Project
aryan12gautam12
No ratings yet
NLP - Practical List
Document14 pages
NLP - Practical List
Yash Amin
No ratings yet
ML Practical 205160694034
Document33 pages
ML Practical 205160694034
09Samrat Bikram Shah
No ratings yet
IR Journal (Printable)
Document20 pages
IR Journal (Printable)
krii24u8
No ratings yet
Python Review Code
Document26 pages
Python Review Code
Yanet Sivipaucar Romero
No ratings yet
Python FAT2
Document7 pages
Python FAT2
gta579035
No ratings yet
Assignment 2.1.2 Image Augmentation
Document8 pages
Assignment 2.1.2 Image Augmentation
Hockhin Ooi
No ratings yet
Sahil NLP
Document16 pages
Sahil NLP
Shubham Mishra Ji
No ratings yet
Ass
Document5 pages
Ass
Taqwa Elsayed
No ratings yet
CS Practical File 2023-24
Document51 pages
CS Practical File 2023-24
apnshayar
No ratings yet
Pattern Recognition
Document26 pages
Pattern Recognition
Aryan Attri
No ratings yet
NLP - Cheatsheet
Document10 pages
NLP - Cheatsheet
ADITYA MANWATKAR
No ratings yet
6 - Text Vectorization-CSC688-SP22
Document5 pages
6 - Text Vectorization-CSC688-SP22
Crypto Genius
No ratings yet
CS Practical File 2023-24
Document49 pages
CS Practical File 2023-24
Souvik JEE 2024
No ratings yet
3 CS Practical File PYTHON PROGRAM
Document48 pages
3 CS Practical File PYTHON PROGRAM
gsss lapathy
No ratings yet
3
Document5 pages
3
shifaansari1975
No ratings yet
NLP Tushar
Document21 pages
NLP Tushar
Yash Amin
No ratings yet
Big Data Merged
Document7 pages
Big Data Merged
Ingame Id
No ratings yet
Presentation Linq STC
Document82 pages
Presentation Linq STC
Juan Carlos Acosta
No ratings yet
Ir Practical
Document13 pages
Ir Practical
Ravishankar Gautam
No ratings yet
Sample Test - Solved
Document6 pages
Sample Test - Solved
Bích Ngọc
No ratings yet
H7 W5 NLP - Merged
Document17 pages
H7 W5 NLP - Merged
Sanathan
No ratings yet
CS Practical File 2023-24 (Python and SQL)
Document52 pages
CS Practical File 2023-24 (Python and SQL)
jitendratyagi2005
No ratings yet
SPCC Printout Alok
Document23 pages
SPCC Printout Alok
bihod71717
No ratings yet
Cabico Tan
Document11 pages
Cabico Tan
jaydee cabico
No ratings yet
Power Shell Tutorial
Document31 pages
Power Shell Tutorial
Abhilash V Pillai
No ratings yet
GONE WITHOUT GOODBYE CHORDS (Ver 2) by Brian Littrell at Ultimate-Guitar
Document25 pages
GONE WITHOUT GOODBYE CHORDS (Ver 2) by Brian Littrell at Ultimate-Guitar
MicroCell Atc
No ratings yet
Edx Course Lab Programs
Document19 pages
Edx Course Lab Programs
dl0395736
No ratings yet
PRJ Car Price Prediction For Data Science
Document10 pages
PRJ Car Price Prediction For Data Science
shivaybhargava33
No ratings yet
SPCC Adheesh
Document16 pages
SPCC Adheesh
bihod71717
No ratings yet
7-Iris Species Classification and Naïve Bayes-NLP-SP22
Document3 pages
7-Iris Species Classification and Naïve Bayes-NLP-SP22
Crypto Genius
No ratings yet
Pythonass
Document8 pages
Pythonass
gta579035
No ratings yet
Machine Learning NLP LAB Sayak Mallick
Document4 pages
Machine Learning NLP LAB Sayak Mallick
Sayak Mallick
No ratings yet
Sample
Document6 pages
Sample
www.santhoshvjd123
No ratings yet
Testing in Python - Unit Test & Script
Document5 pages
Testing in Python - Unit Test & Script
E17CN2 Clc
No ratings yet
Fresco Code Python Application Programming
Document7 pages
Fresco Code Python Application Programming
TECHer YT
No ratings yet
Ass 8
Document2 pages
Ass 8
Taqwa Elsayed
No ratings yet
Funciones para Python
Document33 pages
Funciones para Python
Eduardo Páez Díaz
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Agent Banking
Document7 pages
Agent Banking
Naymur Rahman
100% (1)
SWOT Analysis
Document9 pages
SWOT Analysis
kato chris
No ratings yet
Diphenhydramine Drug Study
Document2 pages
Diphenhydramine Drug Study
Morrin Myles
No ratings yet
Fno Stocks
Document8 pages
Fno Stocks
Jerry Turtle
No ratings yet
Reksointernational Catalogue
Document2 pages
Reksointernational Catalogue
Alby Diantono
No ratings yet
Bussmann HV Fuse PDF
Document65 pages
Bussmann HV Fuse PDF
Fresman A. Sibarani
No ratings yet
Tutorial 12 Forward & Futures
Document2 pages
Tutorial 12 Forward & Futures
Henry Ng Yong Kang
No ratings yet
CFFS 4 Attitudes and Models To IRD
Document18 pages
CFFS 4 Attitudes and Models To IRD
Jeorge Quiboy
No ratings yet
United States Patent (19) : 22 Filed: Dec. 14, 1981
Document6 pages
United States Patent (19) : 22 Filed: Dec. 14, 1981
map vitco
No ratings yet
Soc Project Case Study DP
Document3 pages
Soc Project Case Study DP
Its me Unicorn
No ratings yet
Teaching Underprivileged Student
Document7 pages
Teaching Underprivileged Student
api-319065570
100% (1)
Moisture Content of Coal
Document4 pages
Moisture Content of Coal
Saad Ahmed
100% (1)
2.0 The Poultry Industry Situation Problems and Prospects
Document46 pages
2.0 The Poultry Industry Situation Problems and Prospects
Bernard Flores Belleza
No ratings yet
Free Guy
Document2 pages
Free Guy
roger rojas
No ratings yet
Roma and The Question of Self-Determination
Document30 pages
Roma and The Question of Self-Determination
victoriamss
No ratings yet
August2011journal Double Bass
Document108 pages
August2011journal Double Bass
RBMusik
100% (1)
Alternating Current:: Current Is of Two Types: A.C and D.C: D.C: A.C
Document15 pages
Alternating Current:: Current Is of Two Types: A.C and D.C: D.C: A.C
Mohit Sahu
No ratings yet
Who Lansia 2020
Document10 pages
Who Lansia 2020
Nurul Afny
No ratings yet
Urban Heritage Requirements
Document2 pages
Urban Heritage Requirements
mmmmmmmmmmm
No ratings yet
Graphic Organizer For Matter
Document17 pages
Graphic Organizer For Matter
api-233194737
No ratings yet
Maths Level 1 - Chapter 1 Learner Materials
Document28 pages
Maths Level 1 - Chapter 1 Learner Materials
Marwan H. Ghaly
No ratings yet
OVR T2 3L 40-320 P TS U Surge Protective Device
Document3 pages
OVR T2 3L 40-320 P TS U Surge Protective Device
David Casco
No ratings yet
Terms of Use: Mssocialstudiesteacher
Document10 pages
Terms of Use: Mssocialstudiesteacher
Ivelisse M Torres Diaz
No ratings yet
Us Nop Organic Compliant Materials 20171121
Document3 pages
Us Nop Organic Compliant Materials 20171121
Layflo
No ratings yet
Section:01, Seat:8, Milady Ice Cream
Document248 pages
Section:01, Seat:8, Milady Ice Cream
Ramida Inthasaen
100% (1)
Back To Basics Inventory
Document9 pages
Back To Basics Inventory
leslymairman
No ratings yet
Data Protection Advisor Fundamentals SRG PDF
Document58 pages
Data Protection Advisor Fundamentals SRG PDF
Kevin Yu
No ratings yet
QUOTATION
Document1 page
QUOTATION
ICAD-DXB AL AHSA
No ratings yet