Exploratory Data Analysis Main Concepts

Uploaded by

Zeinab Hamzeh

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Exploratory Data Analysis Main Concepts

Uploaded by

Zeinab Hamzeh

0% found this document useful (0 votes)

12 views1 page

Original Title

DSMLCheatSheet-zaka

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

0% found this document useful (0 votes)

12 views1 page

Exploratory Data Analysis Main Concepts

Uploaded by

Zeinab Hamzeh

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

Jump to Page

You are on page 1of 1

Search inside document

Data Science & Machine Learning Cheat Sheet

1 Main Concepts 3 Exploratory Data Analysis 4 Data Preprocessing

Understand your data Removing missing data Removing unused Columns
rows = data.shape[0] Number of Samples data.isnull().sum()
Missing values in each data.drop("region", axis=1, inplace=True)
columns = data.shape[1] Number of Columns column
data.info() Data types, Missing values The idea is to remove columns that do not
Drop rows with missing
data.describe() Statistical description of columns contribute to our prediction. In our example,
data = data.dropna()
values region does not affect the cost charged.

Distribution of charges
Convert Categorical columns to numerical Normalization
data["charges"].plot(kind="hist")
plt.title("Distribution of charges") gender = {'male':0, 'female':1} data_max = data.max()
plt.xlabel("Charges") data['sex'] = data['sex'].apply(lambda x: gender[x])
Data Science Life Cycle plt.ylabel("Frequency") data = data.divide(data_max)
plt.show()
smokers = {'no':0, 'yes':1} The idea is to divide each column by
data['smoker'] = data['smoker'].apply(lambda x: its maximum value.
smokers[x])
Correlation between smoking and cost of
treatment
smokers = data[(data.smoker == "yes")] Get smokers
non_smokers = data[(data.smoker == "no" Get non smokers 5 Model Training and testing
)]
fig = plt.figure(figsize=(12,5))
ax = fig.add_subplot(121)
Create the figure Data Splits
1st subplot smokers
ax.hist(smokers["charges"]) Smokers histogram
ax.set_title('charges for smokers') Set subplot title X = data.iloc[:,0:-1].values Store all columns except last one as inputs in X

y = data.iloc[:,-1].values Store the last column as the output (label) in y

Machine Learning Framework Repeat subplot for non smokers
Next, we will apply these x_train, x_test, y_train, y_test = train_test_split(X, y, test_size Split dataset into 80/20
concepts to the medical =0.2, random_state=42)
cost prediction problem
as per the course
example, but they are Linear Regression Modeling
also applicable to other
machine learning model = LinearRegression() Deﬁne our regression model
problems.
model.ﬁt(x_train, y_train) Train our model

2 Data Loading Correlation between age and cost of treatment

Model Evaluation
plt.scatter(smokers["age"], smokers["charges"], color='r')
Import Python modules plt.scatter(non_smokers["age"], non_smokers["charges"], c print('Model score {}'.format(model.score(x_tes Evaluate the model based on score

import numpy as np Numpy olor='b') t,y_test)))

import pandas as pd Pandas plt.xlabel("Age")
import matplotlib.pyplot as plt Matplotlib plt.ylabel("Charges") Note that there are several ways to evaluate your model that you will see later on
from sklearn.model_selection import train Scikit learn plt.show() during other courses.
_test_split
from sklearn.linear_model import LinearR The idea is that in this phase, Feature importance
egression we can understand how the columns_names = data.columns[0:-1].values
Read and Visualize the data features are correlated
through different plots. features_importance = model.coef_
data = pd.read_csv(Path_to_data) Read CSV ﬁle in Pandas
data.head() Display ﬁrst 5 rows Correlation between BMI and cost of treatment
plt.barh(columns_names, features_importance)
plt.hist(obese["charges"], color='r')
plt.hist(overweight["charges"], color plt.title('Features Importance')
='y') plt.xlabel('importance')
plt.hist(healthy["charges"], color='g')
plt.hist(underweight["charges"], col plt.ylabel('feature')
or='b') plt.show()
plt.title("Charges distribution")
plt.xlabel("Charges")
plt.ylabel("Frequency")
plt.show() © 2021, Zaka AI, Inc. All Rights Reserved.

Final - DNN - Hands - On - Jupyter Notebook
Document6 pages
Final - DNN - Hands - On - Jupyter Notebook
Aradhana Mehra
0% (1)
Cheat Sheet Data Preprocessing Tasks in Pandas
Document2 pages
Cheat Sheet Data Preprocessing Tasks in Pandas
Andres Rincon
No ratings yet
EDA Cheat Sheet - Exploratory Data Analysis
Document2 pages
EDA Cheat Sheet - Exploratory Data Analysis
Vanshika Rastogi
No ratings yet
R Cheat Sheet Merged
Document35 pages
R Cheat Sheet Merged
Digitalfjord
100% (1)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
Document9 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
rameshb87
100% (3)
AutoCAD Plant3D 2014 - PIP Specs (Updated V2)
Document4 pages
AutoCAD Plant3D 2014 - PIP Specs (Updated V2)
Zvonko Bešlić
100% (1)
Import Import Def
Document2 pages
Import Import Def
HARSHITHA D
No ratings yet
Chapter 2 Data Structures in R
Document14 pages
Chapter 2 Data Structures in R
nailofar
No ratings yet
Data Science
Document60 pages
Data Science
Arya
100% (1)
Data & Variable Transformation: Recode and Transform Variables Summarise Variables and Cases Descriptives and Summaries
Document1 page
Data & Variable Transformation: Recode and Transform Variables Summarise Variables and Cases Descriptives and Summaries
ayrusurya
No ratings yet
CH 3
Document33 pages
CH 3
Rashi Mehta
No ratings yet
R Programming Cheat Sheet: by Via
Document2 pages
R Programming Cheat Sheet: by Via
Kimondo King
No ratings yet
Presentation 1
Document34 pages
Presentation 1
satishreddy71
No ratings yet
Datavischeatsheet
Document2 pages
Datavischeatsheet
rcg97.hd
No ratings yet
Data Science Basics Cheatsheet
Document1 page
Data Science Basics Cheatsheet
acutotu
67% (3)
Deep Learning With PyTorch 1
Document1 page
Deep Learning With PyTorch 1
Junsheng HU
No ratings yet
BDP Week3
Document31 pages
BDP Week3
pd9cnrpyfz
No ratings yet
1 - Introduction To Programming With R
Document13 pages
1 - Introduction To Programming With R
paseg78960
No ratings yet
Pandas DataFrame Notes
Document13 pages
Pandas DataFrame Notes
alainvalois
67% (3)
Tutorial 4
Document8 pages
Tutorial 4
POEASO
No ratings yet
Importing The Files
Document14 pages
Importing The Files
Vijaya Banu
No ratings yet
01b Data Structures
Document16 pages
01b Data Structures
elkin farfan
No ratings yet
Data Analysis W Pandas
Document4 pages
Data Analysis W Pandas
x7jn4sxdn9
No ratings yet
Python-for-Data-Analysis (Pandas
Document31 pages
Python-for-Data-Analysis (Pandas
Naman Jain
No ratings yet
Cheat Sheet: The Pandas Dataframe Object: Column Index (DF - Columns)
Document6 pages
Cheat Sheet: The Pandas Dataframe Object: Column Index (DF - Columns)
Nirmala Shinde
No ratings yet
Beginners Python Cheat Sheet PCC Plotly PDF
Document2 pages
Beginners Python Cheat Sheet PCC Plotly PDF
ROBERTO CUJIA
No ratings yet
BMR Assignment: Tidyr
Document3 pages
BMR Assignment: Tidyr
Abel S John
No ratings yet
R Cheat Sheet (Updated)
Document13 pages
R Cheat Sheet (Updated)
Thảo Thanh
No ratings yet
ML p4
Document2 pages
ML p4
Nathon Mine
No ratings yet
Data Wrangling and Analysis
Document36 pages
Data Wrangling and Analysis
Ashish Antopazhunkaran
100% (1)
Python Data Science 101
Document41 pages
Python Data Science 101
consania
100% (1)
Cours BI - R
Document18 pages
Cours BI - R
Oumaima Lahlou
No ratings yet
R Cheat Sheet
Document4 pages
R Cheat Sheet
Haritha Atluri
No ratings yet
A Short List of The Most Useful R Commands
Document11 pages
A Short List of The Most Useful R Commands
cristiansolomon1754
No ratings yet
EDA - Exploratory Data Analysis
Document16 pages
EDA - Exploratory Data Analysis
spraga1995
No ratings yet
Mohit
Document19 pages
Mohit
Ayush Gupta
No ratings yet
DWDM - Lab Manual1
Document40 pages
DWDM - Lab Manual1
aathyukthas.ai20001
No ratings yet
Chapter 03 Visualization (R)
Document30 pages
Chapter 03 Visualization (R)
hasan
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
Document23 pages
Analysis Using Statistical: Introduction & Data Exploration
Izzue Kashfi
No ratings yet
Python 2.7 Quick Reference Sheet: ver 2.01 ʹ 110105 (sjd)
Document2 pages
Python 2.7 Quick Reference Sheet: ver 2.01 ʹ 110105 (sjd)
Kannada Kuvara
No ratings yet
MATLAB For Data Processing and Visualization Quick Reference
Document11 pages
MATLAB For Data Processing and Visualization Quick Reference
Edrian Pentado
No ratings yet
MATLAB Onramp Quick Reference
Document5 pages
MATLAB Onramp Quick Reference
mukundapriya1981
No ratings yet
Zelig For R Cheat Sheet: Plots Vectors
Document2 pages
Zelig For R Cheat Sheet: Plots Vectors
dadadad
No ratings yet
Bdo Co1 Session 4
Document43 pages
Bdo Co1 Session 4
s.m.pasha0709
No ratings yet
Rstudio Study Notes For PA 20181126
Document6 pages
Rstudio Study Notes For PA 20181126
Trong Nghia Vu
No ratings yet
R Cheat Sheet: 1. Basics 4. Input and Export of Data
Document4 pages
R Cheat Sheet: 1. Basics 4. Input and Export of Data
Rohit Raj Ranganathan
100% (1)
Imp Details
Document6 pages
Imp Details
Jyotirmay Sahu
No ratings yet
Organizing Accessing Data ML Cheat Sheet
Document2 pages
Organizing Accessing Data ML Cheat Sheet
todakrvs
No ratings yet
Deep-Learning-Keras-Tensorflow - 1.1.1 Perceptron and Adaline - Ipynb at Master Leriomaggio - Deep-Learning-Keras-Tensorflow
Document11 pages
Deep-Learning-Keras-Tensorflow - 1.1.1 Perceptron and Adaline - Ipynb at Master Leriomaggio - Deep-Learning-Keras-Tensorflow
me andan buscando
No ratings yet
Data Mining CSE-443: Ayesha Aziz Prova Lecturer, Dept. of CSE CWU
Document21 pages
Data Mining CSE-443: Ayesha Aziz Prova Lecturer, Dept. of CSE CWU
Dipty Sarker
No ratings yet
Handout - Spark Reference
Document2 pages
Handout - Spark Reference
rajasekhar
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
Document2 pages
Data Transformation With Data - Table: Cheat Sheet
breathtakingbehavior
No ratings yet
Data Transformation With Data - Table: Cheat Sheet
Document2 pages
Data Transformation With Data - Table: Cheat Sheet
frsalazar
No ratings yet
Pandas: Import
Document13 pages
Pandas: Import
hello
100% (1)
Data Transformation With Data - Table: Cheat Sheet
Document2 pages
Data Transformation With Data - Table: Cheat Sheet
pao pao
No ratings yet
MATLAB Onramp - Quick Reference
Document4 pages
MATLAB Onramp - Quick Reference
dnzed35
No ratings yet
Pra 5 ML
Document5 pages
Pra 5 ML
setose7825
No ratings yet
Time Series Cheat Sheet
Document1 page
Time Series Cheat Sheet
mylti8ball
No ratings yet
Time Series Cheat Sheet
Document2 pages
Time Series Cheat Sheet
Ashk
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Getting Started With Graph Analysis in Python With Pandas and Networkx
Document8 pages
Getting Started With Graph Analysis in Python With Pandas and Networkx
ante mitar
No ratings yet
First Order Logic: Artificial Intelligence COSC-3112 Ms. Humaira Anwer
Document15 pages
First Order Logic: Artificial Intelligence COSC-3112 Ms. Humaira Anwer
Khizrah Rafique
0% (1)
Beowulf Cluster
Document15 pages
Beowulf Cluster
Gilgamesh Project
No ratings yet
MK7850N
Document6 pages
MK7850N
kherriman
No ratings yet
Credits Microsoft Interactive CD Sampler Credits
Document7 pages
Credits Microsoft Interactive CD Sampler Credits
nn
No ratings yet
Solving Quad Eq Complete The Square Coloring Preview
Document2 pages
Solving Quad Eq Complete The Square Coloring Preview
Zerlynne Lovelle Sansaet Raz
No ratings yet
FST-7 Big and Ripped - 8 Weeks To An Olympia-Winning Physique
Document6 pages
FST-7 Big and Ripped - 8 Weeks To An Olympia-Winning Physique
mohamed ali
No ratings yet
Solar-Log 10 (Bi-Directional Meter) : Installation Manual
Document36 pages
Solar-Log 10 (Bi-Directional Meter) : Installation Manual
udhay
No ratings yet
Wii Ad
Document62 pages
Wii Ad
Luis Gustavo Felix Garcia
No ratings yet
A Gemstone GIS
Document5 pages
A Gemstone GIS
Ken Lam
No ratings yet
Experiment 8
Document5 pages
Experiment 8
Rohan 7
No ratings yet
Neckties
Document4 pages
Neckties
apurva_murty
No ratings yet
Solved Question Paper Computer Operator 2011 Samsad Sewa
Document13 pages
Solved Question Paper Computer Operator 2011 Samsad Sewa
ksunilb
No ratings yet
Optimus Ipro: India's 1 Iot Enabled Chimney
Document2 pages
Optimus Ipro: India's 1 Iot Enabled Chimney
imrakesh8014
No ratings yet
Back Propagation Algorithm
Document19 pages
Back Propagation Algorithm
ujwa prince
No ratings yet
ELG3135 Lab3
Document4 pages
ELG3135 Lab3
navyach2424
No ratings yet
Estudo de Caso 02
Document4 pages
Estudo de Caso 02
Ely Batista Do Rêgo Junior
No ratings yet
Reguler Functions in SQL
Document29 pages
Reguler Functions in SQL
pawan_32
No ratings yet
Create A Genshin Impact Characters (2.4) Tier List - TierMaker
Document1 page
Create A Genshin Impact Characters (2.4) Tier List - TierMaker
anna
No ratings yet
Aec Expt 7 - BJT CB Junction Transistor
Document10 pages
Aec Expt 7 - BJT CB Junction Transistor
Daisy Flower
No ratings yet
House Pricing Regression
Document11 pages
House Pricing Regression
nitin3078
No ratings yet
Alcatel Lucent 6210
Document47 pages
Alcatel Lucent 6210
Germanv99
No ratings yet
Kelley-Oliveira-Stanton Police Memorial Post ElectionReportFormPrint
Document1 page
Kelley-Oliveira-Stanton Police Memorial Post ElectionReportFormPrint
Ken Hagemann
No ratings yet
Free Pro Engineer Tutorials PDF
Document2 pages
Free Pro Engineer Tutorials PDF
Luis
100% (1)
Accelerated Failure Time Models: Patrick Breheny
Document25 pages
Accelerated Failure Time Models: Patrick Breheny
Erlan Saputraa
No ratings yet
2G 3G RFQ Document 01122012 Final V4
Document87 pages
2G 3G RFQ Document 01122012 Final V4
sotodol
100% (1)
Automated Clearance Management Systems For Wolkite University
Document14 pages
Automated Clearance Management Systems For Wolkite University
Moata Girma
100% (1)
Ebook Starting Up Product Photography PDF
Document46 pages
Ebook Starting Up Product Photography PDF
Julia
No ratings yet
Ks0071 Keyestudio Mini Tank (21 12cm
Document89 pages
Ks0071 Keyestudio Mini Tank (21 12cm
Octavio Castillo García
No ratings yet