Nothing Special   »   [go: up one dir, main page]

Practical 4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

PRACTICAL 4.

ipynb - Colaboratory

PRACTICAL 4

Generate Probability Density Function (PDF) and Cumulative Distribution Function (CDF) for the given Iris data set to find the distribution of
various attributes of the dataset.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
#Load Dataset
iris = pd.read_csv('/content/Iris.csv')

iris.shape

(150, 6)

iris.describe()

Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm

count 150.000000 150.000000 150.000000 150.000000 150.000000

mean 75.500000 5.843333 3.054000 3.758667 1.198667

std 43.445368 0.828066 0.433594 1.764420 0.763161

min 1.000000 4.300000 2.000000 1.000000 0.100000

25% 38.250000 5.100000 2.800000 1.600000 0.300000

50% 75.500000 5.800000 3.000000 4.350000 1.300000

75% 112.750000 6.400000 3.300000 5.100000 1.800000

max 150.000000 7.900000 4.400000 6.900000 2.500000

plt.title('Species Count')
sns.countplot(iris['Species'])

C:\Users\gcet\anaconda3\lib\site-packages\seaborn\_decorators.py:36: FutureWarning: Pass the following


warnings.warn(
<AxesSubplot:title={'center':'Species Count'}, xlabel='Species', ylabel='count'>

Each species ( Iris virginica, setosa, versicolor) has 50 as it’s count

plt.figure(figsize=(17,9))
plt.title('Comparison between various species based on sapel length and width')
sns.scatterplot(iris['SepalLengthCm'],iris['SepalWidthCm'],hue =iris['Species'],s=50)

1/5
PRACTICAL 4.ipynb - Colaboratory

\Users\gcet\anaconda3\lib\site-packages\seaborn\_decorators.py:36: FutureWarning: Pass the following va


warnings.warn(
AxesSubplot:title={'center':'Comparison between various species based on sapel length and width'},
abel='SepalLengthCm', ylabel='SepalWidthCm'>

Correlation is a statistical method used to determine whether a linear relationship between variables exists and shows if one variable tends to
occur with large or small values of another variable.

#The correlation coefficients between measurement variables:
iris.groupby("Species").corr()

Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm

Species

Iris-setosa Id 1.000000 -0.033561 -0.066688 0.053253 0.087492

SepalLengthCm -0.033561 1.000000 0.746780 0.263874 0.279092

SepalWidthCm -0.066688 0.746780 1.000000 0.176695 0.279973

PetalLengthCm 0.053253 0.263874 0.176695 1.000000 0.306308

PetalWidthCm 0.087492 0.279092 0.279973 0.306308 1.000000

Iris-versicolor Id 1.000000 -0.269056 -0.081867 -0.189481 -0.168846

SepalLengthCm -0.269056 1.000000 0.525911 0.754049 0.546461

SepalWidthCm -0.081867 0.525911 1.000000 0.560522 0.663999

PetalLengthCm -0.189481 0.754049 0.560522 1.000000 0.786668

PetalWidthCm -0.168846 0.546461 0.663999 0.786668 1.000000

Iris-virginica Id 1.000000 -0.012549 0.130884 -0.204204 0.036446

SepalLengthCm -0.012549 1.000000 0.457228 0.864225 0.281108

SepalWidthCm 0.130884 0.457228 1.000000 0.401045 0.537728

PetalLengthCm -0.204204 0.864225 0.401045 1.000000 0.322108

PetalWidthCm 0.036446 0.281108 0.537728 0.322108 1.000000

Bi-variate Analysis

sns.pairplot(iris,hue="Species",height=4)

2/5
PRACTICAL 4.ipynb - Colaboratory

<seaborn.axisgrid.PairGrid at 0x7fceaa57d730>

Checking Correlation

plt.figure(figsize=(10,11))
sns.heatmap(iris.corr(),annot=True)
plt.plot()

3/5
PRACTICAL 4.ipynb - Colaboratory

<ipython-input-9-decc81dd4f85>:2: FutureWarning: The default value of numeric_only in DataFrame.corr is


sns.heatmap(iris.corr(),annot=True)
[]

Box plots to know about distribution

ig, axes = plt.subplots(2, 2, figsize=(16,9))
sns.boxplot( y="PetalWidthCm", x= "Species", data=iris, orient='v' , ax=axes[0, 0])
sns.boxplot( y="PetalLengthCm", x="Species", data=iris, orient='v' , ax=axes[0, 1])
sns.boxplot( y="SepalLengthCm", x= "Species", data=iris, orient='v' , ax=axes[1, 0])
sns.boxplot( y="SepalWidthCm", x= "Species", data=iris, orient='v' , ax=axes[1, 1])
plt.show()

4/5

You might also like