Exploratory Data Analysis Main Concepts
Exploratory Data Analysis Main Concepts
Exploratory Data Analysis Main Concepts
Distribution of charges
Convert Categorical columns to numerical Normalization
data["charges"].plot(kind="hist")
plt.title("Distribution of charges") gender = {'male':0, 'female':1} data_max = data.max()
plt.xlabel("Charges") data['sex'] = data['sex'].apply(lambda x: gender[x])
Data Science Life Cycle plt.ylabel("Frequency") data = data.divide(data_max)
plt.show()
smokers = {'no':0, 'yes':1} The idea is to divide each column by
data['smoker'] = data['smoker'].apply(lambda x: its maximum value.
smokers[x])
Correlation between smoking and cost of
treatment
smokers = data[(data.smoker == "yes")] Get smokers
non_smokers = data[(data.smoker == "no" Get non smokers 5 Model Training and testing
)]
fig = plt.figure(figsize=(12,5))
ax = fig.add_subplot(121)
Create the figure Data Splits
1st subplot smokers
ax.hist(smokers["charges"]) Smokers histogram
ax.set_title('charges for smokers') Set subplot title X = data.iloc[:,0:-1].values Store all columns except last one as inputs in X