Welcome to Scribd!

0% found this document useful (0 votes)

59 views

2) Theoretical Background: 2.1 EDA (Exploratory Data Analysis)

Uploaded by

This chapter defines the key parts of a machine learning project: exploratory data analysis (EDA), feature engineering, feature selection, and model building. EDA involves investigating the data through summary statistics and visualizations to discover patterns and relationships. Feature engineering transforms and extracts features from raw data. Feature selection reduces the number of input variables to improve model performance and reduce costs. Model building uses training data to construct a model that can make predictions on new data through learning patterns in the training data.

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

2) Theoretical Background: 2.1 EDA (Exploratory Data Analysis)

Uploaded by

Hussain Mujtaba

0% found this document useful (0 votes)

59 views7 pages

Original Title

2 Theoretical Background

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Download as docx, pdf, or txt

0% found this document useful (0 votes)

59 views7 pages

2) Theoretical Background: 2.1 EDA (Exploratory Data Analysis)

Uploaded by

Hussain Mujtaba

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Download as docx, pdf, or txt

Jump to Page

You are on page 1of 7

Search inside document

2) Theoretical Background

This chapter defines the following parts of the project, EDA(Exploratory Data Analysis), Feature
Engineering, Feature Selection, and Model Building and is the basis for the further project.
2.1 EDA (Exploratory Data Analysis)
Exploratory Data Analysis refers to the critical process of performing initial investigations on data so
as to discover patterns,to spot anomalies,to test hypothesis and to check assumptions with the help
of summary statistics and graphical representations.

It is a good practice to understand the data first and try to gather as many insights from it. EDA is
all about making sense of data in hand,before getting them dirty with it.

 In EDA we load the Dataset

 Describe that Data by using some Functions( df.describe(), df.info() )
 Find any missing values (df.isna() )
 Find out the outlires (using Histogram , Box plot )
 Visualize Data ( Using Matplotlib , Seaborn )
Figure 2.1 ( A Histogram plot)
The above one figure (Figure 2.1) a histogram tells us the various information about the
Variable(Feature or Column)
2.2 Feature Engineering

Feature Engineering is the Next Step in A DATA SCIENCE or MACHINE LEARNING Project
after EDA (Exploratory Data Analysis)

Feature engineering is the process of using domain knowledge to extract

features (characteristics, properties, attributes) from raw data. A feature is a property shared by
independent units on which analysis or prediction is to be done. Features are used by predictive
models and influence results.
In EDA we just get know about the Data ( missing Values, Outlires ) but in Feature Engineering
we Clean the data by Handling Missing Values , Removing Outlires.

Handling Missing Values

Drop missing values
Fill Missing Values by Mean, Median, Mode

Handling Outlires
Using Standard Deviation
Normal Distribution
IQR (Inter Quartile Range)

2.3 Feature Selection

Feature selection is the process of reducing the number of input variables when developing a
predictive model.

It is desirable to reduce the number of input variables to both reduce the computational cost of
modeling and, in some cases, to improve the performance of the model.
Statistical-based feature selection methods involve evaluating the relationship between each
input variable and the target variable using statistics and selecting those input variables that
have the strongest relationship with the target variable. These methods can be fast and
effective, although the choice of statistical measures depends on the data type of both the input
and output variables.

As such, it can be challenging for a machine learning practitioner to select an appropriate

statistical measure for a dataset when performing filter-based feature selection.

In this post, you will discover how to choose statistical measures for filter-based feature
selection with numerical and categorical data.

 There are two main types of feature selection techniques: supervised and unsupervised,
and supervised methods may be divided into wrapper, filter and intrinsic.

 Filter-based feature selection methods use statistical measures to score the correlation
or dependence between input variables that can be filtered to choose the most relevant
features.

 Statistical measures for feature selection must be carefully chosen based on the data
type of the input variable and the output or response variable.
2.4 Model Building

A machine learning model is built by learning and generalizing from training data, then
applying that acquired knowledge to new data it has never seen before to make predictions and
fulfill its purpose. Lack of data will prevent you from building the model, and access to data isn't
enough.

R LAB Exproling Data
Document6 pages
R LAB Exproling Data
Furkan Memmedov
100% (2)
EXP 10 - Plate Heat Exchanger Report.
Document17 pages
EXP 10 - Plate Heat Exchanger Report.
Muhammad Fawwaz
50% (2)
MFDM™ Ai
Document48 pages
MFDM™ Ai
Ayusman Panda
50% (4)
Machine Learning Part: Domain Overview
Document20 pages
Machine Learning Part: Domain Overview
surya prakash
No ratings yet
Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
Document6 pages
Feature Engineering: Short Study: Indian Institute of Space Science and Technology, Department of Mathematics
goci
No ratings yet
Unit- 3
Document12 pages
Unit- 3
J.K. Technology
No ratings yet
DS PPT Aman
Document9 pages
DS PPT Aman
Aman
No ratings yet
Comparartive
Document7 pages
Comparartive
Khandelwal Harshada
No ratings yet
SVMvs KNN
Document5 pages
SVMvs KNN
Look HIM
No ratings yet
Deep Learning Vocabulary
Document6 pages
Deep Learning Vocabulary
jaffar bikat
No ratings yet
UNITIV.BtechIot
Document43 pages
UNITIV.BtechIot
125003138
No ratings yet
Phase 2 Aiml
Document7 pages
Phase 2 Aiml
bhagyaspatil08
No ratings yet
DS Module 1 Notes
Document25 pages
DS Module 1 Notes
arjun.jadhaw
No ratings yet
PRACTICAL5
Document23 pages
PRACTICAL5
thundergamerz403
No ratings yet
Experiment No.: 9: T. Y. B. Tech (CSE) - II Subject: Open Source Lab-II
Document4 pages
Experiment No.: 9: T. Y. B. Tech (CSE) - II Subject: Open Source Lab-II
ASHISH MALI
No ratings yet
Advanced Data Analysis
Document30 pages
Advanced Data Analysis
raymatshanda
No ratings yet
Feature Engineering and Normalization
Document7 pages
Feature Engineering and Normalization
Niharika Khanna
No ratings yet
Machine Learning
Document30 pages
Machine Learning
hamoelsyed2005
No ratings yet
Exploratory Data Analysis EDA and Feature Engineering 10 Merged
Document99 pages
Exploratory Data Analysis EDA and Feature Engineering 10 Merged
Shreya Patil
No ratings yet
DSA Module 1 Notes
Document24 pages
DSA Module 1 Notes
gaganad.21.beai
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Data Science Pipeline, EDA & Data Preparation
Document14 pages
Data Science Pipeline, EDA & Data Preparation
willyamedome
No ratings yet
SML Updated UNIT-2
Document43 pages
SML Updated UNIT-2
22416
No ratings yet
FDS Unit 2
Document15 pages
FDS Unit 2
tejabikkili
No ratings yet
DM - MOD - 1 Part III
Document12 pages
DM - MOD - 1 Part III
sandrarajuofficial
No ratings yet
Unit 5 Pattern Recognition
Document10 pages
Unit 5 Pattern Recognition
Shayar Chauhan
No ratings yet
Data Mining unit-1 complete
Document45 pages
Data Mining unit-1 complete
Sandeep Nayal
No ratings yet
AI Unit 4
Document25 pages
AI Unit 4
evilgen7.try
No ratings yet
Solutions To DM I MID (A)
Document19 pages
Solutions To DM I MID (A)
jyothibellaryv
100% (1)
Tarp Da 3
Document7 pages
Tarp Da 3
Anurag Karki
No ratings yet
Unit 7 ML
Document33 pages
Unit 7 ML
Yuvraj Chauhan
No ratings yet
Chapter 5, Class 9 - AI
Document4 pages
Chapter 5, Class 9 - AI
Abhishek Singh Baghel
No ratings yet
FINAL_REVIEWlast_(2)
Document31 pages
FINAL_REVIEWlast_(2)
bharath naidu
No ratings yet
Ads Exp2 C35
Document9 pages
Ads Exp2 C35
sarveshpatil2833
No ratings yet
CS202 Assignment - 4- GIKI
Document3 pages
CS202 Assignment - 4- GIKI
hassandevolper123
No ratings yet
AML Individual Practical PaulY Vfinal
Document7 pages
AML Individual Practical PaulY Vfinal
Leonardo Hernanz
No ratings yet
Explain in Detail Different Types of Machine Learning Models?
Document14 pages
Explain in Detail Different Types of Machine Learning Models?
Sirisha
No ratings yet
Data Analytics for Process Engineers Prediction Control and Optimization
Document3 pages
Data Analytics for Process Engineers Prediction Control and Optimization
grcarjun2020
No ratings yet
Top 30 Data Analyst Interview Questions & Answers (2022)
Document16 pages
Top 30 Data Analyst Interview Questions & Answers (2022)
wesaltarron
No ratings yet
Report Conference
Document7 pages
Report Conference
sammymungai707
No ratings yet
18-Article Text-61-1-10-20200510
Document6 pages
18-Article Text-61-1-10-20200510
Ghi.fourteen Ghi.fourteen
No ratings yet
Arnav MLlab01
Document7 pages
Arnav MLlab01
Atomic Mole
No ratings yet
PCA and LDA Assignment
Document5 pages
PCA and LDA Assignment
nwaytk520
No ratings yet
Exploratory Data Analysis (EDA) Using Python
Document21 pages
Exploratory Data Analysis (EDA) Using Python
bpjstk vc
No ratings yet
20CB913 Machine Learning Module 2
Document52 pages
20CB913 Machine Learning Module 2
anant
No ratings yet
DWM Question Bank Solution
Document35 pages
DWM Question Bank Solution
122akshay2090
No ratings yet
Ipl Matches Documentation
Document28 pages
Ipl Matches Documentation
Nitish Kumar Mohanty
No ratings yet
copy
Document1 page
copy
Live Channel
No ratings yet
It - Kit 601 - Pes - SS - 31.05.2023
Document13 pages
It - Kit 601 - Pes - SS - 31.05.2023
srivastavarishi427
No ratings yet
Survey On Feature Selection in High-Dimensional Data Via Constraint, Relevance and Redundancy
Document4 pages
Survey On Feature Selection in High-Dimensional Data Via Constraint, Relevance and Redundancy
International Journal of Application or Innovation in Engineering & Management
No ratings yet
Class IX - Chapter 2 AI Project Cycle Notes
Document11 pages
Class IX - Chapter 2 AI Project Cycle Notes
Ammu Siri
50% (2)
MACHINE LEARNING 1-5 (Ai &DS)
Document60 pages
MACHINE LEARNING 1-5 (Ai &DS)
Amani yar Khan
100% (1)
Module-3 DSV
Document20 pages
Module-3 DSV
Swathi Y
No ratings yet
Group 5 - Smsma
Document17 pages
Group 5 - Smsma
abhilashmba22
No ratings yet
20 Questions On Feature Engineering and Eda
Document9 pages
20 Questions On Feature Engineering and Eda
rahul.guptaoct31
No ratings yet
Paper 4 Chandigarh PDF
Document4 pages
Paper 4 Chandigarh PDF
Rajeev Prithyani
No ratings yet
AI Unit-5
Document53 pages
AI Unit-5
Jyoti Mishra
No ratings yet
Class PPT - Unit2
Document139 pages
Class PPT - Unit2
anusad003
No ratings yet
Data Mining Chapter 1
Document12 pages
Data Mining Chapter 1
Rony saha
0% (1)
Group A Assignment No2 Writeup
Document9 pages
Group A Assignment No2 Writeup
403 Chaudhari Sanika Sagar
No ratings yet
copy
Document1 page
copy
Live Channel
No ratings yet
3.1 Dimensionality Reduction
Document24 pages
3.1 Dimensionality Reduction
Javada Javada
No ratings yet
AI Machine Learning - Practical Applications and Insights
From Everand
AI Machine Learning - Practical Applications and Insights
Anthony Joseph
No ratings yet
Measures of Central Tendency and Dispersion
Document4 pages
Measures of Central Tendency and Dispersion
John Mark Timog
No ratings yet
OUTLIERS
Document5 pages
OUTLIERS
Rana Arslan Munir
100% (1)
Osmosis Practical Write UP
Document13 pages
Osmosis Practical Write UP
Sanngeeta
No ratings yet
Suggested Answers (Chapter 6)
Document3 pages
Suggested Answers (Chapter 6)
kokomama231
No ratings yet
DM 1 PDF
Document67 pages
DM 1 PDF
Rahul Pawar
No ratings yet
Descriptive Statistics Assignment 1
Document2 pages
Descriptive Statistics Assignment 1
SANTHAN KUMAR
No ratings yet
Introduction To Business Statistics (Revision Questions) : IBS/Revision Worksheet/ BHRM/ 2020
Document4 pages
Introduction To Business Statistics (Revision Questions) : IBS/Revision Worksheet/ BHRM/ 2020
Hassan Hussain
No ratings yet
Statistics
Document743 pages
Statistics
munish_tiwari2007
100% (1)
Measurement and Modeling of Unsaturated Hydraulic Conductivity
Document16 pages
Measurement and Modeling of Unsaturated Hydraulic Conductivity
zeeshansheikh7
No ratings yet
Stat A01
Document6 pages
Stat A01
Sunny Le
No ratings yet
Data Analysis
Document106 pages
Data Analysis
Yasir
No ratings yet
Flood Hec Hms Frequency
Document1 page
Flood Hec Hms Frequency
Nazakat Hussain
No ratings yet
Ac1991 63 139 13354
Document9 pages
Ac1991 63 139 13354
Samuel González
No ratings yet
Markov Vs Arima
Document93 pages
Markov Vs Arima
DenBagoes
No ratings yet
Accurate 3D Point Cloud Comparison Lague Et Al - Revised With Figures Feb2013
Document28 pages
Accurate 3D Point Cloud Comparison Lague Et Al - Revised With Figures Feb2013
Andrei Taranu
No ratings yet
Usp42-Nf37 198
Document9 pages
Usp42-Nf37 198
seshadri
No ratings yet
Case Study
Document11 pages
Case Study
trishajainmanyajain
No ratings yet
Online Nonnegative Matrix Factorization With Outliers
Document28 pages
Online Nonnegative Matrix Factorization With Outliers
余深宝
No ratings yet
AC Worksample Mathematics 8
Document32 pages
AC Worksample Mathematics 8
Fathima Nusrath
No ratings yet
Practice 2-Midterm 3 1 1
Document7 pages
Practice 2-Midterm 3 1 1
Phoenix fire3456
No ratings yet
Chapter09 Part 2
Document18 pages
Chapter09 Part 2
api-232613595
No ratings yet
Fighting Money Laundering With Statistics and Machine Learning
Document7 pages
Fighting Money Laundering With Statistics and Machine Learning
20bd1a058t
No ratings yet
Statistical Data Treatment and Evaluation Lecture
Document16 pages
Statistical Data Treatment and Evaluation Lecture
Watchme Whip
No ratings yet
Converting Briquettes of Orange and Banana Peels Into Carbonaceous Materials For Activated Sustainable Carbon and Fuel Sources
Document11 pages
Converting Briquettes of Orange and Banana Peels Into Carbonaceous Materials For Activated Sustainable Carbon and Fuel Sources
Sophie
No ratings yet
1 s2.0 S1877050915007000 Main
Document9 pages
1 s2.0 S1877050915007000 Main
alvinkvinil299
No ratings yet
Data Sciences Class 10 Notes
Document3 pages
Data Sciences Class 10 Notes
aadyaag2009
100% (2)