This repository contains INSAID GCDAI curriculum assignments and practice samples.
- The basics of Data Analysis and Data Visualisation with use of
Pandas
,Numpy
,Matplotlib
,Seaborn
, etc libraries. - Covers the Exploratory Data Analysis (EDA)
-
What is EDA?
- EDA is a phenomenon under data analysis used for gaining a better understanding of data aspects like main features of data, variables and relationships that hold between them, identifying which variables are important for our problem
- Exploratory Data Analysis (EDA) helps in understanding the data sets by summarizing their main characteristics often plotting them visually.
- Lifecycle of a Data Analysis projects consists of:
- EDA is a phenomenon under data analysis used for gaining a better understanding of data aspects like main features of data, variables and relationships that hold between them, identifying which variables are important for our problem
-
EDA Methods involve:
- Table of Contents
Steps in Data Exploration and Preprocessing:
- Identification of variables and data types
- Analyzing the basic metrics
- Non-Graphical Univariate Analysis
- Graphical Univariate Analysis
- Bivariate Analysis
- Variable transformations
- Missing value treatment
- Outlier treatment
- Correlation Analysis
- Dimensionality Reduction
- Table of Contents
-
CheatSheets contains
-
EDA - This repository is about implementation of EDA on the loaded dataset and applying Python fundamentals to it.
- EDA - Summer Olympic Dataset
- EDA - Wine Quality Dataset
- EDA - House Price Prediction
-
ML - This repository contains practise samples over various ML Algorithms like
Linear Regression
,Logistic Regression
,Decision Trees
,Random Forests
,KNN
,K-means
, etc over openly available datasets along with Model Evaluation- ML - Fifa 2018 Man of the match Prediction
- ML - Forest Cover Type Prediction
- ML - Iris Dataset Decision Trees and Random Forest
-
Python Primer - Basic of Python for Data Science and Machine Learning