Artificial Intelligence - (Unit - 1)
Artificial Intelligence - (Unit - 1)
Artificial Intelligence - (Unit - 1)
So, without further ado, let’s jump straight into some Capstone Project
ideas that will strengthen your base
2) Data Gathering
3) Feature Definition
4) AI Model Construction
6) Deployment
1. Understanding the Problem:
➢ The premise that underlies all Machine Learning disciplines is that there
needs to be a pattern.
➢ If there is no pattern, then the problem cannot be solved with AI
technology.
If it is believed that there is a pattern in the data, then AI development
techniques may be employed.
Applied uses of these techniques are typically geared towards answering
five types of questions, all of which may be categorized as being within the umbrella
of predictive analysis:
➢ In this phase the data requirements are revised and decisions are made as to
whether or not the collection requires more or less data Once the data
ingredients are collected, the data scientist will have a good understanding of
what they will be working with.
1. Understand the problem and then restate the problem in your own words
❖ Know what the desired inputs and outputs are
❖ Ask questions for clarification
2. Break the problem down into a few large pieces. Write these down, either on
paper or as comments in a file.
3. Break complicated pieces down into smaller pieces. Keep doing this until all of
the pieces are small.
This list has broken down the complex problem of creating an app into much
simpler problems that can now be worked out.
You may also be able to get other people to help you with different
individual parts of the app.
For example, you may have a friend who can create the graphics, while another
will be your test the app.
Example 1:
Data
1 2 3
2 4 3
➢ The data scientist will use a training set for predictive modelling
➢ A training set is a set of historical data in which the outcomes are already known
The training set acts like a gauge to determine if the model needs to be calibrated
In this stage, the data scientist will play around with different algorithms to ensure
that the variables in play are actually required
➢ The success of data compilation, preparation and modelling depends on the
understanding of the problem at hand, and the appropriate analytical
approach being taken The data supports the answering of the question, and like
the quality of the ingredients in cooking, sets the stage for the outcome
Constant refinement, adjustments and tweaking are necessary within each step
to ensure the outcome is one that is solid The framework is geared to do 3
things.
➢ The end goal is to move the data scientist to a point where a data model can be
built to answer the question
How to Validate Model Quality
Train-Test Split Evaluation
➢ The train test split is a technique for evaluating the performance of a machine
learning algorithm.
➢ It can be used for classification or regression problems and can be used for
any supervised learning algorithm.
➢ The procedure involves taking a dataset and dividing it into two subsets. The
first subset is used to fit the model and is referred to as the training dataset. The
second subset is not used to train the model; instead, the input element of the
dataset is provided to the model, then predictions are made and compared to the
expected values. This second dataset is referred to as the test dataset.
Train Dataset: Used to fit the machine learning model.
Test Dataset: Used to evaluate the fit machine learning model.
➢ The objective is to estimate the performance of the machine learning model
on new data: data not used to train the model.
➢ This is how we expect to use the model in practice. Namely, to fit it on available
data with known inputs and outputs, then make predictions on new examples in
the future where we do not have the expected output or target values.
➢ The train-test procedure is appropriate when there is a sufficiently large
dataset available.
How to Configure the Train-Test Split
➢ The procedure has one main configuration parameter, which is the size of the train and test
sets. This is most commonly expressed as a percentage between 0 and 1 for either the
train or test datasets.
➢ For example, a training set with the size of 0.67 (67 percent) means that the remainder
percentage 0.33 (33 percent) is assigned to the test set.
➢ There is no optimal split percentage.
You must choose a split percentage that meets your project’s objectives with
considerations that include:
❖ Computational cost in training the model.
❖ Computational cost in evaluating the model.
❖ Training set representativeness.
❖ Test set representativeness.
Following are the process of Train and Test set in Python ML. So, let’s take a
dataset first.
Loading the Data set
Let’s load the forestfires dataset using pandas.
1. >>> data=pd.read_csv(‘forestfires.csv’)
2. >>> data.head()
Temp is a label to predict temperatures in y; we use the drop() function to take all
other data in x. Then, we split the data.
1. >>> x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2)
2. >>> x_train.head()
Train and Test Set in Python Machine Learning
1. >>> x_train.shape
(413, 12)
1. >>> x_test.head()
Regression functions
predict a quantity,
and classification functions
predict a label.
Python Libraries:
A Library refers to Collection of Modules that together satisfy a
specific type of Needs.
Pandas Library:
➢ Pandas is the most popular library for Data Analysis and
Manipulation.
➢ Its supports reshaping of data in to different forms.
➢ Its used in economics, stock prediction, big data, finance, data
science and data analytics.
➢ Pandas is the most popular library for Data Analysis and
Manipulation.
➢ It has functionality to find and fill missing data.
➢ Its supports reshaping of data in to different forms.
➢ It supports data visualization by integrating matplotlib and seaboru.
✓ We can analyze the data in pandas with Series(1D) and Data
Frame(2D)..
✓ Its used in economics, stock prediction, big data, finance,
data science and data analytics.
Syntax:
import pandas (or) import pandas as pd
Scikit Learn:
➢ It is a popular library used to perform machine learning.
➢ Also used in statistical modeling including classification,
regression, clustering and dimensionality reduction.
➢ Pytorch also machine learning library developed by
Facebook.
Syntax:
import sklearn as sk
NumPy:
✓ NumPy library provides high-level Math functionalities to
create and manipulate numeric array.
✓ Data manipulation in Pandas library's is performed using
NumPy.
✓ Its used in Matrix processing, linear algebra, Fourier
transform, 2D array etc.
Syntax:
import numpy (or ) import numpy as np
When to use mean squared error
Use MSE when doing regression, believing that your target, conditioned on
the input, is normally distributed, and want large errors to be significantly
(quadratically) more penalized than small ones.
Example-1: You want to predict future house prices. The price is a continuous
value, and therefore we want to do regression. MSE can here be used as the loss
function.
Example-2:Consider the given data points: (1,1), (2,1), (3,2), (4,2), (5,4) You can
use this online calculator to find the regression equation / line.
Output: 0.21606
Example 1 (RMSE)
Let us write a python code to find out RMSE values of our model. We would be
predicting the brain weight of the users. We would be using linear regression to train our model, the
data set used in my code can be downloaded from here:
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#reading the data
"""
here the directory of my code and the headbrain6.csv file is same make sure both the files are stored
in same folder or directory
"""
data=pd.read_csv('headbrain6.csv')
data.head()
x=data.iloc[:,2:3].values
y=data.iloc[:,3:4].values
#splitting the data into training and test
from sklearn.cross_validation import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=1/4,random_state=0)
#fitting simple linear regression to the training set
from sklearn.linear_model import LinearRegression
regressor=LinearRegression()
regressor.fit(x_train,y_train)
#predict the test result
y_pred=regressor.predict(x_test)
#to see the relationship between the training data values
plt.scatter(x_train,y_train,c='red')
plt.show()
#to see the relationship between the predicted
#brain weight values using scattered graph
plt.plot(x_test,y_pred)
plt.scatter(x_test,y_test,c='red')
plt.xlabel('headsize')
plt.ylabel('brain weight')
#errorin each value
for i in range(0,60):
print("Error in value number",i,(y_test[i]-y_pred[i]))
time.sleep(1)
#combined rmse value
rss=((y_test-y_pred)**2).sum()
mse=np.mean((y_test-y_pred)**2)
print("Final rmse value is =",np.sqrt(np.mean((y_test-y_pred)**2)))
Output:
The RMSE value of our is coming out to be approximately 73 which is not bad. A
good model should have an RMSE value less than 180. In case you have a higher RMSE
value, this would mean that you probably need to change your feature or probably you need
to tweak your hyperparameters.
THANK YOU