Lecturer 4 Regression Analysis
Lecturer 4 Regression Analysis
Lecturer 4 Regression Analysis
Regression Analysis
1
Outlines
•What is Regression Analysis?
•Population Regression Line
•Why do we use Regression Analysis?
•What are the types of Regression?
•Simple Linear Regression Model
•Least Square Estimation for parameters
•Least Square for Linear Regression
•References
2
What is RegressionAnalysis?
Regression analysis is a form of predictive modelling technique which investigates the
relationship between a dependent (target) and independent variable(s) (predictor).
This technique is used for forecasting, time series modelling and finding the causal effect
relationship between the variables.
For example, relationship between rash driving and number of road accidents by a driver is
best studied through regression.
3
Population Regression Line
Regression Line
y
Estimated
Dependent Variables
Actual
Errors
Independent Variables
x
4
Population Regression Line
Example
Population regression function =
Regression Line
𝑦 = 𝑏0+𝑏1x
Estimated Grades
𝑦 = Estimated Grades
𝑏1= Slope x = Study Time
𝑏0= Intercept
𝑏1= Slope
𝑏0= Intercept
Study Time
5
Why we need RegressionAnalysis?
Typically, a regression analysis is used for these purposes:
(1) Prediction of the target variable (forecasting).
(2) Modelling the relationships between the dependent variable and the explanatory variable.
(3) Testing of hypotheses.
Benefits
1. It indicates the strength of impact of multiple independent variables on a dependent variable.
2. It indicates the significant relationships between dependent variable and independent variable.
These benefits help market researchers / data analysts / data scientists to eliminate and evaluate the best set
of variables to be used for building predictive models.
6
Types of Regression Analysis
Regression
Types of regression analysis: Analysis
1 Explanatory 2+ Explanatory
Regression analysis is generally classified into two variable variable
kinds: simple and multiple.
Simple Regression:
It involves only two variables: dependent variable ,
explanatory (independent) variable. Simple Multiple
7
Simple Linear Regression Model
Simple linear regression model is a model with a single regressor x that has a linear relationship with a
response y.
y = 𝑏0+𝑏1x + ɛ
Response variable Regressor variable
8
Some basic assumption on the model:
ɛi is a normally distributed random variable with mean zero and variance σ2.
9
yi= 𝑏0+𝑏1xi + ɛi for i=(1,2….n)
10
Least Square Estimation for Parameters
The parameters 𝑏0 and 𝑏1are unknown and must be estimates using sample data:
(𝑥1,𝑦1), (𝑥2,𝑦2),……(𝑥𝑛,𝑦𝑛)
𝑦 = 𝑏0+𝑏1x + ɛ
y y
𝑦𝑖 = 𝑏 0+𝑏 1xi + ɛ i
11
x x
The line fitted by least square is the one that makes the sum of squares of all vertical discrepancies
as small as possible.
y
(x1,𝑦 1)
We estimate the parameters so that sum of
squares of all the vertical difference between
(y1-𝑦 1)= ɛ1
the observation and fitted line is minimum.
(x1,y1)
S= 𝑛 𝑦𝑖 − 𝑏0 − 𝑏1𝑥 𝑖 2
𝑖=
1
𝑦𝑖 = 𝑏 0+𝑏 1xi + ɛ i
12
Minimizing the function requires to calculate the first order condition with respect to alpha and beta and
set them zero:
𝜕 𝑛
I: = -2 𝑖 =1 𝑦𝑖 − 𝑏0 − 𝑏1𝑥𝑖 = 0 𝑛
𝜕 0 S= 𝑖 =1 𝑦𝑖− 𝑏 0− 𝑏 1𝑥𝑖 2
𝜕 𝑛
II: = -2 𝑖 =1 𝑦𝑖 − 𝑏0 − 𝑏1𝑥𝑖 𝑥𝑖 = 0
𝜕 1
𝜕 𝑛 𝑦𝑖 − 𝑏 − 𝑏 𝑥 = 0 𝑦𝑖
I: = -2 𝑖= 0 1 𝑖 Where 𝑦 =
𝜕 0 𝑛
1
𝑏0= 𝑛 𝑦𝑖− 𝑏 1𝑥 𝑖 𝑥𝑖
𝑖 =1 𝑥 =
𝑛
𝑏 =𝑦 -𝑏𝑥
0 1
13
𝜕𝑠 𝑛
II: = -2 𝑖 =1 𝑦𝑖− 𝑏 0− 𝑏 1𝑥 𝑖 𝑥𝑖 = 0
𝜕 1
𝑛 𝑥 𝑦𝑖 − 𝑏 0 − 𝑏1𝑥𝑖 = 0
𝑖 =1 𝑖
𝑛
𝑖 =1 𝑥𝑖 𝑦𝑖 − 𝑦+𝑏 1𝑥 − 𝑏1 𝑥𝑖 = 0
𝑛 𝑛 (x −𝑥 )
𝑖 =1 𝑥𝑖(𝑦𝑖 − 𝑦) = 𝑏1 𝑖 =1 i 𝑖
𝑥
)𝑥 𝑖 Proof:
𝑖=1 (𝑦𝑖−𝑦
𝑏1 = (𝑥 −𝑥 𝑛
𝑖= 𝑖 𝑖
= 𝑖=1 (𝑦𝑖 − 𝑦) 𝑥
1 )𝑥 =𝑥 𝑦𝑖 − 𝑥 𝑦
𝑖= (𝑦𝑖−𝑦 )(𝑥 −𝑥
𝑖
)
𝑏1 = 2 =𝑛𝑥𝑦− 𝑛𝑥𝑦
𝑖 = (𝑥𝑖−𝑥 )
1
1 =0
𝐶𝑜𝑣(𝑥,𝑦) −1
𝑏1 = = 𝑋′𝑋 𝑋′ 𝑦 )( )
𝑉𝑎𝑟(𝑥) 𝑏 =𝑦- 𝑏 𝑥 ; 𝑏 = 𝑖 = (𝑦𝑖−𝑦 𝑥 −𝑥𝑖
0 1 1 𝑛 2
1
𝑖 = (𝑥𝑖−𝑥 )
14
1
Example
𝑖= (𝑦𝑖−𝑦 )(𝑥 −𝑥
𝑖 ) 6
𝑏1 = 2 = = 0.6
𝑖 = (𝑥𝑖−𝑥 )
1 10
1
𝑏0 = 𝑦-𝑏 1𝑥 = 2.2
15
Calculating R2 Using Regression Analysis
R-squared is a statistical measure of how close the data are to the fitted regression line(For measuring the
goodness of fit ). It is also known as the coefficient of determination.
Firstly we calculate distance between actual values and mean value and also calculate distance between estimated
value and mean value.
Then compare both the distances.
16
Example
17
Performance of Model
18
Standard error of the Estimate (Mean squareerror)
The standard error of the estimate is a measure of the accuracy of predictions.
Note: The regression line is the line that minimizes the sum of squared deviations of prediction
(also called the sum of squares error).
The standard error of the estimate is closely related to this quantity and is defined below:
19
Example
X Y Y' Y-Y' (Y-Y')2
1.00 1.00 1.210 -0.210 0.044
2.00 2.00 1.635 0.365 0.133
3.00 1.30 2.060 -0.760 0.578
4.00 3.75 2.485 1.265 1.600
5.00 2.25 2.910 -0.660 0.436
Sum 15.00 10.30 10.30 0.000 2.791
20
Difference
21
Least Square for Linear Regression
Solve : Ax=b
a1
b
x1a1 x2a2 Ax
22
a2
The columns of A define a vector space range(A).
b is a vector in Rn but not in the column space of A then it doesn’t has asolution.
Try to find out 𝒙 that makes A𝒙 as close to 𝒃 as possible and this is called least square solution of
our problem.
b b Axˆ
a1
x1a1 x2a2 Ax
a2
23
A𝑥is the orthogonal projection of b onto range(A)
AT b Axˆ 0 AT Axˆ AT b
b b Axˆ
a1
Axˆ
24
a2
25
Matlab Implementation (Linear_Regression3.m)
26
Matlab Implementation (Linear_Regression3.m)
27
Reference
1 Sykes, Alan O. "An introduction to regression analysis." (1993).
2Chatterjee, Samprit, and Ali S. Hadi. Regression analysis by example. John Wiley & Sons, 2015.
3Draper, Norman Richard, Harry Smith, and Elizabeth Pownell. Applied regression analysis. Vol.
3. New York: Wiley, 1966.
4Montgomery, Douglas C., Elizabeth A. Peck, and G. Geoffrey Vining. Introduction to linear
regression analysis. John Wiley & Sons, 2015.
5Seber, George AF, and Alan J. Lee. Linear regression analysis. Vol. 936. John Wiley & Sons,
2012.
28
THANK YOU
29