Nothing Special   »   [go: up one dir, main page]

Lecturer 4 Regression Analysis

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 29

Basics of

Regression Analysis

1
Outlines
•What is Regression Analysis?
•Population Regression Line
•Why do we use Regression Analysis?
•What are the types of Regression?
•Simple Linear Regression Model
•Least Square Estimation for parameters
•Least Square for Linear Regression
•References

2
What is RegressionAnalysis?
 Regression analysis is a form of predictive modelling technique which investigates the
relationship between a dependent (target) and independent variable(s) (predictor).
 This technique is used for forecasting, time series modelling and finding the causal effect
relationship between the variables.
 For example, relationship between rash driving and number of road accidents by a driver is
best studied through regression.

3
Population Regression Line
Regression Line
y
Estimated

Dependent Variables

Actual

Errors

Independent Variables
x

4
Population Regression Line
Example
Population regression function =
Regression Line
𝑦 = 𝑏0+𝑏1x
Estimated Grades

𝑦 = Estimated Grades
𝑏1= Slope x = Study Time
𝑏0= Intercept
𝑏1= Slope

𝑏0= Intercept

Study Time

5
Why we need RegressionAnalysis?
Typically, a regression analysis is used for these purposes:
(1) Prediction of the target variable (forecasting).
(2) Modelling the relationships between the dependent variable and the explanatory variable.
(3) Testing of hypotheses.
Benefits
1. It indicates the strength of impact of multiple independent variables on a dependent variable.
2. It indicates the significant relationships between dependent variable and independent variable.

These benefits help market researchers / data analysts / data scientists to eliminate and evaluate the best set
of variables to be used for building predictive models.

6
Types of Regression Analysis
Regression
Types of regression analysis: Analysis
1 Explanatory 2+ Explanatory
Regression analysis is generally classified into two variable variable
kinds: simple and multiple.
Simple Regression:
It involves only two variables: dependent variable ,
explanatory (independent) variable. Simple Multiple

A regression analysis may involve a linear model or


a nonlinear model.
The term linear can be interpreted in two different
ways:
1. Linear in variable Linear Non Linear
2. Linearity in the parameter

7
Simple Linear Regression Model
Simple linear regression model is a model with a single regressor x that has a linear relationship with a
response y.

Simple linear regression model:


Intercept Slope Random error component

y = 𝑏0+𝑏1x + ɛ
Response variable Regressor variable

In this technique, the dependent variable is continuous


and random variable, independent variable(s) can
be continuous or discrete but it is not a random
variable, and nature of regression line is linear.

8
Some basic assumption on the model:

Simple linear regression model:


yi= 𝑏0+𝑏1xi + ɛi for i=(1,2….n)

 ɛi is a random variable with zero mean and varianceσ2,i.e.

E(ɛi )=0 ; V(ɛi )= σ2

 ɛi and ɛj are uncorrelated for i ≠ j, i.e.


cov(ɛi , ɛj )=0

 ɛi is a normally distributed random variable with mean zero and variance σ2.

Ɛi ~𝑖𝑛𝑑 N (0, σ2).

9
yi= 𝑏0+𝑏1xi + ɛi for i=(1,2….n)

E(yi) = 𝐸(𝑏0+𝑏1xi + ɛi)= 𝑏0+𝑏1xi E(ɛi )=0

V(yi) = 𝑉(𝑏0+𝑏1xi + ɛi)=V(ɛi )=σ2.

=> Ɛi ~𝒊𝒏𝒅 N (0, σ2)

=> Yi ~𝒊𝒏𝒅 N (𝒃𝟎+𝒃𝟏xi , σ2)

NOTE : The dataset should satisfy the basic assumption.

10
Least Square Estimation for Parameters
The parameters 𝑏0 and 𝑏1are unknown and must be estimates using sample data:
(𝑥1,𝑦1), (𝑥2,𝑦2),……(𝑥𝑛,𝑦𝑛)
𝑦 = 𝑏0+𝑏1x + ɛ

y y
𝑦𝑖 = 𝑏 0+𝑏 1xi + ɛ i

11
x x
The line fitted by least square is the one that makes the sum of squares of all vertical discrepancies
as small as possible.

y
(x1,𝑦 1)
We estimate the parameters so that sum of
squares of all the vertical difference between
(y1-𝑦 1)= ɛ1
the observation and fitted line is minimum.

(x1,y1)
S= 𝑛 𝑦𝑖 − 𝑏0 − 𝑏1𝑥 𝑖 2
𝑖=
1

𝑦𝑖 = 𝑏 0+𝑏 1xi + ɛ i

12
Minimizing the function requires to calculate the first order condition with respect to alpha and beta and
set them zero:

𝜕 𝑛
I: = -2 𝑖 =1 𝑦𝑖 − 𝑏0 − 𝑏1𝑥𝑖 = 0 𝑛
𝜕 0 S= 𝑖 =1 𝑦𝑖− 𝑏 0− 𝑏 1𝑥𝑖 2

𝜕 𝑛
II: = -2 𝑖 =1 𝑦𝑖 − 𝑏0 − 𝑏1𝑥𝑖 𝑥𝑖 = 0
𝜕 1

We can mathematically solve for 𝑏 𝑎𝑛𝑑 𝑏 :


0 1

𝜕 𝑛 𝑦𝑖 − 𝑏 − 𝑏 𝑥 = 0 𝑦𝑖
I: = -2 𝑖= 0 1 𝑖 Where 𝑦 =
𝜕 0 𝑛
1

𝑏0= 𝑛 𝑦𝑖− 𝑏 1𝑥 𝑖 𝑥𝑖
𝑖 =1 𝑥 =
𝑛
𝑏 =𝑦 -𝑏𝑥
0 1

13
𝜕𝑠 𝑛
II: = -2 𝑖 =1 𝑦𝑖− 𝑏 0− 𝑏 1𝑥 𝑖 𝑥𝑖 = 0
𝜕 1

𝑛 𝑥 𝑦𝑖 − 𝑏 0 − 𝑏1𝑥𝑖 = 0
𝑖 =1 𝑖

𝑛
𝑖 =1 𝑥𝑖 𝑦𝑖 − 𝑦+𝑏 1𝑥 − 𝑏1 𝑥𝑖 = 0

𝑛 𝑛 (x −𝑥 )
𝑖 =1 𝑥𝑖(𝑦𝑖 − 𝑦) = 𝑏1 𝑖 =1 i 𝑖
𝑥
)𝑥 𝑖 Proof:
𝑖=1 (𝑦𝑖−𝑦
𝑏1 = (𝑥 −𝑥 𝑛
𝑖= 𝑖 𝑖
= 𝑖=1 (𝑦𝑖 − 𝑦) 𝑥
1 )𝑥 =𝑥 𝑦𝑖 − 𝑥 𝑦
𝑖= (𝑦𝑖−𝑦 )(𝑥 −𝑥
𝑖
)
𝑏1 = 2 =𝑛𝑥𝑦− 𝑛𝑥𝑦
𝑖 = (𝑥𝑖−𝑥 )
1
1 =0
𝐶𝑜𝑣(𝑥,𝑦) −1
𝑏1 = = 𝑋′𝑋 𝑋′ 𝑦 )( )
𝑉𝑎𝑟(𝑥) 𝑏 =𝑦- 𝑏 𝑥 ; 𝑏 = 𝑖 = (𝑦𝑖−𝑦 𝑥 −𝑥𝑖
0 1 1 𝑛 2
1
𝑖 = (𝑥𝑖−𝑥 )

14
1
Example

𝑖= (𝑦𝑖−𝑦 )(𝑥 −𝑥
𝑖 ) 6
𝑏1 = 2 = = 0.6
𝑖 = (𝑥𝑖−𝑥 )
1 10
1

𝑏0 = 𝑦-𝑏 1𝑥 = 2.2

15
Calculating R2 Using Regression Analysis
 R-squared is a statistical measure of how close the data are to the fitted regression line(For measuring the
goodness of fit ). It is also known as the coefficient of determination.
 Firstly we calculate distance between actual values and mean value and also calculate distance between estimated
value and mean value.
 Then compare both the distances.

16
Example

17
Performance of Model

18
Standard error of the Estimate (Mean squareerror)
The standard error of the estimate is a measure of the accuracy of predictions.

Note: The regression line is the line that minimizes the sum of squared deviations of prediction
(also called the sum of squares error).

The standard error of the estimate is closely related to this quantity and is defined below:

Where Y = actual value


Y’= Estimated Value
N = No. of observations

19
Example
X Y Y' Y-Y' (Y-Y')2
1.00 1.00 1.210 -0.210 0.044
2.00 2.00 1.635 0.365 0.133
3.00 1.30 2.060 -0.760 0.578
4.00 3.75 2.485 1.265 1.600
5.00 2.25 2.910 -0.660 0.436
Sum 15.00 10.30 10.30 0.000 2.791

20
Difference

21
Least Square for Linear Regression
Solve : Ax=b

The columns of A define a vector space range(A).

Ax is an arbitrary vector in range(A).

b is a vector in Rn and also in the column space of A so this has a solution.

a1
b

x1a1  x2a2  Ax

22
a2
The columns of A define a vector space range(A).

Ax is an arbitrary vector in range(A).

b is a vector in Rn but not in the column space of A then it doesn’t has asolution.

Try to find out 𝒙 that makes A𝒙 as close to 𝒃 as possible and this is called least square solution of
our problem.

b b  Axˆ

a1

x1a1  x2a2  Ax

a2

23
A𝑥is the orthogonal projection of b onto range(A)

 AT b  Axˆ 0  AT Axˆ AT b

b b  Axˆ

a1

Axˆ

24
a2
25
Matlab Implementation (Linear_Regression3.m)

26
Matlab Implementation (Linear_Regression3.m)

27
Reference
1 Sykes, Alan O. "An introduction to regression analysis." (1993).
2Chatterjee, Samprit, and Ali S. Hadi. Regression analysis by example. John Wiley & Sons, 2015.
3Draper, Norman Richard, Harry Smith, and Elizabeth Pownell. Applied regression analysis. Vol.
3. New York: Wiley, 1966.
4Montgomery, Douglas C., Elizabeth A. Peck, and G. Geoffrey Vining. Introduction to linear
regression analysis. John Wiley & Sons, 2015.
5Seber, George AF, and Alan J. Lee. Linear regression analysis. Vol. 936. John Wiley & Sons,
2012.

28
THANK YOU

29

You might also like