Nothing Special   »   [go: up one dir, main page]

Econometrics Unit 3 Tedy Best

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 147

The Simple Regression

Model
Chapter 3

1
• Learning Objectives
• After completing this topic the students are able to:

• Determine the significance of the predictor variable in explaining variability in the


dependent variable;

• Predict values of the dependent variable for given values of the explanatory variable;

• Use linear regression methods to estimate empirical relationships;

• Evaluate and mitigate the effects of departures from classical statistical assumptions
on linear regression estimates; and

• Critically evaluate simple econometric analyses.

• Keywords: Simple linear regression model, regression parameters, regression line, residuals,
principle of least squares, least squares estimates, least squares line, fitted values, predicted
values, coefficient of determination, least squares estimators hypotheses on regression 2
parameters, confidence intervals for regression.
Outline

3.1. Introduction of Simple regression


3.2. Ordinary Least Square Method (OLS) and
Classical Assumptions
3.3. Hypothesis Testing of OLS Estimates

3
3.1. Simple Linear Regression
• Our objective is to study the relationship between
two variables X and Y.
• One way is by means of regression.
• Regression analysis is the process of estimating a
functional relationship between X and Y. A
regression equation is often used to predict a value
of Y for a given value of X.
• Another way to study relationship between two
variables is correlation. It involves measuring the
direction and the strength of the linear
relationship.

4
Examples
• Multiple Linear Regression:
Yi   0  1X1i   2 X 2i   i
• Polynomial Linear Regression:

Yi   0  1X i   2 X 2i   i
• Linear Regression:
log10 (Yi )   0  1X i   2 exp(X i )   i
• Nonlinear Regression:
Yi   0 /(1  1 exp(  2 X i ))   i
Linear or nonlinear in parameters 5
Cont…

Most frequently used regression models includes:

Simple regression model


Multiple regression model
Multivariate regression model
Logit, Probit, Multinomial regressions etc.

6
Cont…
Now let us see simple linear regression under this chapter

Econometric research or inquiry generally proceeds


along the following lines/stages.

1. Specification of the model


2. Estimation of the model
3. Evaluation of the estimates
4. Evaluation of the forecasting power of the estimated
model
7
Specification the model
Starting with the postulated theoretical relationships
among economic variables:

Let Y = α + α1 X + u
i 0 i i 1

Where Y dependent variable


X independent variable
U disturbance term

8
The simple linear regression model
• We consider the modelling between the dependent and
one independent variable.

• When there is only one independent variable in the


linear regression model, the model is generally termed
as a simple linear regression model.

• When there are more than one independent variable in


the model, then the linear model is termed as the
multiple linear regression model.

9
Cont…

What is simple regression model?

Simple regression model is a statistical equation that


characterizes the relationship between a dependent
variable and only one independent variable.

10
• The linear model
• Consider a simple linear regression model
X+

• where y is termed as the dependent or study variable


and X is termed as the independent or explanatory
variable.
• The terms and are the parameters of the model.
• The parameter is termed as an intercept term, and
• The parameter is termed as the slope parameter.
• These parameters are usually called as regression
coefficients. 11
Cont….
Using specific mathematical expression for one
variable,
Normally the explained variable is designated by y
and the explanatory variables by x.
y = f (x) + ε
where y = production of maize
x = land size
or
y= sales
x = advertising expenditure

12
Cont…
Explain the variables involved in a regression model.
These variables are observable, unobservable and
unknown parameters.

1. Observable Variables –
These are the variables in which their values are
collected from the field through questionnaires,
interviews and other means of data collection
mechanisms.
yi = the ith value of the dependent variable.
x = the ith value of the independent variable.
13
Cont…

2. Unobservable variables –

These are the values that will be determined from


the observations and estimated values of the data
set.
The ε is the random error term for the ith member
of the population is also called –
– The disturbance term
– The stochastic term
14
Cont…
The stochastic error term measures the residual
variance in Y not covered by X.

This is akin to saying there is measurement error


and our predictions/models will not be perfect.

The more X variables we add to a model, the


lower the error of estimation.

15
Cont….
3.Unknown Parameters (or regression coefficients)

 The regression coefficients are the values that will


be estimated from the sample data of dependent
and independent variables.

16
Cont….
Why is the disturbance term ε?

The reason is that we can’t hope to capture


every influence on an economic variable in
the model, no matter how elaborate it is

This name is given to it because it disturbs


an otherwise stable relationship

17
Cont….
Contributors to ε
– Measurement errors
– Exclusion of important variables
– Simultaneity

In other word, Why do we need to include the stochastic


(random) component, for example in the consumption
function?
— Omission of variables leads to misspecification problem. For example, income is not the
only determinants of consumption.
— There may be measurement error in collecting data.
— We may use poor proxy variables.
— The functional form may not be correct.
— There is randomness on human behavior. 18
Some details…
• The unobservable error component accounts for the
failure of data to lie on a straight line.
• It represents the difference between the true and
observed realization of y.
• There can be several reasons for such difference,
• e.g., the effect of all deleted variables in the model,
measurement error etc.
• We assume that is observed as an independent and
identically distributed random variable with mean zero
and constant variance .
• Later, we will additionally assume that is normally
19
• The independent variables are viewed as controlled by
the experimenter,

• so it is considered as non-stochastic whereas is viewed


as random variable with y
X

and
2
𝑉𝑎𝑟 ( 𝑦 )= 𝜎

20
SIMPLE REGRESSION MODEL

Y  1   2 X
Q4
Q3
Q2
Q1
1

X1 X2 X3 X4 X

If the relationship were an exact one, the observations would lie on a straight line and we
would have no trouble obtaining accurate estimates of b1 and b2.

21 3
• The independent variables are viewed as controlled by
the experimenter,

• so it is considered as non-stochastic whereas is viewed


as random variable with y
X

and
2
𝑉𝑎𝑟 ( 𝑦 )= 𝜎

22
• Sometimes X can also be a random variable.
• In such a case, instead of the sample mean and sample
variance of y,
• we consider the conditional mean of y given X = .
(Meaning expected value of y at a given level of x=

and
2
𝑉𝑎𝑟 ( 𝑦 / 𝑥)=𝜎

23
• Regression is estimation or prediction of the average value of
a dependent variable on the basis of the fixed values of other
variables.

• In regression, we have stochastic dependent variable and non-


stochastic independent variable (fixed)

24
25
26
27
P4
Y

Y  1   2 X
P1 Q4
Q3
Q2
Q1 P3
1 P2

X1 X2 X3 X4 X

In practice, most economic relationships are not exact and the actual values of Y are
different from those corresponding to the straight line.

28 4
SIMPLE REGRESSION MODEL

P4
Y

Y  1   2 X
P1 Q4
u1 Q3
Q2
Q1 P3
1 P2
u = disturbance term
1   2 X1

X1 X2 X3 X4 X

Each value of Y thus has a non-random component, b1 + b2X, and a random component, u. The first
observation has been decomposed into these two components.
29 6
Simple Linear Regression
Model DCOVA
(continued)

Y Yi  β0  β1Xi  ε i
Observed Value
of Y for Xi

εi Slope = β1
Predicted Value Random Error
of Y for Xi
for this Xi value

Intercept = β0

Xi X
Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 30
Simple Linear Regression Equation
(Prediction Line)
DCOVA

The simple linear regression equation provides an


estimate of the population regression line

Estimated
(or predicted) Estimate of Estimate of the
Y value for the regression regression slope
observation i
intercept
Value of X for

Ŷi  b0  b1Xi
observation i

Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 31


• The parameters , and are generally unknown in
practice and ε is unobserved

• The determination of the statistical/econometric model


depends on the determination (i.e., estimation ) of ,
and .

• In order to know the values of these parameters, n


pairs of observations ()(= 1…..n) on (X, y) are
observed/collected and are used to determine these
unknown parameters.

32
• Various methods of estimation can be used to determine
the estimates of the parameters.

• Among them, the methods of least squares and maximum


likelihood are the popular methods of estimation.

33
3.2. Ordinary Least Square Method (OLS) and Classical
Assumptions
• There are two major ways of estimating regression
functions.
• These are the ordinary least square method and maximum
likelihood (MLH) method.
• Both the methods are basically similar to their application
in estimations.
• The ordinary least square method is the easiest and the
most commonly used method as opposed to the maximum
likelihood (MLH) method which is limited by its
assumptions.

34
Cont…
• For instance, the MLH method is valid only for large sample as
opposed to the OLS method which can be applied to smaller
samples.
• Owing to this merit, our discussion mainly focuses on the
ordinary least square (OLS).
• The (Ordinary) least square (OLS) method of estimating
parameters or regression function is about finding or
estimating values of the parameters ( of the simple
linear regression function given below for which the errors or
35
Cont…
Estimation
The sample regression line is given as:

Yˆi  ˆ 0  ˆ1 X i  ̂i 2

Ordinary Least Squares (OLS) Method :–


determines the best fitting straight line as the line
that minimizes the Sum of Squares (SS) between
observations, Y, and fitted values,Yˆ.
i
36
Cont…
The difference between the observed value and
the fitted value is known as the error (or
residual).

Mathematically, the error term is expressed as,

ˆi  Yi  Yˆi

37
Ordinary Least Square (OLS)
OLS is the technique used to estimate a line that will
minimize the error.

The difference between the predicted and the actual


values of Y

38
Cont…

39
Cont…
Ordinary Least Squares method chooses
estimates of the parameters (α i) by minimizing
the sum of squared differences between the
actual y’s and the estimated yˆ’s.
i

Before estimating the parametrs let us look the


OLS assumptions.

40
i, Classical Assumptions of OLS

1. The error terms or the disturbance terms ‘‘Ui’’ are not correlated.

This means that there is no systematic variation or relation among the


value of the error terms (Ui and Uj);

Thus, Cov (Ui , Uj) =0

The value in which the error term assumed in one period does not
depend on the value in which it assumed in any other period.

This assumption is known as the assumption of no autocorrelation or


non-serial correlation

If the covariance is correlated it causes an autocorrelation problem.41


Classical Assumptions of OLS

2. The disturbance terms ‘‘Ui’’ have zero mean.


• The deviations of the values of some of the disturbance terms
are negative; some are zero and some are positive and the sum
or the average is zero.

• This is given by the following identities.

U i
• E(Ui) = n
 0.
Multiplying both sides by (sample size ‘n’)
we obtain the following. E(Ui) =  U i  0 . 42
Classical Assumptions of OLS

• If this condition is not met, then the position of the regression


function (or curve) will not be the same as where it is supposed to
be.

• This results in an upward (if the mean of the error term or residual
term is positive) or down ward (if the mean of the error term or
residual term is negative) shift in the regression function.

• this estimated models will be biased and cause the regression


function to shift. For instance, if (or positive) it is going to
43
Figure 2: Regression Function/curve if the mean of error term is not zero

44
3. The disturbance terms have constant variance in each period.

• This assumption is known as the assumption of homoscedasticity.

• The constant variance itself is called homoscedastic variance.

• If this condition is not fulfilled or if the variance of the error terms


varies as sample size changes or as the value of explanatory
variables changes, then this leads to Heteroscedasticity problem. 45
The dependent variable Yi is normally distributed

46
Classical Assumptions of OLS

4. Explanatory variables ‘‘Xi’’ and disturbance


terms ‘‘Ui’’ are uncorrelated or independent.

This means there is no correlation between the


random variable and the explanatory variable.

If two variables are unrelated their covariance


is zero.

47
Classical Assumptions of OLS

5. The explanatory variable Xi is fixed in repeated


samples.

Each value of Xi does not vary for instance owing


to change in sample size.

This means the explanatory variables are non-


random and hence distributional free variable.
48
Classical Assumptions of OLS

6. Linearity of the model in parameters.

The classical assumed that the model should be linear in the


parameters regardless of whether the explanatory and the
dependent variables are linear or not.

This is because if the parameters are non-linear it is difficult to


estimate them since their value is not known but you are given
with the data of the dependent and independent variable.
49
Classical Assumptions of OLS

• Example 1. is linear in both parameters and


the variables, so it Satisfies the assumption.

• Example 2. is linear only in the parameters.


Since the the classicals worry on the parameters, the model
satisfies the assumption.

• What is important is transforming the data as required.

50
Classical Assumptions of OLS

7. Normality assumption

• The disturbance term Ui is assumed to have a normal distribution with zero


mean and a constant variance.

• This assumption is given as follows:

• This assumption is a combination of zero mean of error term assumption and


homoscedasticity assumption.

• This assumption or combination of assumptions is used in testing hypotheses


about significance of parameters.

• It is also useful in both estimating parameters and testing their significance


51
Classical Assumptions of OLS

8. Independence - the observations or the explanatory variables, x‟s,


are statistically independent of each another.

 Mathematically, it means the covariance between any two


observations is zero.

 Meaning, the x observations are independent of each other, and


denoted as Cov(xi, xj) = 0.

 However, it is not unusual for there to be some association between


the independent variables.

 This is to say that low correlations do not lead to inconsistence of


52
parameter estimates.
Classical Assumptions of OLS

 A violation of independence assumption (Cov(xi, xj )=0 )


indicates that there is multicollinearity problem
among the explanatory variables, which leads to a
very high value of coefficient of determination and
inconsistent parameter estimates.

53
• We can now use the above assumptions to derive the following basic
concepts.
A. The dependent variable Yi is normally distributed

54
Con’t…
Normal distribution - for any fixed explanatory
value, x, the response, y, has a normal distribution.
 Generally, the observations are normally distributed,
if the observations are graphed by a bell-shaped or
normal curve appears with zero mean.
• A violation of this assumption occurs when there are
outliers in data set, and leads to problems of wider
confidence intervals and wrong hypothesis testing.

55
Con’t…
Homoskedasticity (or constant variance) - the
variance of the dependent variable is the same for
any independent observations or explanatory
variables.
 There exists a constant variance for the given
regression model.
 Mathematically, it means the variance of the response
variable for all given observations does not vary.
 It is denoted as var( y)2 if y is given as a
dependent or response variable.

56
Con’t…
B,
Existence - for any fixed value of the independent variable x,
the dependent variable, y, is a random variable with a certain
probability distribution, having finite means and variance. A
violation of this assumption may indicate that there is no
relationship between the variables involved.
Continuity – the dependent variable is a continuous random
variable, whereas values of the independent variable are
fixed values; they can take continuous or discrete values.
Cautions must be taken that if the dependent variable is not
continuous, then other type of regression models such as
probit, logit, tobit, etc. should be used accordingly.
57
Cont…
ii, Deriving Ordinary list square estimate

 Derivation of the normal equation

Let Y = α + α X + u
i 0 1 i i 3

 The first normal equation is obtained as follows.

58
Con’t…
 Sum the equation 3 over all observation

 Y   (
i 0   1 X i  u i )  n 0   1  X i   u i

 Divide by n
Yi Xi ui
 n   0  1  n   n

 Then impose sum of error to zero (by assumption)


u i n  0, then Y = â + b̂X
Y   0  1 X
4 59
N:B

60
Con’t…
 The second normal equation
 Now returning to the equation 1 and multiplying
both sides by Xi gives us
X i Yi   0 X i  1 X i  û i X i
ˆ ˆ 2

And sum it

 i i 0  i 1  i   û i Xi
X Y  
ˆ X  
ˆ X 2

61
Cont…
 If we divide it by n we obtained

 ( X iYi )  ˆ 0 x 
ˆ1  X i2  (uˆi X i )

n n n

 If we now impose the condition  (û i Xi )  0

 Our second normal equation become


 ( X Y )  ˆ
i i
x  ˆ1 
X i2
0
n n
5 62
Estimators
 We now have two equations and two
unknowns and we will try to find solutions for
the unknowns.
 We are in a position to solve for the estimators
i.e formula for the two alphas.
 ( X Y )  ˆ x  ˆ  X
i i i
2

0 1
n n

Y  ˆ 0  ˆ1 X
 Solving the equation
63
Con’t…
 Then the two formulas is given as:

ˆ 0  Y  ˆ1 X

ˆ1   XY  nXY
 X  nX
2 2

 These are the two estimators

64
Con’t…
 The OLS selects estimates ˆ , ˆ , ˆ … that
0 1 2

minimizes the sum of squared residuals,


summed over all the sample data points.

 The estimators obtained thus are known as the


Least Square Estimators with unbiased and
efficient properties.

65
66
67
68
69
70
71
72
• Example 2.4: Given the following sample data of three pairs of
‘Y’ (dependent variable) and ‘X’ (independent variable), find
a simple linear regression function; Y = f(X).

73
74
75
Simple Linear Regression
Example
DCOVA

 A real estate agent wishes to examine the


relationship between the selling price of a home
and its size (measured in square feet)
 A random sample of 10 houses is selected
 Dependent variable (Y) = house price in $1000s

 Independent variable (X) = square feet

Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 76


Simple Linear Regression Example:
Data DCOVA

House Price in $1000s Square Feet


(Y) (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700

Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 77


Simple Linear Regression Example:
Excel Output DCOVA
Regression Statistics
Multiple R 0.76211 The regression equation is:
R Square 0.58082
Adjusted R Square 0.52842 house price  98.24833  0.10977 (square feet)
Standard Error 41.33032
Observations 10

ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 78


Simple Linear Regression Example:
Interpretation of bo
DCOVA

house price  98.24833  0.10977 (square feet)

 b0 is the estimated mean value of Y when the


value of X is zero (if X = 0 is in the range of
observed X values)
 Because a house cannot have a square footage
of 0, b0 has no practical application

Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 79


Simple Linear Regression Example:
Interpreting b1
DCOVA

house price  98.24833  0.10977 (square feet)

 b1 estimates the change in the mean


value of Y as a result of a one-unit
increase in X
 Here, b1 = 0.10977 tells us that the mean value of a
house increases by .10977($1000) = $109.77, on
average, for each additional one square foot of size

Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 80


Simple Linear Regression Example:
Making Predictions
DCOVA
Predict the price for a house
with 2000 square feet:

house price  98.25  0.1098 (sq.ft.)

 98.25  0.1098(200 0)

 317.85
The predicted price for a house with 2000
square feet is 317.85($2,000s) = $317,850
Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 81
Mean and Variance of Parameter Estimates

Formula for mean and variance of the respective


parameter estimates and the error term are given
below

10
3.3. Evaluation of the estimates

 This stage consists of deciding whether the


estimates of the parameters are theoretically
meaningful and statistically satisfactory.

 This stage enables the econometrician to


evaluate the results of calculations and
determine the reliability of the results.

83
Con’t…
 Economic a priori criteria: These criteria are
determined by economic theory and refer to the
size and sign of the parameters of economic
relationships.
 Statistical criteria (first-order tests): These are
determined by statistical theory and aim at the
evaluation of the statistical reliability of the
estimates of the parameters of the model.
Correlation coefficient test, standard error test, t-
test, F-test, and R2-test are some of the most
commonly used statistical tests.
84
Con’t…
 Econometric criteria (second-order tests):

– These are set by the theory of econometrics and aim


at the investigation of whether the assumptions of the
econometric method employed are satisfied or not in
any particular case.
– The econometric criteria serve as a second order test
(as test of the statistical tests) i.e. they determine the
reliability of the statistical criteria.

85
Con’t….
 They help us establish whether the estimates
have the desirable properties of unbiasedness,
consistency etc.

 Econometric criteria aim at the detection of the


violation or validity of the assumptions of the
various econometric techniques.

86
Measurement of the explanatory power of
the regression model

We would like to know the explanatory power


of the regression model.
That is, how much of the variation in Y is due to
its relationship to X ?
We need to find a measure for the strength of the
relationship.
May 27, 2024 Prepared by Theodros G 87
Con’t…

May 27, 2024 Prepared by Theodros G 88


Quality of straight line fit
 How do you test whether the fit (or estimates) is
good? Or,
 How do you test the validity of a model? Or,

 What qualifies a model to be adequately


representing the data?

May 27, 2024 Prepared by Theodros G 89


Con’t…

 What is known as the ‘Test of Goodness of Fit’


method determines whether a regression model
is valid or adequately fit the data under
investigation.
 the higher the variation in the dependent
variable explained by the estimated regression
equation.
 Now the total variation in the dependent
variable, y, is equal to the explained variation in
the dependent variable plus the residual
variation.
May 27, 2024 Prepared by Theodros G 90
Con’t…
 Mathematically, it is formulated as

 Yi Y 2
  Yˆ
i Y    
2
Y  Yˆ  i i
2

 Now, the difference between Total Sum of Squares


(TSS) and Regression Sum of Squares (RSS) is Error
Sum of Squares (ESS), which is expressed as:
ESS = TSS – RSS
 By virtue of dividing both sides by TSS and
decomposing the response and the explanatory
variables, it is possible to calculate the Coefficient of
Determination, R2, as follows:
May 27, 2024 Prepared by Theodros G 91
Con’t…
 Thus coefficient of determination is given by
the:

   1  ESS  1     2
RSS Yˆi  Y Yi  Yˆi
R  2

TSS  Y  Y 
i TSS  Y  Y 
i

 Which indicates the proportion of the response


or dependent variable, y which is explained by
the independent variables in the model.

May 27, 2024 Prepared by Theodros G 92


Con’t…
There are two distinct cases that we have to consider as regards
R2.
Case I. If the regression equation explains all the variation in Y
that is, (all the observations fall on the fitted line), then making
ESS = 0. In this case, TSS = RSS and hence R2 = 1.
This implies that Yt is a perfect linear combination of Xt.
Case II. This is a case in which the regression equation explains
nothing. In this case, ESS = TSS.
This implies that RSS = R2 = 0. Since RSS = 0, we must have that
for all i.
May 27, 2024 Prepared by Theodros G 93
Con’t…
What does a coefficient of determination not
measure?
• Be aware of the misconceptions about coefficient
of determination, R2.
 R2 is not a measure of the magnitude of the slope
of the regression line.
 R2 is not a complete measure of the overall fitness
of the straight line model.
 R2 is not a verification of appropriateness or
correct specification of a fitted model.
May 27, 2024 Prepared by Theodros G 94
Hypothesis Testing of OLS Estimates
• After estimation of the parameters there are important issues to
be considered by the researcher.

• We have to know that to what extent our estimates are reliable


enough and acceptable for further purpose.

• That means, we have to evaluate the degree of representativeness


of the estimate to the true population parameter.

• Simply a model must be tested for its significance before it can be


used for any other purpose.

• In this subsection we will evaluate the reliability of model


estimated using the procedure we explained above. 95
The coefficient of determination )

measure the amount the total variation of the


dependent variable that is explained by the
explanatory variable in the model.

The total variation of the dependent variable is split in


two additive components; a part explained by the
model and a part represented by the random term.

10
97
98
• Or

99
10
Hypothesis Testing of OLS Estimates

After estimation of the parameters there are important


issues to be considered by the researcher.
We have to know that to what extent our estimates are
reliable enough and acceptable for further purpose.
That means, we have to evaluate the degree of
representativeness of the estimate to the true
population parameter.
Simply a model must be tested for its significance
before it can be used for any other purpose.

10
1
Hypothesis Testing of OLS Estimates

The significance of a model can be seen in terms of


the amount of variation in the dependent variable
that it explains and the significance of the regression
coefficients.
There are different tests that are available to test the
statistical reliability of the parameter estimates. The
following are the common ones;
• The standard error test
• The standard normal test
• The students t-test

10
2
Hypothesis Testing of OLS Estimates
1. The Standard Error Test
• This test first establishes the two hypotheses that are
going to be tested which are commonly known as the null
and alternative hypotheses.
The two hypotheses are given as follows:
• H0: βi=0
• H1: βi≠0
• The standard error test is outlined as follows:
1. Compute the standard deviations of the parameter
estimates
• This is because standard deviation is the positive square
root of the variance. 10
3
Hypothesis Testing of OLS Estimates
2. Compare the standard errors of the estimates with the
numerical values of the estimates and make decision.

A) If the standard error of the estimate is less than half of


the numerical value of the estimate, we can conclude that
the estimate is statistically significant.
That is, if , reject the null hypothesis and we can conclude
that the estimate is statistically significant.
B) If the standard error of the estimate is greater than half
of the numerical value of the estimate, the parameter
estimate is not statistically reliable. That is, if , conclude to
accept the null hypothesis and conclude that the estimate is
not statistically significant. 10
4
105
106
Hypothesis Testing of OLS Estimates

The Student t-Test

The test procedures of t-test outlined as follows;


 Set up the hypothesis. The hypotheses for testing a given
regression coefficient is given by:
 Determine the level of significance for carrying out the test. We
usually use a 5% level significance in applied econometric research
 Determine the tabulated value of t from the table with n-k degrees
of freedom, where k is the number of parameters estimated.
 Determine the calculated value of t. The test statistic (using the t-
test) is given by

• The test rule or decision is given as follows:


10
7
108
109
110
• Step 2: Choose level of significance. Level of significance is the probability
of making ‘wrong’ decision, i.e. the probability of rejecting the hypothesis
when it is actually true or the probability of committing a type I error.
• It is customary in econometric research to choose the 5% or the 1% level
of significance. This means that in making our decision we allow
(tolerate) five times out of a hundred to be ‘wrong’ i.e. reject the
111
hypothesis when it is actually true.
112
113
12.7 Inferences About the Slope
DCOVA
 The standard error of the regression slope
coefficient (b1) is estimated by

S YX S YX
Sb1  
SSX  i
(X  X ) 2

where:
Sb1 = Estimate of the standard error of the slope
SSE = Standard error of the estimate
S YX 
n2

Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 114


Inferences About the Slope:
t Test
DCOVA
 t test for a population slope
 Is there a linear relationship between X and Y?
 Null and alternative hypotheses
 H0: β1 = 0 (no linear relationship)
 H1: β1 ≠ 0 (linear relationship does exist)
 Test statistic where:
b1  β 1
t STAT  b1 = regression slope
coefficient
Sb β1 = hypothesized slope
1
Sb1 = standard
d.f.  n  2 error of the slope

Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 115


Inferences About the Slope:
t Test Example
DCOVA

House Price Estimated Regression Equation:


Square Feet
in $1000s
(x)
(y)
house price = 98.25 + 0.1098 (sq. ft.)
245 1400
312 1600
279 1700
308 1875 The slope of this model is 0.1098
199 1100
219 1550
Is there a relationship between the
405 2350 square footage of the house and its
324 2450 sales price?
319 1425
255 1700

Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 116


Inferences About the Slope:
t Test Example
H0: β1 = 0 DCOVA

From Excel output: H1: β1 ≠ 0


Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039

From Minitab output: b1 Sb1


Predictor Coef SE Coef T P
Constant 98.25 58.03 1.69 0.129
Square Feet 0.10977 0.03297 3.33 0.010
b1  β1 0.10977  0
t STAT    3.32938
b1 Sb1 Sb 0.03297
1

Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 117


Inferences About the Slope:
t Test Example
DCOVA

H0: β1 = 0
Test Statistic: tSTAT = 3.329
H1: β1 ≠ 0

d.f. = 10- 2 = 8

a/2=.025 a/2=.025
Decision: Reject H0

There is sufficient evidence


Reject H0
-tα/2
Do not reject H0
tα/2
Reject H0 that square footage affects
0
-2.3060 2.3060 3.329 house price

Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 118


Inferences About the Slope:
t Test Example
H0: β1 = 0 DCOVA
From Excel output: H1: β1 ≠ 0
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039

From Minitab output:


Predictor Coef SE Coef T P p-value
Constant 98.25 58.03 1.69 0.129
Square Feet 0.10977 0.03297 3.33 0.010

Decision: Reject H0, since p-value < α


There is sufficient evidence that
square footage affects house price.
Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 119
• iii) Confidence interval
• In order to define how close the estimate to the true parameter,
we must construct confidence interval for the true parameter,
• in other words we must establish limiting values around the
estimate with in which the true parameter is expected to lie
within a certain “degree of confidence”.
• In this respect we say that with a given probability the
population parameter will be with in the defined confidence
interval (confidence limits).
• It is customarily in econometrics to choose the 95% onfidence
level.
• This means that in repeated sampling the confidence limits,
computed from the sample, would include the true population
parameter in 95% of the cases.
• In the other 5% of the cases the population parameter will fall
outside the confidence interval. 120
121
122
Confidence Interval Estimate
for the Slope
DCOVA
Confidence Interval Estimate of the Slope:
b1  t α / 2 S b d.f. = n - 2
1

Excel Printout for House Prices:


Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

At 95% level of confidence, the confidence interval for


the slope is (0.0337, 0.1858)

Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 123


Confidence Interval Estimate
for the Slope (continued)
DCOVA
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

Since the units of the house price variable is


$1000s, we are 95% confident that the average
impact on sales price is between $33.74 and
$185.80 per square foot of house size

This 95% confidence interval does not include 0.


Conclusion: There is a significant relationship between
house price and square feet at the .05 level of significance

Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 124


Confidence Interval Estimate for the
Slope from Minitab (continued)
DCOVA
Minitab does not automatically calculate a confidence
interval for the slope but provides the quantities necessary
to use the confidence interval formula.

Predictor Coef SE Coef T P


Constant 98.25 58.03 1.69 0.129
Square Feet 0.10977 0.03297 3.33 0.010

b1  t α / 2 S b
1

Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 125


Evaluation of Estimators
So far we have now established formulae for the
estimation of α0 and α1.
Our next question would be, are these estimators good
estimators of the parameters?
We shall now show that are good estimators of α0 and
α1.
To be good estimators they have to satisfy, particularly
the conditions:
May 27, 2024 Prepared by Theodros G 126
Con’t…
 The E (ˆ 0 )   0 E (ˆ1 )  1.This
and is the
unbiasedness property of α0 and α1 .
 The variance of ̂ 0 and ̂1 are relatively small
as compared to the variances of all other
estimators of α0 and α1 is known as the
efficiency property of ̂ 0 and ̂.1

May 27, 2024 Prepared by Theodros G 127


Con’t…
Unbiasedness: - An estimator is unbiased if the
mean of its sampling distribution (i.e., the
expected value of the estimate) equals the true
or population parameter.
 Mathematically, it is given as E ( wˆ )  w, where w
is any population parameter, such as
coefficient βis in a model, mean (μ), variance
(σ), proportion (ρ), etc., and is a sample
estimate ŵ.

May 27, 2024 Prepared by Theodros G 128


Con’t…
Consistent Estimate – two conditions are required for an
estimate to be consistent.
 First, as the sample size increases, the estimator must
approach more and more to the true (or population)
parameter (technically known as asymptotic unbiasedness).
 Second, as the sample size approaches infinity in the limit
(basically as the sample size reaches to the population
size), the sampling distribution of the estimator must
collapse or become a straight vertical line with the height
of 1(maximum probability value of 1) above the value of
the true parameter.

May 27, 2024 Prepared by Theodros G 129


Con’t…

 This can be observed on the bell-shaped or


normal curve.

 As the sample size increases the variability


among the observations become less and less,
until finally no variability exists between the
sample and population, perhaps the whole
population becomes the sample itself.

May 27, 2024 Prepared by Theodros G 130


Properties of OLS Estimators

• The ideal or optimum properties that the OLS


estimates possess may be summarized by well-known
theorem known as the Gauss-Markov Theorem.
• According to this theorem, under the basic
assumptions of the classical linear regression model,
the least squares estimators are linear, unbiased and
have minimum variance (i.e. are best of all linear
unbiased estimators).
• Sometimes the theorem referred as the BLUE
theorem i.e. Best, Linear, Unbiased Estimator. An
estimator is called BLUE if:
13
1
Properties of OLS Estimators

• Linear: a linear function of the random variable, such


as, the dependent variable Y.
• Unbiased: its average or expected value is equal to
the true population parameter.
• Minimum variance: It has a minimum variance in the
class of linear and unbiased estimators. An unbiased
estimator with the least variance is known as an
efficient estimator.
• According to the Gauss-Markov theorem, the OLS
estimators possess all the BLUE properties.
13
2
Chapter
End 133
134
Types of Regression Models
Most frequently used regression models includes:

Simple regression model


Multiple regression model
Multivariate regression model
Logit, Probit, Multinomial regressions etc.

135
Cont…
Now let us see simple linear regression under this chapter

Econometric research or inquiry generally proceeds


along the following lines/stages.

 1. Specification of the model


 2. Estimation of the model
 3. Evaluation of the estimates
 4. Evaluation of the forecasting power of the estimated
model
136
Specification the model
Starting with the postulated theoretical
relationships among economic variables:

Let Y = α + α X + u
i 0 1 i i 1

Where Y dependent variable


X independent variable
U disturbance term
137
138
 For the detail..

139
140
141
b0

142
4

143
144
145
146
147

You might also like