Econometrics Unit 3 Tedy Best

The Simple Regression
Model
Chapter 3
1
• Learning Objectives
• After completing this topic the students are able to:
• Determine the significance of the predictor variable in explaining variability in the

dependent variable;
• Predict values of the dependent variable for given values of the explanatory variable;
• Use linear regression methods to estimate empirical relationships;
• Evaluate and mitigate the effects of departures from classical statistical assumptions
on linear regression estimates; and
• Critically evaluate simple econometric analyses.
• Keywords: Simple linear regression model, regression parameters, regression line, residuals,
principle of least squares, least squares estimates, least squares line, fitted values, predicted
values, coefficient of determination, least squares estimators hypotheses on regression 2
parameters, confidence intervals for regression.
Outline
3.1. Introduction of Simple regression

3.2. Ordinary Least Square Method (OLS) and
Classical Assumptions
3.3. Hypothesis Testing of OLS Estimates
3
3.1. Simple Linear Regression
• Our objective is to study the relationship between
two variables X and Y.
• One way is by means of regression.
• Regression analysis is the process of estimating a
functional relationship between X and Y. A
regression equation is often used to predict a value
of Y for a given value of X.
• Another way to study relationship between two
variables is correlation. It involves measuring the
direction and the strength of the linear
relationship.
4
Examples
• Multiple Linear Regression:
Yi   0  1X1i   2 X 2i   i
• Polynomial Linear Regression:
Yi   0  1X i   2 X 2i   i
• Linear Regression:
log10 (Yi )   0  1X i   2 exp(X i )   i
• Nonlinear Regression:
Yi   0 /(1  1 exp(  2 X i ))   i
Linear or nonlinear in parameters 5
Cont…
Most frequently used regression models includes:
Simple regression model

Multiple regression model
Multivariate regression model
Logit, Probit, Multinomial regressions etc.
6
Cont…
Now let us see simple linear regression under this chapter
Econometric research or inquiry generally proceeds

along the following lines/stages.
1. Specification of the model

2. Estimation of the model
3. Evaluation of the estimates
4. Evaluation of the forecasting power of the estimated
model
7
Specification the model
Starting with the postulated theoretical relationships
among economic variables:
Let Y = α + α1 X + u
i 0 i i 1
Where Y dependent variable

X independent variable
U disturbance term
8
The simple linear regression model
• We consider the modelling between the dependent and
one independent variable.
• When there is only one independent variable in the

linear regression model, the model is generally termed
as a simple linear regression model.
• When there are more than one independent variable in

the model, then the linear model is termed as the
multiple linear regression model.
9
Cont…
What is simple regression model?
Simple regression model is a statistical equation that

characterizes the relationship between a dependent
variable and only one independent variable.
10
• The linear model
• Consider a simple linear regression model
X+
• where y is termed as the dependent or study variable

and X is termed as the independent or explanatory
variable.
• The terms and are the parameters of the model.
• The parameter is termed as an intercept term, and
• The parameter is termed as the slope parameter.
• These parameters are usually called as regression
coefficients. 11
Cont….
Using specific mathematical expression for one
variable,
Normally the explained variable is designated by y
and the explanatory variables by x.
y = f (x) + ε
where y = production of maize
x = land size
or
y= sales
x = advertising expenditure
12
Cont…
Explain the variables involved in a regression model.
These variables are observable, unobservable and
unknown parameters.
1. Observable Variables –
These are the variables in which their values are
collected from the field through questionnaires,
interviews and other means of data collection
mechanisms.
yi = the ith value of the dependent variable.
x = the ith value of the independent variable.
13
Cont…
2. Unobservable variables –
These are the values that will be determined from

the observations and estimated values of the data
set.
The ε is the random error term for the ith member
of the population is also called –
– The disturbance term
– The stochastic term
14
Cont…
The stochastic error term measures the residual
variance in Y not covered by X.
This is akin to saying there is measurement error

and our predictions/models will not be perfect.
The more X variables we add to a model, the

lower the error of estimation.
15
Cont….
3.Unknown Parameters (or regression coefficients)
 The regression coefficients are the values that will

be estimated from the sample data of dependent
and independent variables.
16
Cont….
Why is the disturbance term ε?
The reason is that we can’t hope to capture

every influence on an economic variable in
the model, no matter how elaborate it is
This name is given to it because it disturbs

an otherwise stable relationship
17
Cont….
Contributors to ε
– Measurement errors
– Exclusion of important variables
– Simultaneity
In other word, Why do we need to include the stochastic

(random) component, for example in the consumption
function?
— Omission of variables leads to misspecification problem. For example, income is not the
only determinants of consumption.
— There may be measurement error in collecting data.
— We may use poor proxy variables.
— The functional form may not be correct.
— There is randomness on human behavior. 18
Some details…
• The unobservable error component accounts for the
failure of data to lie on a straight line.
• It represents the difference between the true and
observed realization of y.
• There can be several reasons for such difference,
• e.g., the effect of all deleted variables in the model,
measurement error etc.
• We assume that is observed as an independent and
identically distributed random variable with mean zero
and constant variance .
• Later, we will additionally assume that is normally
19
• The independent variables are viewed as controlled by
the experimenter,
• so it is considered as non-stochastic whereas is viewed

as random variable with y
X
and
2
𝑉𝑎𝑟 ( 𝑦 )= 𝜎
20
SIMPLE REGRESSION MODEL
Y  1   2 X
Q4
Q3
Q2
Q1
1
X1 X2 X3 X4 X
If the relationship were an exact one, the observations would lie on a straight line and we
would have no trouble obtaining accurate estimates of b1 and b2.
21 3
• The independent variables are viewed as controlled by
the experimenter,
• so it is considered as non-stochastic whereas is viewed

as random variable with y
X
and
2
𝑉𝑎𝑟 ( 𝑦 )= 𝜎
22
• Sometimes X can also be a random variable.
• In such a case, instead of the sample mean and sample
variance of y,
• we consider the conditional mean of y given X = .
(Meaning expected value of y at a given level of x=
and
2
𝑉𝑎𝑟 ( 𝑦 / 𝑥)=𝜎
23
• Regression is estimation or prediction of the average value of
a dependent variable on the basis of the fixed values of other
variables.
• In regression, we have stochastic dependent variable and non-

stochastic independent variable (fixed)
24
25
26
27
P4
Y
Y  1   2 X
P1 Q4
Q3
Q2
Q1 P3
1 P2
X1 X2 X3 X4 X
In practice, most economic relationships are not exact and the actual values of Y are
different from those corresponding to the straight line.
28 4
SIMPLE REGRESSION MODEL
P4
Y
Y  1   2 X
P1 Q4
u1 Q3
Q2
Q1 P3
1 P2
u = disturbance term
1   2 X1
X1 X2 X3 X4 X
Each value of Y thus has a non-random component, b1 + b2X, and a random component, u. The first
observation has been decomposed into these two components.
29 6
Simple Linear Regression
Model DCOVA
(continued)
Y Yi  β0  β1Xi  ε i
Observed Value
of Y for Xi
εi Slope = β1
Predicted Value Random Error
of Y for Xi
for this Xi value
Intercept = β0
Xi X
Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 30
Simple Linear Regression Equation
(Prediction Line)
DCOVA
The simple linear regression equation provides an

estimate of the population regression line
Estimated
(or predicted) Estimate of Estimate of the
Y value for the regression regression slope
observation i
intercept
Value of X for
Ŷi  b0  b1Xi
observation i

• The parameters , and are generally unknown in
practice and ε is unobserved
• The determination of the statistical/econometric model

depends on the determination (i.e., estimation ) of ,
and .
• In order to know the values of these parameters, n

pairs of observations ()(= 1…..n) on (X, y) are
observed/collected and are used to determine these
unknown parameters.
32
• Various methods of estimation can be used to determine
the estimates of the parameters.
• Among them, the methods of least squares and maximum

likelihood are the popular methods of estimation.
33
3.2. Ordinary Least Square Method (OLS) and Classical
Assumptions
• There are two major ways of estimating regression
functions.
• These are the ordinary least square method and maximum
likelihood (MLH) method.
• Both the methods are basically similar to their application
in estimations.
• The ordinary least square method is the easiest and the
most commonly used method as opposed to the maximum
likelihood (MLH) method which is limited by its
assumptions.
34
Cont…
• For instance, the MLH method is valid only for large sample as
opposed to the OLS method which can be applied to smaller
samples.
• Owing to this merit, our discussion mainly focuses on the
ordinary least square (OLS).
• The (Ordinary) least square (OLS) method of estimating
parameters or regression function is about finding or
estimating values of the parameters ( of the simple
linear regression function given below for which the errors or
35
Cont…
Estimation
The sample regression line is given as:
Yî  ˆ 0  ˆ1 X i  ̂i 2
Ordinary Least Squares (OLS) Method :–

determines the best fitting straight line as the line
that minimizes the Sum of Squares (SS) between
observations, Y, and fitted values,Yˆ.
i
36
Cont…
The difference between the observed value and
the fitted value is known as the error (or
residual).
Mathematically, the error term is expressed as,
î  Yi  Yî
37
Ordinary Least Square (OLS)
OLS is the technique used to estimate a line that will
minimize the error.
The difference between the predicted and the actual

values of Y
38
Cont…
39
Cont…
Ordinary Least Squares method chooses
estimates of the parameters (α i) by minimizing
the sum of squared differences between the
actual y’s and the estimated yˆ’s.
i
Before estimating the parametrs let us look the

OLS assumptions.
40
i, Classical Assumptions of OLS
1. The error terms or the disturbance terms ‘‘Ui’’ are not correlated.
This means that there is no systematic variation or relation among the

value of the error terms (Ui and Uj);
Thus, Cov (Ui , Uj) =0
The value in which the error term assumed in one period does not
depend on the value in which it assumed in any other period.
This assumption is known as the assumption of no autocorrelation or

non-serial correlation
If the covariance is correlated it causes an autocorrelation problem.41

Classical Assumptions of OLS
2. The disturbance terms ‘‘Ui’’ have zero mean.

• The deviations of the values of some of the disturbance terms
are negative; some are zero and some are positive and the sum
or the average is zero.
• This is given by the following identities.
U i
• E(Ui) = n
 0.
Multiplying both sides by (sample size ‘n’)
we obtain the following. E(Ui) =  U i  0 . 42
• If this condition is not met, then the position of the regression

function (or curve) will not be the same as where it is supposed to
be.
• This results in an upward (if the mean of the error term or residual
term is positive) or down ward (if the mean of the error term or
residual term is negative) shift in the regression function.
• this estimated models will be biased and cause the regression

function to shift. For instance, if (or positive) it is going to
43
Figure 2: Regression Function/curve if the mean of error term is not zero
44
3. The disturbance terms have constant variance in each period.
• This assumption is known as the assumption of homoscedasticity.
• The constant variance itself is called homoscedastic variance.
• If this condition is not fulfilled or if the variance of the error terms

varies as sample size changes or as the value of explanatory
variables changes, then this leads to Heteroscedasticity problem. 45
The dependent variable Yi is normally distributed
46
4. Explanatory variables ‘‘Xi’’ and disturbance

terms ‘‘Ui’’ are uncorrelated or independent.
This means there is no correlation between the

random variable and the explanatory variable.
If two variables are unrelated their covariance

is zero.
47
5. The explanatory variable Xi is fixed in repeated

samples.
Each value of Xi does not vary for instance owing

to change in sample size.
This means the explanatory variables are non-

random and hence distributional free variable.
48
6. Linearity of the model in parameters.
The classical assumed that the model should be linear in the

parameters regardless of whether the explanatory and the
dependent variables are linear or not.
This is because if the parameters are non-linear it is difficult to

estimate them since their value is not known but you are given
with the data of the dependent and independent variable.
49
• Example 1. is linear in both parameters and

the variables, so it Satisfies the assumption.
• Example 2. is linear only in the parameters.

Since the the classicals worry on the parameters, the model
satisfies the assumption.
• What is important is transforming the data as required.
50
7. Normality assumption
• The disturbance term Ui is assumed to have a normal distribution with zero

mean and a constant variance.
• This assumption is given as follows:
• This assumption is a combination of zero mean of error term assumption and

homoscedasticity assumption.
• This assumption or combination of assumptions is used in testing hypotheses

about significance of parameters.
• It is also useful in both estimating parameters and testing their significance

51
8. Independence - the observations or the explanatory variables, x‟s,

are statistically independent of each another.
 Mathematically, it means the covariance between any two

observations is zero.
 Meaning, the x observations are independent of each other, and

denoted as Cov(xi, xj) = 0.
 However, it is not unusual for there to be some association between

the independent variables.
 This is to say that low correlations do not lead to inconsistence of

52
parameter estimates.
 A violation of independence assumption (Cov(xi, xj )=0 )

indicates that there is multicollinearity problem
among the explanatory variables, which leads to a
very high value of coefficient of determination and
inconsistent parameter estimates.
53
• We can now use the above assumptions to derive the following basic
concepts.
A. The dependent variable Yi is normally distributed
54
Con’t…
Normal distribution - for any fixed explanatory
value, x, the response, y, has a normal distribution.
 Generally, the observations are normally distributed,
if the observations are graphed by a bell-shaped or
normal curve appears with zero mean.
• A violation of this assumption occurs when there are
outliers in data set, and leads to problems of wider
confidence intervals and wrong hypothesis testing.
55
Con’t…
Homoskedasticity (or constant variance) - the
variance of the dependent variable is the same for
any independent observations or explanatory
variables.
 There exists a constant variance for the given
regression model.
 Mathematically, it means the variance of the response
variable for all given observations does not vary.
 It is denoted as var( y)2 if y is given as a
dependent or response variable.
56
Con’t…
B,
Existence - for any fixed value of the independent variable x,
the dependent variable, y, is a random variable with a certain
probability distribution, having finite means and variance. A
violation of this assumption may indicate that there is no
relationship between the variables involved.
Continuity – the dependent variable is a continuous random
variable, whereas values of the independent variable are
fixed values; they can take continuous or discrete values.
Cautions must be taken that if the dependent variable is not
continuous, then other type of regression models such as
probit, logit, tobit, etc. should be used accordingly.
57
Cont…
ii, Deriving Ordinary list square estimate
 Derivation of the normal equation
Let Y = α + α X + u
i 0 1 i i 3
 The first normal equation is obtained as follows.
58
Con’t…
 Sum the equation 3 over all observation
 Y   (
i 0   1 X i  u i )  n 0   1  X i   u i
 Divide by n
Yi Xi ui
 n   0  1  n   n
 Then impose sum of error to zero (by assumption)

u i n  0, then Y = â + b̂X
Y   0  1 X
4 59
N:B
60
Con’t…
 The second normal equation
 Now returning to the equation 1 and multiplying
both sides by Xi gives us
X i Yi   0 X i  1 X i  û i X i
ˆ ˆ 2
And sum it
 i i 0  i 1  i   û i Xi
X Y  
ˆ X  
ˆ X 2
61
Cont…
 If we divide it by n we obtained
 ( X iYi )  ˆ 0 x 
ˆ1  X i2  (uî X i )

n n n
 If we now impose the condition  (û i Xi )  0
 Our second normal equation become

 ( X Y )  ˆ
i i
x  ˆ1 
X i2
0
n n
5 62
Estimators
 We now have two equations and two
unknowns and we will try to find solutions for
the unknowns.
 We are in a position to solve for the estimators
i.e formula for the two alphas.
 ( X Y )  ˆ x  ˆ  X
i i i
2
0 1
n n
Y  ˆ 0  ˆ1 X
 Solving the equation
63
Con’t…
 Then the two formulas is given as:
ˆ 0  Y  ˆ1 X
ˆ1   XY  nXY
 X  nX
2 2
 These are the two estimators
64
Con’t…
 The OLS selects estimates ˆ , ˆ , ˆ … that
0 1 2
minimizes the sum of squared residuals,

summed over all the sample data points.
 The estimators obtained thus are known as the

Least Square Estimators with unbiased and
efficient properties.
65
66
67
68
69
70
71
72
• Example 2.4: Given the following sample data of three pairs of
‘Y’ (dependent variable) and ‘X’ (independent variable), find
a simple linear regression function; Y = f(X).
73
74
75
Simple Linear Regression
Example
DCOVA
 A real estate agent wishes to examine the

relationship between the selling price of a home
and its size (measured in square feet)
 A random sample of 10 houses is selected
 Dependent variable (Y) = house price in $1000s
 Independent variable (X) = square feet

Simple Linear Regression Example:
Data DCOVA
House Price in $1000s Square Feet

(Y) (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700

Excel Output DCOVA
Regression Statistics
Multiple R 0.76211 The regression equation is:
R Square 0.58082
Adjusted R Square 0.52842 house price  98.24833  0.10977 (square feet)
Standard Error 41.33032
Observations 10
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

Interpretation of bo
DCOVA
house price  98.24833  0.10977 (square feet)
 b0 is the estimated mean value of Y when the

value of X is zero (if X = 0 is in the range of
observed X values)
 Because a house cannot have a square footage
of 0, b0 has no practical application

Interpreting b1
DCOVA
house price  98.24833  0.10977 (square feet)
 b1 estimates the change in the mean

value of Y as a result of a one-unit
increase in X
 Here, b1 = 0.10977 tells us that the mean value of a
house increases by .10977($1000) = $109.77, on
average, for each additional one square foot of size

Making Predictions
DCOVA
Predict the price for a house
with 2000 square feet:
house price  98.25  0.1098 (sq.ft.)
 98.25  0.1098(200 0)
 317.85
The predicted price for a house with 2000
square feet is 317.85($2,000s) = $317,850
Mean and Variance of Parameter Estimates
Formula for mean and variance of the respective

parameter estimates and the error term are given
below
10
3.3. Evaluation of the estimates
 This stage consists of deciding whether the

estimates of the parameters are theoretically
meaningful and statistically satisfactory.
 This stage enables the econometrician to

evaluate the results of calculations and
determine the reliability of the results.
83
Con’t…
 Economic a priori criteria: These criteria are
determined by economic theory and refer to the
size and sign of the parameters of economic
relationships.
 Statistical criteria (first-order tests): These are
determined by statistical theory and aim at the
evaluation of the statistical reliability of the
estimates of the parameters of the model.
Correlation coefficient test, standard error test, t-
test, F-test, and R2-test are some of the most
commonly used statistical tests.
84
Con’t…
 Econometric criteria (second-order tests):
– These are set by the theory of econometrics and aim

at the investigation of whether the assumptions of the
econometric method employed are satisfied or not in
any particular case.
– The econometric criteria serve as a second order test
(as test of the statistical tests) i.e. they determine the
reliability of the statistical criteria.
85
Con’t….
 They help us establish whether the estimates
have the desirable properties of unbiasedness,
consistency etc.
 Econometric criteria aim at the detection of the

violation or validity of the assumptions of the
various econometric techniques.
86
Measurement of the explanatory power of
the regression model
We would like to know the explanatory power

of the regression model.
That is, how much of the variation in Y is due to
its relationship to X ?
We need to find a measure for the strength of the
relationship.
May 27, 2024 Prepared by Theodros G 87
Con’t…

Quality of straight line fit
 How do you test whether the fit (or estimates) is
good? Or,
 How do you test the validity of a model? Or,
 What qualifies a model to be adequately

representing the data?

Con’t…
 What is known as the ‘Test of Goodness of Fit’

method determines whether a regression model
is valid or adequately fit the data under
investigation.
 the higher the variation in the dependent
variable explained by the estimated regression
equation.
 Now the total variation in the dependent
variable, y, is equal to the explained variation in
the dependent variable plus the residual
variation.
Con’t…
 Mathematically, it is formulated as
 Yi Y 2
  Yˆ
i Y    
2
Y  Yˆ  i i
2
 Now, the difference between Total Sum of Squares

(TSS) and Regression Sum of Squares (RSS) is Error
Sum of Squares (ESS), which is expressed as:
ESS = TSS – RSS
 By virtue of dividing both sides by TSS and
decomposing the response and the explanatory
variables, it is possible to calculate the Coefficient of
Determination, R2, as follows:
Con’t…
 Thus coefficient of determination is given by
the:
   1  ESS  1     2
RSS Yî  Y Yi  Yî
R  2

TSS  Y  Y 
i TSS  Y  Y 
i
 Which indicates the proportion of the response

or dependent variable, y which is explained by
the independent variables in the model.

Con’t…
There are two distinct cases that we have to consider as regards
R2.
Case I. If the regression equation explains all the variation in Y
that is, (all the observations fall on the fitted line), then making
ESS = 0. In this case, TSS = RSS and hence R2 = 1.
This implies that Yt is a perfect linear combination of Xt.
Case II. This is a case in which the regression equation explains
nothing. In this case, ESS = TSS.
This implies that RSS = R2 = 0. Since RSS = 0, we must have that
for all i.
Con’t…
What does a coefficient of determination not
measure?
• Be aware of the misconceptions about coefficient
of determination, R2.
 R2 is not a measure of the magnitude of the slope
of the regression line.
 R2 is not a complete measure of the overall fitness
of the straight line model.
 R2 is not a verification of appropriateness or
correct specification of a fitted model.
Hypothesis Testing of OLS Estimates
• After estimation of the parameters there are important issues to
be considered by the researcher.
• We have to know that to what extent our estimates are reliable

enough and acceptable for further purpose.
• That means, we have to evaluate the degree of representativeness

of the estimate to the true population parameter.
• Simply a model must be tested for its significance before it can be

used for any other purpose.
• In this subsection we will evaluate the reliability of model

estimated using the procedure we explained above. 95
The coefficient of determination )
measure the amount the total variation of the

dependent variable that is explained by the
explanatory variable in the model.
The total variation of the dependent variable is split in

two additive components; a part explained by the
model and a part represented by the random term.
10
97
98
• Or
99
10
After estimation of the parameters there are important

issues to be considered by the researcher.
We have to know that to what extent our estimates are
reliable enough and acceptable for further purpose.
That means, we have to evaluate the degree of
representativeness of the estimate to the true
population parameter.
Simply a model must be tested for its significance
before it can be used for any other purpose.
10
1
The significance of a model can be seen in terms of

the amount of variation in the dependent variable
that it explains and the significance of the regression
coefficients.
There are different tests that are available to test the
statistical reliability of the parameter estimates. The
following are the common ones;
• The standard error test
• The standard normal test
• The students t-test
10
2
1. The Standard Error Test
• This test first establishes the two hypotheses that are
going to be tested which are commonly known as the null
and alternative hypotheses.
The two hypotheses are given as follows:
• H0: βi=0
• H1: βi≠0
• The standard error test is outlined as follows:
1. Compute the standard deviations of the parameter
estimates
• This is because standard deviation is the positive square
root of the variance. 10
3
2. Compare the standard errors of the estimates with the
numerical values of the estimates and make decision.
A) If the standard error of the estimate is less than half of

the numerical value of the estimate, we can conclude that
the estimate is statistically significant.
That is, if , reject the null hypothesis and we can conclude
that the estimate is statistically significant.
B) If the standard error of the estimate is greater than half
of the numerical value of the estimate, the parameter
estimate is not statistically reliable. That is, if , conclude to
accept the null hypothesis and conclude that the estimate is
not statistically significant. 10
4
105
106
The Student t-Test
The test procedures of t-test outlined as follows;

 Set up the hypothesis. The hypotheses for testing a given
regression coefficient is given by:
 Determine the level of significance for carrying out the test. We
usually use a 5% level significance in applied econometric research
 Determine the tabulated value of t from the table with n-k degrees
of freedom, where k is the number of parameters estimated.
 Determine the calculated value of t. The test statistic (using the t-
test) is given by
• The test rule or decision is given as follows:

10
7
108
109
110
• Step 2: Choose level of significance. Level of significance is the probability
of making ‘wrong’ decision, i.e. the probability of rejecting the hypothesis
when it is actually true or the probability of committing a type I error.
• It is customary in econometric research to choose the 5% or the 1% level
of significance. This means that in making our decision we allow
(tolerate) five times out of a hundred to be ‘wrong’ i.e. reject the
111
hypothesis when it is actually true.
112
113
12.7 Inferences About the Slope
DCOVA
 The standard error of the regression slope
coefficient (b1) is estimated by
S YX S YX
Sb1  
SSX  i
(X  X ) 2
where:
Sb1 = Estimate of the standard error of the slope
SSE = Standard error of the estimate
S YX 
n2

Inferences About the Slope:
t Test
DCOVA
 t test for a population slope
 Is there a linear relationship between X and Y?
 Null and alternative hypotheses
 H0: β1 = 0 (no linear relationship)
 H1: β1 ≠ 0 (linear relationship does exist)
 Test statistic where:
b1  β 1
t STAT  b1 = regression slope
coefficient
Sb β1 = hypothesized slope
1
Sb1 = standard
d.f.  n  2 error of the slope

t Test Example
DCOVA
House Price Estimated Regression Equation:

Square Feet
in $1000s
(x)
(y)
house price = 98.25 + 0.1098 (sq. ft.)
245 1400
312 1600
279 1700
308 1875 The slope of this model is 0.1098
199 1100
219 1550
Is there a relationship between the
405 2350 square footage of the house and its
324 2450 sales price?
319 1425
255 1700

t Test Example
H0: β1 = 0 DCOVA
From Excel output: H1: β1 ≠ 0

Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
From Minitab output: b1 Sb1

Predictor Coef SE Coef T P
Constant 98.25 58.03 1.69 0.129
Square Feet 0.10977 0.03297 3.33 0.010
b1  β1 0.10977  0
t STAT    3.32938
b1 Sb1 Sb 0.03297
1

t Test Example
DCOVA
H0: β1 = 0
Test Statistic: tSTAT = 3.329
H1: β1 ≠ 0
d.f. = 10- 2 = 8
a/2=.025 a/2=.025
Decision: Reject H0
There is sufficient evidence

Reject H0
-tα/2
Do not reject H0
tα/2
Reject H0 that square footage affects
0
-2.3060 2.3060 3.329 house price

t Test Example
H0: β1 = 0 DCOVA
From Excel output: H1: β1 ≠ 0
Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039
From Minitab output:

Predictor Coef SE Coef T P p-value
Constant 98.25 58.03 1.69 0.129
Square Feet 0.10977 0.03297 3.33 0.010
Decision: Reject H0, since p-value < α

There is sufficient evidence that
square footage affects house price.
• iii) Confidence interval
• In order to define how close the estimate to the true parameter,
we must construct confidence interval for the true parameter,
• in other words we must establish limiting values around the
estimate with in which the true parameter is expected to lie
within a certain “degree of confidence”.
• In this respect we say that with a given probability the
population parameter will be with in the defined confidence
interval (confidence limits).
• It is customarily in econometrics to choose the 95% onfidence
level.
• This means that in repeated sampling the confidence limits,
computed from the sample, would include the true population
parameter in 95% of the cases.
• In the other 5% of the cases the population parameter will fall
outside the confidence interval. 120
121
122
Confidence Interval Estimate
for the Slope
DCOVA
Confidence Interval Estimate of the Slope:
b1  t α / 2 S b d.f. = n - 2
1
Excel Printout for House Prices:

Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
At 95% level of confidence, the confidence interval for

the slope is (0.0337, 0.1858)

Confidence Interval Estimate
for the Slope (continued)
DCOVA
Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580
Since the units of the house price variable is

$1000s, we are 95% confident that the average
impact on sales price is between $33.74 and
$185.80 per square foot of house size
This 95% confidence interval does not include 0.

Conclusion: There is a significant relationship between
house price and square feet at the .05 level of significance

Confidence Interval Estimate for the
Slope from Minitab (continued)
DCOVA
Minitab does not automatically calculate a confidence
interval for the slope but provides the quantities necessary
to use the confidence interval formula.
Predictor Coef SE Coef T P

Constant 98.25 58.03 1.69 0.129
Square Feet 0.10977 0.03297 3.33 0.010
b1  t α / 2 S b
1

Evaluation of Estimators
So far we have now established formulae for the
estimation of α0 and α1.
Our next question would be, are these estimators good
estimators of the parameters?
We shall now show that are good estimators of α0 and
α1.
To be good estimators they have to satisfy, particularly
the conditions:
Con’t…
 The E (ˆ 0 )   0 E (ˆ1 )  1.This
and is the
unbiasedness property of α0 and α1 .
 The variance of ̂ 0 and ̂1 are relatively small
as compared to the variances of all other
estimators of α0 and α1 is known as the
efficiency property of ̂ 0 and ̂.1

Con’t…
Unbiasedness: - An estimator is unbiased if the
mean of its sampling distribution (i.e., the
expected value of the estimate) equals the true
or population parameter.
 Mathematically, it is given as E ( wˆ )  w, where w
is any population parameter, such as
coefficient βis in a model, mean (μ), variance
(σ), proportion (ρ), etc., and is a sample
estimate ŵ.

Con’t…
Consistent Estimate – two conditions are required for an
estimate to be consistent.
 First, as the sample size increases, the estimator must
approach more and more to the true (or population)
parameter (technically known as asymptotic unbiasedness).
 Second, as the sample size approaches infinity in the limit
(basically as the sample size reaches to the population
size), the sampling distribution of the estimator must
collapse or become a straight vertical line with the height
of 1(maximum probability value of 1) above the value of
the true parameter.

Con’t…
 This can be observed on the bell-shaped or

normal curve.
 As the sample size increases the variability

among the observations become less and less,
until finally no variability exists between the
sample and population, perhaps the whole
population becomes the sample itself.

Properties of OLS Estimators
• The ideal or optimum properties that the OLS

estimates possess may be summarized by well-known
theorem known as the Gauss-Markov Theorem.
• According to this theorem, under the basic
assumptions of the classical linear regression model,
the least squares estimators are linear, unbiased and
have minimum variance (i.e. are best of all linear
unbiased estimators).
• Sometimes the theorem referred as the BLUE
theorem i.e. Best, Linear, Unbiased Estimator. An
estimator is called BLUE if:
13
1
Properties of OLS Estimators
• Linear: a linear function of the random variable, such

as, the dependent variable Y.
• Unbiased: its average or expected value is equal to
the true population parameter.
• Minimum variance: It has a minimum variance in the
class of linear and unbiased estimators. An unbiased
estimator with the least variance is known as an
efficient estimator.
• According to the Gauss-Markov theorem, the OLS
estimators possess all the BLUE properties.
13
2
Chapter
End 133
134
Types of Regression Models
Most frequently used regression models includes:
Simple regression model

Multiple regression model
Multivariate regression model
Logit, Probit, Multinomial regressions etc.
135
Cont…
Now let us see simple linear regression under this chapter
Econometric research or inquiry generally proceeds

along the following lines/stages.
 1. Specification of the model

 2. Estimation of the model
 3. Evaluation of the estimates
 4. Evaluation of the forecasting power of the estimated
model
136
Specification the model
Starting with the postulated theoretical
relationships among economic variables:
Let Y = α + α X + u
i 0 1 i i 1
Where Y dependent variable

X independent variable
U disturbance term
137
138
 For the detail..
139
140
141
b0
142
4
143
144
145
146
147

Econometrics Unit 3 Tedy Best

Uploaded by

Copyright:

Available Formats

Econometrics Unit 3 Tedy Best

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Econometrics Unit 3 Tedy Best

Uploaded by

Copyright:

Available Formats

The Simple Regression

• Determine the significance of the predictor variable in explaining variability in the

• Use linear regression methods to estimate empirical relationships;

• Critically evaluate simple econometric analyses.

3.1. Introduction of Simple regression

Most frequently used regression models includes:

Simple regression model

Econometric research or inquiry generally proceeds

1. Specification of the model

Where Y dependent variable

• When there is only one independent variable in the

• When there are more than one independent variable in

What is simple regression model?

Simple regression model is a statistical equation that

• where y is termed as the dependent or study variable

These are the values that will be determined from

This is akin to saying there is measurement error

The more X variables we add to a model, the

 The regression coefficients are the values that will

The reason is that we can’t hope to capture

This name is given to it because it disturbs

In other word, Why do we need to include the stochastic

• so it is considered as non-stochastic whereas is viewed

• so it is considered as non-stochastic whereas is viewed

• In regression, we have stochastic dependent variable and non-

The simple linear regression equation provides an

Copyright © 2016 Pearson Education, Ltd. Chapter 12, Slide 31

• The determination of the statistical/econometric model

• In order to know the values of these parameters, n

• Among them, the methods of least squares and maximum

Yˆi  ˆ 0  ˆ1 X i  ̂i 2

Ordinary Least Squares (OLS) Method :–

Mathematically, the error term is expressed as,

The difference between the predicted and the actual

Before estimating the parametrs let us look the

This means that there is no systematic variation or relation among the

Thus, Cov (Ui , Uj) =0

This assumption is known as the assumption of no autocorrelation or

If the covariance is correlated it causes an autocorrelation problem.41

2. The disturbance terms ‘‘Ui’’ have zero mean.

• This is given by the following identities.

• If this condition is not met, then the position of the regression

• this estimated models will be biased and cause the regression

• This assumption is known as the assumption of homoscedasticity.

• The constant variance itself is called homoscedastic variance.

• If this condition is not fulfilled or if the variance of the error terms

4. Explanatory variables ‘‘Xi’’ and disturbance

This means there is no correlation between the

If two variables are unrelated their covariance

5. The explanatory variable Xi is fixed in repeated

Each value of Xi does not vary for instance owing

This means the explanatory variables are non-

6. Linearity of the model in parameters.

The classical assumed that the model should be linear in the

This is because if the parameters are non-linear it is difficult to

• Example 1. is linear in both parameters and

• Example 2. is linear only in the parameters.

• What is important is transforming the data as required.

• The disturbance term Ui is assumed to have a normal distribution with zero