Nothing Special   »   [go: up one dir, main page]

Econometrics Slides

Download as pdf or txt
Download as pdf or txt
You are on page 1of 289

The Nature of Econometrics

and Economic Data

Chapter 1

Wooldridge: Introductory Econometrics:


A Modern Approach, 5e

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Nature of Econometrics
and Economic Data
What is econometrics?

Econometrics = use of statistical methods to analyze economic data

Econometricians typically analyze nonexperimental data

Typical goals of econometric analysis

Estimating relationships between economic variables

Testing economic theories and hypotheses

Forecasting economic variables

Evaluating and implementing government and business policy

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Nature of Econometrics
and Economic Data
Steps in econometric analysis

1) Economic model (this step is often skipped)

2) Econometric model

Economic models

Maybe micro- or macromodels

Often use optimizing behaviour, equilibrium modeling, …

Establish relationships between economic variables

Examples: demand equations, pricing equations, …

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Nature of Econometrics
and Economic Data
Economic model of crime (Becker (1968))
Derives equation for criminal activity based on utility maximization

Hours spent in
criminal activities

Age
„Wage“ of cri-
minal activities Probability of Expected
Wage for legal
Other Probability of conviction if sentence
employment
income getting caught caught

Functional form of relationship not specified

Equation could have been postulated without economic modeling

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Nature of Econometrics
and Economic Data
Model of job training and worker productivity
What is effect of additional training on worker productivity?
Formal economic theory not really needed to derive equation:

Hourly wage

Years of formal
education Weeks spent
Years of work- in job training
force experience

Other factors may be relevant, but these are the most important (?)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Nature of Econometrics
and Economic Data
Econometric model of criminal activity
The functional form has to be specified
Variables may have to be approximated by other quantities

Measure of cri- Wage for legal Other Frequency of


minal activity employment income prior arrests
Unobserved deter-
minants of criminal
activity

e.g. moral character,


wage in criminal activity,
Frequency of Average sentence Age family background …
conviction length after conviction

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Nature of Econometrics
and Economic Data
Econometric model of job training and worker productivity

Unobserved deter-
minants of the wage

e.g. innate ability,


Hourly wage Years of formal Years of work- Weeks spent quality of education,
education force experience in job training family background …

Most of econometrics deals with the specification of the error

Econometric models may be used for hypothesis testing

For example, the parameter represents effect of training on wage

How large is this effect? Is it different from zero?

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Nature of Econometrics
and Economic Data
Econometric analysis requires data

Different kinds of economic data sets

Cross-sectional data

Time series data

Pooled cross sections

Panel/Longitudinal data

Econometric methods depend on the nature of the data used

Use of inappropriate methods may lead to misleading results

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Nature of Econometrics
and Economic Data
Cross-sectional data sets

Sample of individuals, households, firms, cities, states, countries,

or other units of interest at a given point of time/in a given period

Cross-sectional observations are more or less independent

For example, pure random sampling from a population

Sometimes pure random sampling is violated, e.g. units refuse to

respond in surveys, or if sampling is characterized by clustering

Cross-sectional data typically encountered in applied microeconomics

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Nature of Econometrics
and Economic Data
Cross-sectional data set on wages and other characteristics

Indicator variables
(1=yes, 0=no)

Observation number Hourly wage

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Nature of Econometrics
and Economic Data
Cross-sectional data on growth rates and country characteristics

Growth rate of real Government consumtion Adult secondary


per capita GDP as percentage of GDP education rates

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Nature of Econometrics
and Economic Data
Time series data
Observations of a variable or several variables over time

For example, stock prices, money supply, consumer price index,


gross domestic product, annual homicide rates, automobile sales, …

Time series observations are typically serially correlated

Ordering of observations conveys important information

Data frequency: daily, weekly, monthly, quarterly, annually, …

Typical features of time series: trends and seasonality

Typical applications: applied macroeconomics and finance

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Nature of Econometrics
and Economic Data
Time series data on minimum wages and related variables

Average minimum Average Unemployment Gross national


wage for given year coverage rate rate product

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Nature of Econometrics
and Economic Data
Pooled cross sections
Two or more cross sections are combined in one data set

Cross sections are drawn independently of each other

Pooled cross sections often used to evaluate policy changes

Example:

• Evaluate effect of change in property taxes on house prices

• Random sample of house prices for the year 1993

• A new random sample of house prices for the year 1995

• Compare before/after (1993: before reform, 1995: after reform)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Nature of Econometrics
and Economic Data
Pooled cross sections on housing prices Property tax
Size of house
in square feet

Number of bathrooms

Before reform

After reform

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Nature of Econometrics
and Economic Data
Panel or longitudinal data
The same cross-sectional units are followed over time

Panel data have a cross-sectional and a time series dimension

Panel data can be used to account for time-invariant unobservables

Panel data can be used to model lagged responses

Example:

• City crime statistics; each city is observed in two years

• Time-invariant unobserved city characteristics may be modeled

• Effect of police on crime rates may exhibit time lag

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Nature of Econometrics
and Economic Data
Two-year panel data on city crime statistics

Each city has two time


series observations

Number of
police in 1986

Number of
police in 1990

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Nature of Econometrics
and Economic Data
Causality and the notion of ceteris paribus

Definition of causal effect of on :

„How does variable change if variable is changed


but all other relevant factors are held constant“

Most economic questions are ceteris paribus questions

It is important to define which causal effect one is interested in

It is useful to describe how an experiment would have to be


designed to infer the causal effect in question

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Nature of Econometrics
and Economic Data
Causal effect of fertilizer on crop yield
„By how much will the production of soybeans increase if one
increases the amount of fertilizer applied to the ground“
Implicit assumption: all other factors that influence crop yield such
as quality of land, rainfall, presence of parasites etc. are held fixed
Experiment:
Choose several one-acre plots of land; randomly assign different
amounts of fertilizer to the different plots; compare yields
Experiment works because amount of fertilizer applied is unrelated
to other factors influencing crop yields

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Nature of Econometrics
and Economic Data
Measuring the return to education
„If a person is chosen from the population and given another
year of education, by how much will his or her wage increase? “
Implicit assumption: all other factors that influence wages such as
experience, family background, intelligence etc. are held fixed
Experiment:
Choose a group of people; randomly assign different amounts of
eduction to them (infeasable!); compare wage outcomes
Problem without random assignment: amount of education is related
to other factors that influence wages (e.g. intelligence)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Nature of Econometrics
and Economic Data
Effect of law enforcement on city crime level
„If a city is randomly chosen and given ten additional police officers,
by how much would its crime rate fall? “
Alternatively: „If two cities are the same in all respects, except that
city A has ten more police officers, by how much would the two cities
crime rates differ?“
Experiment:
Randomly assign number of police officers to a large number of cities
In reality, number of police officers will be determined by crime rate
(simultaneous determination of crime and number of police)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Nature of Econometrics
and Economic Data
Effect of the minimum wage on unemployment
„By how much (if at all) will unemployment increase if the minimum
wage is increased by a certain amount (holding other things fixed)? “
Experiment:
Government randomly chooses minimum wage each year and
observes unemployment outcomes
Experiment will work because level of minimum wage is unrelated
to other factors determining unemployment
In reality, the level of the minimum wage will depend on political
and economic factors that also influence unemployment

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Nature of Econometrics
and Economic Data
Testing predictions of economic theories

Economic theories are not always stated in terms of causal effects

For example, the expectations hypothesis states that long term


interest rates equal compounded expected short term interest rates

An implicaton is that the interest rate of a three-months T-bill should


be equal to the expected interest rate for the first three months of a
six-months T-bill; this can be tested using econometric methods

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model

Chapter 2

Wooldridge: Introductory Econometrics:


A Modern Approach, 5e

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Definition of the simple linear regression model

„Explains variable in terms of variable “

Intercept Slope parameter

Dependent variable,
explained variable, Error term,
Independent variable, disturbance,
response variable,… explanatory variable, unobservables,…
regressor,…

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Interpretation of the simple linear regression model

„Studies how varies with changes in :“

as long as

By how much does the dependent Interpretation only correct if all other
variable change if the independent things remain equal when the indepen-
variable is increased by one unit? dent variable is increased by one unit

The simple linear regression model is rarely applicable in prac-


tice but its discussion is useful for pedagogical reasons

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Example: Soybean yield and fertilizer

Rainfall,
land quality,
presence of parasites, …
Measures the effect of fertilizer on
yield, holding all other factors fixed

Example: A simple wage equation

Labor force experience,


tenure with current employer,
work ethic, intelligence …
Measures the change in hourly wage
given another year of education,
holding all other factors fixed

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
When is there a causal interpretation?
Conditional mean independence assumption

The explanatory variable must not


contain information about the mean
of the unobserved factors

Example: wage equation

e.g. intelligence …

The conditional mean independence assumption is unlikely to hold because


individuals with more education will also be more intelligent on average.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Population regression function (PFR)
The conditional mean independence assumption implies that

This means that the average value of the dependent variable


can be expressed as a linear function of the explanatory variable

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model

Population regression function

For individuals with , the


average value of is

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
In order to estimate the regression model one needs data

A random sample of observations

First observation

Second observation

Third observation Value of the dependent


variable of the i-th ob-
Value of the expla-
servation
natory variable of
the i-th observation
n-th observation

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Fit as good as possible a regression line through the data points:

Fitted regression line


For example, the i-th
data point

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
What does „as good as possible“ mean?
Regression residuals

Minimize sum of squared regression residuals

Ordinary Least Squares (OLS) estimates

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
CEO Salary and return on equity

Salary in thousands of dollars Return on equity of the CEO‘s firm

Fitted regression

Intercept
If the return on equity increases by 1 percent,
then salary is predicted to change by 18,501 $
Causal interpretation?

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model

Fitted regression line


(depends on sample)

Unknown population regression line

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Wage and education

Hourly wage in dollars Years of education

Fitted regression

Intercept
In the sample, one more year of education was
associated with an increase in hourly wage by 0.54 $
Causal interpretation?

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Voting outcomes and campaign expenditures (two parties)

Percentage of vote for candidate A Percentage of campaign expenditures candidate A

Fitted regression

Intercept
If candidate A‘s share of spending increases by one
percentage point, he or she receives 0.464 percen-
Causal interpretation? tage points more of the total vote

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Properties of OLS on any sample of data
Fitted values and residuals

Fitted or predicted values Deviations from regression line (= residuals)

Algebraic properties of OLS regression

Deviations from regression Correlation between deviations Sample averages of y and


line sum up to zero and regressors is zero x lie on regression line

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model

For example, CEO number 12‘s salary was


526,023 $ lower than predicted using the
the information on his firm‘s return on equity

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Goodness-of-Fit

„How well does the explanatory variable explain the dependent variable?“

Measures of Variation

Total sum of squares, Explained sum of squares, Residual sum of squares,


represents total variation represents variation represents variation not
in dependent variable explained by regression explained by regression

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Decomposition of total variation

Total variation Explained part Unexplained part

Goodness-of-fit measure (R-squared)

R-squared measures the fraction of the


total variation that is explained by the
regression

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
CEO Salary and return on equity

The regression explains only 1.3 %


of the total variation in salaries

Voting outcomes and campaign expenditures

The regression explains 85.6 % of the


total variation in election outcomes

Caution: A high R-squared does not necessarily mean that the


regression has a causal interpretation!

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Incorporating nonlinearities: Semi-logarithmic form
Regression of log wages on years of eduction

Natural logarithm of wage

This changes the interpretation of the regression coefficient:

Percentage change of wage

… if years of education
are increased by one year

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Fitted regression

The wage increases by 8.3 % for


every additional year of education
(= return to education)

For example:

Growth rate of wage is 8.3 %


per year of education

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Incorporating nonlinearities: Log-logarithmic form
CEO salary and firm sales

Natural logarithm of CEO salary Natural logarithm of his/her firm‘s sales

This changes the interpretation of the regression coefficient:

Percentage change of salary


… if sales increase by 1 %

Logarithmic changes are


always percentage changes

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
CEO salary and firm sales: fitted regression

For example: + 1 % sales ! + 0.257 % salary

The log-log form postulates a constant elasticity model,


whereas the semi-log form assumes a semi-elasticity model

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Expected values and variances of the OLS estimators
The estimated regression coefficients are random variables
because they are calculated from a random sample

Data is random and depends on particular sample that has been drawn

The question is what the estimators will estimate on average


and how large their variability in repeated samples is

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Standard assumptions for the linear regression model

Assumption SLR.1 (Linear in parameters)

In the population, the relationship


between y and x is linear

Assumption SLR.2 (Random sampling)

The data is a random sample


drawn from the population

Each data point therefore follows


the population equation

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Discussion of random sampling: Wage and education
The population consists, for example, of all workers of country A
In the population, a linear relationship between wages (or log wages)
and years of education holds
Draw completely randomly a worker from the population
The wage and the years of education of the worker drawn are random
because one does not know beforehand which worker is drawn

Throw back worker into population and repeat random draw times

The wages and years of education of the sampled workers are used to
estimate the linear relationship between wages and education

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model

The values drawn


for the i-th worker

The implied deviation


from the population
relationship for
the i-th worker:

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Assumptions for the linear regression model (cont.)

Assumption SLR.3 (Sample variation in explanatory variable)

The values of the explanatory variables are not all


the same (otherwise it would be impossible to stu-
dy how different values of the explanatory variable
lead to different values of the dependent variable)

Assumption SLR.4 (Zero conditional mean)

The value of the explanatory variable must


contain no information about the mean of
the unobserved factors

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Theorem 2.1 (Unbiasedness of OLS)

Interpretation of unbiasedness
The estimated coefficients may be smaller or larger, depending on
the sample that is the result of a random draw
However, on average, they will be equal to the values that charac-
terize the true relationship between y and x in the population
„On average“ means if sampling was repeated, i.e. if drawing the
random sample und doing the estimation was repeated many times
In a given sample, estimates may differ considerably from true values

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Variances of the OLS estimators
Depending on the sample, the estimates will be nearer or farther
away from the true population values
How far can we expect our estimates to be away from the true
population values on average (= sampling variability)?
Sampling variability is measured by the estimator‘s variances

Assumption SLR.5 (Homoskedasticity)

The value of the explanatory variable must


contain no information about the variability
of the unobserved factors

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Graphical illustration of homoskedasticity

The variability of the unobserved


influences does not dependent on
the value of the explanatory variable

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
An example for heteroskedasticity: Wage and education

The variance of the unobserved


determinants of wages increases
with the level of education

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Theorem 2.2 (Variances of OLS estimators)

Under assumptions SLR.1 – SLR.5:

Conclusion:
The sampling variability of the estimated regression coefficients will be
the higher the larger the variability of the unobserved factors, and the
lower, the higher the variation in the explanatory variable

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Estimating the error variance

The variance of u does not depend on x,


i.e. is equal to the unconditional variance

One could estimate the variance of the


errors by calculating the variance of the
residuals in the sample; unfortunately
this estimate would be biased

An unbiased estimate of the error variance can be obtained by


substracting the number of estimated regression coefficients
from the number of observations

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Simple
Regression Model
Theorem 2.3 (Unbiasedness of the error variance)

Calculation of standard errors for regression coefficients

Plug in for
the unknown

The estimated standard deviations of the regression coefficients are called „standard
errors“. They measure how precisely the regression coefficients are estimated.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation

Chapter 3

Wooldridge: Introductory Econometrics:


A Modern Approach, 5e

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Definition of the multiple linear regression model

„Explains variable in terms of variables “

Intercept Slope parameters

Dependent variable,
explained variable, Error term,
Independent variables, disturbance,
response variable,… explanatory variables, unobservables,…
regressors,…

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Motivation for multiple regression
Incorporate more explanatory factors into the model
Explicitly hold fixed other factors that otherwise would be in
Allow for more flexible functional forms

Example: Wage equation

Now measures effect of education explicitly holding experience fixed

All other factors…

Hourly wage Years of education Labor market experience

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Example: Average test scores and per student spending

Other factors

Average standardized Per student spending Average family income


test score of school at this school of students at this school

Per student spending is likely to be correlated with average family


income at a given high school because of school financing
Omitting average family income in regression would lead to biased
estimate of the effect of spending on average test scores
In a simple regression model, effect of per student spending would
partly include the effect of family income on test scores

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Example: Family income and family consumption

Other factors

Family consumption Family income Family income squared

Model has two explanatory variables: inome and income squared


Consumption is explained as a quadratic function of income
One has to be very careful when interpreting the coefficients:

By how much does consumption Depends on how


increase if income is increased much income is
by one unit? already there

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Example: CEO salary, sales and CEO tenure

Log of CEO salary Log sales Quadratic function of CEO tenure with firm

Model assumes a constant elasticity relationship between CEO salary


and the sales of his or her firm
Model assumes a quadratic relationship between CEO salary and his
or her tenure with the firm
Meaning of „linear“ regression
The model has to be linear in the parameters (not in the variables)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
OLS Estimation of the multiple regression model

Random sample

Regression residuals

Minimize sum of squared residuals

Minimization will be carried out by computer

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Interpretation of the multiple regression model

By how much does the dependent variable change if the j-th


independent variable is increased by one unit, holding all
other independent variables and the error term constant

The multiple linear regression model manages to hold the values


of other explanatory variables fixed even if, in reality, they are
correlated with the explanatory variable under consideration
„Ceteris paribus“-interpretation
It has still to be assumed that unobserved factors do not change if
the explanatory variables are changed

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Example: Determinants of college GPA

Grade point average at college High school grade point average Achievement test score

Interpretation
Holding ACT fixed, another point on high school grade point average
is associated with another .453 points college grade point average
Or: If we compare two students with the same ACT, but the hsGPA of
student A is one point higher, we predict student A to have a colGPA
that is .453 higher than that of student B
Holding high school grade point average fixed, another 10 points on
ACT are associated with less than one point on college GPA
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
„Partialling out“ interpretation of multiple regression
One can show that the estimated coefficient of an explanatory
variable in a multiple regression can be obtained in two steps:
1) Regress the explanatory variable on all other explanatory variables
2) Regress on the residuals from this regression
Why does this procedure work?
The residuals from the first regression is the part of the explanatory
variable that is uncorrelated with the other explanatory variables
The slope coefficient of the second regression therefore represents
the isolated effect of the explanatory variable on the dep. variable

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Properties of OLS on any sample of data
Fitted values and residuals

Fitted or predicted values Residuals

Algebraic properties of OLS regression

Deviations from regression Correlations between deviations Sample averages of y and of the
line sum up to zero and regressors are zero regressors lie on regression line

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Goodness-of-Fit

Decomposition of total variation

Notice that R-squared can only


increase if another explanatory
variable is added to the regression
R-squared

Alternative expression for R-squared R-squared is equal to the squared


correlation coefficient between the
actual and the predicted value of
the dependent variable

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Example: Explaining arrest records

Number of times Proportion prior arrests Months in prison 1986 Quarters employed 1986
arrested 1986 that led to conviction

Interpretation:
Proportion prior arrests +0.5 ! -.075 = -7.5 arrests per 100 men
Months in prison +12 ! -.034(12) = -0.408 arrests for given man
Quarters employed +1 ! -.104 = -10.4 arrests per 100 men

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Example: Explaining arrest records (cont.)
An additional explanatory variable is added:

Average sentence in prior convictions

R-squared increases only slightly


Interpretation:
Average prior sentence increases number of arrests (?)
Limited additional explanatory power as R-squared increases by little
General remark on R-squared
Even if R-squared is small (as in the given example), regression may
still provide good estimates of ceteris paribus effects
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Standard assumptions for the multiple regression model

Assumption MLR.1 (Linear in parameters)


In the population, the relation-
ship between y and the expla-
natory variables is linear

Assumption MLR.2 (Random sampling)

The data is a random sample


drawn from the population

Each data point therefore follows the population equation

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Standard assumptions for the multiple regression model (cont.)

Assumption MLR.3 (No perfect collinearity)


„In the sample (and therefore in the population), none
of the independent variables is constant and there are
no exact relationships among the independent variables“

Remarks on MLR.3
The assumption only rules out perfect collinearity/correlation bet-
ween explanatory variables; imperfect correlation is allowed
If an explanatory variable is a perfect linear combination of other
explanatory variables it is superfluous and may be eliminated
Constant variables are also ruled out (collinear with intercept)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Example for perfect collinearity: small sample

In a small sample, avginc may accidentally be an exact multiple of expend; it will not
be possible to disentangle their separate effects because there is exact covariation

Example for perfect collinearity: relationships between regressors

Either shareA or shareB will have to be dropped from the regression because there
is an exact linear relationship between them: shareA + shareB = 1

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Standard assumptions for the multiple regression model (cont.)
Assumption MLR.4 (Zero conditional mean)

The value of the explanatory variables


must contain no information about the
mean of the unobserved factors

In a multiple regression model, the zero conditional mean assumption


is much more likely to hold because fewer things end up in the error
Example: Average test scores

If avginc was not included in the regression, it would end up in the error term;
it would then be hard to defend that expend is uncorrelated with the error

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Discussion of the zero mean conditional assumption
Explanatory variables that are correlated with the error term are
called endogenous; endogeneity is a violation of assumption MLR.4
Explanatory variables that are uncorrelated with the error term are
called exogenous; MLR.4 holds if all explanat. var. are exogenous
Exogeneity is the key assumption for a causal interpretation of the
regression, and for unbiasedness of the OLS estimators

Theorem 3.1 (Unbiasedness of OLS)

Unbiasedness is an average property in repeated samples; in a given


sample, the estimates may still be far away from the true values

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Including irrelevant variables in a regression model

No problem because . = 0 in the population

However, including irrevelant variables may increase sampling variance.

Omitting relevant variables: the simple case

True model (contains x1 and x2)

Estimated model (x2 is omitted)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Omitted variable bias
If x1 and x2 are correlated, assume a linear
regression relationship between them

If y is only regressed If y is only regressed error term


on x1 this will be the on x1, this will be the
estimated intercept estimated slope on x1

Conclusion: All estimated coefficients will be biased

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Example: Omitting ability in a wage equation

Will both be positive

The return to education will be overestimated because . It will look


as if people with many years of education earn very high wages, but this is partly
due to the fact that people with more education are also more able on average.

When is there no omitted variable bias?


If the omitted variable is irrelevant or uncorrelated

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Omitted variable bias: more general cases

True model (contains x1, x2 and x3)

Estimated model (x3 is omitted)

No general statements possible about direction of bias


Analysis as in simple case if one regressor uncorrelated with others
Example: Omitting ability in a wage equation

If exper is approximately uncorrelated with educ and abil, then the direction
of the omitted variable bias can be as analyzed in the simple two variable case.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Standard assumptions for the multiple regression model (cont.)
Assumption MLR.5 (Homoscedasticity)

The value of the explanatory variables


must contain no information about the
variance of the unobserved factors

Example: Wage equation


This assumption may also be hard
to justify in many cases

Short hand notation All explanatory variables are


collected in a random vector

with

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Theorem 3.2 (Sampling variances of OLS slope estimators)

Under assumptions MLR.1 – MLR.5:

Variance of the error term

Total sample variation in R-squared from a regression of explanatory variable


explanatory variable xj: xj on all other independent variables
(including a constant)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Components of OLS Variances:
1) The error variance
A high error variance increases the sampling variance because there is
more „noise“ in the equation
A large error variance necessarily makes estimates imprecise
The error variance does not decrease with sample size
2) The total sample variation in the explanatory variable
More sample variation leads to more precise estimates
Total sample variation automatically increases with the sample size
Increasing the sample size is thus a way to get more precise estimates

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
3) Linear relationships among the independent variables

Regress on all other independent variables (including a constant)

The R-squared of this regression will be the higher


the better xj can be linearly explained by the other
independent variables

Sampling variance of will be the higher the better explanatory


variable can be linearly explained by other independent variables
The problem of almost linearly dependent explanatory variables is
called multicollinearity (i.e. for some )

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
An example for multicollinearity

Average standardized Expenditures Expenditures for in- Other ex-


test score of school for teachers structional materials penditures

The different expenditure categories will be strongly correlated because if a school has a lot
of resources it will spend a lot on everything.

It will be hard to estimate the differential effects of different expenditure categories because
all expenditures are either high or low. For precise estimates of the differential effects, one
would need information about situations where expenditure categories change differentially.

As a consequence, sampling variance of the estimated effects will be large.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Discussion of the multicollinearity problem
In the above example, it would probably be better to lump all expen-
diture categories together because effects cannot be disentangled
In other cases, dropping some independent variables may reduce
multicollinearity (but this may lead to omitted variable bias)
Only the sampling variance of the variables involved in multicollinearity
will be inflated; the estimates of other effects may be very precise
Note that multicollinearity is not a violation of MLR.3 in the strict sense
Multicollinearity may be detected through „variance inflation factors“

As an (arbitrary) rule of thumb, the variance


inflation factor should not be larger than 10

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Variances in misspecified models
The choice of whether to include a particular variable in a regression
can be made by analyzing the tradeoff between bias and variance

True population model

Estimated model 1

Estimated model 2

It might be the case that the likely omitted variable bias in the
misspecified model 2 is overcompensated by a smaller variance

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Variances in misspecified models (cont.)

Conditional on x1 and x2 , the


variance in model 2 is always
smaller than that in model 1

Case 1: Conclusion: Do not include irrelevant regressors

Case 2: Trade off bias and variance; Caution: bias will not vanish even in large samples

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Estimating the error variance

An unbiased estimate of the error variance can be obtained by substracting the number of
estimated regression coefficients from the number of observations. The number of obser-
vations minus the number of estimated parameters is also called the degrees of freedom.
The n estimated squared residuals in the sum are not completely independent but related
through the k+1 equations that define the first order conditions of the minimization problem.

Theorem 3.3 (Unbiased estimator of the error variance)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Estimation of the sampling variances of the OLS estimators

The true sampling


variation of the
estimated

Plug in for the unknown

The estimated samp-


ling variation of the
estimated

Note that these formulas are only valid under assumptions


MLR.1-MLR.5 (in particular, there has to be homoscedasticity)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Efficiency of OLS: The Gauss-Markov Theorem
Under assumptions MLR.1 - MLR.5, OLS is unbiased
However, under these assumptions there may be many other
estimators that are unbiased
Which one is the unbiased estimator with the smallest variance?
In order to answer this question one usually limits oneself to linear
estimators, i.e. estimators linear in the dependent variable

May be an arbitrary function of the sample values


of all the explanatory variables; the OLS estimator
can be shown to be of this form

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Estimation
Theorem 3.4 (Gauss-Markov Theorem)
Under assumptions MLR.1 - MLR.5, the OLS estimators are the best
linear unbiased estimators (BLUEs) of the regression coefficients, i.e.

for all for which .

OLS is only the best estimator if MLR.1 – MLR.5 hold; if there is


heteroscedasticity for example, there are better estimators.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference

Chapter 4

Wooldridge: Introductory Econometrics:


A Modern Approach, 5e

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Statistical inference in the regression model
Hypothesis tests about population parameters
Construction of confidence intervals

Sampling distributions of the OLS estimators

The OLS estimators are random variables

We already know their expected values and their variances

However, for hypothesis tests we need to know their distribution

In order to derive their distribution we need additional assumptions

Assumption about distribution of errors: normal distribution

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Assumption MLR.6 (Normality of error terms)

independently of

It is assumed that the unobserved


factors are normally distributed around
the population regression function.

The form and the variance of the


distribution does not depend on
any of the explanatory variables.

It follows that:

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Discussion of the normality assumption
The error term is the sum of „many“ different unobserved factors
Sums of independent factors are normally distributed (CLT)
Problems:
• How many different factors? Number large enough?
• Possibly very heterogenuous distributions of individual factors
• How independent are the different factors?
The normality of the error term is an empirical question
At least the error distribution should be „close“ to normal
In many cases, normality is questionable or impossible by definition

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Discussion of the normality assumption (cont.)
Examples where normality cannot hold:
• Wages (nonnegative; also: minimum wage)
• Number of arrests (takes on a small number of integer values)
• Unemployment (indicator variable, takes on only 1 or 0)
In some cases, normality can be achieved through transformations
of the dependent variable (e.g. use log(wage) instead of wage)
Under normality, OLS is the best (even nonlinear) unbiased estimator
Important: For the purposes of statistical inference, the assumption
of normality can be replaced by a large sample size

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Terminology

„Gauss-Markov assumptions“ „Classical linear model (CLM) assumptions“

Theorem 4.1 (Normal sampling distributions)

Under assumptions MLR.1 – MLR.6:

The estimators are normally distributed The standardized estimators follow a


around the true parameters with the standard normal distribution
variance that was derived earlier

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Testing hypotheses about a single population parameter
Theorem 4.1 (t-distribution for standardized estimators)

Under assumptions MLR.1 – MLR.6:

If the standardization is done using the estimated


standard deviation (= standard error), the normal
distribution is replaced by a t-distribution

Note: The t-distribution is close to the standard normal distribution if n-k-1 is large.

Null hypothesis (for more general hypotheses, see below)


The population parameter is equal to zero, i.e. after
controlling for the other independent variables, there
is no effect of xj on y

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
t-statistic (or t-ratio)
The t-statistic will be used to test the above null hypothesis.
The farther the estimated coefficient is away from zero, the
less likely it is that the null hypothesis holds true. But what
does „far“ away from zero mean?

This depends on the variability of the estimated coefficient, i.e. its


standard deviation. The t-statistic measures how many estimated
standard deviations the estimated coefficient is away from zero.

Distribution of the t-statistic if the null hypothesis is true

Goal: Define a rejection rule so that, if it is true, H0 is rejected


only with a small probability (= significance level, e.g. 5%)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Testing against one-sided alternatives (greater than zero)

Test against .

Reject the null hypothesis in favour of the


alternative hypothesis if the estimated coef-
ficient is „too large“ (i.e. larger than a criti-
cal value).

Construct the critical value so that, if the


null hypothesis is true, it is rejected in,
for example, 5% of the cases.

In the given example, this is the point of the t-


distribution with 28 degrees of freedom that is
exceeded in 5% of the cases.

! Reject if t-statistic greater than 1.701

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Example: Wage equation
Test whether, after controlling for education and tenure, higher work
experience leads to higher hourly wages

Standard errors

Test against .

One would either expect a positive effect of experience on hourly wage or no effect at all.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Example: Wage equation (cont.)
t-statistic

Degrees of freedom;
here the standard normal
approximation applies

Critical values for the 5% and the 1% significance level (these


are conventional significance levels).

The null hypothesis is rejected because the t-statistic exceeds


the critical value.

„The effect of experience on hourly wage is statistically greater


than zero at the 5% (and even at the 1%) significance level.“

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Testing against one-sided alternatives (less than zero)

Test against .

Reject the null hypothesis in favour of the


alternative hypothesis if the estimated coef-
ficient is „too small“ (i.e. smaller than a criti-
cal value).

Construct the critical value so that, if the


null hypothesis is true, it is rejected in,
for example, 5% of the cases.

In the given example, this is the point of the t-


distribution with 18 degrees of freedom so that
5% of the cases are below the point.

! Reject if t-statistic less than -1.734

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Example: Student performance and school size
Test whether smaller school size leads to better student performance

Percentage of students Average annual tea- Staff per one thou- School enrollment
passing maths test cher compensation sand students (= school size)

Test against .

Do larger schools hamper student performance or is there no such effect?

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Example: Student performance and school size (cont.)
t-statistic

Degrees of freedom;
here the standard normal
approximation applies

Critical values for the 5% and the 15% significance level.

The null hypothesis is not rejected because the t-statistic is


not smaller than the critical value.

One cannot reject the hypothesis that there is no effect of school size on
student performance (not even for a lax significance level of 15%).

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Example: Student performance and school size (cont.)
Alternative specification of functional form:

R-squared slightly higher

Test against .

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Example: Student performance and school size (cont.)

t-statistic

Critical value for the 5% significance level ! reject null hypothesis

The hypothesis that there is no effect of school size on student performance


can be rejected in favor of the hypothesis that the effect is negative.

How large is the effect? + 10% enrollment ! -0.129 percentage points


students pass test

(small effect)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Testing against two-sided alternatives

Test against .

Reject the null hypothesis in favour of the


alternative hypothesis if the absolute value
of the estimated coefficient is too large.

Construct the critical value so that, if the


null hypothesis is true, it is rejected in,
for example, 5% of the cases.

In the given example, these are the points


of the t-distribution so that 5% of the cases
lie in the two tails.

! Reject if absolute value of t-statistic is less than


-2.06 or greater than 2.06

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Example: Determinants of college GPA Lectures missed per week

For critical values, use standard normal distribution

The effects of hsGPA and skipped are


significantly different from zero at the
1% significance level. The effect of ACT
is not significantly different from zero,
not even at the 10% significance level.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
„Statistically significant“ variables in a regression
If a regression coefficient is different from zero in a two-sided test, the
corresponding variable is said to be „statistically significant“
If the number of degrees of freedom is large enough so that the nor-
mal approximation applies, the following rules of thumb apply:

„statistically significant at 10 % level“

„statistically significant at 5 % level“

„statistically significant at 1 % level“

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Guidelines for discussing economic and statistical significance
If a variable is statistically significant, discuss the magnitude of the
coefficient to get an idea of its economic or practical importance
The fact that a coefficient is statistically significant does not necessa-
rily mean it is economically or practically significant!
If a variable is statistically and economically important but has the
„wrong“ sign, the regression model might be misspecified
If a variable is statistically insignificant at the usual levels (10%, 5%,
1%), one may think of dropping it from the regression
If the sample size is small, effects might be imprecisely estimated so
that the case for dropping insignificant variables is less strong

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Testing more general hypotheses about a regression coefficient
Null hypothesis
Hypothesized value of the coefficient

t-statistic

The test works exactly as before, except that the hypothesized


value is substracted from the estimate when forming the statistic

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Example: Campus crime and enrollment
An interesting hypothesis is whether crime increases by one percent
if enrollment is increased by one percent

Estimate is different from


one but is this difference
statistically significant?

The hypothesis is
rejected at the 5%
level

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Computing p-values for t-tests
If the significance level is made smaller and smaller, there will be a
point where the null hypothesis cannot be rejected anymore
The reason is that, by lowering the significance level, one wants to
avoid more and more to make the error of rejecting a correct H0
The smallest significance level at which the null hypothesis is still
rejected, is called the p-value of the hypothesis test
A small p-value is evidence against the null hypothesis because one
would reject the null hypothesis even at small significance levels
A large p-value is evidence in favor of the null hypothesis
P-values are more informative than tests at fixed significance levels

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
How the p-value is computed (here: two-sided test)

The p-value is the significance level at which


one is indifferent between rejecting and not
rejecting the null hypothesis.

These would be the


critical values for a
In the two-sided case, the p-value is thus the
5% significance level probability that the t-distributed variable takes
on a larger absolute value than the realized
value of the test statistic, e.g.:

From this, it is clear that a null hypothesis is


rejected if and only if the corresponding p-
value is smaller than the significance level.
value of test statistic
For example, for a significance level of 5% the
t-statistic would not lie in the rejection region.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Critical value of
Confidence intervals two-sided test

Simple manipulation of the result in Theorem 4.2 implies that

Lower bound of the Upper bound of the Confidence level


Confidence interval Confidence interval

Interpretation of the confidence interval


The bounds of the interval are random
In repeated samples, the interval that is constructed in the above way
will cover the population regression coefficient in 95% of the cases

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Confidence intervals for typical confidence levels

Use rules of thumb

Relationship between confidence intervals and hypotheses tests

reject in favor of

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Example: Model of firms‘ R&D expenditures

Spending on R&D Annual sales Profits as percentage of sales

The effect of sales on R&D is relatively precisely estimated This effect is imprecisely estimated as the in-
as the interval is narrow. Moreover, the effect is significantly terval is very wide. It is not even statistically
different from zero because zero is outside the interval. significant because zero lies in the interval.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Testing hypotheses about a linear combination of parameters
Example: Return to education at 2 year vs. at 4 year colleges
Years of education Years of education
at 2 year colleges at 4 year colleges

Test against .

A possible test statistic would be:


The difference between the estimates is normalized by the estimated
standard deviation of the difference. The null hypothesis would have
to be rejected if the statistic is „too negative“ to believe that the true
difference between the parameters is equal to zero.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Impossible to compute with standard regression output because

Usually not available in regression output


Alternative method

Define and test against .

Insert into original regression a new regressor (= total years of college)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Total years of college
Estimation results

Hypothesis is rejected at 10%


level but not at 5% level

This method works always for single linear hypotheses

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Testing multiple linear restrictions: The F-test
Testing exclusion restrictions
Salary of major lea- Years in Average number of
gue base ball player the league games per year

Batting average Home runs per year Runs batted in per year

against

Test whether performance measures have no effect/can be exluded from regression.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Estimation of the unrestricted model

None of these variabels is statistically significant when tested individually

Idea: How would the model fit be if these variables were dropped from the regression?

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Estimation of the restricted model

The sum of squared residuals necessarily increases, but is the increase statistically significant?

Test statistic Number of restrictions

The relative increase of the sum of


squared residuals when going from
H1 to H0 follows a F-distribution (if
the null hypothesis H0 is correct)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Rejection rule (Figure 4.7)

A F-distributed variable only takes on positive


values. This corresponds to the fact that the
sum of squared residuals can only increase if
one moves from H1 to H0.

Choose the critical value so that the null hypo-


thesis is rejected in, for example, 5% of the
cases, although it is true.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Test decision in example Number of restrictions to be tested

Degrees of freedom in
the unrestricted model

The null hypothesis is overwhel-


mingly rejected (even at very
small significance levels).

Discussion
The three variables are „jointly significant“
They were not significant when tested individually
The likely reason is multicollinearity between them
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Test of overall significance of a regression

The null hypothesis states that the explanatory


variables are not useful at all in explaining the
dependent variable
Restricted model
(regression on constant)

The test of overall significance is reported in most regression


packages; the null hypothesis is usually overwhelmingly rejected

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Testing general linear restrictions with the F-test
Example: Test whether house price assessments are rational
The assessed housing value Size of lot
Actual house price
(before the house was sold) (in feet)

Square footage Number of bedrooms

In addition, other known factors should


not influence the price once the assessed
value has been controlled for.
If house price assessments are rational, a 1% change in the
assessment should be associated with a 1% change in price.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Unrestricted regression

The restricted model is actually a


Restricted regression
regression of [y-x1] on a constant

Test statistic

cannot be rejected

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Inference
Regression output for the unrestricted regression

When tested individually,


there is also no evidence
against the rationality of
house price assessments

The F-test works for general multiple linear hypotheses


For all tests and confidence intervals, validity of assumptions
MLR.1 – MLR.6 has been assumed. Tests may be invalid otherwise.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: OLS Asymptotics

Chapter 5

Wooldridge: Introductory Econometrics:


A Modern Approach, 5e

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: OLS Asymptotics
So far we focused on properties of OLS that hold for any sample
Properties of OLS that hold for any sample/sample size
Expected values/unbiasedness under MLR.1 – MLR.4
Variance formulas under MLR.1 – MLR.5
Gauss-Markov Theorem under MLR.1 – MLR.5
Exact sampling distributions/tests under MLR.1 – MLR.6

Properties of OLS that hold in large samples


Without assuming nor-
Consistency under MLR.1 – MLR.4 mality of the error term!

Asymptotic normality/tests under MLR.1 – MLR.5

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: OLS Asymptotics
Consistency

An estimator is consistent for a population parameter if

for arbitrary and .

Alternative notation:
The estimate converges in proba-
bility to the true population value
Interpretation:
Consistency means that the probability that the estimate is arbitrari-
ly close to the true population value can be made arbitrarily high by
increasing the sample size
Consistency is a minimum requirement for sensible estimators

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: OLS Asymptotics
Theorem 5.1 (Consistency of OLS)

Special case of simple regression model

One can see that the slope estimate is consistent


Assumption MLR.4‘
if the explanatory variable is exogenous, i.e. un-
correlated with the error term.

All explanatory variables must be uncorrelated with the


error term. This assumption is weaker than the zero
conditional mean assumption MLR.4.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: OLS Asymptotics
For consistency of OLS, only the weaker MLR.4‘ is needed
Asymptotic analog of omitted variable bias

True model

Misspecified
model

Bias

There is no omitted variable bias if the omitted variable is


irrelevant or uncorrelated with the included variable

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: OLS Asymptotics
Asymptotic normality and large sample inference
In practice, the normality assumption MLR.6 is often questionable
If MLR.6 does not hold, the results of t- or F-tests may be wrong
Fortunately, F- and t-tests still work if the sample size is large enough
Also, OLS estimates are normal in large samples even without MLR.6

Theorem 5.2 (Asymptotic normality of OLS)

Under assumptions MLR.1 – MLR.5:


In large samples, the
standardized estimates also
are normally distributed

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: OLS Asymptotics
Practical consequences
In large samples, the t-distribution is close to the N(0,1) distribution
As a consequence, t-tests are valid in large samples without MLR.6
The same is true for confidence intervals and F-tests
Important: MLR.1 – MLR.5 are still necessary, esp. homoscedasticity

Asymptotic analysis of the OLS sampling errors

Converges to

Converges to Converges to a fixed


number

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: OLS Asymptotics
Asymptotic analysis of the OLS sampling errors (cont.)

shrinks with the rate

shrinks with the rate

This is why large samples are better


Example: Standard errors in a birth weight equation

Use only the first half of observations

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression
Analysis: Further Issues

Chapter 6

Wooldridge: Introductory Econometrics:


A Modern Approach, 5e

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Further Issues
More on Functional Form
More on using logarithmic functional forms
Convenient percentage/elasticity interpretation
Slope coefficients of logged variables are invariant to rescalings
Taking logs often eliminates/mitigates problems with outliers
Taking logs often helps to secure normality and homoscedasticity
Variables measured in units such as years should not be logged
Variables measured in percentage points should also not be logged
Logs must not be used if variables take on zero or negative values
It is hard to reverse the log-operation when constructing predictions

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Further Issues
Using quadratic functional forms
Example: Wage equation Concave experience profile

The first year of experience increases


the wage by some .30$, the second
Marginal effect of experience year by .298-2(.0061)(1) = .29$ etc.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Further Issues
Wage maximum with respect to work experience

Does this mean the return to experience


becomes negative after 24.4 years?

Not necessarily. It depends on how many


observations in the sample lie right of the
turnaround point.

In the given example, these are about 28%


of the observations. There may be a speci-
fication problem (e.g. omitted variables).

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Further Issues Nitrogen oxide in air, distance from em-
ployment centers, student/teacher ratio

Example: Effects of pollution on housing prices

Does this mean that, at a low number of rooms,


more rooms are associated with lower prices?

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Further Issues
Calculation of the turnaround point

Turnaround point:

This area can be ignored as


it concerns only 1% of the
observations.

Increase rooms from 5 to 6:

Increase rooms from 6 to 7:

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Further Issues
Other possibilities

Higher polynomials

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Further Issues
Models with interaction terms

Interaction term

The effect of the number


of bedrooms depends on
the level of square footage

Interaction effects complicate interpretation of parameters

Effect of number of bedrooms, but for a square footage of zero

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Further Issues
Reparametrization of interaction effects Population means; may be
replaced by sample means

Effect of x2 if all variables take on their mean values

Advantages of reparametrization
Easy interpretation of all parameters
Standard errors for partial effects at the mean values available
If necessary, interaction may be centered at other interesting values

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Further Issues
More on goodness-of-fit and selection of regressors
General remarks on R-squared
A high R-squared does not imply that there is a causal interpretation
A low R-squared does not preclude precise estimation of partial effects
Adjusted R-squared
What is the ordinary R-squared supposed to measure?

is an estimate for

Population R-squared

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Further Issues
Correct degrees of freedom of
Adjusted R-squared (cont.) nominator and denominator

A better estimate taking into account degrees of freedom would be

The adjusted R-squared imposes a penalty for adding new regressors


The adjusted R-squared increases if, and only if, the t-statistic of a
newly added regressor is greater than one in absolute value
Relationship between R-squared and adjusted R-squared

The adjusted R-squared


may even get negative

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Further Issues
Using adjusted R-squared to choose between nonnested models
Models are nonnested if neither model is a special case of the other

A comparison between the R-squared of both models would be unfair


to the first model because the first model contains fewer parameters
In the given example, even after adjusting for the difference in
degrees of freedom, the quadratic model is preferred

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Further Issues
Comparing models with different dependent variables
R-squared or adjusted R-squared must not be used to compare models
which differ in their definition of the dependent variable
Example: CEO compensation and firm performance

There is much
less variation
in log(salary)
that needs to
be explained
than in salary

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Further Issues
Controlling for too many factors in regression analysis
In some cases, certain variables should not be held fixed
In a regression of traffic fatalities on state beer taxes (and other
factors) one should not directly control for beer consumption
In a regression of family health expenditures on pesticide usage
among farmers one should not control for doctor visits
Different regressions may serve different purposes
In a regression of house prices on house characteristics, one would
only include price assessments if the purpose of the regression is to
study their validity; otherwise one would not include them

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Further Issues
Adding regressors to reduce the error variance

Adding regressors may excarcerbate multicollinearity problems

On the other hand, adding regressors reduces the error variance

Variables that are uncorrelated with other regressors should be added


because they reduce error variance without increasing multicollinearity

However, such uncorrelated variables may be hard to find

Example: Individual beer consumption and beer prices

Including individual characteristics in a regression of beer consumption


on beer prices leads to more precise estimates of the price elasticity

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Further Issues
Predicting y when log(y) is the dependent variable

Under the additional assumption that is independent of :

Prediction for y

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Further Issues
Comparing R-squared of a logged and an unlogged specification

These are the R-squareds for the predictions of the unlogged


salary variable (although the second regression is originally for
logged salaries). Both R-squareds can now be directly compared.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis
with Qualitative Information

Chapter 7

Wooldridge: Introductory Econometrics:


A Modern Approach, 5e

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Qualitative Information
Qualitative Information
Examples: gender, race, industry, region, rating grade, …
A way to incorporate qualitative information is to use dummy variables
They may appear as the dependent or as independent variables

A single dummy independent variable

= the wage gain/loss if the person Dummy variable:


is a woman rather than a man =1 if the person is a woman
(holding other things fixed) =0 if the person is man

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Qualitative Information
Graphical Illustration

Alternative interpretation of coefficient:

i.e. the difference in mean wage between


men and women with the same level of
education.

Intercept shift

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Qualitative Information
This model cannot be estimated (perfect collinearity)
Dummy variable trap

When using dummy variables, one category always has to be omitted:

The base category are men

The base category are women

Alternatively, one could omit the intercept: Disadvantages:


1) More difficult to test for diffe-
rences between the parameters
2) R-squared formula only valid
if regression contains intercept

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Qualitative Information
Estimated wage equation with intercept shift

Holding education, experience,


and tenure fixed, women earn
1.81$ less per hour than men

Does that mean that women are discriminated against?


Not necessarily. Being female may be correlated with other produc-
tivity characteristics that have not been controlled for.

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Qualitative Information
Comparing means of subpopulations described by dummies

Not holding other factors constant, women


earn 2.51$ per hour less than men, i.e. the
difference between the mean wage of men
and that of women is 2.51$.

Discussion
It can easily be tested whether difference in means is significant
The wage difference between men and women is larger if no other
things are controlled for; i.e. part of the difference is due to differ-
ences in education, experience and tenure between men and women

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Qualitative Information
Further example: Effects of training grants on hours of training

Hours training per employee Dummy indicating whether firm received training grant

This is an example of program evaluation


Treatment group (= grant receivers) vs. control group (= no grant)
Is the effect of treatment on the outcome of interest causal?

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Qualitative Information
Using dummy explanatory variables in equations for log(y)

Dummy indicating
whether house is of
colonial style

As the dummy for colonial


style changes from 0 to 1,
the house price increases
by 5.4 percentage points

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Qualitative Information
Using dummy variables for multiple categories
1) Define membership in each category by a dummy variable
2) Leave out one category (which becomes the base category)

Holding other things fixed, married


women earn 19.8% less than single
men (= the base category)

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Qualitative Information
Incorporating ordinal information using dummy variables
Example: City credit ratings and municipal bond interest rates

Municipal bond rate Credit rating from 0-4 (0=worst, 4=best)

This specification would probably not be appropriate as the credit rating only contains
ordinal information. A better way to incorporate this information is to define dummies:

Dummies indicating whether the particular rating applies, e.g. CR1=1 if CR=1 and CR1=0
otherwise. All effects are measured in comparison to the worst rating (= base category).

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Qualitative Information
Interactions involving dummy variables Interaction term
Allowing for different slopes

= intercept men = slope men

= intercept women = slope women

Interesting hypotheses

The return to education is the The whole wage equation is


same for men and women the same for men and women

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Qualitative Information
Graphical illustration

Interacting both the intercept and


the slope with the female dummy
enables one to model completely
independent wage equations for
men and women

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Qualitative Information
Estimated wage equation with interaction term

Does this mean that there is no significant evidence of


No evidence against hypothesis that lower pay for women at the same levels of educ, exper,
the return to education is the same and tenure? No: this is only the effect for educ = 0. To
for men and women answer the question one has to recenter the interaction
term, e.g. around educ = 12.5 (= average education).

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Qualitative Information
Testing for differences in regression functions across groups
Unrestricted model (contains full set of interactions)

College grade point average Standardized aptitude test score High school rank percentile

Total hours spent


Restricted model (same regression for both groups) in college courses

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Qualitative Information
Null hypothesis All interaction effects are zero, i.e.
the same regression coefficients
apply to men and women

Estimation of the unrestricted model

Tested individually,
the hypothesis that
the interaction effects
are zero cannot be
rejected

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Qualitative Information
Null hypothesis is rejected
Joint test with F-statistic

Alternative way to compute F-statistic in the given case


Run separate regressions for men and for women; the unrestricted
SSR is given by the sum of the SSR of these two regressions
Run regression for the restricted model and store SSR
If the test is computed in this way it is called the Chow-Test
Important: Test assumes a constant error variance accross groups

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Qualitative Information
A Binary dependent variable: the linear probability model
Linear regression when the dependent variable is binary

If the dependent variable only


takes on the values 1 and 0

Linear probability
model (LPM)

In the linear probability model, the coefficients


describe the effect of the explanatory variables
on the probability that y=1

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Qualitative Information
Example: Labor force participation of married women

=1 if in labor force, =0 otherwise Non-wife income (in thousand dollars per year)

If the number of kids under six


years increases by one, the pro-
probability that the woman
works falls by 26.2%

Does not look significant (but see below)

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Qualitative Information
Example: Female labor participation of married women (cont.)

Graph for nwifeinc=50, exper=5,


age=30, kindslt6=1, kidsge6=0

The maximum level of education in


the sample is educ=17. For the gi-
ven case, this leads to a predicted
probability to be in the labor force
of about 50%.

Negative predicted probability but


no problem because no woman in
the sample has educ < 5.

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Qualitative Information
Disadvantages of the linear probability model
Predicted probabilities may be larger than one or smaller than zero
Marginal probability effects sometimes logically impossible
The linear probability model is necessarily heteroskedastic

Variance of Ber-
noulli variable

Heterosceasticity consistent standard errors need to be computed

Advantanges of the linear probability model


Easy estimation and interpretation
Estimated effects and predictions often reasonably good in practice

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Qualitative Information
More on policy analysis and program evaluation
Example: Effect of job training grants on worker productivity

Percentage of defective items =1 if firm received training grant, =0 otherwise

No apparent effect of
grant on productivity

Treatment group: grant reveivers, Control group: firms that received no grant

Grants were given on a first-come, first-served basis. This is not the same as giving them out
randomly. It might be the case that firms with less productive workers saw an opportunity to
improve productivity and applied first.

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Qualitative Information
Self-selection into treatment as a source for endogeneity
In the given and in related examples, the treatment status is probably
related to other characteristics that also influence the outcome
The reason is that subjects self-select themselves into treatment
depending on their individual characteristics and prospects
Experimental evaluation
In experiments, assignment to treatment is random
In this case, causal effects can be inferred using a simple regression

The dummy indicating whether or not there was


treatment is unrelated to other factors affecting
the outcome.

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Qualitative Information
Further example of an endogenuous dummy regressor
Are nonwhite customers discriminated against?

Dummy indicating whether Race dummy


loan was approved Credit rating

It is important to control for other characteristics that may be


important for loan approval (e.g. profession, unemployment)
Omitting important characteristics that are correlated with the non-
white dummy will produce spurious evidence for discriminiation

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Heteroscedasticity

Chapter 8

Wooldridge: Introductory Econometrics:


A Modern Approach, 5e

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Heteroscedasticity
Consequences of heteroscedasticity for OLS

OLS still unbiased and consistent under heteroscedastictiy!

Also, interpretation of R-squared is not changed

Unconditional error variance is unaffected


by heteroscedasticity (which refers to the
conditional error variance)

Heteroscedasticity invalidates variance formulas for OLS estimators

The usual F-tests and t-tests are not valid under heteroscedasticity

Under heteroscedasticity, OLS is no longer the best linear unbiased


estimator (BLUE); there may be more efficient linear estimators

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Heteroscedasticity
Heteroscedasticity-robust inference after OLS
Formulas for OLS standard errors and related statistics have been
developed that are robust to heteroscedasticity of unknown form
All formulas are only valid in large samples
Formula for heteroscedasticity-robust OLS standard error
Also called White/Eicker standard errors. They involve
the squared residuals from the regression and from a
regression of xj on all other explanatory variables.

Using these formulas, the usual t-test is valid asymptotically


The usual F-statistic does not work under heteroscedasticity, but
heteroscedasticity robust versions are available in most software

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Heteroscedasticity
Example: Hourly wage equation

Heteroscedasticity robust standard errors may be


larger or smaller than their nonrobust counterparts.
The differences are often small in practice.

F-statistics are also often not too different.

If there is strong heteroscedasticity, differences may be larger.


To be on the safe side, it is advisable to always compute robust
standard errors.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Heteroscedasticity
Testing for heteroscedasticity
It may still be interesting whether there is heteroscedasticity because
then OLS may not be the most efficient linear estimator anymore

Breusch-Pagan test for heteroscedasticity

Under MLR.4

The mean of u2 must not


vary with x1, x2, …, xk

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Heteroscedasticity
Breusch-Pagan test for heteroscedasticity (cont.)

Regress squared residuals on all expla-


natory variables and test whether this
regression has explanatory power.

A large test statistic (= a high R-


squared) is evidence against the
null hypothesis.

Alternative test statistic (= Lagrange multiplier statistic, LM).


Again, high values of the test statistic (= high R-squared) lead
to rejection of the null hypothesis that the expected value of u2
is unrelated to the explanatory variables.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Heteroscedasticity
Example: Heteroscedasticity in housing price equations

Heteroscedasticity

In the logarithmic specification, homoscedasticity cannot be rejected

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Heteroscedasticity
Regress squared residuals on all expla-
White test for heteroscedasticity natory variables, their squares, and in-
teractions (here: example for k=3)

The White test detects more general


deviations from heteroscedasticity
than the Breusch-Pagan test

Disadvantage of this form of the White test


Including all squares and interactions leads to a large number of esti-
mated parameters (e.g. k=6 leads to 27 parameters to be estimated)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Heteroscedasticity
Alternative form of the White test

This regression indirectly tests the dependence of the squared residuals


on the explanatory variables, their squares, and interactions, because the
predicted value of y and its square implicitly contain all of these terms.

Example: Heteroscedasticity in (log) housing price equations

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Heteroscedasticity
Weighted least squares estimation
Heteroscedasticity is known up to a multiplicative constant

The functional form of the


heteroscedasticity is known

Transformed model

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Heteroscedasticity
Example: Savings and income

Note that this regression


model has no intercept
The transformed model is homoscedastic

If the other Gauss-Markov assumptions hold as well, OLS applied


to the transformed model is the best linear unbiased estimator!

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Heteroscedasticity
OLS in the transformed model is weighted least squares (WLS)

Observations with a large


variance get a smaller weight
in the optimization problem

Why is WLS more efficient than OLS in the original model?


Observations with a large variance are less informative than observa-
tions with small variance and therefore should get less weight

WLS is a special case of generalized least squares (GLS)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Heteroscedasticity
Example: Financial wealth equation
Net financial wealth

Assumed form of heteroscedasticity:

WLS estimates have considerably


smaller standard errors (which is
line with the expectation that
they are more efficient).

Participation in 401K pension plan

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Heteroscedasticity
Important special case of heteroscedasticity
If the observations are reported as averages at the city/county/state/-
country/firm level, they should be weighted by the size of the unit

Average contribution to Average earnings Percentage firm Heteroscedastic


pension plan in firm i and age in firm i contributes to plan error term

Error variance if errors


are homoscedastic at
the employee level

If errors are homoscedastic at the employee level, WLS with weights equal to firm size mi should
be used. If the assumption of homoscedasticity at the employee level is not exactly right, one can
calculate robust standard errors after WLS (i.e. for the transformed model).

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Heteroscedasticity
Unknown heteroscedasticity function (feasible GLS)
Assumed general form
of heteroscedasticity;
exp-function is used to
ensure positivity

Multiplicative error (assumption:


independent of the explanatory
variables)

Use inverse values of the


estimated heteroscedasticity
funtion as weights in WLS

Feasible GLS is consistent and asymptotically more efficient than OLS.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Heteroscedasticity
Example: Demand for cigarettes
Smoking
Estimation by OLS restrictions in
restaurants
Cigarettes smoked per day Logged income and cigarette price

Reject homo-
scedasticity

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Heteroscedasticity
Estimation by FGLS Now statistically significant

Discussion
The income elasticity is now statistically significant; other coefficients
are also more precisely estimated (without changing qualit. results)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Heteroscedasticity
What if the assumed heteroscedasticity function is wrong?
If the heteroscedasticity function is misspecified, WLS is still consistent
under MLR.1 – MLR.4, but robust standard errors should be computed
WLS is consistent under MLR.4 but not necessarily under MLR.4‘

If OLS and WLS produce very different estimates, this typically indi-
cates that some other assumptions (e.g. MLR.4) are wrong
If there is strong heteroscedasticity, it is still often better to use a
wrong form of heteroscedasticity in order to increase efficiency

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Heteroscedasticity
WLS in the linear probability model

In the LPM, the exact form of


heteroscedasticity is known

Use inverse values


as weights in WLS
Discussion
Infeasible if LPM predictions are below zero or greater than one
If such cases are rare, they may be adjusted to values such as .01/.99
Otherwise, it is probably better to use OLS with robust standard errors

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
More on Specification and
Data Issues

Chapter 9

Wooldridge: Introductory Econometrics:


A Modern Approach, 5e

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Specification and Data Issues
Tests for functional form misspecification
One can always test whether explanatory should appear as squares or
higher order terms by testing whether such terms can be excluded
Otherwise, one can use general specification tests such as RESET

Regression specification error test (RESET)


The idea of RESET is to include squares and possibly higher order
fitted values in the regression (similarly to the reduced White test)

Test for the exclusion of these terms. If they cannot be exluded, this is evidence for
omitted higher order terms and interactions, i.e. for misspecification of functional form.

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Specification and Data Issues
Example: Housing price equation

Evidence for
misspecification

Less evidence for


misspecification
Discussion
One may also include higher order terms, which implies complicated
interactions and higher order terms of all explanatory variables
RESET provides little guidance as to where misspecification comes from
© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Specification and Data Issues
Testing against nonnested alternatives
Which specification
is more appropriate?
Model 1:

Model 2:

Define a general model that contains both models as subcases and test:

Discussion
Can always be done; however, a clear winner need not emerge
Cannot be used if the models differ in their definition of the dep. var.

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Specification and Data Issues
Using proxy variables for unobserved explanatory variables
Example: Omitted ability in a wage equation Replace by proxy

In general, the estimates for the returns to education and experience will be biased because
one has omit the unobservable ability variable. Idea: find a proxy variable for ability which is
able to control for ability differences between individuals so that the coefficients of the other
variables will not be biased. A possible proxy for ability is the IQ score or similar test scores.

General approach to using proxy variables

Omitted variable, e.g. ability

Regression of the omitted variable on its proxy

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Specification and Data Issues
Assumptions necessary for the proxy variable method to work
The proxy is „just a proxy“ for the omitted variable, it does not belong
into the population regression, i.e. it is uncorrelated with its error
If the error and the proxy were correlated, the proxy
would actually have to be included in the population
regression function

The proxy variable is a „good“ proxy for the omitted variable, i.e. using
other variables in addition will not help to predict the omitted variable

Otherwise x1 and x2 would


have to be included in the
regression for the omitted
variable

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Specification and Data Issues
Under these assumptions, the proxy variable method works:

In this regression model, the error term is uncorrelated with all explanatory variables. As a
consequence, all coefficients will be correctly estimated using OLS. The coefficents for the
explanatory variables x1 and x2 will be correctly identified. The coefficient for the proxy va-
riable may also be of interest (it is a multiple of the coefficient of the omitted variable).

Discussion of the proxy assumptions in the wage example


Assumption 1: Should be fullfilled as IQ score is not a direct wage
determinant; what matters is how able the person proves at work
Assumption 2: Most of the variation in ability should be explainable by
variation in IQ score, leaving only a small rest to educ and exper

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Specification and Data Issues

As expected, the measured return to


education decreases if IQ is included
as a proxy for unobserved ability.

The coefficient for the proxy suggests


that ability differences between indivi-
duals are important (e.g. + 15 points
IQ score are associated with a wage
increase of 5.4 percentage points).

Even if IQ score imperfectly soaks up


the variation caused by ability, inclu-
ding it will at least reduce the bias in
the measured return to education.

No significant interaction effect bet-


ween ability and education.

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Specification and Data Issues
Using lagged dependent variables as proxy variables
In many cases, omitted unobserved factors may be proxied by the
value of the dependent variable from an earlier time period

Example: City crime rates

Including the past crime rate will at least partly control for the many
omitted factors that also determine the crime rate in a given year
Another way to interpret this equation is that one compares cities
which had the same crime rate last year; this avoids comparing cities
that differ very much in unobserved crime factors

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Specification and Data Issues
Models with random slopes (= random coefficient models)

The model has a random


intercept and a random slope

Average Random Average Random


intercept component slope component
Error term

The individual random com-


ponents are independent of
Assumptions: the explanatory variable

WLS or OLS with robust standard


errors will consistently estimate the
average intercept and average
slope in the population

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Specification and Data Issues
Properties of OLS under measurement error
Measurement error in the dependent variable

Mismeasured value = True value + Measurement error

Population regression

Estimated regression

Consequences of measurement error in the dependent variable


Estimates will be less precise because the error variance is higher
Otherwise, OLS will be unbiased and consistent (as long as the mea-
surement error is unrelated to the values of the explanatory variables)

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Specification and Data Issues
Measurement error in an explanatory variable
Mismeasured value = True value + Measurement error

Population regression

Estimated regression

Error unrelated
Classical errors-in-variables assumption: to true value

The mismeasured
variable x1 is cor-
related with the
error term!

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Specification and Data Issues
Consequences of measurement error in an explanatory variable
Under the classical errors-in-variables assumption, OLS is biased and
inconsistent because the mismeasured variable is endogenous
One can show that the inconsistency is of the following form:

This factor (which involves the error


variance of a regression of the true value
of x1 on the other explanatory variables)
will always be between zero and one

The effect of the mismeasured variable suffers from attenuation bias,


i.e. the magnitude of the effect will be attenuated towards zero
In addition, the effects of the other explanatory variables will be biased

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Specification and Data Issues
Missing data and nonrandom samples
Missing data as sample selection
Missing data is a special case of sample selection (= nonrandom samp-
ling) as the observations with missing information cannot be used
If the sample selection is based on independent variables there is no
problem as a regression conditions on the independent variables
In general, sample selection is no problem if it is uncorrelated with the
error term of a regression (= exogenous sample selection)
Sample selection is a problem, if it is based on the dependent variable
or on the error term (= endogenous sample selection)

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Specification and Data Issues
Example for exogenous sample selection

If the sample was nonrandom in the way that certain age groups, income groups, or household sizes
were over- or undersampled, this is not a problem for the regression because it examines the savings
for subgroups defined by income, age, and hh-size. The distribution of subgroups does not matter.

Example for endogenous sample selection

If the sample is nonrandom in the way individuals refuse to take part in the sample survey if their
wealth is particularly high or low, this will bias the regression results because these individuals may
be systematically different from those who do not refuse to take part in the sample survey.

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Specification and Data Issues
Outliers and influential observations
Extreme values and outliers may be a particular problem for OLS
because the method is based on squaring deviations
If outliers are the result of mistakes that occured when keying in the
data, one should just discard the affected observations
If outliers are the result of the data generating process, the decision
whether to discard the outliers is not so easy

Example: R&D intensity and firm size

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Specification and Data Issues
Example: R&D intensity and firm size (cont.)

The outlier is not the result of a mistake:


One of the sampled firms is much larger The regression without the
than the others. outlier makes more sense.

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multiple Regression Analysis:
Specification and Data Issues
Least absolute deviations estimation (LAD)
The least absolute deviations estimator minimizes the sum of absolute
deviations (instead of the sum of squared deviations, i.e. OLS)

It may be more robust to outliers as deviations are not squared


The least absolute deviations estimator estimates the parameters of
the conditional median (instead of the conditional mean with OLS)
The least absolute deviations estimator is a special case of quantile
regression, which estimates parameters of conditional quantiles

© 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Basic Regression Analysis with
Time Series Data

Chapter 10

Wooldridge: Introductory Econometrics:


A Modern Approach, 5e

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
The nature of time series data
Temporal ordering of observations; may not be arbitrarily reordered
Typical features: serial correlation/nonindependence of observations
How should we think about the randomness in time series data?
• The outcome of economic variables (e.g. GNP, Dow Jones) is
uncertain; they should therefore be modeled as random variables
• Time series are sequences of r.v. (= stochastic processes)
• Randomness does not come from sampling from a population
• „Sample“ = the one realized path of the time series out of the
many possible paths the stochastic process could have taken

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Example: US inflation and unemployment rates 1948-2003

Here, there are only two time series. There may


be many more variables whose paths over time
are observed simultaneously.

Time series analysis focuses on modeling the


dependency of a variable on its own past, and
on the present and past values of other variables.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Examples of time series regression models
Static models
In static time series models, the current value of one variable is
modeled as the result of the current values of explanatory variables

Examples for static models


There is a contemporaneous relationship between
unemployment and inflation (= Phillips-Curve).

The current murderrate is determined by the current conviction rate, unemployment rate,
and fraction of young males in the population.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Finite distributed lag models
In finite distributed lag models, the explanatory variables are allowed
to influence the dependent variable with a time lag

Example for a finite distributed lag model


The fertility rate may depend on the tax value of a child, but for
biological and behavioral reasons, the effect may have a lag

Children born per Tax exemption Tax exemption Tax exemption


1,000 women in year t in year t in year t-1 in year t-2

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Interpretation of the effects in finite distributed lag models

Effect of a past shock on the current value of the dep. variable

Effect of a transitory shock: Effect of permanent shock:


If there is a one time shock in a If there is a permanent shock in a past period, i.e.
past period, the dep. variable will the explanatory variable permanently increases by
change temporarily by the amount one unit, the effect on the dep. variable will be the
indicated by the coefficient of the cumulated effect of all relevant lags. This is a long-
corresponding lag. run effect on the dependent variable.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Graphical illustration of lagged effects

For example, the effect is biggest


after a lag of one period. After that,
the effect vanishes (if the initial
shock was transitory).

The long run effect of a permanent


shock is the cumulated effect of all
relevant lagged effects. It does not
vanish (if the initial shock is a per-
manent one).

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Finite sample properties of OLS under classical assumptions

Assumption TS.1 (Linear in parameters)

The time series involved obey a linear relationship. The stochastic processes yt, xt1,…,
xtk are observed, the error process ut is unobserved. The definition of the explanatory
variables is general, e.g. they may be lags or functions of other explanatory variables.

Assumption TS.2 (No perfect collinearity)


„In the sample (and therefore in the underlying time series
process), no independent variable is constant nor a perfect
linear combination of the others.“

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Notation
This matrix collects all the
information on the complete time
paths of all explanatory variables

The values of all explanatory


variables in period number t

Assumption TS.3 (Zero conditional mean)

The mean value of the unobserved factors is unrelated to


the values of the explanatory variables in all periods

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Discussion of assumption TS.3

The mean of the error term is unrelated to the


Exogeneity: explanatory variables of the same period

The mean of the error term is unrelated to the


Strict exogeneity: values of the explanatory variables of all periods

Strict exogeneity is stronger than contemporaneous exogeneity


TS.3 rules out feedback from the dep. variable on future values of the
explanatory variables; this is often questionable esp. if explanatory
variables „adjust“ to past changes in the dependent variable
If the error term is related to past values of the explanatory variables,
one should include these values as contemporaneous regressors

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Theorem 10.1 (Unbiasedness of OLS)

Assumption TS.4 (Homoscedasticity)

The volatility of the errors must not be related to


the explanatory variables in any of the periods

A sufficient condition is that the volatility of the error is independent of


the explanatory variables and that it is constant over time
In the time series context, homoscedasticity may also be easily violated,
e.g. if the volatility of the dep. variable depends on regime changes

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Assumption TS.5 (No serial correlation)
Conditional on the explanatory variables, the un-
observed factors must not be correlated over time

Discussion of assumption TS.5


Why was such an assumption not made in the cross-sectional case?
The assumption may easily be violated if, conditional on knowing the
values of the indep. variables, omitted factors are correlated over time
The assumption may also serve as substitute for the random sampling
assumption if sampling a cross-section is not done completely randomly
In this case, given the values of the explanatory variables, errors have
to be uncorrelated across cross-sectional units (e.g. states)
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Theorem 10.2 (OLS sampling variances)

Under assumptions TS.1 – TS.5: The same formula as in


the cross-sectional case

The conditioning on the values of the explanatory variables is not easy to understand. It effectively
means that, in a finite sample, one ignores the sampling variability coming from the randomness of
the regressors. This kind of sampling variability will normally not be large (because of the sums).

Theorem 10.3 (Unbiased estimation of the error variance)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Theorem 10.4 (Gauss-Markov Theorem)
Under assumptions TS.1 – TS.5, the OLS estimators have the minimal
variance of all linear unbiased estimators of the regression coefficients
This holds conditional as well as unconditional on the regressors

Assumption TS.6 (Normality) This assumption implies TS.3 – TS.5

independently of

Theorem 10.5 (Normal sampling distributions)


Under assumptions TS.1 – TS.6, the OLS estimators have the usual nor-
mal distribution (conditional on ). The usual F- and t-tests are valid.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Example: Static Phillips curve
Contrary to theory, the estimated Phillips
Curve does not suggest a tradeoff between
inflation and unemployment

The error term contains factors such


as monetary shocks, income/demand
shocks, oil price shocks, supply
Discussion of CLM assumptions shocks, or exchange rate shocks

TS.1:

TS.2: A linear relationship might be restrictive, but it should be a good approximation.


Perfect collinearity is not a problem as long as unemployment varies over time.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Discussion of CLM assumptions (cont.)

TS.3: Easily violated

For example, past unemployment shocks may lead to


future demand shocks which may dampen inflation
For example, an oil price shock means more inflation
and may lead to future increases in unemployment

Assumption is violated if monetary


TS.4: policy is more „nervous“ in times
of high unemployment
TS.5: Assumption is violated if ex-
change rate influences persist
Questionable over time (they cannot be
TS.6: explained by unemployment)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Example: Effects of inflation and deficits on interest rates

Interest rate on 3-months T-bill Government deficit as percentage of GDP

The error term represents other


factors that determine interest
rates in general, e.g. business
Discussion of CLM assumptions cycle effects

TS.1:

TS.2: A linear relationship might be restrictive, but it should be a good approximation.


Perfect collinearity will seldomly be a problem in practice.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Discussion of CLM assumptions (cont.)
Easily violated
TS.3:
For example, past deficit spending may boost economic
activity, which in turn may lead to general interest rate rises
For example, unobserved demand shocks may increase
interest rates and lead to higher inflation in future periods

Assumption is violated if higher deficits lead


TS.4: to more uncertainty about state finances
and possibly more abrupt rate changes

TS.5: Assumption is violated if business cylce


effects persist across years (and they
Questionable cannot be completely accounted for by
TS.6: inflation and the evolution of deficits)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Using dummy explanatory variables in time series

Children born per Tax exemption Dummy for World War Dummy for availabity of con-
1,000 women in year t in year t II years (1941-45) traceptive pill (1963-present)

Interpretation
During World War II, the fertility rate was temporarily lower
It has been permanently lower since the introduction of the pill in 1963

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Time series with trends

Example for a time


series with a linear
upward trend

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Modelling a linear time trend

Abstracting from random deviations, the dependent


variable increases by a constant amount per time unit

Alternatively, the expected value of the dependent


variable is a linear function of time

Modelling an exponential time trend

Abstracting from random deviations, the dependent vari-


able increases by a constant percentage per time unit

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Example for a time series with an exponential trend

Abstracting from
random deviations,
the time series has a
constant growth rate

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Using trending variables in regression analysis
If trending variables are regressed on each other, a spurious re-
lationship may arise if the variables are driven by a common trend
In this case, it is important to include a trend in the regression

Example: Housing investment and prices

Per capita housing investment Housing price index

It looks as if investment and


prices are positively related

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Example: Housing investment and prices (cont.)

There is no significant relationship


between price and investment anymore

When should a trend be included?


If the dependent variable displays an obvious trending behaviour
If both the dependent and some independent variables have trends
If only some of the independent variables have trends; their effect on
the dep. var. may only be visible after a trend has been substracted

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
A Detrending interpretation of regressions with a time trend
It turns out that the OLS coefficients in a regression including a trend
are the same as the coefficients in a regression without a trend but
where all the variables have been detrended before the regression
This follows from the general interpretation of multiple regressions
Computing R-squared when the dependent variable is trending
Due to the trend, the variance of the dep. var. will be overstated
It is better to first detrend the dep. var. and then run the regression
on all the indep. variables (plus a trend if they are trending as well)
The R-squared of this regression is a more adequate measure of fit

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Basic Regression Analysis
Modelling seasonality in time series
A simple method is to include a set of seasonal dummies:

=1 if obs. from december


=0 otherwise

Similar remarks apply as in the case of deterministic time trends


The regression coefficients on the explanatory variables can be seen as
the result of first deseasonalizing the dep. and the explanat. variables
An R-squared that is based on first deseasonalizing the dep. var. may
better reflect the explanatory power of the explanatory variables

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Further Issues Using OLS with
Time Series Data

Chapter 11

Wooldridge: Introductory Econometrics:


A Modern Approach, 5e

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Further Issues Using OLS
The assumptions used so far seem to be too restricitive
Strict exogeneity, homoscedasticity, and no serial correlation are very
demanding requirements, especially in the time series context
Statistical inference rests on the validity of the normality assumption
Much weaker assumptions are needed if the sample size is large
A key requirement for large sample analysis of time series is that
the time series in question are stationary and weakly dependent
Stationary time series
Loosely speaking, a time series is stationary if its stochastic properties
and its temporal dependence structure do not change over time

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Further Issues Using OLS
Stationary stochastic processes
A stochastic process is stationary, if for every
collection of indices the joint distribution of
, is the same as that of
for all integers .

Covariance stationary processes


A stochastic process is covariance stationary, if its
expected value, its variance, and its covariances are constant over time:
1) , 2) , and 3) .

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Further Issues Using OLS
Weakly dependent time series
A stochastic process is weakly dependent , if
is „almost independent“ of if grows to infinity (for all ).

Discussion of the weak dependence property


An implication of weak dependence is that the correlation between
, and must converge to zero if grows to infinity
For the LLN and the CLT to hold, the individual observations must not
be too strongly related to each other; in particular their relation must
become weaker (and this fast enough) the farther they are apart
Note that a series may be nonstationary but weakly dependent

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Further Issues Using OLS
Examples for weakly dependent time series
Moving average process of order one (MA(1))

The process is a short moving average of an i.i.d. series et

The process is weakly dependent because observations that are more than one
time period apart have nothing in common and are therefore uncorrelated.

Autoregressive process of order one (AR(1))


The process carries over to a certain extent the value of the
previous period (plus random shocks from an i.i.d. series et)

If the stability condition holds, the process is weakly dependent because serial
correlation converges to zero as the distance between observations grows to infinity.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Further Issues Using OLS
Asymptotic properties of OLS

Assumption TS.1‘ (Linear in parameters)


Same as assumption TS.1 but now the dependent and independent
variables are assumed to be stationary and weakly dependent
Assumption TS.2‘ (No perfect collinearity)
Same as assumption TS.2
Assumption TS.3‘ (Zero conditional mean)
Now the explanatory variables are assumed to be only contempo-
raneously exogenous rather than strictly exogenous, i.e.
The explanatory variables of the same period are
uninformative about the mean of the error term

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Further Issues Using OLS
Theorem 11.1 (Consistency of OLS)

Important note: For consistency it would even suffice to assume that the explanatory
variables are merely contemporaneously uncorrelated with the error term.

Why is it important to relax the strict exogeneity assumption?


Strict exogeneity is a serious restriction beause it rules out all kinds of
dynamic relationships between explanatory variables and the error term
In particular, it rules out feedback from the dep. var. on future values of
the explanat. variables (which is very common in economic contexts)
Strict exogeneity precludes the use of lagged dep. var. as regressors

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Further Issues Using OLS
Why do lagged dependent variables violate strict exogeneity?

This is the simplest possible regression


model with a lagged dependent variable

Contemporanous exogeneity:

Strict exogeneity: Strict exogeneity would imply


that the error term is uncorre-
lated with all yt, t=1, … , n-1
This leads to a contradiction because:

OLS estimation in the presence of lagged dependent variables


Under contemporaneous exogeneity, OLS is consistent but biased

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Further Issues Using OLS
Assumption TS.4‘ (Homoscedasticity)

The errors are contemporaneously homoscedastic

Assumption TS.5‘ (No serial correlation)

Conditional on the explanatory variables in


periods t and s, the errors are uncorrelated

Theorem 11.2 (Asymptotic normality of OLS)


Under assumptions TS.1‘ – TS.5‘, the OLS estimators are asymptotically
normally distributed. Further, the usual OLS standard errors, t-statistics
and F-statistics are asymptotically valid.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Further Issues Using OLS
Example: Efficient Markets Hypothesis (EMH)

The EMH in a strict form states that information observable to the market prior to week t should
not help to predict the return during week t. A simplification assumes in addition that only past
returns are considered as relevant information to predict the return in week t.This implies that

A simple way to test the EMH is to specify an AR(1) model. Under the EMH assumption,TS.3‘ holds
so that an OLS regression can be used to test whether this week‘s returns depend on last week‘s.

There is no evidence against the


EMH. Including more lagged
returns yields similar results.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Further Issues Using OLS
Using trend-stationary series in regression analysis
Time series with deterministic time trends are nonstationary
If they are stationary around the trend and in addition weakly
dependent, they are called trend-stationary processes
Trend-stationary processes also satisfy assumption TS.1‘
Using highly persistent time series in regression analysis
Unfortunately many economic time series violate weak dependence
because they are highly persistent (= strongly dependent)
In this case OLS methods are generally invalid (unless the CLM hold)
In some cases transformations to weak dependence are possible

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Further Issues Using OLS
Random walks
The random walk is called random walk because it wanders
from the previous position yt-1 by an i.i.d. random amount et

The value today is the accumulation of all past shocks plus an initial value. This is the reason why
the random walk is highly persistent: The effect of a shock will be contained in the series forever.

The random walk is not covariance stationary


because its variance and its covariance depend
on time.

It is also not weakly dependent because the


correlation between observations vanishes very
slowly and this depends on how large t is.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Further Issues Using OLS
Examples for random walk realizations

The random walks


wander around with
no clear direction

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Further Issues Using OLS
Three-month T-bill rate as a possible example for a random walk

A random walk is a special case


of a unit root process.

Unit root processes are defined


as the random walk but et may
be an arbitrary weakly depen-
dent process.

From an economic point of view


it is important to know whether
a time series is highly persistent.
In highly persistent time series,
shocks or policy changes have
lasting/permanent effects, in
weakly dependent processes
their effects are transitory.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Further Issues Using OLS
Random walks with drift
In addition to the usual random walk mechanism, there is
a deterministic increase/decrease (= drift) in each period

This leads to a linear time trend around which the series follows its random walk behaviour. As there
is no clear direction in which the random walk develops, it may also wander away from the trend.

Otherwise, the random walk with drift has similar


properties as the random walk without drift.

Random walks with drift are not covariance statio-


nary and not weakly dependent.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Further Issues Using OLS
Sample path of a random walk with drift

Note that the series does not


regularly return to the trend line.

Random walks with drift may be


good models for time series that
have an obvious trend but are not
weakly dependent.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Further Issues Using OLS
Transformations on highly persistent time series
Order of integration
Weakly dependent time series are integrated of order zero (= I(0))
If a time series has to be differenced one time in order to obtain a
weakly dependent series, it is called integrated of order one (= I(1))
Examples for I(1) processes
After differencing, the
resulting series are weakly
dependent (because et is
weakly dependent).

Differencing is often a way to achieve weak dependence

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Further Issues Using OLS
Deciding whether a time series is I(1)
There are statistical tests for testing whether a time series is I(1)
(= unit root tests); these will be covered in later chapters
Alternatively, look at the sample first order autocorrelation:

Measures how strongly adjacent times series


observations are related to each other.

If the sample first order autocorrelation is close to one, this suggests


that the time series may be highly persistent (= contains a unit root)
Alternatively, the series may have a deterministic trend
Both unit root and trend may be eliminated by differencing

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Further Issues Using OLS
Example: Fertility equation

This equation could be estimated by OLS if the CLM assumptions hold. These may be questionable,
so that one would have to resort to large sample analysis. For large sample analysis, the fertility
series and the series of the personal tax exemption have to be stationary and weakly dependent.
This is questionable because the two series are highly persistent:

It is therefore better to estimate the equation in first differences. This makes sense because if the
equation holds in levels, it also has to hold in first differences:

Estimate of

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Further Issues Using OLS
Include trend because both
Example: Wages and productivity series display clear trends.

The elasticity of hourly wage with respect


to output per hour (=productivity) seems
implausibly large.

It turns out that even after detrending, both series display sample autocorrelations
close to one so that estimating the equation in first differences seems more adequate:

This estimate of the elasticity of hourly


wage with respect to productivity makes
much more sense.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Further Issues Using OLS
Dynamically complete models
A model is said to be dynamically complete if enough lagged variab-
les have been included as explanatory variables so that further lags
do not help to explain the dependent variable:

Dynamic completeness implies absence of serial correlation


If further lags actually belong in the regression, their omission will
cause serial correlation (if the variables are serially correlated)
One can easily test for dynamic completeness
If lags cannot be excluded, this suggests there is serial correlation

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Further Issues Using OLS
Sequential exogeneity
A set of explanatory variables is said to be sequentially exogenous if
„enough“ lagged explanatory variables have been included:

Sequential exogeneity is weaker than strict exogeneity


Sequential exogeneity is equivalent to dynamic completeness if the
explanatory variables contain a lagged dependent variable
Should all regression models be dynamically complete?
Not necessarily: If sequential exogeneity holds, causal effects will be
correctly estimated; absence of serial correlation is not crucial

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Serial Correlation and
Heteroscedasticity in
Time Series Regressions

Chapter 12

Wooldridge: Introductory Econometrics:


A Modern Approach, 5e

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Serial Correl. and Heterosced.
Properties of OLS with serially correlated errors
OLS still unbiased and consistent if errors are serially correlated
Correctness of R-squared also does not depend on serial correlation
OLS standard errors and tests will be invalid if there is serial correlation
OLS will not be efficient anymore if there is serial correlation
Serial correlation and the presence of lagged dependent variables
Is OLS inconsistent if there are ser. corr. and lagged dep. variables?
No: Including enough lags so that TS.3‘ holds guarantees consistency
Including too few lags will cause an omitted variable problem and serial
correlation because some lagged dep. var. end up in the error term

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Serial Correl. and Heterosced.
Testing for serial correlation
Testing for AR(1) serial correlation with strictly exog. regressors

AR(1) model for serial correlation (with an i.i.d. series et)

Replace true unobserved errors by estimated residuals

Test in

Example: Static Phillips curve (see above)


Reject null hypothesis
of no serial correlation

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Serial Correl. and Heterosced.
Durbin-Watson test under classical assumptions
Under assumptions TS.1 – TS.6, the Durbin-Watson test is an exact
test (whereas the previous t-test is only valid asymptotically).

Unfortunately, the Durbin-Watson


vs. test works with a lower and and an
upper bound for the critical value.
In the area between the bounds
Reject if , „Accept“ if the test result is inconclusive.

Example: Static Phillips curve (see above)

Reject null hypothesis of no serial correlation

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Serial Correl. and Heterosced.
Testing for AR(1) serial correlation with general regressors
The t-test for autocorrelation can be easily generalized to allow for the
possibility that the explanatory variables are not strictly exogenous:

The test now allows for the possibility that


Test for
the strict exogeneity assumption is violated.

The test may be carried out in a heteroscedasticity robust way


General Breusch-Godfrey test for AR(q) serial correlation

Test

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Serial Correl. and Heterosced.
Correcting for serial correlation with strictly exog. regressors
Under the assumption of AR(1) errors, one can transform the model
so that it satisfies all GM-assumptions. For this model, OLS is BLUE.

Simple case of regression with only one explana-


tory variable. The general case works analogously.

Lag and multiply by

The transformed error satis-


fies the GM-assumptions.

Problem: The AR(1)-coefficient is not known and has to be estimated

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Serial Correl. and Heterosced.
Correcting for serial correlation (cont.)
Replacing the unknown by leads to a FGLS-estimator
There are two variants:
• Cochrane-Orcutt estimation omits the first observation
• Prais-Winsten estimation adds a transformed first observation
In smaller samples, Prais-Winsten estimation should be more efficient
Comparing OLS and FGLS with autocorrelation
For consistency of FGLS more than TS.3‘ is needed (e.g. TS.3) because
the transformed regressors include variables from different periods
If OLS and FGLS differ dramatically this might indicate violation of TS.3

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Serial Correl. and Heterosced.
Serial correlation-robust inference after OLS
In the presence of serial correlation, OLS standard errors overstate
statistical significance because there is less independent variation
One can compute serial correlation-robust std. errors after OLS
This is useful because FGLS requires strict exogeneity and assumes a
very specific form of serial correlation (AR(1) or, generally, AR(q))
Serial correlation-robust standard errors:
The usual OLS standard errors are
normalized and then „inflated“ by
a correction factor.

Serial correlation-robust F- and t-tests are also available

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Serial Correl. and Heterosced.
Correction factor for serial correlation (Newey-West formula)

This term is the product of the residuals and the residuals


of a regression of xtj on all other explanatory variables

The integer g controls how much serial correlation is allowed:

g=2: The weight of higher order


autocorrelations is declining

g=3:

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Serial Correl. and Heterosced.
Discussion of serial correlation-robust standard errors
The formulas are also robust to heteroscedasticity; they are therefore
called „heteroscedasticity and autocorrelation consistent“ (=HAC)
For the integer g, values such as g=2 or g=3 are normally sufficient
(there are more involved rules of thumb for how to choose g)
Serial correlation-robust standard errors are only valid asymptotically;
they may be severely biased if the sample size is not large enough
The bias is the higher the more autocorrelation there is; if the series
are highly correlated, it might be a good idea to difference them first
Serial correlation-robust errors should be used if there is serial corr.
and strict exogeneity fails (e.g. in the presence of lagged dep. var.)
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Serial Correl. and Heterosced.
Heteroscedasticity in time series regressions
Heteroscedasticity usually receives less attention than serial correlation
Heteroscedasticity-robust standard errors also work for time series
Heteroscedasticity is automatically corrected for if one uses the serial
correlation-robust formulas for standard errors and test statistics
Testing for heteroscedasticity
The usual heteroscedasticity tests assume absence of serial correlation
Before testing for heteroscedasticity one should therefore test for serial
correlation first, using a heteroscedasticity-robust test if necessary
After ser. corr. has been corrected for, test for heteroscedasticity

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Serial Correl. and Heterosced.
Example: Serial correlation and homoscedasticity in the EMH
Test equation for the EMH

Test for serial correlation:


No evidence for serial
correlation

Test for heteroscedasticity:


Strong evidence for heteroscedasticity

Note: Volatility is higher


if returns are low

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Serial Correl. and Heterosced.
Autoregressive Conditional Heteroscedasticity (ARCH)

Even if there is no heteroscedasticity in the usual sense (the error variance depends
on the explanatory variables), there may be heteroscedasticity in the sense that the
variance depends on how volatile the time series was in previous periods:

ARCH(1) model

Consequences of ARCH in static and distributed lag models


If there are no lagged dependent variables among the regressors, i.e.
in static or distributed lag models, OLS remains BLUE under TS.1-TS.5
Also, OLS is consistent etc. for this case under assumptions TS.1‘-TS.5‘
As explained, in this case, assumption TS.4 still holds under ARCH

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Serial Correl. and Heterosced.
Consequences of ARCH in dynamic models
In dynamic models, i.e. models including lagged dependent variables,
the homoscedasticity assumption TS.4 will necessarily be violated:

because

This means the error variance indirectly depends on explanat. variables


In this case, heteroscedasticity-robust standard error and test statistics
should be computed, or a FGLS/WLS-procedure should be applied
Using a FGLS/WLS-procedure will also increase efficiency

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Serial Correl. and Heterosced.
Example: Testing for ARCH-effects in stock returns

Are there ARCH-effects in these errors?

Estimating equation for ARCH(1) model

There are statistically significant ARCH-effects:


If returns were particularly high or low (squared
returns were high) they tend to be particularly
high or low again, i.e. high volatility is followed
by high volatility.

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Analyzing Time Series:
Serial Correl. and Heterosced.
A FGLS procedure for serial correlation and heteroscedasticity

Given or estimated model for heteroscedasticity

Model for serial correlation

Estimate transformed model by Cochrane-Orcutt or Prais-Winsten


techniques (because of serial correlation in transformed error term)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Pooling Cross Sections across
Time: Simple Panel Data Methods

Chapter 13

Wooldridge: Introductory Econometrics:


A Modern Approach, 5e

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Pooled Cross Sections and
Simple Panel Data Methods
Policy analysis with pooled cross sections
Two or more independently sampled cross sections can be used to
evaluate the impact of a certain event or policy change
Example: Effect of new garbage incinerator on housing prices
Examine the effect of the location of a house on its price before and
after the garbage incinerator was built:
After incinerator
was built

Before incinerator
was built

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Pooled Cross Sections and
Simple Panel Data Methods
Example: Garbage incinerator and housing prices (cont.)
It would be wrong to conclude from the regression after the incinerator
is there that being near the incinerator depresses prices so strongly
One has to compare with the situation before the incinerator was built:

Incinerator depresses prices but location


In the given case, this is equivalent to was one with lower prices anyway

This is the so called difference-in-differences estimator (DiD)


© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Pooled Cross Sections and
Simple Panel Data Methods
Difference-in-differences in a regression framework

Differential effect of being in the location and after the incinerator was built

In this way standard errors for the DiD-effect can be obtained


If houses sold before and after the incinerator was built were sys-
tematically different, further explanatory variables should be included
This will also reduce the error variance and thus standard errors

Before/After comparisons in „natural experiments“


DiD can be used to evaluate policy changes or other exogenous events

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Pooled Cross Sections and
Simple Panel Data Methods
Policy evaluation using difference-in-differences

Compare outcomes of the two groups


before and after the policy change

Compare the difference in outcomes of the units that are affected by the policy change (= treatment
group) and those who are not affected (= control group) before and after the policy was enacted.

For example, the level of unemployment benefits is cut but only for group A (= treatment group).
Group A normally has longer unemployment durations than group B (= control group). If the diffe-
rence in unemployment durations between group A and group B becomes smaller after the reform,
reducing unemployment benefits reduces unemployment duration for those affected.

Caution: Difference-in-differences only works if the difference in outcomes between the two groups
is not changed by other factors than the policy change (e.g. there must be no differential trends).

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Pooled Cross Sections and
Simple Panel Data Methods
Two-period panel data analysis
Example: Effect of unemployment on city crime rate
Assume that no other explanatory variables are available. Will it be
possible to estimate the causal effect of unemployment on crime?
Yes, if cities are observed for at least two periods and other factors
affecting crime stay approximately constant over those periods:

Time dummy for Unobserved time-constant Other unobserved factors


the second period factors (= fixed effect) (= idiosyncratic error)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Pooled Cross Sections and
Simple Panel Data Methods
Example: Effect of unemployment on city crime rate (cont.)

Subtract:

Estimate differenced equation by OLS: Fixed effect drops out!

+ 1 percentage point unemploy-


ment rate leads to 2.22 more
crimes per 1,000 people

Secular increase in crime

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Pooled Cross Sections and
Simple Panel Data Methods
Discussion of first-differenced panel estimator
Further explanatory variables may be included in original equation
Note that there may be arbitrary correlation between the unobserved
time-invariant characteristics and the included explanatory variables
OLS in the original equation would therefore be inconsistent
The first-differenced panel estimator is thus a way to consistently
estimate causal effects in the presence of time-invariant endogeneity
For consistency, strict exogeneity has to hold in the original equation
First-differenced estimates will be imprecise if explanatory variables
vary only little over time (no estimate possible if time-invariant)

© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

You might also like