Nothing Special   »   [go: up one dir, main page]

Asynchronus Learning Module - Sesi 8

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Individual Reading (90 minutes)

Part 14.1 Covariance and the Correlation Coefficient


We often need to examine the relationship between variables. There are several methods
or measurement, to find the relationship of two variables. The methods or measurement
that we can use are:
1. Scatterplot
2. Sample covariance
3. Correlation Coefficient

Assignment 1
A. Define each of the three measurements mentioned above, and how is it used to define
the relationship between two variables.
B. What are the limitations of correlation analysis?
C. What is the difference between regression and correlation coefficient?

Part 14.2 The Simple Linear Regression Model

A. Regression Model:

A mathematical model that captures the relationship between the response variable y and
the k explanatory variables x1, x2, . . . , xk.
In order to develop a linear regression model, we start with a deterministic component that
approximates the relationship we want to model, and then add a random term to it, making
the relationship inexact.

Assignment 2
Define the term below!
a. Explanatory variable
b. Response variable
c. Deterministic
d. Inexact

1. The Simple Linear Regression Model

The simple linear regression model is defned as

y = β0 + β1X + ε,

where y and x are the response variable and the explanatory variable, respectively,
and ε is the random error term.
The coefficients β0 and β1 are the unknown parameters to be estimated. We use sample
data to estimate the population parameters of interest (β0 and β1)

Let b0 and b1 represent the estimates of β0 and β1, respectively. We form the sample
regression equation as
𝑦̂ = b0 + b1x,
where 𝑦̂ (read as y-hat) is the predicted value of the response variable given a specifed
value of the explanatory variable x.
We refer to the difference between the observed and the predicted values of y, that is y –𝑦̂ ,
as the residual e.

A fundamental assumption underlying the simple linear regression model is that the
expected value of y lies on a straight line, denoted by β0 + β1x, where β0 and β1 are the
unknown intercept and slope parameters, respectively.

The expression β0 + β1x is the deterministic component of the simple linear regression
model, which can be thought of as the expected value of y for a given value of x. In other
words, conditional on x, E(y) = β0 + β1x.
The slope parameter β1 determines whether the linear relationship between x and E(y) is
positive (β1 > 0) or negative (β1 < 0); β1 = 0 indicates that there is no linear relationship.

(Taken from FIGURE 14.2) Various examples of a simple linear regression model

The Figure above shows the expected value of y for various values of the intercept β0 and
the slope β1 parameters.
The actual value y may differ from the expected value E(y). Therefore, we add a random
error term ε to develop a simple linear regression model.

Assignment 3
1.1 Using Excel to Construct a Scatterplot and a Trendline

A. Move the table below (from Case I) into excel. For the purpose of creating a scatterplot of
weight against consumption.
B. To create the scatter plot: Select/block the data for weight and consumption, from the
task bar, choose Insert > Scatter. Select the graph on the top left.
C. Based on the scatter plot, what can you conclude about the correlation between weight
and consumption?

CASE I
A dog trainer is exploring the relationship between the size of the dog (weight in pounds)
and its daily food consumption (measured in standard cups). Below is the result of a
sample of 18 observations.
1.2 Ordinary Least Square

A common approach to fitting a line to the scatterplot is the method of least squares, also
referred to as ordinary least squares (OLS). In other words, we use OLS to estimate the
parameters β0 and β1. OLS estimators have many desirable properties if certain
assumptions hold (these assumptions are discussed further in econometrics).

The OLS method chooses the line whereby the sum of squares due to error, SSE, also
referred to as the error sum of squares, is minimized, where SSE = Σ(𝑦𝑖 – 𝑦̂𝑖 )2 = Σ𝑒𝑖2 . SSE is
the sum of the squared differences between the observed values y and their predicted
values 𝑦̂, or equivalently, the sum of the squared distances from the regression equation.
Thus, using this distance measure, we say that the OLS method produces the straight line
that is “closest” to the data. In the context of Figure 14.3, the superimposed line has been
estimated by OLS.
Using calculus, equations have been developed for b0 and b1 that satisfy the OLS criterion

1.3 Interpretting the estimated regression coeffcients


It is not always possible to provide an economic interpretation of the intercept estimate b0;
mathematically, however, it represents the predicted value of 𝑦̂ when x has a value of zero.
The slope estimate b1 represents the change in 𝑦̂ when x increases by one unit.

1.4 Exercise (Assignment 4 )


Problem 1
Using Case I, (a) Use the formula to calculate bo, b1, and determine the regression equation.
(b)Predict the weight of the dog, if its daily food consumption is 7 cups per day.
Problem 2
Use Excel to estimate the sample regression equation with weight as the response variable
and consumption as the explanatory variable.
i. Use your excel table from Case I.
ii. Choose Data > Data Analysis > Regression from the menu.
iii. See Figure below.

In the Regression dialog box, click on the box next to Input Y Range, then select the Weight
data, including its heading. For Input X Range, select the consumption data, including its
heading. Check Labels, since we are using Weight and Consumption as headings.
iv. Click OK.

Note: The result you obtain from manual calculation using formula, should have the same
result

Problem 3
The director of graduate admissions at a large university is analyzing the relationship
between scores on the math portion of the Graduate Record Examination (GRE) and
subsequent performance in graduate school, as measured by a student’s grade point
average (GPA). She uses a sample of 8 students who graduated within the past fve years.
The data are as follows:
a. Construct a scatterplot placing GRE on the horizontal axis.
b. Find the sample regression equation for the model: GPA = β0 + β1GRE + ε.
c. What is a student’s predicted GPA if he/she scored 710 on the math portion of the
GRE?

Part 14.3 The Multiple Linear Regression Model


The simple linear regression model allows us to analyze the linear relationship between
one explanatory variable and the response variable. Meanwhile, in multiple linear
regression model the response variable is influenced by two or more explanatory variables.
The choices of the explanatory variables are based on economic theory, intuition, and/or
prior research.

The multiple linear regression model is a straightforward extension of the simple linear
regression model.

The difference between the observed and the predicted values of y represents the
residual e; that is, e = y – 𝑦̂

As in the case of the simple linear regression model, we apply the OLS method that
minimizes SSE, where SSE = Σ(𝑦𝑖 – 𝑦̂𝑖 )2 = Σ𝑒𝑖2

For each explanatory variable xj (j = 1, . . . , k), the corresponding slope coeffcient bj is the
estimate of βj.
While the interpretation for each of the slope coeficients (bj) is:
A change in the predicted value of the response variable 𝑦̂ given a unit increase in the
associated explanatory variable xj, holding all other explanatory variables constant. In other
words, it represents the partial influence of xj on 𝑦̂.

CASE II
Salsberry Realty sells homes along the east coast of the United States. One of the questions
most frequently asked by prospective buyers is: If we purchase this home, how much can
we expect to pay to heat it during the winter? The research department at Salsberry has
been asked to develop some guidelines regarding heating costs for single family homes.
Three variables are thought to relate to the heating costs: (1) the mean daily outside
temperature, (2) the number of inches of insulation in the attic, and (3) the age in years of
the furnace. To investigate, Salsberry’s research department selected a random sample of
20 recently sold homes. It determined the cost to heat each home last January, as well as
the January outside temperature in the region, the number of inches of insulation in the
attic, and the age of the furnace. The sample information is reported in the table below

Mean
Outside Attick Age of
Heating Temperature Insulation Furnace
Home Cost ($) ('F) (inches) (years)
1 250 35 3 6
2 360 29 4 10
3 165 36 7 3
4 43 60 6 9
5 92 65 5 6
6 200 30 5 5
7 355 10 6 7
8 290 7 10 10
9 230 21 9 11
10 120 55 2 5
11 73 54 12 4
12 205 48 5 1
13 400 20 5 15
14 320 39 4 7
15 72 60 8 6
16 272 20 5 8
17 94 58 7 3
18 190 40 8 11
19 235 27 9 8
20 139 30 7 5

The dependent variable is the January heating cost. It is represented by Y. There are
three independent variables:
• The mean outside temperature in January, represented by X1.
• The number of inches of insulation in the attic, represented by X2.
• The age in years of the furnace, represented by X3.
Given these definitions, the general form of the multiple regression equation follows.
The value 𝑌̂ is used to estimate the value of Y.

̂
Y = b0 + b1X1 + b2X2 + b3X3

Now that we have defined the regression equation, we are ready to use Excel to compute all
the statistics needed for the analysis.
We follow similar steps as we did when we estimated the simple linear regression model.
i. Use your excel table from Case I.
ii. Choose Data > Data Analysis > Regression from the menu.
iii. In the Regression dialog box, click on the box next to Input Y Range, then select the
data for Heating Cost. For Input X Range, simultaneously select the data for Mean
Outside Temperature; Attick Insulation; Age of Furnace . Select Labels, since we are
using Mean Outside Temperature; Attick Insulation; Age of Furnace and Heating Cost
as headings.
iv. Click OK.

The outputs from the software systems are shown below.


SUMMARY OUTPUT

Regression Statistics
Multiple R 0.897
R Square 0.804
Adjusted R Square 0.767
Standard Error 51.049
Observations 20

ANOVA
df SS MS F Significance F
Regression 3 171,220.47 57,073.49 21.90 6.56E-06
Residual 16 41,695.28 2,605.95
Total 19 212,915.75

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 427.19 59.60 7.17 0.00 300.84 553.54 300.84 553.54
Mean Outside Temperature ('F) -4.58 0.77 -5.93 0.00 -6.22 -2.95 -6.22 -2.95
Attic Insulation (inches) -14.83 4.75 -3.12 0.01 -24.91 -4.75 -24.91 -4.75
Age of Furnace (years) 6.10 4.01 1.52 0.15 -2.40 14.61 -2.40 14.61
Thus the sample regression equation is

̂Cost= 427.19 – 4.58 Mean Outside Temperature – 14.83 Attic Insulation + 6.10 Age
Heating
of Furnace
Or can be written as

̂
Y= 427.19 – 4.58 𝑋1 – 14.83 𝑋2 + 6.10 𝑋3

Interpertation of the parameters:


b1 (- 4.58) : an increase in the mean of outside temperature by 1 degree farenheit would
deacrease the heating cost by 4.58 dollar
b2 (- 14.83): an increase in 1 inch of the attic insulation would decrese the heating cost by
14.83 dollars
b3 (6.10): an increse in the age of furnace by 1 year, the heating cost would increase by 6.10
dollar

Assignment 5
A sociologist believes that the crime rate in an area is signifcantly influenced by the area’s
poverty rate and median income. Specifcally, she hypothesizes crime will increase with
poverty and decrease with income. She collects data on the crime rate (crimes per 100,000
residents), the poverty rate (in %), and the median income (in $1,000s) from 41 New
England cities. A portion of the regression results is shown in the following table.

i. Are the signs as expected on the slope coeffcients?


ii. Interpret the slope coeffcient for Poverty.
iii. Predict the crime rate in an area with a poverty rate of 20% and a median income
of $50,000.

You might also like