Nothing Special   »   [go: up one dir, main page]

Chapter 13 Simple Regression

Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

Linear Regression and

Correlation
GOALS

⚫ Understand and interpret the terms dependent and


independent variable.
⚫ Calculate and interpret:
→ the coefficient of correlation,
→ the coefficient of determination,
→ and the standard error of estimate.
⚫ Conduct a test of hypothesis.
⚫ Calculate the least squares regression line..

2
History of Regression

⚫ The term Regression was introduced by Francis Galton

“Meskipun ada kecenderungan bagi orang tua yang tinggi mempunyai


anak-anak yang tinggi, dan bagi orang tua yang pendek mempunyai
anak yang pendek, distribusi tinggi dari suatu populasi tidak berubah
secara menyolok (besar) dari generasi ke generasi”.

⚫ Regresi = “Kemunduran/Kemajuan ke arah sedang”


ILUSTRASI
Definition of Regression

⚫ Regression analysis is a set of statistical


methods used for the estimation of
relationships between a dependent
variable and one or more independent
variables.
⚫ It can be utilized to assess the strength of the
relationship between variables and for
modeling the future relationship between
them.
Regression Analysis - Uses

Some examples.
⚫ Is there a relationship between the amount Healthtex
spends per month on advertising and its sales in
the month?
⚫ Can we base an estimate of the cost to heat a home
in January on the number of square feet in the
home?
⚫ Is there a relationship between the miles per gallon
achieved by large pickup trucks and the size of
the engine?
⚫ Is there a relationship between the number of hours
that students studied for an exam and the score
earned?

6
Correlation Analysis

⚫ Correlation Analysis is the study of the


relationship between variables. It is also
defined as group of techniques to measure
the association between two variables.
⚫ A Scatter Diagram is a chart that portrays
the relationship between the two variables. It
is the usual first step in correlations analysis
– The Dependent Variable is the variable being
predicted or estimated.
– The Independent Variable provides the basis for
estimation. It is the predictor variable.

7
Fundamental difference between
correlation and regression

⚫ Korelasi hanya ⚫ Regresi menunjukkan


menunjukkan hubungan pengaruh.
sekedar hubungan. ⚫ Dalam regresi
⚫ Dalam korelasi terdapat istilah
variabel tidak ada tergantung dan
istilah tergantung variabel bebas.
dan variabel bebas.
Variable terms and notations
in regression?
Y X
⚫ Varaibel tergantung ⚫ Varaibel bebas (Independent
(Dependent Variable) Variable)
⚫ Variabel yang dijelaskan ⚫ Variabel yang menjelaskan
(Explained Variable) (Explanatory Variable)
⚫ Variabel yang diramalkan ⚫ Variabel peramal (Predictor)
(Predictand) ⚫ Variabel yang meregresi
⚫ Variabel yang diregresi (Regressor)
(Regressand) ⚫ Variabel perangsang atau
⚫ Variabel Tanggapan kendali (Stimulus or control
(Response) variable)
Regression Example

The sales manager of Copier Sales


of America, which has a large
sales force throughout the
United States and Canada,
wants to determine whether
there is a relationship between
the number of sales calls made
in a month and the number of
copiers sold that month. The
manager selects a random
sample of 10 representatives
and determines the number of
sales calls each representative
made last month and the
number of copiers sold.

10
Scatter Diagram

11
The Coefficient of Correlation, r/R

The Coefficient of Correlation (r) is a measure of the


strength of the relationship between two variables. It
requires interval or ratio-scaled data.
⚫ It can range from -1.00 to 1.00.
⚫ Values of -1.00 or 1.00 indicate perfect and strong
correlation.
⚫ Values close to 0.0 indicate weak correlation.
⚫ Negative values indicate an inverse relationship and
positive values indicate a direct relationship.

12
Perfect Correlation

13
Minitab Scatter Plots

14
Correlation Coefficient - Interpretation

15
Correlation Coefficient - Formula

16
Coefficient of Determination

The coefficient of determination (r2) is the


proportion of the total variation in the
dependent variable (Y) that is explained or
accounted for by the variation in the
independent variable (X). It is the square of
the coefficient of correlation.
⚫ It ranges from 0 to 1.
⚫ It does not give any information on the
direction of the relationship between the
variables.
17
Correlation Coefficient - Example

Using the Copier Sales of


America data which a
scatterplot was
developed earlier,
compute the correlation
coefficient and
coefficient of
determination.

18
Correlation Coefficient - Example

19
Correlation Coefficient – Excel Example

20
Correlation Coefficient - Example

How do we interpret a correlation of 0.759?


First, it is positive, so we see there is a direct relationship between
the number of sales calls and the number of copiers sold. The value
of 0.759 is fairly close to 1.00, so we conclude that the association
is strong.

However, does this mean that more sales calls cause more sales?
No, we have not demonstrated cause and effect here, only that the
two variables—sales calls and copiers sold—are related.
21
Coefficient of Determination (r2) - Example

•The coefficient of determination, r2 ,is 0.576,


found by (0.759)2

•This is a proportion or a percent; we can say that


57.6 percent of the variation in the number of
copiers sold is explained, or accounted for, by the
variation in the number of sales calls.

22
Linear Regression Model

23
Computing the Slope of the Line

24
Computing the Y-Intercept

25
Regression Analysis

In regression analysis we use the independent variable


(X) to estimate the dependent variable (Y).
⚫ The relationship between the variables is linear.
⚫ Both variables must be at least interval scale.
⚫ The least squares criterion is used to determine the
equation.

26
Regression Analysis – Least Squares
Principle

⚫ The least squares principle is used to


obtain a and b.
⚫ The equations to determine a and b
are:
n( XY ) − ( X )( Y )
b=
n(  X 2 ) − (  X ) 2
Y X
a= −b
n n

27
Illustration of the Least Squares
Regression Principle

28
Regression Equation - Example

Recall the example involving


Copier Sales of America. The
sales manager gathered
information on the number of
sales calls made and the
number of copiers sold for a
random sample of 10 sales
representatives. Use the least
squares method to determine a
linear equation to express the
relationship between the two
variables.
What is the expected number of
copiers sold by a representative
who made 20 calls?

29
Finding the Regression Equation - Example

The regression equation is :


^
Y = a + bX
^
Y = 18.9476 + 1.1842 X
^
Y = 18.9476 + 1.1842(20)
^
Y = 42.6316
30
Computing the Estimates of Y

Step 1 – Using the regression equation, substitute the


value of each X to solve for the estimated sales

Tom Keller Soni Jones


^ ^
Y = 18.9476 + 1.1842 X Y = 18.9476 + 1.1842 X
^ ^
Y = 18.9476 + 1.1842(20) Y = 18.9476 + 1.1842(30)
^ ^
Y = 42.6316 Y = 54.4736
31
Plotting the Estimated and the Actual Y’s

32
The Standard Error of Estimate

⚫ The standard error of estimate measures the


scatter, or dispersion, of the observed values
around the line of regression
⚫ The formulas that are used to compute the
standard error:

Y 2 − aY − bXY
^
(Y − Y ) 2
s y. x = s y. x =
n−2 n−2

33
Standard Error of the Estimate - Example

Recall the example involving


Copier Sales of America.
The sales manager
determined the least
squares regression
equation is given below.
Determine the standard error
of estimate as a measure
of how well the values fit
the regression line.
^ ^

Y = 18.9476 + 1.1842 X (Y − Y ) 2


s y. x =
n−2
784.211
= = 9.901
10 − 2

34
Graphical Illustration of the Differences between Actual ^
Y – Estimated Y (Y − Y )

35
Standard Error of the Estimate - Excel

36
Testing the Significance of
the Correlation Coefficient

H0:  = 0 (the correlation in the population is 0)


H1:  ≠ 0 (the correlation in the population is not 0)
Reject H0 if:
t > t/2,n-2 or t < -t/2,n-2

37
38
Testing the Significance of
the Correlation Coefficient - Example

H0:  = 0 (the correlation in the population is 0)


H1:  ≠ 0 (the correlation in the population is not 0)
Reject H0 if:
t > t/2,n-2 or t < -t/2,n-2
t > t0.025,8 or t < -t0.025,8
t > 2.306 or t < -2.306

T hit = 3,40
T hit = -5,1
T hit = 1.23
39
Testing the Significance of
the Correlation Coefficient - Example

The computed t (3.297) is within the rejection region, therefore, we will reject H0. This means
the correlation in the population is not zero. From a practical standpoint, it indicates to the
sales manager that there is correlation with respect to the number of sales calls made
and the number of copiers sold in the population of salespeople.
40
Minitab

41
Dua Pihak 1 pihak kanan 1 pihak kiri

Formulasi Hipotesis
Ho : 𝛃i = 0 (Xi secara parsial tidak berpengaruh signifikan terhadap Y)
Ha/1 : 𝛃i > 0 (Xi secara parsial berpengaruh positif signifikan terhadap Y) → 1 pihak kanan

Ho : 𝛃i = 0 (Xi secara parsial tidak berpengaruh signifikan terhadap Y)


Ha : 𝛃i < 0 (Xi secara parsial berpengaruh negatif signifikan terhadap Y) → 1 pihak kiri

Ho : 𝛃i = 0 (Xi secara parsial tidak berpengaruh signifikan terhadap Y)


Ha : 𝛃i ≠ 0 (Xi secara parsial berpengaruh signifikan signifikan terhadap Y) dua pihak

42
End of Chapter

43
Assumptions Underlying Linear
Regression
For each value of X, there is a group of Y values, and these
⚫ Y values are normally distributed. The means of these normal
distributions of Y values all lie on the straight line of regression.
⚫ The standard deviations of these normal distributions are equal.
⚫ The Y values are statistically independent. This means that in
the selection of a sample, the Y values chosen for a particular X
value do not depend on the Y values for any other X values.

44

You might also like