Scatter Plot/Diagram Simple Linear Regression Model
Scatter Plot/Diagram Simple Linear Regression Model
Scatter Plot/Diagram Simple Linear Regression Model
INTRODUCTORY
LINEAR REGRESSION
Chapter Outline
3.1 Simple Linear Regression
•Scatter Plot/Diagram
•Simple Linear Regression Model
3.2 Curve Fitting
3.3 Inferences About Estimated Parameters
3.4 Adequacy of the model coefficient of
determination
3.5 Pearson Product Moment Correlation
Coefficient
3.6 Test for Linearity of Regression
3.7 ANOVA Approach Testing for Linearity of
Regression
INTRODUCTION TO LINEAR
REGRESSION
Simple ( 2 variables)
Multiple (more than 2 variables)
Many problems in science and engineering
involve exploring the relationship between two
or more variables.
Two statistical techniques:
(1) Regression Analysis
(2) Computing the Correlation Coefficient (r).
Linear regression - study on the linear
relationship between two or more variables.
This is done by fitting a linear equation to the
observed data.
The linear equation is then used to predict
values for the data.
In simple linear regression only two variables
are involved:
i. X is the independent variable.
5
variable).
b) Y is the weight (dependent variable).
Yˆ ˆ0 ˆ1 X
3.2 CURVE FITTING (SCATTER PLOT)
SCATTER PLOT
Scatter plots show the relationship between
two variables by displaying data points on
a two-dimensional graph.
The variable that might be considered as
an explanatory variable is plotted on the
x-axis, and the response variable is plotted
on the y- axis.
Scatter plots are especially useful when
there are a large number of data points.
They provide the following information about
the relationship between two variables:
(1) Strength
(2) Shape - linear, curved, etc.
(3) Direction - positive or negative
(4) Presence of outliers
EXAMPLES:
PLOTTING LINEAR REGRESSION MODEL
11
The given table contains values for 2 variables, X and Y. Plot
the given data and make a freehand estimated regression line.
X -3 -2 -1 0 1 2 3
Y 1 2 3 5 8 11 12
12
3.3 INFERENCES ABOUT ESTIMATED PARAMETERS
ˆ0 y ˆ1 x
Before, x 65 63 76 46 68 72 68 57 36 96
After, y 68 66 86 48 65 66 71 57 42 87
S xy 44435
647 656
1991.8
10
2
647
S xx 44279 2418.1
10
6562
S yy 448.84 1850.4
10
ˆ S xy 1991.8
a) 1 0.8237
S xx 2418.1
ˆ0 y ˆ1 x 65.6 0.8237 64.7 12.3063
Y 12.3063 0.8237 X
b) X 60
Y 12.3063 0.8237 60 61.7283
EXERCISE 3.1:
INCOME, x FOOD EXPENDITURE, y
55 14
83 24
38 13
61 16
33 9
49 15
67 17
21
explained by the regression line and the
independent variable (X).
The symbol for the coefficient of determination is r 2
2
or R .
If r =0.90, then r 2 =0.81. It means that 81% of the
variation in the dependent variable (Y) is accounted
for by the variations in the independent variable (X).
The rest of the variation, 0.19 or 19%, is
unexplained and called the coefficient of non
determination.
Formula for the coefficient of non determination
is 1.00 r 2
Relationship Among SST, SSR, SSE
i
( y y ) 2
i
( ˆ
y y ) 2
i i
( y ˆ
y ) 2
23
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
The coefficient of determination is:
2
SSR Sxy
r
2
SST SxxSyy
where:
SSR = sum of squares due to regression
3.5 PEARSON PRODUCT
MOMENT CORRELATION
COEFFICIENT (r)
Correlation measures the strength of a linear
relationship between the two variables.
Also known as Pearson’s product moment coefficient
of correlation.
The symbol for the sample coefficient of correlation
is (r)
Formula :
Sxy
r
S xx .S yy
or r (sign of b1 ) r 2
Properties of (r):
1 r 1
Values of r close to 1 implies there is a strong
positive linear relationship between x and y.
Values of r close to -1 implies there is a strong
negative linear relationship between x and y.
Values of r close to 0 implies little or no linear
relationship between x and y.
ASSUMPTIONS ABOUT THE ERROR
TERM E
1.
1. The error is
The error is aa random
random variable
variable with
with mean
mean of
of zero.
zero.
2.
2. The
The variance of ,, denoted
variance of by
denoted by 22,, is
is the
the same
same for
for
all
all values
values of
of the
the independent
independent variable.
variable.
3.
3. The
The values of
values of are
are independent.
independent.
4.
4. The error is
The error is aa normally
normally distributed
distributed random
random
variable.
variable.
EXAMPLE 3.4: REFER PREVIOUS EXAMPLE
3.2, STUDENTS SCORE IN HISTORY
SOLUTION: Sxy
r
Sxx .Syy
1991.8
2418.1 1850.4
0.9416
(i)
t -Test
(ii) F -Test
(i) t-Test
1. Determine the hypotheses.
H 0 : 1 0 ( no linear r/ship)
H 1 : 1 0 (exist linear r/ship)
2. Compute Critical Value/ level of significance.
t or p value
,n 2
2
Reject H0 if :
t t or t t
,n 2 ,n 2
2 2
p-value <
5.Conclusion.
SOLUTION:
1) H : 0 ( no linear r/ship)
0 1
H 1 : 1 0 (exist linear r/ship)
2) 0.05
t 0.05 2.306
,8
2
3) 1
ttest
S S
1 xy 1
Var ( 1 )
yy
Var ( 1 ) n 2 Sxx
0.8237 1850.4 (0.8237)(1991.8) 1
7.926
0.0108 8 2418.1
0.0108
4) Rejection Rule:
ttest t0.025 ,8
7.926 2.306
5) Conclusion:
Thus, we reject H0. The score before (x) is linear relationship
to the score after (y) the trip.
EXERCISE 3.4:
EXERCISE 3.5:
(ii) F-Test
1. Determine the hypotheses.
H 0 : 1 0 ( no linear r/ship)
H 1 : 1 0 (exist linear r/ship)
2. Specify the level of significance.
F ,1,n 2 or p value
pH Yield
4.6 1056
4.8 1833
5.2 1629
5.4 1852
5.6 1783
5.8 2647
6.0 2131
a) Construct a scatter plot of yield (y) versus pH (x). Verify
that a linear model is appropriate.
b) Compute the estimated regression line for predicting
Yield from pH.
c) If the pH is increased by 0.1, by how much would you
predict the yield to increase or decrease?
d) For what pH would you predict a yield of 1500 pounds
per acre?
e) Calculate coefficient correlation, and interpret the
results.
c) yˆ 73.71
d ) pH 4.872
EXERCISE 3.7
A regression analysis relating the current market value in dollars to
the size in square feet of homes in Greeny County, Tennessee,
follows. The portion of a regression software output as below:
Analysis of Variance
Source DF SS MS F P
Regression 1 10354 10354 15.46 0.001
Error 18 12054 670
a)Determine how 19
Total many homes
22408inthe sample.
b)Determine the regression equation.
0.05