Lecture - Correlation and Regression GEG 222
Lecture - Correlation and Regression GEG 222
Lecture - Correlation and Regression GEG 222
STATISTICS
Dr K. O. Orolu
Covariance and
Correlation
Variance
Variance is a measure of the dispersion of a univariate
.distribution
Additional statistics are required to describe the joint
.distribution of two or more variables
negative relationship
no relationship
Positive relationship
Negative relationship
Reliability
Age of Car
No relation
Variance vs Covariance
Do two variables change together?
Variance:
• Gives information on variability of a single
variable.
Covariance:
• Gives information on the degree to which
two variables vary together.
• Note how similar the covariance is to
variance: the equation simply multiplies x’s
error scores by y’s error scores as opposed to
squaring x’s error scores.
Covariance
x y xi - x yi - y ( xi - x )( yi - y )
0 3 -3 0 0
2 2 -1 -1 1
3 4 0 1 0
4 0 1 -3 -3
6 6 3 3 9
x=3 y=3 å= 7
If r = l = perfect correlation.
How to compute the simple correlation
coefficient (r)
degree to which X and Y vary together
r = degree to which X and Y vary separately
Numerators of
variance
How to compute the simple correlation
coefficient (r)
:Example
A sample of 6 children was selected, data about their
age in years and weight in kilograms was recorded as
shown in the following table . It is required to find the
correlation between age and weight.
64 36 48 8 6 2
144 64 96 12 8 3
100 25 50 10 5 4
121 36 66 11 6 5
169 81 117 13 9 6
Anxiety Test X2 Y2 XY
)X( score (Y)
10 2 100 4 20
8 3 64 9 24
2 9 4 81 18
1 7 1 49 7
5 6 25 36 30
6 5 36 25 30
X = 32∑ Y = 32∑ X2 = 230∑ Y2 = 204∑ XY=129∑
Calculating Correlation Coefficient
r = - 0.94
x y
xy n
bb1
( x) 2
x 2
n
Regression Equation
Regression equation
describes the
regression line
mathematically
Intercept
Slope
Linear Equations
Hours studying and grades
Regressing grades on hours
Linear Regression
90.00 Final grade in course = 59.95 + 3.17 * study
R-Square = 0.88
80.00
70.00
x n
2
=112.13 + 0.4547 x
for age 25
B.P = 112.13 + 0.4547 * 25=123.49 = 123.5 mm hg
Example
The strength of paper used in the manufacture of cardboard boxes (y)
is related to the percentage of hardwood concentration in the original
pulp (x). Under controlled conditions, a pilot plant manufactures 16
samples, each from a different batch of pulp, and measures the tensile
strength as shown in the Table
y 101.4 117.4 117.1 106.2 131.9 146.9 146.8 133.9 111.0 123.0 125.1 145.2 134.3 144.5 143.7 146.9
x 1 1.5 1.5 1.5 2 2 2.2 2.4 2.5 2.5 2.8 2.8 3.0 3.0 3.2 3.3
Example
• a. Describe the correlation between the tensile strength of
paper and the hard wood concentration.
• b. Derive a simple linear regression equation to predict
tensile strength from percentage of hardwood concentration
in the pulp.
• c. Predict tensile strength when concentration = 1.7.
• d. Obtain the fitted value of y when x = 2.2 and calculate the
corresponding residual.
Solution
a. Describe the correlation between the tensile strength of paper and
the hard wood concentration.
• The correlation (r) between the tensile strength of paper and the
hardwood concentration in the original pulp can be expressed as:
Solution
Hence the correlation is calculated as follows
Concentration Strength
N xy x2 y2
(x) (y)
1 1 101.4 101.4 1 10281.96
2 1.5 117.4 176.1 2.25 13782.76
3 1.5 117.1 175.65 2.25 13712.41
4 1.5 106.2 159.3 2.25 11278.44
5 2 131.9 263.8 4 17397.61
6 2 146.9 293.8 4 21579.61
7 2.2 146.8 322.96 4.84 21550.24
8 2.4 133.9 321.36 5.76 17929.21
9 2.5 111 277.5 6.25 12321
10 2.5 123 307.5 6.25 15129
11 2.8 125.1 350.28 7.84 15650.01
12 2.8 145.2 406.56 7.84 21083.04
13 3 134.3 402.9 9 18036.49
14 3 144.5 433.5 9 20880.25
15 3.2 143.7 459.84 10.24 20649.69
16 3.3 146.9 484.77 10.89 21579.61
TOTAL 37.2 2075.3 4937.22 93.66 272841
Mean 2.325 129.70625
r = 0.685
From the result, there is a direct/positive intermediate correlation
between the tensile strength of paper and the hardwood
concentration in the original pulp
Regression
• Derive a simple linear regression equation to predict tensile strength
from percentage of hardwood concentration in the pulp.
• To derive a linear regression
b = 15.641
The required simple linear regression equation to predict tensile strength
from percentage of hardwood concentration in the pulp is:
c. Predict tensile strength when concentration = 1.7
At x=1.7, y = 119.925
Multiple Regression