Correlation and Regression
Correlation and Regression
Correlation and Regression
Correlation is concerned with the measurement of the ‘ strength of association’ between variable.
While Regression is concerned with the ‘prediction’ of the most likely value of one variable when the
value of the other variable in known.
1
Cov (x,y) = ( − )( − )
= -( )( )
Variance must be always positive, covariance may be positive, negative or zero.
If x and y are two independent variable, then their co-variance is Zero. i.e. COV (X,Y) = 0 .
ASSUMPTION:
1. X and Y are linear relationship .
2. Both variable should be Normally Distributed.
3. Homoscedasticity of the variable.
𝒐𝒗 ( , )
r=
𝝈 𝝈
r is independent of the choice of both origin and scale of observation.
Correlation co-efficient between x and y = Correlation co-efficient between u and v.
− − ′
If u = ,v= ′
r is a pure number and is unit free.
r lies et ee - a d + - ≤ ≤ .
When r = +1 perfect positive Correlation between variable.
r= -1 perfect negative Correlation between variable. Rajib Dolai
https://rajib1.weebly.com/
r is a measure of degree of association between two variables.
Correlation coefficient is adopted by karl Pearson.
If two variable are independent, their correlation coefficient is Zero. But the converse is not true.
𝑪𝑶 ( , ) 𝝈
= 𝝈
=r𝝈
1. y - = ( x- )
x- = ( y- )
Where and are respectively the regression coefficients of y on x and the regression
coefficients of x on y.
2. The product of the two regression coefficients is equal to the square of correlation coefficient.
. = r2
3. r, and , all have the same sign. If the correlation coefficient r is zero, the regression coefficients
and are also zero.
4. The regression lines always intersect at the point ( , ) . The slopes of the regression line of y on x and
the regression line of x on y are respectively and 1/ .
5. The angle between the two regression lines depends on the correlation coefficient r. When r=0 , the
two lines are perpendicular to each other; when r= +1, or r= -1, they coincide .As r increases
numerically from 0 to 1 , the angle between the regression lines diminishes from 90 0 to 00.
6. The two regression equations are usually different . However, when r = ±1 , they become identical;
and in this case, there is an exact linear relationship between the variables . When r = 0, the regression
equations reduce to y = and x = , and neither y nor x can be estimated from linear regression
equations.
7. If the variables are uncorrelated i.e. r = 0 then the lines are perpendicular.
8. If one of the regression coefficient is greater than one , the other must be less than one.
9. The A.M. of regression coefficient ( + ) is greater than the correlation coefficient.
10. Regression coefficients are independent of change of origin but not of scale.
Correlation need not imply cause and effect relationship between the variables. But regression
analysis clearly indicates the cause and effect relationship between variables.
Rajib Dolai
https://rajib1.weebly.com/
Example 1:
Let the two regression lines be given as: 3x = 10 + 5y and 4y = 5 + 15x . Then the correlation
oeffi ient etween and is…….
10 5
X=
3
+ …………..
3
5 15
Y= +
4
…………..
4
5 15 5 15 25 5
𝑟2 = × = × = = = 2.5 > 1 [ this is impossible ]
3 4 3 4 4 2
So from 1 and 2 e uatio
st nd
e o e…….
10 3 5 4
Y=- + and x = - +
5 5 15 15
3 4 3 4 4 2
𝑟 2 = 5 × 15 = 5
× 15
= = = 0.4 < 1
25 5
so answer is 0.4.
Example 2:
In a two variable regression Y is dependent variable and X is independent variable. The correlation
coefficient between Y and X is 0.6. For this which of the result explained by X.