Nothing Special   »   [go: up one dir, main page]

Unit 7 Correlation Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

UNIT 7

CORRELATION ANALYSIS

Mathematics Department
XAVIER UNIVERSITY-ATENEO DE CAGAYAN
Correlation Analysis
Correlation analysis is a statistical method that attempts to measure the
strength of the linear relationship between two quantitative variables by
means of a single value called correlation coefficient.
r = sample correlation coefficient
 “rho” = population correlation coefficient.

Pearson Correlation Coefficient


The Pearson correlation coefficient (  ) measures the strength of linear
relationship between two random variables X and Y and is estimated by the
sample correlation coefficient r, where
n n n

n x y x y
i 1
i i
i 1
i
i 1
i

r 
 n
 n

2
  n
 n

2

n
  xi  
2


 xi 


  n
   yi  
2


 yi 




 i 1 i 1   i 1 i 1 
Scatter Plot
A scatter plot is a graphical way of presenting the linear
relationship between to quantitative variables X and Y.

Note: Correlation alone does not imply cause-effect relationship. Even if


the two variables x and y are linearly related, it does not necessarily mean
that one of them is causing the change of the other variable. It can be
external factors not accounted by the relationship.
Suggested Interpretation of the Pearson Correlation
Coefficient

Correlation Coefficient r Interpretation


(positive or negative)
0.00 to 0.19 no correlation to very weak correlation

0.20 to 0.39 weak correlation


0.40 to 0.69 moderate correlation
0.70 to 0.89 strong correlation
0.90 to 1.00 very strong to perfect correlation

Remarks: An r near 0 (zero) means that there is a lack of


linearity between two variables or there is no linear
relationship between them. Note that this doesn’t mean
they are not associated at all.
Coefficient of Determination

■ The coefficient of determination R2 measures the total


variation in the values of Y that is explained by its linear
relationship with the values of X.
■ It is usually expressed in percentage.

coefficient of determination = R2 %

Although R2 is an extremely useful measure of the substantive importance


of an effect, it cannot be used to infer causal relationships. Although we
usually talk in terms of ‘the variance in y accounted for by x’, or even the
variation in one variable explained by the other, this still says nothing about
which way causality runs. (Andy Field)
Example: The sales manager of No. of sales No. of
certain company wants to calls copiers sold
determine whether there is a linear 9 3
relationship between the number of 25 6
sales calls and the number of copier 15 4
machines sold in a month. The 20 6
manager selected a random sample 7 3
of 10 sales representatives and 10 4
determined the number of sales 17 4
calls each sales representative made
20 5
last month and the number of
13 3
copier machines sold.
30 7

a) Draw the scatter plot of the given data.


b) Compute the Pearson correlation coefficient r and interpret.
c) Compute the coefficient of determination and interpret.
EXCEL: Scatter Plot
1. Open Microsoft Excel. Encode the bivariate data separately into two columns in the
spread sheet. Highlight the data.
2. Go to Insert. Click Scatter in the tool bar.
3. Click the figure – Chart Tools: Design – Add Chart Element

a) The scatter plot


indicates that there is
a direct linear
relationship between
number of sales calls
and the number of
copier machines sold
in a month.

More copiers are sold for greater number of sales calls.


b) Pearson correlation coefficient r
No. of sales calls No. of copiers sold 2 2
(X ) ( Y) x y xy
9 3 81 9 27
25 6 625 36 150
15 4 225 16 60
20 6 400 36 180
7 3 49 9 21
10 4 100 16 40
17 4 289 16 68
20 5 400 25 100
13 x 3 169 9 39
10 30 10 7 10 900 10 49 10 210

x i 1
i  166 y
i 1
i  45

i 1
2
xi  3,238 
i 1
y i2  221 x y
i 1
i i  835

n n n

n x y x y
i 1
i i
i 1
i
i 1
i

r 
 n
 n

2
  n
 n

2

n
  xi  
2


 xi 


  n
   yi  
2


 yi 




 i 1 i 1
  i 1 i 1


A correlation coefficient of 0.9315 indicates a strong direct linear relationship


between the number of sales calls and the number of copier machines sold in a
month.
EXCEL: Correlation Coefficient
1. Open Microsoft Excel. Encode the
bivariate data separately into two
columns in the spread sheet.
Highlight the data.

2. Select Data – Data Analysis –


Correlation. Click OK.

3. In the dialogue box (right), enter


Input Range: Highlight all data
Grouped by: Columns
Check: Labels in First Row
Output Range: select any cell where
you want to display the output

No. of sales calls No. of copiers sold


4. Click OK.
No. of sales calls 1
No. of copiers sold 0.93152209 1
c) Sample Coefficient of Determination R2

R2 % = (0.9315)2 x 100% = 86.77%

This means that 86.77% of the total variation in the number of copier
machines sold in a month is explained by its linear relationship with the
number of sales calls made in a month. Only 13.23% (computed from
100% minus 86.77%) of the sample variability in the number of copier
machines sold in a month is due to factors other than what is accounted for
by its linear relationship with the number of sales calls made in a month.
Test for the significance of the linear relationship

Null hypothesis
H0: There is no significant linear relationship between between X and Y.
H0 :   0

Alternative hypothesis
H1: There is a significant linear relationship between between X and Y.
H1 :   0 Note: If the test of significance for the
(two-tailed test)
correlation coefficient yields a significant
(one-tailed test) result, then regression analysis can be
performed.
Example: The sales manager of certain No. of sales No. of
company wants to determine whether calls copiers sold
there is a linear relationship between the 9 3
number of sales calls and the number of
25 6
copier machines sold in a month. The
manager selected a random sample of 15 4
10 sales representatives and 20 6
determined the number of sales calls 7 3
each sales representative made last
month and the number of copier 10 4
machines sold. 17 4
20 5
13 3
30 7

Test at 0.05 level of significance if there is a linear relationship between the


number of sales calls and the number of copier machines sold in a month.
Steps
1. Null and alternative hypotheses:
H0: There is no significant linear relationship between number of sales calls
and the number of copier machines sold.
H1: There is a significant linear relationship between number of sales calls
and the number of copier machines sold.
(In symbols) H0 :   0 versus H1 :   0

2. level of significance = 0.05

3. Test Statistic: t test (for correlation coefficient)

  𝟐
4. Rejection Regions: Since the test is two-tailed, the rejection regions are
given by
t  t or t  t
2 2

t  t 0.025 or t  t 0.025

Based from the t distribution table with degrees of freedom df = n – 2 = 10 – 2 = 8


the critical value of the test is  2 . 306

Thus, H0 is rejected if t   2 .306 or t  2 .306


Otherwise, H0 is not rejected.
5. Computation of the test statistic
Based from the previous results, the computed sample correlation coefficient is
r = 0.9315 where n = 10 sales representatives.

  𝟐   𝟐

6. Statistical Decision
Since t = 7.243 is in the critical region, H0 is rejected.

Conclusion: There is a significant linear relationship


between sales calls and number of copiers sold.
Note: There is no direct drop down menu for testing for r in Excel. But a simple linear
regression analysis can be ran to check its significance.

EXCEL: Data  Data Analysis  Regression

Check on this table:


Since the p-value (8.85E-05) is less
Standard than 0.05 level of significance, Ho is
Coefficients Error t Stat P-value
rejected.
Intercept 1.471808 0.453107 3.248257 0.011732
No. of sales calls 0.182421 0.02518 7.244578 8.85E-05
Conclusion: There is a significant
linear relationship between sales
calls and number of copiers sold.
Optional
SPSS: Graphs  Chart Builder  Scatter/Dot


Optional
SPSS: Analyze  Correlate  Bivariate
Optional

SPSS Output

H0 :   0 versus H1 :   0

Since, the p-value (.000) is less than 0.05, Ho is rejected. There is a


significant linear relationship between sales calls and number of copiers
sold. Since the computed r is positive, then there is a direct linear
relationship between sales calls and number of copiers sold.

You might also like