Pearson Chi Square Test

DASHBOARD LEARN MENU
Learn VEE Mathematical Stats 4 4.2 4.2.3 Pearson's Chi-Squared Tests
Pearson's Chi-Squared Tests
Test of Goodness-of-Fit
Hypothesis tests can also assist in assessing the quality of a model. In particular, the chi-squared goodness-of-fit
test checks whether a proposed distribution agrees with observed data.
Start with n independent observations that must be classified as one of r mutually exclusive categories. Define ni
ni
as the number of observations classified as Category i , where i = 1 , 2 , ..., r . Hence, n is the proportion of
observations in Category i .
Now consider a model that describes the distribution among the categories. If the model is properly specified, then
ni
pi , the probability an observation belongs to Category i , should be similar to n for all i . As such, the
hypotheses can be written as
ni
• H0 : pi = for all i = 1 , 2 , ..., r
n
ni
• H1 : At least one pi ≠ for i = 1 , 2 , ..., r
n
In other words, failing to reject H0 suggests that the model fits the data adequately, whereas rejecting H0
suggests that the model fits the data poorly.
Without discussing the proof, this is a right-tailed test with a test statistic calculated as
r
(ni − npi )2
∑
i=1
npi
which comes from a χ2 sampling distribution with r − 1 degrees of freedom. Therefore, reject H0 when
r
(ni − npi )2
∑ ≥ χ21−α, r−1
i=1
npi
As a reminder,
• r is the number of unique categories,
• ni is the number of Category i observations,
• n is the total number of observations,
• pi is the model's probability of a Category i observation,
• α is the significance level, and
• χ2p, ν is the 100p th percentile of a χ2 random variable with ν degrees of freedom.
EXAMPLE 4.2.5
The outcomes of 150 die rolls were recorded as follows:
Die Roll Frequency
1 17
2 18
3 24
4 29
5 33
6 29
Let χ2p, ν be the 100p th percentile of a chi-squared random variable with ν degrees of freedom. The following
table lists values of χ2p, ν for specific combinations of p and ν :
p = 0.94 p = 0.96 p = 0.98

ν=5 10.596 11.644 13.388
ν=6 12.090 13.198 15.033
Test whether the die is fair using the chi-squared goodness-of-fit test.
SOLUTION
There are six categories, one for each die roll outcome. Therefore, r =6.
1
In addition, a fair die implies that each die roll outcome is equally likely, meaning pi = 6
for all i.
With 150 observations,
i npi
1
1 150 ⋅ 6
= 25
1
2 150 ⋅ 6
= 25
⋮ ⋮
1
6 150 ⋅ 6
= 25
The test statistic is
r
(ni − npi )2 (17 − 25)2 (18 − 25)2 (29 − 25)2
∑ = + +…+
i=1
npi 25 25 25
= 8.4
This test involves 6 − 1 = 5 degrees of freedom. Note that
8.4 < 10.596
Determine the significance level associated with 10.596.
10.596 = χ20.94, 5 = χ21−α, 5 ⇒ α = 1 − 0.94 = 0.06
In conclusion, we fail to reject H0 at the 6% significance level, suggesting that the assumption of a fair die
seems reasonable for this data of 150 rolls.
Test of Independence
A contingency table records the frequency of observations described by two categorical variables. It is used to
examine the presence of dependence between the two variables. This is achieved using the same procedure as
the goodness-of-fit test. The hypotheses are
• H0 : The two variables are independent

• H1 : The two variables are dependent
One variable has r number of categories, while the other variable has s . Each of the n observations belongs to
one of the r -by-s combinations. Let
• nij be the number of observations in Category i for the first variable and Category j for the second variable,
• ni⋅ be the subtotal number of observations in Category i for the first variable, across all categories of the
second variable, and
• n⋅j be the subtotal number of observations in Category j for the second variable, across all categories of the
first variable,
for i = 1 , 2 , ..., r and j = 1 , 2 , ..., s . Thus, a contingency table resembles
Second Variable
Total
Cat 1 Cat 2 ⋯ Cat s  
Cat 1 n11 n12 ⋯ n1s n1⋅
First Cat 2 n21 n22 ⋯ n2s n2⋅

Variable
  ⋮ ⋮ ⋮ ⋱ ⋮ ⋮
Cat r nr1 nr2 ⋯ nrs nr⋅

Total n⋅1 n⋅2 ⋯ n⋅s n
The test statistic is calculated as
1 r s (nij n − ni⋅ n⋅j )2

∑∑
n i=1 j=1 ni⋅ n⋅j
which comes from a χ2 sampling distribution with (r − 1)(s − 1) degrees of freedom. Therefore, reject H0
when

∑∑ ≥ χ21−α, (r−1)(s−1)
EXAMPLE 4.2.6
150 vehicles were stopped at random by the police for inspection.

Year
Total
< 2015 ≥ 2015  
   
Cars 40 60 100
Type
Motorcycles 10 40 50
Total 50 100 150
Let χ2p, ν be the 100p th percentile of a chi-squared random variable with ν degrees of freedom. The following
table lists values of χ2p, ν for specific combinations of p and ν :
p = 0.95 p = 0.975 p = 0.99

ν=1 3.841 5.024 6.635
ν=2 5.991 7.378 9.210
Test whether the vehicle type and year are independent.
SOLUTION
Note that r = 2 and s = 2 , since each variable (type and year) has two categories.
The test statistic is

∑∑
2
1 [40(150) − 100(50)] [60(150) − 100(100)]2
= ( +
150 100(50) 100(100)
2
[10(150) − 50(50)] [40(150) − 50(100)]2
+ + )
50(50) 50(100)
=6
 
This test involves (2 − 1)(2 − 1) = 1 degree of freedom. Note that
5.024 < 6 < 6.635
Determine the significance levels associated with 5.024 and 6.635.

5.024 = χ20.975, 1 = χ21−α, 1 ⇒ α = 1 − 0.975 = 0.025
6.635 = χ20.99, 1 = χ21−α, 1 ⇒ α = 1 − 0.99 = 0.01
In conclusion, reject H0 at the 2.5% significance level, but not at the 1% level, suggesting strong evidence
that vehicle type and year are dependent.
Discussions
Ask a question
Nur Alia Kamaluddin
SUMMARY:
MESSAGE:
Type your question...
Previous Lesson Next Lesson

Watch 4.2.2 Hypothesis Test for Variances Watch 4.2.3 Test of Goodness-of-Fit

Pearson Chi Square Test

Uploaded by

Copyright:

Available Formats

Pearson Chi Square Test

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pearson Chi Square Test

Uploaded by

Copyright:

Available Formats

DASHBOARD LEARN MENU

Learn VEE Mathematical Stats 4 4.2 4.2.3 Pearson's Chi-Squared Tests

Pearson's Chi-Squared Tests

The outcomes of 150 die rolls were recorded as follows:

Die Roll Frequency

p = 0.94 p = 0.96 p = 0.98

The test statistic is

This test involves 6 − 1 = 5 degrees of freedom. Note that

8.4 < 10.596

Determine the significance level associated with 10.596.

10.596 = χ20.94, 5 = χ21−α, 5 ⇒ α = 1 − 0.94 = 0.06

• H0 : The two variables are independent

for i = 1 , 2 , ..., r and j = 1 , 2 , ..., s . Thus, a contingency table resembles

Cat 1 n11 n12 ⋯ n1s n1⋅

First Cat 2 n21 n22 ⋯ n2s n2⋅

Cat r nr1 nr2 ⋯ nrs nr⋅

The test statistic is calculated as

1 r s (nij n − ni⋅ n⋅j )2

1 r s (nij n − ni⋅ n⋅j )2

150 vehicles were stopped at random by the police for inspection.

p = 0.95 p = 0.975 p = 0.99

Test whether the vehicle type and year are independent.

The test statistic is

1 r s (nij n − ni⋅ n⋅j )2

This test involves (2 − 1)(2 − 1) = 1 degree of freedom. Note that

5.024 < 6 < 6.635

Determine the significance levels associated with 5.024 and 6.635.

Nur Alia Kamaluddin

Type your question...

Previous Lesson Next Lesson

You might also like