Nothing Special   »   [go: up one dir, main page]

STATISTICS Grade 12

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

STATISTICS grade 12

Revision
Terminology
Measures of central tendency:

Provide information on the data values at the centre of the data set.

 The mean is the 'average' value of a data set. It is calculated as


∑𝑛𝑖=1 𝑥𝑖
𝑥̅ =
𝑛
where the 𝑥𝑖 are the data and n is the number of data entries. We read 𝑥̅ as “x bar”.
 The median is the middle value of an ordered data set.

To find the median: we first sort the data in ascending or descending order and then
pick out the value in the middle of the sorted list. If the middle is in between two values,
the median is the average of those two values.

MEASURES OF DISPERSION:

Tell us how spread out a data set is. If a measure of dispersion is small, the data are
clustered in a small region.

If a measure of dispersion is large, the data are spread out over a large region.

THE RANGE is the difference between the maximum and minimum values in the data
set.

The inter-quartile range is the difference between the first and third quartiles of the
data set. The quartiles are computed in a similar way to the median.

The median is halfway into the ordered data set and is sometimes also called the
second quartile.

THE FIRST QUARTILE is one quarter of the way into the ordered data set, whereas the
THIRD QUARTILE is three quarters of the way into the ordered data set.

1
If you begin numbering your ordered data set with the number 1, the formulae for the
location of each quartile are as follows:

Location of Q1= 14 (𝑛 − 1) + 1
Location of Q2= 12 (𝑛 − 1) + 1
Location of Q3= 34 (𝑛 − 1) + 1
The variance of the data is the average squared distance between the mean and each
data value.

THE VARIANCE OF THE DATA IS


∑𝒏𝒊=𝟏(𝒙𝒊 − 𝒙
̅ )𝟐
𝛔𝟐 =
𝒏

in a population of n elements, {𝑥1 ; 𝑥2 ; … ; 𝑥𝑛 }, with a mean of 𝑥̅ .


THE STANDARD DEVIATION measures how spread out the values in a data set are
around the mean. More precisely, it is a measure of the average distance between the
values of the data in the set and the mean.

THE STANDARD DEVIATION OF THE DATA IS

∑𝒏 (𝒙𝒊 − 𝒙
̅ )𝟐
𝛔 = √ 𝒊=𝟏
𝒏

in a population of n elements, {𝑥1 ; 𝑥2 ; … ; 𝑥𝑛 }, with a mean of 𝑥̅ .


THE FIVE NUMBER SUMMARY combines a measure of central tendency, the median,
with measures of dispersion, namely the range and the inter-quartile range.

More precisely, the five number summary is written in the following order:

 minimum;
 first quartile;
 median;
 third quartile;
 maximum.

2
THE FIVE NUMBER SUMMARY is often presented visually using a box and whisker
diagram, illustrated below.

3
WORKED EXAMPLE :
FIVE NUMBER SUMMARY

Draw a box and whisker diagram for the following data set:

1,25 ; 1,5 ; 2,5 ; 2,5 ; 3,1 ; 3,2 ; 4,1 ; 4,25 ; 4,75 ; 4,8 ; 4,95 ; 5,1
STEP 1:

DETERMINE THE MINIMUM AND MAXIMUM

Since the data set is already ordered, we can read off the minimum as the first value
(1,25) and the maximum as the last value ( 5,1).
STEP 2:

DETERMINE THE QUARTILES

There are 12 values in the data set. We can use the figure below or the formulae to
determine where the quartiles are located.

Using the figure above we can see that the median is between the sixth and seventh
values. We can confirm this using the formula:

Location of Q2= 12 (𝑛 − 1) + 1 = 12 (11) + 1 = 6.5


3.2+4.1
Therefore, the value of median is 2
= 3.65
The first quartile lies between the third and fourth values. We can confirm this using the
formula:

Location of Q1= 14 (𝑛 − 1) + 1 = 14 (11) + 1 = 3.75

Therefore, the value of the first quartile is


2,5 + 2,5
𝑄1 = = 2.5
2
The third quartile lies between the ninth and tenth values. We can confirm this using the
formula:
4
Location of Q3=34 (𝑛 − 1) + 1 = 34 (11) + 1 = 9.25
Therefore, the value of the third quartile is
4,75 + 4,82
𝑄3 = = 4,775
2
STEP 3.

DRAW THE BOX AND WHISKER DIAGRAM

We now have the five number summary as ( 1,25; 2,5; 3,65; 4,775). The box and
whisker diagram representing the five number summary is given below.

5
WORKED EXAMPLE 2: VARIANCE AND STANDARD DEVIATION

You flip a coin 100100 times and it lands on heads 4444 times. You then use the same
coin and do another 100100 flips. This time in lands on heads 4949 times. You repeat
this experiment a total of 1010 times and get the following results for the number of
heads.
{44;49;52;62;53;48;54;49;46;51}
For the data set above:

 Calculate the mean.


 Calculate the variance and standard deviation using a table.
 Confirm your answer for the variance and standard deviation using a calculator.

Step 1.

CALCULATE THE MEAN

The formula for the mean is

∑𝑛𝑖=1 𝑥𝑖
𝑥̅ =
𝑛

In this case, we sum the data and divide by 10 to get 𝑥̅ = 50,8


Step 2

Calculate the variance using a table

∑𝒏 ̅) 𝟐
𝒊=𝟏(𝒙𝒊 −𝒙
The formula for the variance is 𝛔𝟐 = 𝒏

We first subtract the mean from each data point and then square the result.

44 49 52 62 53 48 54 49 46 51

-6.8 -1.8 1.2 11.2 2.2 -2.8 3.2 -1.8 -4.8 0.2
46.24 3.24 1.44 125.44 4.84 7.84 10.24 3.24 23.04 0.04 225.6
The variance is the sum of the last row in this table divided by 10, so 𝛔𝟐 = 𝟐𝟐. 𝟓𝟔

6
CALCULATE THE VARIANCE USING A CALCULATOR

Using the SHARP EL-531VH calculator:


Using your calculator, change the mode from normal to “Stat x ”. Do this by pressing
[2ndF] and then 1. This mode enables you to type in univariate data.
Key in the data, row by row:

7
8
Symmetric and skewed data

Last year you learnt about three shapes of data distribution: symmetric, left skewed and
right skewed.

A symmetric distribution is one where the left and right hand sides of the distribution
are roughly equally balanced around the mean. The histogram below shows a typical
symmetric distribution.

For symmetric distributions, the mean is approximately equal to the median and the left
and right tails are equally balanced, meaning that they have about the same length.

If large numbers of data are collected from a population, the graph will often have a bell
shape.

If the data was, say, examination results, a few learners usually get very high marks, a
few very low marks and most get a mark in the middle range. This is a common type of
symmetric data known as a normal distribution.

We say a distribution is normal if

 the mean, median and mode are equal.

 it is symmetric around the mean.

9
 68% of the sample lies within one standard deviation of the mean, 95% within
two standard deviations and 99% within three standard deviations of the mean.

What happens if the test was very easy or very difficult? Then the distribution may not
be symmetrical. If extremely high or extremely low scores are added to a distribution,
then the mean and median tend to shift towards these scores and the curve becomes
skewed.

If the test was very difficult, the mean and median scores are shifted to the left. In this
case, we say the distribution is positively skewed, or skewed right.

A distribution that is skewed right has the following characteristics:

1. the mean is typically more than the median;

2. the tail of the distribution is longer on the right hand side than on the left hand
side; and
3. the median is closer to the first quartile than to the third quartile.

If the test was very easy, then many learners would get high scores, and the mean and
median of the distribution would be shifted to the right. We say the distribution
is negatively skewed, or skewed left.

A distribution that is skewed right has the following characteristics:

1. the mean is typically less than the median;

2. the tail of the distribution is longer on the left hand side than on the right hand
side; and

10
3. the median is closer to the third quartile than to the first quartile.

4. The table below summarises the different categories visually.

11
WORKED EXAMPLE 3: SKEWED AND SYMMETRIC DATA

Three Matric classes wrote a Mathematics test. The test is out of 4040 marks and each
class has 2121 learners. The results of the test are shown in the table below:

Gr. Gr. Gr.


12A 12B 12C
4 4 4
8 12 6
8 16 6
12 16 10
12 16 14
12 16 14
12 16 16
16 20 16
16 20 18
16 28 18
16 32 21
32 32 24
32 32 24
32 32 24
36 36 28
36 36 30
36 36 32
40 36 36
40 40 36
40 40 36
40 40 40

12
1. For each class, determine the five number summary and draw a box and whisker
diagram on the same set of axes using an appropriate scale.
2. Determine the mean and standard deviation for each class.
3. Comparing the mean and median values for each class, comment on the
distribution of the test marks for each class.
1.
First, we order the data from smallest to largest. This has already been done for us.
Then, we divide our data into quartiles:

The minimum of each data set is 4.


The maximum of each data set is 40.

Since there are 21 values in the data set, the median lies on the eleventh mark, making
it equal to 16 for Gr. 12A, 32 for Gr. 12B and 21 for Gr. 12C.

The first quartile lies between the fifth and sixth values, making it equal to 12 for Gr.
12A, 16 for Gr. 12B and 14 for Gr. 12C.

The third quartile lies between the 16th and 17th values, making it equal to 36 for Gr.
30+32
12A and Gr. 12B, and 2 = 31for Gr. 12C.
Therefore, we are able to formulate the following five number summaries and
subsequent box and whisker plots:
 Gr. 12A =[4;12;16;36;40]
 Gr. 12B =[4;16;32;36;40]
 Gr. 12C =[4;14;21;31;40]

13
DETERMINE THE MEAN AND STANDARD DEVIATION FOR EACH CLASS.

2. Gr. 12A:
∑𝑛
𝑖=1 𝑥𝑖 ∑𝒏 ̅ )𝟐
𝒊=𝟏(𝒙𝒊 −𝒙
𝑥̅ = ; 𝛔=√
𝑛 𝒏

496
mean (𝑥̅ )= = 23.6
21
∑𝑛
𝐼=1(𝑥𝑖 −𝑥̅ )
2
standard deviation (σ)= √ = ±12.70
𝑛
Gr. 12B:
556
mean (𝑥̅ )= = 26.5
21
∑𝑛
𝐼=1(𝑥𝑖 −𝑥̅ )
2
standard deviation (σ)= √ = ±10.65
𝑛
Gr. 12C:
453
mean (𝑥̅ )= = 21.6
21
∑𝑛
𝐼=1(𝑥𝑖 −𝑥̅ )
2
standard deviation (σ)= √ = ±10.54
𝑛

3.
If the mean is greater than the median, the data is typically positively skewed and if
the mean is less than the median, the data is typically negatively skewed.

14
Gr. 12A:
mean−median=23,6−16=7,6
The marks for 12A are therefore positively skewed, meaning that there were many
low marks in the class with the high marks being more spread out.

Gr. 12B:
mean−median=26,5−32=−5,5
The marks for 12B are therefore negatively skewed, meaning that there were many
high marks in the class with the low marks being more spread out.

Gr. 12C: mean−median=21,6−21=0.6


The marks for 12C are therefore normally distributed, meaning that there are as
many low marks in the class as there are high marks.

15
EXERCISE 9.1
State whether each of the following data sets are symmetric, skewed right or skewed left.
a. A data set with this distribution:

skewed right
b. A data set with this box and whisker plot:

symmetric
c. A data set with this histogram:

skewed left
A data set with this frequency polygon:

skewed right

16
A data set with this distribution:

skewed left

f. The following data set:


105 ; 44 ; 94 ; 149 ; 83 ; 178 ; −4 ; 112 ; 50 ; 188

105 ; 44 ; 94 ; 149 ; 83 ; 178 ; −4 ; 112 ; 50 ; 188


Mean
1 -4
2 44
3 50 1/4 66.5 3.25
4 83
5 94 1/2 99.5 5.5
6 105
7 112
8 149 3/4 130.5 7.75
9 178
10 188
999
mean 99.9

first
quartile 66.5
third
quartile 130.5
median 99.5

17
Note that we get contradicting indications from the different ways of determining whether the
data is skewed right or left.
 The mean is slightly greater than the median. This would indicate that the data set is
skewed right.
 The median is slightly closer to the third quartile than the first quartile. This would
indicate that the data set is skewed left.
Since these differences are so small and since they contradict each other, we conclude that the
data set is symmetric.

Example 2

For the following data sets:

 Determine the mean and five number summary.

 Draw the box and whisker plot.


 Determine the skewness of the data.

a. 40 ; 45 ; 12 ; 6 ; 9 ; 16 ; 11; 7 ; 35 ; 7; 31 ; 33

3
6
7
7
9
11
12

18
16
31
35
40
45
222 18.5

STEP 1:
DETERMINE THE MINIMUM AND MAXIMUM

Since the data set is already ordered, we can read off the minimum as the first
Note: value (1,25) and the maximum as the last value (5,1).
Minimum 3
Maximum 45

STEP 2:
DETERMINE THE
QUARTILES
3 6 7 7 9 11 12 16 31 35 40 45 12

1 2 3 4 5 6 7 8 9 10 11 12
Using the figure above we can see that the median is between the sixth and seventh
values. We can confirm this using the formula:

Location Q2 6.5
Therfore , the value of median is :
11.5
The first quartile lies between the third and fourth values. We can
confirm this using the formula:

Location Q1 3.75
Therefore, the value of the first quartile is
Q1 7
The third quartile lies between the ninth and tenth values. We can
confirm this using the formula:

Location Q3 9.25
Therefore, the value of the third quartile is
Q3 33

19
STEP 3.
DRAW THE BOX AND WHISKER DIAGRAM
We now have the five number summary as ( 3; 7; 11.5; 33; 45)
The box and whisker diagram representing the five number
summary is given below.

20
21
22

You might also like