Nothing Special   »   [go: up one dir, main page]

FHMM1034 Topic 1 B Descriptive Statistics Student

Download as pdf or txt
Download as pdf or txt
You are on page 1of 142

Centre For Foundation Studies

Department of Science and Engineering

FHMM1034 Mathematics III

Topic 1 (Part 2)
Descriptive Statistics
Contents (Part 1)

1.1 Population Vs Sample


1.2 Types of Variables
1.3 Organizing and Graphing Quantitative Data
1.4 Shapes of Histograms
1.5 Cumulative Frequency Distributions
1.6 Stem-and-Leaf Displays

Jan 2021, FHMM1034 Mathematics III Page 2


Contents (Part 2)

1.7 Measures of Central Tendency for


Ungrouped Data
1.8 Measures of Central Tendency for Grouped
Data
1.9 Measures of Dispersion for Ungrouped Data
1.10 Measures of Dispersion for Grouped Data
1.11 Symmetry and Skewness in Data Distribution
1.12 Measures of Position
1.13 Box-and-Whisker Plot and Outlier
Jan 2021, FHMM1034 Mathematics III Page 3
Topics & Learning Outcomes

• Find mean, median & mode of ungrouped &


grouped data
• Find range, variance & standard deviation of
ungrouped and grouped data
• Relate mean, median & mode to skewness of
data distribution
• Find quartiles of ungrouped and grouped data
• Draw Box-and-Whisker Plot
Jan 2021, FHMM1034 Mathematics III Page 4
1.7
Measures of Central Tendency
for Ungrouped Data
Introduction

• A measure of central tendency gives the centre


of a histogram or a frequency distribution
curve.
• There are 3 measures of central tendency for
ungrouped and grouped data,
(1) Mean
(2) Median
(3) Mode

Jan 2021, FHMM1034 Mathematics III Page 6


Parameters & Statistics

Notations for Notations for


Population Sample
Parameters Statistics
Mean  x

Variance  2
s 2

Standard deviation  s
Proportion p p̂

Jan 2021, FHMM1034 Mathematics III Page 7


Mean for Ungrouped Data

Mean for population data x1, x2,…, xN


x1 + x2 + ... + xN 1 N
= =  xi
N N i =1

Mean for sample data x1, x2,…, xn


x1 + x2 + ... + xn 1 N
x= =  xi
n n i =1

Jan 2021, FHMM1034 Mathematics III Page 8


Example 1
The following are the ages (in years) of all eight
employees of a small company:
53 32 61 27 39 44 49 57
Find the mean age of these employees.

Jan 2021, FHMM1034 Mathematics III Page 9


Example 2
The table shows the philanthropic giving in a lifetime of
six wealthy Americans up to 2007 as reported in
Businessweek. Find the mean of these data.
Total philanthropic giving
Donors till 2007 (million of dollars)
Warren Buffett 40,780
Bill & Melinda Gates 28,144
George Kaiser 2,522
George Soros 6,401
Gordon & Betty Moore 7,404
Walton family 2,015
Total 87,266
Jan 2021, FHMM1034 Mathematics III Page 10
Mean for Ungrouped Data with
Frequency Distribution
If the value x1 occurs with frequency f1,
x2 occurs with frequency f2, …, xn
occurs with frequency fn, then

n
 fi xi
 fx
x or  = =
i =1

f
n
 fi
i =1

Jan 2021, FHMM1034 Mathematics III Page 11


Example 3

Calculate the mean for the sample data in the


frequency table below and interpret the result.
Sales (units) 4 5 7 10 11 15 17
Number of days 3 12 23 10 14 8 2

Jan 2021, FHMM1034 Mathematics III Page 12


Example 3 Solution
Sales Number
(units), of Days, fx
x f
4 3
5 12
7 23
10 10
11 14
15 8
17 2
f = III
Jan 2021, FHMM1034 Mathematics Page 13
Median

• The median is the value of the middle term in


a set of data that has been ranked in ascending
or descending order.
• Median is not influenced by outliers (extreme
values).
• Being the middle value implies that 50% of the
observations will be more than the median and
another 50% of them will be less than the
median.
Jan 2021, FHMM1034 Mathematics III Page 14
Median for Ungrouped Data
 n +1 
Median = the value of the   th term in a ranked data
 2 
set; n = total number of data.

Note:
1. If n is odd, then median is the value of the middle
term in the ranked data
2. If n is even, then median is the average values of
the two middle terms.
3. Median is not influenced by the extreme values or
outliers.
Jan 2021, FHMM1034 Mathematics III Page 15
Example 4

Find the median of each of these sets of data.


(a) 1, 9, 6, 7, 12, 8, 3, 10, 11

(b) 2, 5, 1, 6, 7, 11, 13, 8

Jan 2021, FHMM1034 Mathematics III Page 16


Median for Ungrouped Data with
Frequency Distribution
When calculating the median for an ungrouped frequency
distribution, take the position of the median as:
 n +1
th

  term, n =  f
 2 
The median is exactly at the 50th position of the data,
implying 50% of the data will be less than the median and
50% of them will be more than the median.
In other words, 50% of the data is at most of the median
value, and 50% is at least of the median value.

Jan 2021, FHMM1034 Mathematics III Page 17


Example 5

Find the median of the following frequency


distribution and interpret the result.
No. of children 0 1 2 3 4 5
No. of families 3 5 12 9 4 2

Jan 2021, FHMM1034 Mathematics III Page 18


Example 5 Solution
Number of Number of Cumulative
Children, x families, f frequency, F
0 3
1 5
2 12
3 9
4 4
5 2
f =

Jan 2021, FHMM1034 Mathematics III Page 19


Mode for Ungrouped Data

The mode is the value that occurs with the highest


frequency in a data set.
Note:
1. Mode is not influenced by the extreme values
or outliers.
2. Mode may not exist; may have one mode
(unimodal); two modes (bimodal) or more
two modes (multimodal).
3. Mode can be used for both quantitative and
qualitative data.
Jan 2021, FHMM1034 Mathematics III Page 20
Example 6
Find the mode of each of the following data set.
(i) Speed (miles/hr):
77 82 74 81 79 84 74 78
(ii) Income (RM):
46,150 95,750 64,985 87,490 53,740
(iii) Prices (RM):
895 886 903 895 870 905 870 899
(iv) Age (years):
21 19 27 22 29 19 25 21 22 30
Jan 2021, FHMM1034 Mathematics III Page 21
Example 7

Find the mode of the following frequency


distribution and interpret the result.
No. of children 0 1 2 3 4 5
No. of families 3 5 12 9 4 2

Ans: 2 children
Jan 2021, FHMM1034 Mathematics III Page 22
1.8
Measures of Central Tendency
for Grouped Data
Mean for
Grouped Frequency Distribution

 fm
Mean for population data :  = , N = f
N

 fm
Mean for sampledata : x = , n= f
n

m is the midpoint
f is the frequency of a class.

Jan 2021, FHMM1034 Mathematics III Page 24


Example 8
The table gives frequency distribution of daily commuting
time (in minutes) from home to work of all 25 employees
of a company.
Daily Commuting Number of
Time (minutes) Employees
0 to less than 10 4
10 to less than 20 9
20 to less than 30 6
30 to less than 40 4
40 to less than 50 2
Calculate the mean of daily commuting times.
Jan 2021, FHMM1034 Mathematics III Page 25
Example 8 Solution
Daily Commuting Frequency Class mid-
fm
Time (minutes) f point, m
0 – 10 4
10 – 20 9
20 – 30 6
30 – 40 4
40 – 50 2
f = 25 fm =

Jan 2021, FHMM1034 Mathematics III Page 26


Example 9
The table gives frequency distribution of number of
orders received each day during a sample period of
50 days at the office of a mail-order company.
Number of Orders Number of Days
10 − 12 4
13 − 15 12
16 − 18 20
19 − 21 14

Calculate the mean.


Jan 2021, FHMM1034 Mathematics III Page 27
Example 9 Solution
Number of Number of Class mid-
fm
Orders Days, f point, m
10−12 4
13−15 12
16−18 20
19−21 14
f = 50 fm =

Jan 2021, FHMM1034 Mathematics III Page 28


Median for
Grouped Frequency Distribution
When calculating the median for grouped
frequency distribution, take the position of the
median as: th
n
  term, n =  f
2
Median for grouped data can be estimated using
(1) cumulative distribution curves (ogives)
(2) histogram
(3) formula (linear interpolation)
Jan 2021, FHMM1034 Mathematics III Page 29
Median for
Grouped Frequency Distribution
Method 1: From cumulative frequency curve
Relative
Cumulative cumulative Cumulative
frequency frequency percentage (%)

f 1 100

f 0.5 50
2

Data Data Data


Median Median Median

Jan 2021, FHMM1034 Mathematics III Page 30


Median for
Grouped Frequency Distribution
Method 2: From Histogram
(1) Find the median class.
(2) Determine the total frequency before the
median class.
(3) Use the method of proportion to calculate
the median.

Jan 2021, FHMM1034 Mathematics III Page 31


Median for
Grouped Frequency Distribution
Method 3: Using formula (linear interpolation)
Lower Upper
boundary, LM Median, M boundary

Size of median class, cm

cumulative frequency before Median class


median class, FB =  f M −1

1
f
2
cumulative frequency up to median class, FA

1
(  f ) −  f M −1
M = LM + 2 c
fM
Jan 2021, FHMM1034 Mathematics III Page 32
Median for
Grouped Frequency Distribution
Method 3: Using formula (linear interpolation)
1
(  f ) −  f M −1
M = LM + 2 c
fM
LM = lower boundary of median class
c = width of median class
f M = frequency of median class
 f M −1 = cumulative frequency before median class
 f = total frequency
Note : This formula can be used for both grouped data
of equal and unequal widths.
Jan 2021, FHMM1034 Mathematics III Page 33
Example 10

Estimate the median of the following frequency


distribution by:
Weight
60 – 62 63 – 65 66 – 68 69 – 71 72 – 74
(nearest kg)
Frequency 3 4 5 6 2

Jan 2021, FHMM1034 Mathematics III Page 34


Example 10 Solution
Method 1: Cumulative Frequency Curve (ogives)

Weight Cumulative
Frequency Weight
(nearest kg) frequency

60 – 62 3
63 – 65 4
66 – 68 5
69 – 71 6
72 – 74 2

Jan 2021, FHMM1034 Mathematics III Page 35


Example 10 Solution

Jan 2021, FHMM1034 Mathematics III Page 36


Example 10 Solution
Method 1: Ogive (Cumulative Relative Frequency Curve)
Cumulative
Weight Cumulative
Frequency Weight relative
(nearest kg) frequency
frequency
< 59.5 0 0/20 = 0.00
60 – 62 3 < 62.5 3 3/20 = 0.15
63 – 65 4 < 65.5 7 7/20 = 0.35
66 – 68 5 < 68.5 12 12/20 = 0.60
69 – 71 6 < 71.5 18 18/20 = 0.90
72 – 74 2 < 74.5 20 20/20 = 1.00
Jan 2021, FHMM1034 Mathematics III Page 37
Example 10 Solution

"Less than" Cumulative Relative Frequency


Curve

1.2
Cumulative Relative

1.0 74.5
71.5
Frequency

0.8

0.6 68.5

0.4
65.5
0.2
62.5
0.0 59.5
56.5 61.5 66.5 71.5 76.5
Weight (kg) 67.5

Jan 2021, FHMM1034 Mathematics III Page 38


Example 10 Solution
Method 1: Ogive (Cumulative Percentage Curve)
Weight Cumulative Cumulative
Frequency Weight
(nearest kg) frequency percentage
< 59.5 0 0
60 – 62 3 < 62.5 3 15
63 – 65 4 < 65.5 7 35
66 – 68 5 < 68.5 12 60
69 – 71 6 < 71.5 18 90
72 – 74 2 < 74.5 20 100

Jan 2021, FHMM1034 Mathematics III Page 39


Example 10 Solution
"Less than" Cumulative Percentage Curve

120%

100% 74.5
Cumulative Percentage

71.5
80%

60% 68.5

40%
65.5
20%
62.5

0% 59.5
56.5 61.5 66.5 71.5 76.5
Weight (kg) 67.5

Jan 2021, FHMM1034 Mathematics III Page 40


Example 10 Solution
Method 2: Histogram

Weight
Class Cumulative
(nearest Frequency Boundary
width Frequency
kg)
60 – 62 3
63 – 65 4
66 – 68 5
69 – 71 6
72 – 74 2

Jan 2021, FHMM1034 Mathematics III Page 41


Example 10 Solution

Jan 2021, FHMM1034 Mathematics III Page 42


Example 10 Solution
Method 3: Linear Interpolation Formula

Jan 2021, FHMM1034 Mathematics III Page 43


Example 10 Solution

Jan 2021, FHMM1034 Mathematics III Page 44


Example 11
The Mathematics marks for 100 students in a school is shown in
the histogram below. Estimate the median mark for the students
in the school.
Number of
students
24
22
20
18
16
14
12
10
8
6
4
2
0 Marks
0 10 20 30 40 50 60 70 80 90 100
Jan 2021, FHMM1034 Mathematics III Page 45
Example 11 Solution

Cumulative
Boundary frequency
frequency
0 – 10 2
10 – 20 2
20 – 30 2
30 – 40 10
40 – 50 24
50 – 60 22

Jan 2021, FHMM1034 Mathematics III Page 46


Mode for
Grouped Frequency distribution

• Modal class - the class which has the


largest standard frequency.
• An estimate of the mode can be obtained
from the modal class.
• Mode for grouped data can be obtained by
✓ Graphical method (histogram)
✓ Interpolation method

Jan 2021, FHMM1034 Mathematics III Page 47


Mode for
Grouped Frequency distribution
In using formula to estimate the mode of grouped data,
follow the steps below:
(1) Find the modal class, determine its frequency.
(2) Determine the frequency of the class immediately
before the modal class.
(3) Determine the frequency of the class immediately
after the modal class.
(4) Estimate the mode for the grouped data using
similar triangles.

Jan 2021, FHMM1034 Mathematics III Page 48


Mode for
Grouped Frequency distribution
After identifying the modal class,
R T PQR PST
PU f − fb
= m
fm−fb U PV fm − fa
P V f −f
m a mo − Lm
=
C − (mo − Lm )
Q
fm

fb S
fa
m o − Lm

Mode, mo

Lm
C
Jan 2021, FHMM1034 Mathematics III Page 49
Mode for
Grouped Frequency distribution
PU f m − fb mo − Lm f m − fb
PQR PST  =  =
PV fm − fa C − (mo − Lm ) f m − f a

Lm = lower class boundary of the modal class


fm = frequency of the modal class
fb = frequency of the class immediately before the modal class
fa = frequency of the class immediately after the modal class
C = the class width of the modal class
f m − fb
Mode, mo = Lm + C
( f m − fb ) + ( f m − f a )
Note : This formula can only be used for grouped data
of equal widths.
Jan 2021, FHMM1034 Mathematics III Page 50
Mode for
Grouped Frequency distribution
mo − Lm f − fb
= m
C − ( mo − Lm ) f m − f a

( mo − Lm ) ( f m − f a ) = ( f m − f b ) C − ( mo − Lm )
( mo − Lm ) ( f m − f a ) = C ( f m − f b ) − ( f m − f b ) ( mo − Lm )
( mo − Lm ) ( f m − f a ) + ( f m − f b ) ( mo − Lm ) = C ( f m − f b )
( mo − Lm ) ( f m − f b ) + ( f m − f a ) = C ( f m − f b )
( fm − fb )
mo = Lm + C
( fm − fb ) + ( fm − fa )
Jan 2021, FHMM1034 Mathematics III Page 51
Mode for
Grouped Frequency distribution
For mode of a grouped data of unequal width, the
frequency has to be replaced by frequency density.
Lm = lower class boundary of the modal class (based on frequency
density)
ρm = frequency density of the modal class
ρb = frequency density of the class immediately before the modal class
ρa = frequency density of the class immediately after the modal class
C = the class width of the modal class

 m − b
Mode, mo = Lm + C
(  m − b ) + (  m −  a )
Jan 2021, FHMM1034 Mathematics III Page 52
Mode for
Grouped Frequency distribution

Mode Mode Mode

Jan 2021, FHMM1034 Mathematics III Page 53


Mode for
Grouped Frequency distribution
R T

fm−fb= 2 fm−fa=
U P V
3
Q
fm= 15
S
fb= 13 fa=
12

20 30
?
Jan 2021, FHMM1034 Mathematics III Page 54
Mode for
Grouped Frequency distribution
R T

fm−fb= 2 2 : 3 fm−fa=
U P V
3
Q
2
 10 = 4 S
2+3
f m − fb 2 : 3
C
( f m − fb ) + ( f m − f a ) C = 10
Lm= 20 30

Jan 2021, FHMM1034 Mathematics III 20 + 4 = 24 Page 55


Example 12

Estimate the mode of the following distribution.

Weight
(nearest kg)
60 − 62 63 − 65 66 − 68 69 − 71 72 − 74

Frequency 3 4 5 6 2

Jan 2021, FHMM1034 Mathematics III Page 56


Example 12 Solution Method 1

Weight Marks
Class Frequency
(nearest Class boundaries width f
kg)
60 – 62 3
63 – 65 4
66 – 68 5
69 – 71 6
72 – 74 2

Jan 2021, FHMM1034 Mathematics III Page 57


Example 12 Solution - Histogram

Jan 2021, FHMM1034 Mathematics III Page 58


Example 12 Solution - Calculation

Jan 2021, FHMM1034 Mathematics III Page 59


Example 13

The following table shows the distribution of the number


of weekly accidents at a certain town which has been
recorded for 52 consecutive weeks.

Number of
0 – 4 5 – 9 10 – 14 15 – 19 20 – 24 25 – 29 30 – 34
accidents
Number of
4 6 11 15 8 5 3
weeks

What is the modal number of weekly accidents?

Jan 2021, FHMM1034 Mathematics III Page 60


Example 13 Solution

Jan 2021, FHMM1034 Mathematics III Page 61


Example 14
The data below show the distribution of the mass of parcels, to
the nearest gram, in a post office in a particular day.
Mass (g) 20-24 25-29 30-39 40-54 55-59
Frequency 40 48 90 60 10

(a) Construct a histogram to show the distribution of mass


of parcels and estimate the mode from the histogram.
(b) Calculate an estimate of the median and mode of the
distribution.

Jan 2021, FHMM1034 Mathematics III Page 62


Example 14 Solution

Mass (g) Class


Frequency Frequency
Mass (g) Class width
f density
boundaries c
20 – 24 40
25 – 29 48
30 – 39 90
40 – 54 60
55 – 59 10

Jan 2021, FHMM1034 Mathematics III Page 63


Example 14 Solution

Jan 2021, FHMM1034 Mathematics III Page 64


Example 14 Solution

Mass (g), Frequency Cumulative


Mass (g)
Class boundaries f Frequency
20 – 24 40
25 – 29 48
30 – 39 90
40 – 54 60
55 – 59 10

Jan 2021, FHMM1034 Mathematics III Page 65


Example 14 Solution

Mass (g), Class


Frequency Frequency
Mass (g) Class width
f density
boundaries c
20 – 24 40
25 – 29 48
30 – 39 90
40 – 54 60
55 – 59 10

Jan 2021, FHMM1034 Mathematics III Page 66


Example 14 Solution

Jan 2021, FHMM1034 Mathematics III Page 67


1.9
Measures of Dispersion
for Ungrouped Data
Measure of Dispersion
• The measures of central tendency, such as the mean,
median, and mode, do not reveal the whole picture of
the distribution of a data set.
• Two data sets with the same mean may have
completely different spread.
• Thus the mean, median, or mode by itself is not a
sufficient measure to reveal the shape of the
distribution of a data set.
• We need measures that can provide some information
about the variation among data values and they are
called measures of dispersion.
Jan 2021, FHMM1034 Mathematics III Page 69
Measure of Dispersion
Consider the following two data sets on the ages of all
workers in each of two small companies:
Company 1: 35 36 38 39 40 45 47
Company 2: 18 27 33 52 70
The mean age of workers in both companies is the same.
However, the variation in the workers’ ages for each of
these companies is very different.
Company 1 35 36 38 39 40 45 47

Company 2 18 27 33 Mean = 40 52 70
Jan 2021, FHMM1034 Mathematics III Page 70
Measure of Dispersion

The measures of central dispersion to consider:


• range,
• variance,
• standard deviation

Jan 2021, FHMM1034 Mathematics III Page 71


Range for Ungrouped data

Range = Largest value – Smallest value

Disadvantages of range:

• Like the mean, range is influenced by


outliers.

• It is based on two values only and all other


values in a data set are ignored.

Jan 2021, FHMM1034 Mathematics III Page 72


Variance and Standard Deviation

• Variance is a measure of the spread in a set of


data.
• The measurement is to find out how closely the
values in data set is clustered around the mean.
• The values of the variance and the standard
deviation are never negative.
• The measurement units of variance are always
the square of the measurement units of the
original data.
Jan 2021, FHMM1034 Mathematics III Page 73
Basic Formulae of Variance and
Standard Deviation for Ungrouped data

Average squared Standard


deviation from Variance
deviation
the mean

 (x − )
2
 ( x −  ) 2
Population:  =
2
=
N N

( x − x )
2
 ( x − x ) 2
Sample: s =
2
s=
n −1 n −1

Jan 2021, FHMM1034 Mathematics III Page 74


Deviation from the Mean

The quantity x −  or x − x in the above


formulas is called the deviation of the x value
from the mean.

The sum of the deviations of the x values from


the mean is always zero; that is,

 (x − ) = 0 and  (x − x ) =0

Jan 2021, FHMM1034 Mathematics III Page 75


Deviation from the Mean
Suppose the midterm scores of 4 students are 82, 95,
67 and 92.
82 + 95 + 67 + 92
mean = = 84
4

x x − mean
For this reason we
82 82 – 84 = –2
square the deviation to
95 95 – 84 = +11 calculate variance and
67 67 – 84 = –17 standard deviation.
92 92 – 84 = +8
Jan 2021, FHMM1034 Mathematics III Page 76
Deviation from the Mean
Mid Term Score
100
95
90 +11 +8
85 84
80
−2
75 −17
70
65
60
Jan 2021, FHMM1034 Mathematics III Page 77
Deviation from the Mean

( x −  ) = 0 ( x − x ) = 0

 ( x −  ) = 478  ( x − x ) = 478
2 2

( x −  ) ( x − x )
2 2

 =2
= 119.5 s =
2
= 159.33
N n −1

 = 119.5 = 10.93 s = 159.33 = 12.62

POPULATION SAMPLE
Jan 2021, FHMM1034 Mathematics III Page 78
Example 15

The following data are the ages (in years) of a


sample of eight students.
12 12 12 12 12 12 12 12
Find the standard deviation.

Jan 2021, FHMM1034 Mathematics III Page 79


Calculation-friendly Formulae for Variance &
Standard Deviation for Ungrouped Data
Standard
Variance
deviation
(  x)
2

x
2

Population:  = 2 N  = 2
N
x
2
= − 2
N
(  x)
2

x
2

Sample: s = 2 n s = s2
n −1
Jan 2021, FHMM1034 Mathematics III Page 80
Calculation-friendly Formulae for Variance &
Standard Deviation for Ungrouped Data
 (x − )
2
Population Variance,  = 2
N
( x −  )
2

(
=  x 2 − 2 x +  2 )
=  x 2 − 2  x +   2
2( x) ( x)
2 2
 x = x − +
=  x − 2 2
 x + N  2 2
 N  N N
2( x)  x
2 2
= x −2(  x)
2
= x − 2
+ N 
N  N  N
Jan 2021, FHMM1034 Mathematics III Page 81
Calculation-friendly Formulae for Variance &
Standard Deviation for Ungrouped Data

( x −  )
2
 =2
N
(  x)
2
x −
2
2
x x
2
= N = − 
N N  N 
 x ( x)
2
x
2 2
= − = − 2
N N2 N

Jan 2021, FHMM1034 Mathematics III Page 82


Example 16

Consider the following two data sets on the ages


of all workers in each of two small companies:
Company 1: 35 36 38 39 40 45 47
Company 2: 18 27 33 52 70
Calculate the mean, variance and standard
deviation.

Jan 2021, FHMM1034 Mathematics III Page 83


Example 16 Solution
Company 1:  ( x −  ) 2
 =
2

Age, x x− (x − ) 2
N
35
36
38
39
40
45
47

∑x = 280

Jan 2021, FHMM1034 Mathematics III Page 84


Example 16 Solution

Company 2:  ( x −  ) 2
 =
2

N
Age, x x− (x − ) 2

18
27
33
52
70

∑x = 200

Jan 2021, FHMM1034 Mathematics III Page 85


Example 16 Solution
Company 1: ( x)
2

x −
2

2
2 N x
Age, x x 2 = = − 2
N N
35
36
38
39
40
45
47

∑x = 280

Jan 2021, FHMM1034 Mathematics III Page 86


Example 16 Solution

( x)
2
Company 2:
x −
2

2
N x
2 = = − 2
Age, x x2 N N
18
27
33
52
70

∑x = 200

Jan 2021, FHMM1034 Mathematics III Page 87


1.10
Measures of Dispersion
for Grouped Data
Range for Grouped data

The range for grouped data can be defined in one


of two ways:

mid-point mid-point
Range = of − of
the largest class the smallest class

upper boundary lower boundary


Range = of − of
the largest class the smallest class
Jan 2021, FHMM1034 Mathematics III Page 89
Basic Formulae for Variance & Standard
Deviation for Grouped Frequency Distribution

Variance for population data:

 f (m −  )
2
 = 2
N

Variance for sample data:

2  ( − ) 2
f m x
s =
n −1

Jan 2021, FHMM1034 Mathematics III Page 90


Calculation-friendly Formulae for Variance &
Std Dev for Grouped Frequency Distribution
Population variance:
( fm) 2
 fm −
2
 fm
2
 =
2 N = − 2

N N

Sample variance:
fm 2 − 
2
( fm )
 n
s2 =
n −1
Jan 2021, FHMM1034 Mathematics III Page 91
Example 17
The following data give the frequency distribution of the
number of orders received each day during a sample
period of 50 days at the office of a mail-order company.

Number of Orders Number of Days


10−12 4
13−15 12
16−18 20
19−21 14
Calculate the variance and standard deviation.
Jan 2021, FHMM1034 Mathematics III Page 92
Example 17 Solution 1
Number Class mid- Number of
(m − x ) f (m − x )
2 2
fm
of Orders point, m Days, f
10−12 4
13−15 12
16−18 20
19−21 14

 f = 50

Jan 2021, FHMM1034 Mathematics III Page 93


Example 17 Solution 2
Number Class mid- Number of
fm fm2
of Orders point, m Days, f
10−12 4
13−15 12
16−18 20
19−21 14
f = 50

Jan 2021, FHMM1034 Mathematics III Page 94


Example 18

The following data give the frequency distribution of the daily


commuting times (in minutes) from home to work for all 25
employees of a company.
Daily Commuting Time
Number of Employees
(minutes)
0 to less than 10 4
10 to less than 20 9
20 to less than 30 6
30 to less than 40 4
40 to less than 50 2

Calculate the variance and standard deviation.


Jan 2021, FHMM1034 Mathematics III Page 95
Example 18 Solution
Commuting Class mid- Number of
fm fm2
Time point, m Employees, f

0  x < 10 4
10  x < 20 9
20  x < 30 6
30  x < 40 4
40  x < 50 2
f = 25

Jan 2021, FHMM1034 Mathematics III Page 96


1.11
Symmetry and Skewness
in Data Distribution
Mean, Median & Mode for a Symmetric
Histogram & Frequency Distribution Curve

(mean at centre)

Jan 2021, FHMM1034 Mathematics III Page 98


Mean, Median & Mode for a Histogram & Frequency
Distribution Curve Skewed to the Right

mean  median  mode ( mean most right)

Jan 2021, FHMM1034 Mathematics III Page 99


Mean, Median & Mode for a Histogram & Frequency
Distribution Curve Skewed to the Left

mean < median < mode (mean most left)

Jan 2021, FHMM1034 Mathematics III Page 100


Skewed Distribution
• Skewness measures how much the mean is different
from the median.
• For distribution which are largely skewed, positively
or negatively, median is a more appropriate measure
of central tendency.
• The mean of a data set is distorted by the presence of
extreme values (outliers) . Median, by comparison,
is not affected by outliers.
• For symmetrical (or normal) distributions, both
median and mean can be used to measure central
tendency.
Jan 2021, FHMM1034 Mathematics III Page 101
1.12
Measures of Position
What is Measure of Position ?

• A measure of position determines the


position of a single value in relation to other
values in a sample or a population data set.

• The three commonly used measures of


position are quartiles, percentiles, and
percentile rank.

Jan 2021, FHMM1034 Mathematics III Page 103


Quartiles and Interquartile Range
Quartiles are 3 summary measures that divide a ranked
data set into 4 equal parts.
• Second quartile (Q2) is the median of a data set.
• First quartile (Q1) is the value of the middle term
among the observations that are less than the median.
• Third quartile (Q3) is the value of the middle term
among the observations that are greater than the median.

Jan 2021, FHMM1034 Mathematics III Page 104


Quartiles for Ungrouped Data

Steps:
1. Arrange data in ascending or descending order.
2. Locate the median, i.e. the Second Quartil Q2
3. For observation below median locate the
middle value i.e. the First Quartile Q1
4. For observation above median locate the
middle value i.e. the Third Quartile Q3

Jan 2021, FHMM1034 Mathematics III Page 105


Interpretation of Quartiles
• Approximately 25% of the values in a ranked data set
are less than or equal to Q1 and about 75% are greater
than Q1.

• The second quartile, Q2, divides a ranked data set into


two equal parts and is the same as the median.
Approximately 50% of the data values are less than or
equal to Q2 and about 50% are greater than Q2.

• Approximately 75% of the data values are less than or


equal to Q3 and about 25% are greater than Q3.

Jan 2021, FHMM1034 Mathematics III Page 106


Interquartile Range

The difference between the third and the first


quartiles gives the Interquartile range (IQR).

IQR = Q3 − Q1

IQR is a measure of dispersion or spread of the


data.
Semi-interquartile range  Q3 − Q1 
= 
(or quartile deviation)  2 

Jan 2021, FHMM1034 Mathematics III Page 107


Skewed Distribution

• For distribution which are largely skewed,


positively or negatively, interquartile range
is a more appropriate measure of dispersion.

• The standard deviation of a data set is


distorted by the presence of extreme values
(outliers) . Interquartile range, by
comparison, is not affected by outliers.

Jan 2021, FHMM1034 Mathematics III Page 108


Example 19

The following are the scores of 12 students in a


mathematics class.
75 80 68 53 99 58 76 73 85 88 91 79
a) Find the values of the three quartiles.
b) Hence find the value of the inter-quartile range.

Jan 2021, FHMM1034 Mathematics III Page 109


Example 19 Solution

Jan 2021, FHMM1034 Mathematics III Page 110


Example 20

The following are the ages of nine employees of an


insurance company:
47 28 39 51 33 37 59 24 33

(a) Find the values of the three quartiles.


Where does the age of 28 fall in relation to the ages
of these employees?

(b) Find the interquartile range.

Jan 2021, FHMM1034 Mathematics III Page 111


Example 20 Solution

Jan 2021, FHMM1034 Mathematics III Page 112


Example 21
Compute the Q1, Q2, Q3 for the following ungrouped
frequency distribution.
Number of hours of Number of people
viewing TV per week (Frequency)
7 4
8 8
9 11
10 5
11 2
Total 30
Jan 2021, FHMM1034 Mathematics III Page 113
Example 21 Solution

Jan 2021, FHMM1034 Mathematics III Page 114


Quartiles for Grouped Data
n Cq  n 
Q1 position= th, Q1 = Lq +  −  f q −1 
4 fq  4 
n Cq  n 
Q2 position= th, Q2 = Lq +  −  f q −1 
2 fq  2 
3n Cq  3n 
Q3 position= th, Q3 = Lq +  −  f q −1 
4 fq  4 
Lq = lower class boundary of the quartile class
f q = frequency of the quartile class
Cq = size of the quartile class
f q −1 = cumulative frequency before the quartile class
Jan 2021, FHMM1034 Mathematics III Page 115
Example 22

Compute the Q1, Q2, Q3 for the following grouped


frequency distribution.

Weight (nearest kg) Frequency


60 – 62 3
63 – 65 4
66 – 68 5
69 – 71 6
72 – 74 2
Jan 2021, FHMM1034 Mathematics III Page 116
Example 22 Solution

Weight
Weight Frequency, Cumulative
boundaries
(nearest kg) f Frequency, F
(kg)
60 – 62 3
63 – 65 4
66 – 68 5
69 – 71 6
72 – 74 2
∑f = 20
Jan 2021, FHMM1034 Mathematics III Page 117
Example 22 Solution

Jan 2021, FHMM1034 Mathematics III Page 118


Example 22 Solution

Jan 2021, FHMM1034 Mathematics III Page 119


Example 23

Compute the weight k that is exceeded by 20% of


the adults.
Weight (nearest kg) Number of adults
60 – 62 3
63 – 65 4
66 – 68 5
69 – 71 6
72 – 74 2

Jan 2021, FHMM1034 Mathematics III Page 120


Example 23 Solution

Weight Weight Frequency, Cumulative


(nearest kg) boundaries (kg) f frequency
60 – 62 3
63 – 65 4
66 – 68 5
69 – 71 6
72 – 74 2

Jan 2021, FHMM1034 Mathematics III Page 121


Example 23 Solution

Jan 2021, FHMM1034 Mathematics III Page 122


1.13
Box-and-Whisker Plot
(Boxplot) & Outlier
Boxplot

Boxplots shows the spread of a distribution by using the


(1) smallest value
(2) largest value
(3) second quartile (or median)
(4) first quartile
(5) third quartile
(6) lower (inner) fence
(7) upper (inner) fence

It can be displayed horizontally or vertically.

Jan 2021, FHMM1034 Mathematics III Page 124


Boxplot

Boxplots displayed horizontally:

0 10 20 30 40 50 60

Smallest Largest
Median
value value

1st 3rd
Quartile quartile

Jan 2021, FHMM1034 Mathematics III Page 125


Boxplot
60 Largest value

50

Boxplots 40
Third quartile Q3
displayed Median Q2
30
vertically First quartile Q1
20

10 Smallest value

0
Jan 2021, FHMM1034 Mathematics III Page 126
Boxplot

• ‘Box’ starts from Q1 up to Q3 and contains


50% of the data in the middle of the
distribution.

• ‘Whisker’ starts from the box to the smallest


value and also from the box to the largest
value.

The ‘whisker’ displays the range of the data.

Jan 2021, FHMM1034 Mathematics III Page 127


Boxplot

Boxplots for 3 types of distribution :

(1) Symmetrical distribution

(2) Positively skewed distribution


(skewed to the right)

(3) Negatively skewed distribution


(skewed to the left)

Jan 2021, FHMM1034 Mathematics III Page 128


Boxplot for Symmetrical Distribution

‘whisker’ : same length


Median : centre of the box
Q2 − Q1 = Q3 − Q2
Jan 2021, FHMM1034 Mathematics III Page 129
Boxplot for Distribution Skewed to the Right

‘whisker’ : left side shorter than right side


Median : nearer to 1st quartile
Q2 − Q1  Q3 − Q2
Jan 2021, FHMM1034 Mathematics III Page 130
Boxplot for Distribution Skewed to the Left

‘whisker’ : left side longer than right side


Median : nearer to 3rd quartile
Q2 − Q1  Q3 − Q2
Jan 2021, FHMM1034 Mathematics III Page 131
Example 24
The following data shows a summary of the marks for
Mathematics and Science for students in a class.
First Third
Subjects Minimum Maximum Median quartile quartile
Mathematics 10 90 60 45 70
Science 35 85 60 48 72

Draw two boxplots for this data and give comments


regarding the distribution of marks for Mathematics and
Science.

Jan 2021, FHMM1034 Mathematics III Page 132


Example 24 Solution

Jan 2021, FHMM1034 Mathematics III Page 133


Lower & Upper Fences

Sometimes, there occur values which are unusually


small or large in a set of data.
These extreme values occur probably because of an
error in recording the data.

Lower Fence is the value which is 1.5 times the


interquartile range smaller than the first quartile.
Upper Fence is the value which is 1.5 times the
interquartile range larger than the third quartile.

Jan 2021, FHMM1034 Mathematics III Page 134


Outliers

Outliers : Points which lie outside the lower and upper


fences, i.e. points which are 1.5 times the
interquartile range more than the 3rd quartile or less
than the 1st quartile.
1.5 (Q3 - Q1) 1.5 (Q3 - Q1)

* *
Lower Upper
fence Q1 Q2 Q3 fence
Last value Last value
inside lower inside upper
Outlier fence fence Outlier

Jan 2021, FHMM1034 Mathematics III Page 135


Modifying Boxplot

Outliers are excluded from the boxplot. Hence, if


there are outliers in the data, the boxplot needs to
be modified as follows:
The left whisker extends from Q1 to the smallest
value within the lower fence.
The right whisker extends from Q3 to the largest
value within the upper fence.
All the outlying data are marked with asterisks *

Jan 2021, FHMM1034 Mathematics III Page 136


Example 25

The English marks obtained by 14 students are


given by
61 35 53 48 61 62 57
42 69 64 55 65 59 67
(a) Find the median and interquartile range of the
data.
(b) Construct a boxplot to illustrate the data.
(c) Comment on the distribution.

Jan 2021, FHMM1034 Mathematics III Page 137


Example 25 Solution

Jan 2021, FHMM1034 Mathematics III Page 138


Example 25 Solution

Jan 2021, FHMM1034 Mathematics III Page 139


Example 26
The following stem plot shows the maximum temperature
in oF for each day from 1st August to 23rd August in a
town. Draw a boxplot and use your boxplot to identify
the outliers. Comment on the data distribution.
Stem Leaf
5 1
5 9
6 2 3 3 4 4 4 4 4
6 5 7 8 8 8 9 9
7 0 2 2 3
7 6 7
Jan 2021, FHMM1034 Mathematics III Key : 5|9 means 59oF Page 140
Example 26 Solution

Jan 2021, FHMM1034 Mathematics III Page 141


The End of Topic 1

You might also like