FHMM1034 Topic 1 B Descriptive Statistics Student

Centre For Foundation Studies
Department of Science and Engineering
FHMM1034 Mathematics III
Topic 1 (Part 2)
Descriptive Statistics
Contents (Part 1)
1.1 Population Vs Sample

1.2 Types of Variables
1.3 Organizing and Graphing Quantitative Data
1.4 Shapes of Histograms
1.5 Cumulative Frequency Distributions
1.6 Stem-and-Leaf Displays
Jan 2021, FHMM1034 Mathematics III Page 2

Contents (Part 2)
1.7 Measures of Central Tendency for

Ungrouped Data
1.8 Measures of Central Tendency for Grouped
Data
1.9 Measures of Dispersion for Ungrouped Data
1.10 Measures of Dispersion for Grouped Data
1.11 Symmetry and Skewness in Data Distribution
1.12 Measures of Position
1.13 Box-and-Whisker Plot and Outlier
Topics & Learning Outcomes
• Find mean, median & mode of ungrouped &

grouped data
• Find range, variance & standard deviation of
ungrouped and grouped data
• Relate mean, median & mode to skewness of
data distribution
• Find quartiles of ungrouped and grouped data
• Draw Box-and-Whisker Plot
1.7
Measures of Central Tendency
for Ungrouped Data
Introduction
• A measure of central tendency gives the centre

of a histogram or a frequency distribution
curve.
• There are 3 measures of central tendency for
ungrouped and grouped data,
(1) Mean
(2) Median
(3) Mode

Parameters & Statistics
Notations for Notations for

Population Sample
Parameters Statistics
Mean  x
Variance  2
s 2
Standard deviation  s
Proportion p p̂

Mean for Ungrouped Data
Mean for population data x1, x2,…, xN

x1 + x2 + ... + xN 1 N
= =  xi
N N i =1
Mean for sample data x1, x2,…, xn

x1 + x2 + ... + xn 1 N
x= =  xi
n n i =1

Example 1
The following are the ages (in years) of all eight
employees of a small company:
53 32 61 27 39 44 49 57
Find the mean age of these employees.

Example 2
The table shows the philanthropic giving in a lifetime of
six wealthy Americans up to 2007 as reported in
Businessweek. Find the mean of these data.
Total philanthropic giving
Donors till 2007 (million of dollars)
Warren Buffett 40,780
Bill & Melinda Gates 28,144
George Kaiser 2,522
George Soros 6,401
Gordon & Betty Moore 7,404
Walton family 2,015
Total 87,266
Mean for Ungrouped Data with
Frequency Distribution
If the value x1 occurs with frequency f1,
x2 occurs with frequency f2, …, xn
occurs with frequency fn, then
n
 fi xi
 fx
x or  = =
i =1
f
n
 fi
i =1

Example 3
Calculate the mean for the sample data in the

frequency table below and interpret the result.
Sales (units) 4 5 7 10 11 15 17
Number of days 3 12 23 10 14 8 2

Example 3 Solution
Sales Number
(units), of Days, fx
x f
4 3
5 12
7 23
10 10
11 14
15 8
17 2
f = III
Jan 2021, FHMM1034 Mathematics Page 13
Median
• The median is the value of the middle term in

a set of data that has been ranked in ascending
or descending order.
• Median is not influenced by outliers (extreme
values).
• Being the middle value implies that 50% of the
observations will be more than the median and
another 50% of them will be less than the
median.
Median for Ungrouped Data
 n +1 
Median = the value of the   th term in a ranked data
 2 
set; n = total number of data.
Note:
1. If n is odd, then median is the value of the middle
term in the ranked data
2. If n is even, then median is the average values of
the two middle terms.
3. Median is not influenced by the extreme values or
outliers.
Example 4
Find the median of each of these sets of data.

(a) 1, 9, 6, 7, 12, 8, 3, 10, 11
(b) 2, 5, 1, 6, 7, 11, 13, 8

Median for Ungrouped Data with
Frequency Distribution
When calculating the median for an ungrouped frequency
distribution, take the position of the median as:
 n +1
th
  term, n =  f
 2 
The median is exactly at the 50th position of the data,
implying 50% of the data will be less than the median and
50% of them will be more than the median.
In other words, 50% of the data is at most of the median
value, and 50% is at least of the median value.

Example 5
Find the median of the following frequency

distribution and interpret the result.
No. of children 0 1 2 3 4 5
No. of families 3 5 12 9 4 2

Example 5 Solution
Number of Number of Cumulative
Children, x families, f frequency, F
0 3
1 5
2 12
3 9
4 4
5 2
f =

Mode for Ungrouped Data
The mode is the value that occurs with the highest

frequency in a data set.
Note:
1. Mode is not influenced by the extreme values
or outliers.
2. Mode may not exist; may have one mode
(unimodal); two modes (bimodal) or more
two modes (multimodal).
3. Mode can be used for both quantitative and
qualitative data.
Example 6
Find the mode of each of the following data set.
(i) Speed (miles/hr):
77 82 74 81 79 84 74 78
(ii) Income (RM):
46,150 95,750 64,985 87,490 53,740
(iii) Prices (RM):
895 886 903 895 870 905 870 899
(iv) Age (years):
21 19 27 22 29 19 25 21 22 30
Example 7
Find the mode of the following frequency

distribution and interpret the result.
No. of children 0 1 2 3 4 5
No. of families 3 5 12 9 4 2
Ans: 2 children
1.8
Measures of Central Tendency
for Grouped Data
Mean for
Grouped Frequency Distribution
 fm
Mean for population data :  = , N = f
N
 fm
Mean for sampledata : x = , n= f
n
m is the midpoint
f is the frequency of a class.

Example 8
The table gives frequency distribution of daily commuting
time (in minutes) from home to work of all 25 employees
of a company.
Daily Commuting Number of
Time (minutes) Employees
0 to less than 10 4
10 to less than 20 9
Calculate the mean of daily commuting times.
Example 8 Solution
Daily Commuting Frequency Class mid-
fm
Time (minutes) f point, m
0 – 10 4
10 – 20 9
20 – 30 6
30 – 40 4
40 – 50 2
f = 25 fm =

Example 9
The table gives frequency distribution of number of
orders received each day during a sample period of
50 days at the office of a mail-order company.
Number of Orders Number of Days
10 − 12 4
13 − 15 12
16 − 18 20
19 − 21 14
Calculate the mean.

Example 9 Solution
Number of Number of Class mid-
fm
Orders Days, f point, m
10−12 4
13−15 12
16−18 20
19−21 14
f = 50 fm =

Median for
When calculating the median for grouped
frequency distribution, take the position of the
median as: th
n
  term, n =  f
2
Median for grouped data can be estimated using
(1) cumulative distribution curves (ogives)
(2) histogram
(3) formula (linear interpolation)
Median for
Method 1: From cumulative frequency curve
Relative
Cumulative cumulative Cumulative
frequency frequency percentage (%)
f 1 100
f 0.5 50
2
Data Data Data

Median Median Median

Median for
Method 2: From Histogram
(1) Find the median class.
(2) Determine the total frequency before the
median class.
(3) Use the method of proportion to calculate
the median.

Median for
Method 3: Using formula (linear interpolation)
Lower Upper
boundary, LM Median, M boundary
Size of median class, cm
cumulative frequency before Median class

median class, FB =  f M −1
1
f
2
cumulative frequency up to median class, FA
1
(  f ) −  f M −1
M = LM + 2 c
fM
Median for
Method 3: Using formula (linear interpolation)
1
(  f ) −  f M −1
M = LM + 2 c
fM
LM = lower boundary of median class
c = width of median class
f M = frequency of median class
 f M −1 = cumulative frequency before median class
 f = total frequency
Note : This formula can be used for both grouped data
of equal and unequal widths.
Example 10
Estimate the median of the following frequency

distribution by:
Weight
60 – 62 63 – 65 66 – 68 69 – 71 72 – 74
(nearest kg)
Frequency 3 4 5 6 2

Example 10 Solution
Method 1: Cumulative Frequency Curve (ogives)
Weight Cumulative
Frequency Weight
(nearest kg) frequency
60 – 62 3
63 – 65 4
66 – 68 5
69 – 71 6
72 – 74 2

Example 10 Solution

Example 10 Solution
Method 1: Ogive (Cumulative Relative Frequency Curve)
Cumulative
Weight Cumulative
Frequency Weight relative
(nearest kg) frequency
frequency
< 59.5 0 0/20 = 0.00
60 – 62 3 < 62.5 3 3/20 = 0.15
63 – 65 4 < 65.5 7 7/20 = 0.35
66 – 68 5 < 68.5 12 12/20 = 0.60
69 – 71 6 < 71.5 18 18/20 = 0.90
72 – 74 2 < 74.5 20 20/20 = 1.00
Example 10 Solution
"Less than" Cumulative Relative Frequency

Curve
1.2
Cumulative Relative
1.0 74.5
71.5
Frequency
0.8
0.6 68.5
0.4
65.5
0.2
62.5
0.0 59.5
56.5 61.5 66.5 71.5 76.5
Weight (kg) 67.5

Example 10 Solution
Method 1: Ogive (Cumulative Percentage Curve)
Weight Cumulative Cumulative
Frequency Weight
(nearest kg) frequency percentage
< 59.5 0 0
60 – 62 3 < 62.5 3 15
63 – 65 4 < 65.5 7 35
66 – 68 5 < 68.5 12 60
69 – 71 6 < 71.5 18 90
72 – 74 2 < 74.5 20 100

Example 10 Solution
"Less than" Cumulative Percentage Curve
120%
100% 74.5
Cumulative Percentage
71.5
80%
60% 68.5
40%
65.5
20%
62.5
0% 59.5
56.5 61.5 66.5 71.5 76.5
Weight (kg) 67.5

Example 10 Solution
Method 2: Histogram
Weight
Class Cumulative
(nearest Frequency Boundary
width Frequency
kg)
60 – 62 3
63 – 65 4
66 – 68 5
69 – 71 6
72 – 74 2

Example 10 Solution

Example 10 Solution
Method 3: Linear Interpolation Formula

Example 10 Solution

Example 11
The Mathematics marks for 100 students in a school is shown in
the histogram below. Estimate the median mark for the students
in the school.
Number of
students
24
22
20
18
16
14
12
10
8
6
4
2
0 Marks
0 10 20 30 40 50 60 70 80 90 100
Example 11 Solution
Cumulative
Boundary frequency
frequency
0 – 10 2
10 – 20 2
20 – 30 2
30 – 40 10
40 – 50 24
50 – 60 22
…

Mode for
Grouped Frequency distribution
• Modal class - the class which has the

largest standard frequency.
• An estimate of the mode can be obtained
from the modal class.
• Mode for grouped data can be obtained by
✓ Graphical method (histogram)
✓ Interpolation method

Mode for
In using formula to estimate the mode of grouped data,
follow the steps below:
(1) Find the modal class, determine its frequency.
(2) Determine the frequency of the class immediately
before the modal class.
(3) Determine the frequency of the class immediately
after the modal class.
(4) Estimate the mode for the grouped data using
similar triangles.

Mode for
After identifying the modal class,
R T PQR PST
PU f − fb
= m
fm−fb U PV fm − fa
P V f −f
m a mo − Lm
=
C − (mo − Lm )
Q
fm
fb S
fa
m o − Lm
Mode, mo
Lm
C
Mode for
PU f m − fb mo − Lm f m − fb
PQR PST  =  =
PV fm − fa C − (mo − Lm ) f m − f a
Lm = lower class boundary of the modal class

fm = frequency of the modal class
fb = frequency of the class immediately before the modal class
fa = frequency of the class immediately after the modal class
C = the class width of the modal class
f m − fb
Mode, mo = Lm + C
( f m − fb ) + ( f m − f a )
Note : This formula can only be used for grouped data
of equal widths.
Mode for
mo − Lm f − fb
= m
C − ( mo − Lm ) f m − f a
( mo − Lm ) ( f m − f a ) = ( f m − f b ) C − ( mo − Lm )
( mo − Lm ) ( f m − f a ) = C ( f m − f b ) − ( f m − f b ) ( mo − Lm )
( mo − Lm ) ( f m − f a ) + ( f m − f b ) ( mo − Lm ) = C ( f m − f b )
( mo − Lm ) ( f m − f b ) + ( f m − f a ) = C ( f m − f b )
( fm − fb )
mo = Lm + C
( fm − fb ) + ( fm − fa )
Mode for
For mode of a grouped data of unequal width, the
frequency has to be replaced by frequency density.
Lm = lower class boundary of the modal class (based on frequency
density)
ρm = frequency density of the modal class
ρb = frequency density of the class immediately before the modal class
ρa = frequency density of the class immediately after the modal class
C = the class width of the modal class
 m − b
Mode, mo = Lm + C
(  m − b ) + (  m −  a )
Mode for
Mode Mode Mode

Mode for
R T
fm−fb= 2 fm−fa=
U P V
3
Q
fm= 15
S
fb= 13 fa=
12
20 30
?
Mode for
R T
fm−fb= 2 2 : 3 fm−fa=
U P V
3
Q
2
 10 = 4 S
2+3
f m − fb 2 : 3
C
( f m − fb ) + ( f m − f a ) C = 10
Lm= 20 30
Jan 2021, FHMM1034 Mathematics III 20 + 4 = 24 Page 55

Example 12
Estimate the mode of the following distribution.
Weight
(nearest kg)
60 − 62 63 − 65 66 − 68 69 − 71 72 − 74
Frequency 3 4 5 6 2

Example 12 Solution Method 1
Weight Marks
Class Frequency
(nearest Class boundaries width f
kg)
60 – 62 3
63 – 65 4
66 – 68 5
69 – 71 6
72 – 74 2

Example 12 Solution - Histogram

Example 12 Solution - Calculation

Example 13
The following table shows the distribution of the number

of weekly accidents at a certain town which has been
recorded for 52 consecutive weeks.
Number of
0 – 4 5 – 9 10 – 14 15 – 19 20 – 24 25 – 29 30 – 34
accidents
Number of
4 6 11 15 8 5 3
weeks
What is the modal number of weekly accidents?

Example 13 Solution

Example 14
The data below show the distribution of the mass of parcels, to
the nearest gram, in a post office in a particular day.
Mass (g) 20-24 25-29 30-39 40-54 55-59
Frequency 40 48 90 60 10
(a) Construct a histogram to show the distribution of mass

of parcels and estimate the mode from the histogram.
(b) Calculate an estimate of the median and mode of the
distribution.

Example 14 Solution
Mass (g) Class

Frequency Frequency
Mass (g) Class width
f density
boundaries c
20 – 24 40
25 – 29 48
30 – 39 90
40 – 54 60
55 – 59 10

Example 14 Solution

Example 14 Solution
Mass (g), Frequency Cumulative

Mass (g)
Class boundaries f Frequency
20 – 24 40
25 – 29 48
30 – 39 90
40 – 54 60
55 – 59 10

Example 14 Solution
Mass (g), Class

Frequency Frequency
Mass (g) Class width
f density
boundaries c
20 – 24 40
25 – 29 48
30 – 39 90
40 – 54 60
55 – 59 10

Example 14 Solution

1.9
Measures of Dispersion
for Ungrouped Data
Measure of Dispersion
• The measures of central tendency, such as the mean,
median, and mode, do not reveal the whole picture of
the distribution of a data set.
• Two data sets with the same mean may have
completely different spread.
• Thus the mean, median, or mode by itself is not a
sufficient measure to reveal the shape of the
distribution of a data set.
• We need measures that can provide some information
about the variation among data values and they are
called measures of dispersion.
Consider the following two data sets on the ages of all
workers in each of two small companies:
Company 1: 35 36 38 39 40 45 47
Company 2: 18 27 33 52 70
The mean age of workers in both companies is the same.
However, the variation in the workers’ ages for each of
these companies is very different.
Company 1 35 36 38 39 40 45 47
Company 2 18 27 33 Mean = 40 52 70
The measures of central dispersion to consider:

• range,
• variance,
• standard deviation

Range for Ungrouped data
Range = Largest value – Smallest value
Disadvantages of range:
• Like the mean, range is influenced by

outliers.
• It is based on two values only and all other

values in a data set are ignored.

Variance and Standard Deviation
• Variance is a measure of the spread in a set of

data.
• The measurement is to find out how closely the
values in data set is clustered around the mean.
• The values of the variance and the standard
deviation are never negative.
• The measurement units of variance are always
the square of the measurement units of the
original data.
Basic Formulae of Variance and
Standard Deviation for Ungrouped data
Average squared Standard

deviation from Variance
deviation
the mean
 (x − )
2
 ( x −  ) 2
Population:  =
2
=
N N
( x − x )
2
 ( x − x ) 2
Sample: s =
2
s=
n −1 n −1

Deviation from the Mean
The quantity x −  or x − x in the above

formulas is called the deviation of the x value
from the mean.
The sum of the deviations of the x values from

the mean is always zero; that is,
 (x − ) = 0 and  (x − x ) =0

Suppose the midterm scores of 4 students are 82, 95,
67 and 92.
82 + 95 + 67 + 92
mean = = 84
4
x x − mean
For this reason we
82 82 – 84 = –2
square the deviation to
95 95 – 84 = +11 calculate variance and
67 67 – 84 = –17 standard deviation.
92 92 – 84 = +8
Mid Term Score
100
95
90 +11 +8
85 84
80
−2
75 −17
70
65
60
( x −  ) = 0 ( x − x ) = 0
 ( x −  ) = 478  ( x − x ) = 478
2 2
( x −  ) ( x − x )
2 2
 =2
= 119.5 s =
2
= 159.33
N n −1
 = 119.5 = 10.93 s = 159.33 = 12.62
POPULATION SAMPLE
Example 15
The following data are the ages (in years) of a

sample of eight students.
12 12 12 12 12 12 12 12
Find the standard deviation.

Calculation-friendly Formulae for Variance &
Standard Deviation for Ungrouped Data
Standard
Variance
deviation
(  x)
2
x
2
−
Population:  = 2 N  = 2
N
x
2
= − 2
N
(  x)
2
x
2
−
Sample: s = 2 n s = s2
n −1
 (x − )
2
Population Variance,  = 2
N
( x −  )
2
(
=  x 2 − 2 x +  2 )
=  x 2 − 2  x +   2
2( x) ( x)
2 2
 x = x − +
=  x − 2 2
 x + N  2 2
 N  N N
2( x)  x
2 2
= x −2(  x)
2
= x − 2
+ N 
N  N  N
( x −  )
2
 =2
N
(  x)
2
x −
2
2
x x
2
= N = − 
N N  N 
 x ( x)
2
x
2 2
= − = − 2
N N2 N

Example 16
Consider the following two data sets on the ages

of all workers in each of two small companies:
Company 1: 35 36 38 39 40 45 47
Company 2: 18 27 33 52 70
Calculate the mean, variance and standard
deviation.

Example 16 Solution
Company 1:  ( x −  ) 2
 =
2
Age, x x− (x − ) 2
N
35
36
38
39
40
45
47
∑x = 280

Example 16 Solution
Company 2:  ( x −  ) 2
 =
2
N
Age, x x− (x − ) 2
18
27
33
52
70
∑x = 200

Example 16 Solution
Company 1: ( x)
2
x −
2

2
2 N x
Age, x x 2 = = − 2
N N
35
36
38
39
40
45
47
∑x = 280

Example 16 Solution
( x)
2
Company 2:
x −
2

2
N x
2 = = − 2
Age, x x2 N N
18
27
33
52
70
∑x = 200

1.10
Measures of Dispersion
for Grouped Data
Range for Grouped data
The range for grouped data can be defined in one

of two ways:
mid-point mid-point
Range = of − of
the largest class the smallest class
upper boundary lower boundary

Range = of − of
the largest class the smallest class
Basic Formulae for Variance & Standard
Deviation for Grouped Frequency Distribution
Variance for population data:
 f (m −  )
2
 = 2
N
Variance for sample data:
2  ( − ) 2
f m x
s =
n −1

Std Dev for Grouped Frequency Distribution
Population variance:
( fm) 2
 fm −
2
 fm
2
 =
2 N = − 2
N N
Sample variance:
fm 2 − 
2
( fm )
 n
s2 =
n −1
Example 17
The following data give the frequency distribution of the
number of orders received each day during a sample
period of 50 days at the office of a mail-order company.
Number of Orders Number of Days

10−12 4
13−15 12
16−18 20
19−21 14
Calculate the variance and standard deviation.
Example 17 Solution 1
Number Class mid- Number of
(m − x ) f (m − x )
2 2
fm
of Orders point, m Days, f
10−12 4
13−15 12
16−18 20
19−21 14
 f = 50

Example 17 Solution 2
Number Class mid- Number of
fm fm2
of Orders point, m Days, f
10−12 4
13−15 12
16−18 20
19−21 14
f = 50

Example 18
The following data give the frequency distribution of the daily

commuting times (in minutes) from home to work for all 25
employees of a company.
Daily Commuting Time
Number of Employees
(minutes)
0 to less than 10 4
Calculate the variance and standard deviation.

Example 18 Solution
Commuting Class mid- Number of
fm fm2
Time point, m Employees, f
0  x < 10 4
10  x < 20 9
20  x < 30 6
30  x < 40 4
40  x < 50 2
f = 25

1.11
Symmetry and Skewness
in Data Distribution
Mean, Median & Mode for a Symmetric
Histogram & Frequency Distribution Curve
(mean at centre)

Mean, Median & Mode for a Histogram & Frequency
Distribution Curve Skewed to the Right
mean  median  mode ( mean most right)

Mean, Median & Mode for a Histogram & Frequency
Distribution Curve Skewed to the Left
mean < median < mode (mean most left)

Skewed Distribution
• Skewness measures how much the mean is different
from the median.
• For distribution which are largely skewed, positively
or negatively, median is a more appropriate measure
of central tendency.
• The mean of a data set is distorted by the presence of
extreme values (outliers) . Median, by comparison,
is not affected by outliers.
• For symmetrical (or normal) distributions, both
median and mean can be used to measure central
tendency.
1.12
Measures of Position
What is Measure of Position ?
• A measure of position determines the

position of a single value in relation to other
values in a sample or a population data set.
• The three commonly used measures of

position are quartiles, percentiles, and
percentile rank.

Quartiles and Interquartile Range
Quartiles are 3 summary measures that divide a ranked
data set into 4 equal parts.
• Second quartile (Q2) is the median of a data set.
• First quartile (Q1) is the value of the middle term
among the observations that are less than the median.
• Third quartile (Q3) is the value of the middle term
among the observations that are greater than the median.

Quartiles for Ungrouped Data
Steps:
1. Arrange data in ascending or descending order.
2. Locate the median, i.e. the Second Quartil Q2
3. For observation below median locate the
middle value i.e. the First Quartile Q1
4. For observation above median locate the
middle value i.e. the Third Quartile Q3

Interpretation of Quartiles
• Approximately 25% of the values in a ranked data set
are less than or equal to Q1 and about 75% are greater
than Q1.
• The second quartile, Q2, divides a ranked data set into

two equal parts and is the same as the median.
Approximately 50% of the data values are less than or
equal to Q2 and about 50% are greater than Q2.
• Approximately 75% of the data values are less than or

equal to Q3 and about 25% are greater than Q3.

Interquartile Range
The difference between the third and the first

quartiles gives the Interquartile range (IQR).
IQR = Q3 − Q1
IQR is a measure of dispersion or spread of the

data.
Semi-interquartile range  Q3 − Q1 
= 
(or quartile deviation)  2 

Skewed Distribution
• For distribution which are largely skewed,

positively or negatively, interquartile range
is a more appropriate measure of dispersion.
• The standard deviation of a data set is

distorted by the presence of extreme values
(outliers) . Interquartile range, by
comparison, is not affected by outliers.

Example 19
The following are the scores of 12 students in a

mathematics class.
75 80 68 53 99 58 76 73 85 88 91 79
a) Find the values of the three quartiles.
b) Hence find the value of the inter-quartile range.

Example 19 Solution

Example 20
The following are the ages of nine employees of an

insurance company:
47 28 39 51 33 37 59 24 33
(a) Find the values of the three quartiles.

Where does the age of 28 fall in relation to the ages
of these employees?
(b) Find the interquartile range.

Example 20 Solution

Example 21
Compute the Q1, Q2, Q3 for the following ungrouped
frequency distribution.
Number of hours of Number of people
viewing TV per week (Frequency)
7 4
8 8
9 11
10 5
11 2
Total 30
Example 21 Solution

Quartiles for Grouped Data
n Cq  n 
Q1 position= th, Q1 = Lq +  −  f q −1 
4 fq  4 
n Cq  n 
2 fq  2 
3n Cq  3n 
4 fq  4 
Lq = lower class boundary of the quartile class
f q = frequency of the quartile class
Cq = size of the quartile class
f q −1 = cumulative frequency before the quartile class
Example 22
Compute the Q1, Q2, Q3 for the following grouped

frequency distribution.
Weight (nearest kg) Frequency

60 – 62 3
63 – 65 4
66 – 68 5
69 – 71 6
72 – 74 2
Example 22 Solution
Weight
Weight Frequency, Cumulative
boundaries
(nearest kg) f Frequency, F
(kg)
60 – 62 3
63 – 65 4
66 – 68 5
69 – 71 6
72 – 74 2
∑f = 20
Example 22 Solution

Example 22 Solution

Example 23
Compute the weight k that is exceeded by 20% of

the adults.
Weight (nearest kg) Number of adults
60 – 62 3
63 – 65 4
66 – 68 5
69 – 71 6
72 – 74 2

Example 23 Solution
Weight Weight Frequency, Cumulative

(nearest kg) boundaries (kg) f frequency
60 – 62 3
63 – 65 4
66 – 68 5
69 – 71 6
72 – 74 2

Example 23 Solution

1.13
Box-and-Whisker Plot
(Boxplot) & Outlier
Boxplot
Boxplots shows the spread of a distribution by using the

(1) smallest value
(2) largest value
(3) second quartile (or median)
(4) first quartile
(5) third quartile
(6) lower (inner) fence
(7) upper (inner) fence
It can be displayed horizontally or vertically.

Boxplot
Boxplots displayed horizontally:
0 10 20 30 40 50 60
Smallest Largest
Median
value value
1st 3rd
Quartile quartile

Boxplot
60 Largest value
50
Boxplots 40
Third quartile Q3
displayed Median Q2
30
vertically First quartile Q1
20
10 Smallest value
0
Boxplot
• ‘Box’ starts from Q1 up to Q3 and contains

50% of the data in the middle of the
distribution.
• ‘Whisker’ starts from the box to the smallest

value and also from the box to the largest
value.
The ‘whisker’ displays the range of the data.

Boxplot
Boxplots for 3 types of distribution :
(1) Symmetrical distribution
(2) Positively skewed distribution

(skewed to the right)
(3) Negatively skewed distribution

(skewed to the left)

Boxplot for Symmetrical Distribution
‘whisker’ : same length

Median : centre of the box
Q2 − Q1 = Q3 − Q2
Boxplot for Distribution Skewed to the Right
‘whisker’ : left side shorter than right side

Median : nearer to 1st quartile
Q2 − Q1  Q3 − Q2
Boxplot for Distribution Skewed to the Left
‘whisker’ : left side longer than right side

Median : nearer to 3rd quartile
Q2 − Q1  Q3 − Q2
Example 24
The following data shows a summary of the marks for
Mathematics and Science for students in a class.
First Third
Subjects Minimum Maximum Median quartile quartile
Mathematics 10 90 60 45 70
Science 35 85 60 48 72
Draw two boxplots for this data and give comments

regarding the distribution of marks for Mathematics and
Science.

Example 24 Solution

Lower & Upper Fences
Sometimes, there occur values which are unusually

small or large in a set of data.
These extreme values occur probably because of an
error in recording the data.
Lower Fence is the value which is 1.5 times the

interquartile range smaller than the first quartile.
Upper Fence is the value which is 1.5 times the
interquartile range larger than the third quartile.

Outliers
Outliers : Points which lie outside the lower and upper

fences, i.e. points which are 1.5 times the
interquartile range more than the 3rd quartile or less
than the 1st quartile.
1.5 (Q3 - Q1) 1.5 (Q3 - Q1)
* *
Lower Upper
fence Q1 Q2 Q3 fence
Last value Last value
inside lower inside upper
Outlier fence fence Outlier

Modifying Boxplot
Outliers are excluded from the boxplot. Hence, if

there are outliers in the data, the boxplot needs to
be modified as follows:
The left whisker extends from Q1 to the smallest
value within the lower fence.
The right whisker extends from Q3 to the largest
value within the upper fence.
All the outlying data are marked with asterisks *

Example 25
The English marks obtained by 14 students are

given by
61 35 53 48 61 62 57
42 69 64 55 65 59 67
(a) Find the median and interquartile range of the
data.
(b) Construct a boxplot to illustrate the data.
(c) Comment on the distribution.

Example 25 Solution

Example 25 Solution

Example 26
The following stem plot shows the maximum temperature
in oF for each day from 1st August to 23rd August in a
town. Draw a boxplot and use your boxplot to identify
the outliers. Comment on the data distribution.
Stem Leaf
5 1
5 9
6 2 3 3 4 4 4 4 4
6 5 7 8 8 8 9 9
7 0 2 2 3
7 6 7
Jan 2021, FHMM1034 Mathematics III Key : 5|9 means 59oF Page 140
Example 26 Solution

The End of Topic 1

FHMM1034 Topic 1 B Descriptive Statistics Student

Uploaded by

Copyright:

Available Formats

FHMM1034 Topic 1 B Descriptive Statistics Student

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FHMM1034 Topic 1 B Descriptive Statistics Student

Uploaded by

Copyright:

Available Formats

Centre For Foundation Studies

Department of Science and Engineering

FHMM1034 Mathematics III

1.1 Population Vs Sample

Jan 2021, FHMM1034 Mathematics III Page 2

1.7 Measures of Central Tendency for

• Find mean, median & mode of ungrouped &

• A measure of central tendency gives the centre

Jan 2021, FHMM1034 Mathematics III Page 6

Notations for Notations for

Jan 2021, FHMM1034 Mathematics III Page 7

Mean for population data x1, x2,…, xN

Mean for sample data x1, x2,…, xn

Jan 2021, FHMM1034 Mathematics III Page 8

Jan 2021, FHMM1034 Mathematics III Page 9

Jan 2021, FHMM1034 Mathematics III Page 11

Calculate the mean for the sample data in the

Jan 2021, FHMM1034 Mathematics III Page 12

• The median is the value of the middle term in

Find the median of each of these sets of data.

(b) 2, 5, 1, 6, 7, 11, 13, 8

Jan 2021, FHMM1034 Mathematics III Page 16

Jan 2021, FHMM1034 Mathematics III Page 17

Find the median of the following frequency

Jan 2021, FHMM1034 Mathematics III Page 18

Jan 2021, FHMM1034 Mathematics III Page 19

The mode is the value that occurs with the highest

Find the mode of the following frequency

Jan 2021, FHMM1034 Mathematics III Page 24

Jan 2021, FHMM1034 Mathematics III Page 26

Calculate the mean.

Jan 2021, FHMM1034 Mathematics III Page 28

Data Data Data

Jan 2021, FHMM1034 Mathematics III Page 30

Jan 2021, FHMM1034 Mathematics III Page 31

Size of median class, cm

cumulative frequency before Median class

Estimate the median of the following frequency

Jan 2021, FHMM1034 Mathematics III Page 34

Jan 2021, FHMM1034 Mathematics III Page 35

Jan 2021, FHMM1034 Mathematics III Page 36

"Less than" Cumulative Relative Frequency

Jan 2021, FHMM1034 Mathematics III Page 38

Jan 2021, FHMM1034 Mathematics III Page 39

Jan 2021, FHMM1034 Mathematics III Page 40

Jan 2021, FHMM1034 Mathematics III Page 41

Jan 2021, FHMM1034 Mathematics III Page 42

Jan 2021, FHMM1034 Mathematics III Page 43

Jan 2021, FHMM1034 Mathematics III Page 44

Jan 2021, FHMM1034 Mathematics III Page 46

• Modal class - the class which has the

Jan 2021, FHMM1034 Mathematics III Page 47

Jan 2021, FHMM1034 Mathematics III Page 48

Lm = lower class boundary of the modal class

Mode Mode Mode

Jan 2021, FHMM1034 Mathematics III Page 53

Jan 2021, FHMM1034 Mathematics III 20 + 4 = 24 Page 55

Estimate the mode of the following distribution.