احصاء ابراهيم - 221221 - 085821
احصاء ابراهيم - 221221 - 085821
احصاء ابراهيم - 221221 - 085821
Department of Mathematics
imalmanjahi@kku.edu.sa
ﻫـ1442
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 0
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 1
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Definition of statistics
Statistics is the science of conducting studies to collect, organize,
summarize, analyze, and draw conclusions from data.
Branches of Statistics
A- Descriptive Statistics consists of the collection, organization, sum-
marization, and presentation of data.
Methods of Descriptive Statistics
✦ Frequency distributions (Frequency Tables), graphs,..
✦ Measures of central tendency (averages), measures of dispersion,...
B- Inferential Statistics consists of generalizing from samples to popu-
lations, performing estimations and hypothesis tests, determining rela-
tionships among variables, and making predictions. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 2
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Types of Population:
Sample:
It is the part of the population from which information is collected.
Taking a sample from the population saves time and effort, such as examin-
ing a sample of eggs or the lifetime of the electricity bulbs produced from a
factory. .
.
.
.
.
.
. .
. .
. .
. .
.
. . . .
. . .
. .
. .
. .
. . .
.
.
.
.
.
.
.
.
.
3
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Data
is a set of observations taken during a specific study and maybe nu-
merical (quantitative) data such as lengths and weights of a group of
students, or non-numerical (Qualitative) data such as skin color, gen-
der, etc.
Variables:
Characteristics that varies from one person or thing to another.
Types:
✦ Qualitative variables are the variables that yield non-numerical
data. For example, gender (male or female), hair colour, eye colour,....
✦ Quantitative variables are the variables that yield numerical data.
For example, weight, height, measurement of the IQ, ...
Sources of data collection: Two sources
Example (1-1):
If we want to choose a sample consisting of (30) students from the
College of Science students in the first stage. The number of admitted
students in the Department of Life Sciences is 130, in the Department
of Chemistry is 110, in the Department of Mathematics is 50 and in
the Department of Physics is 100. How many students do we choose
from each Department? . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 7
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Sol:
Total number of students = 130 + 110 + 50 + 100 = 390
From Department of Life Sciences = ( 130
390 ) × 30 = 10
From Department of Chemistry = ( 110
390 ) × 30 = 8
50
From Department of Mathematics = ( 390 ) × 30 = 4
From Department of Physics = ( 100
390 ) × 30 = 8
3- Cluster Sampling: Here, the population is divided into groups and these
groups are divided into subgroups, and so on so that the smallest subgroup is
called a cluster. Then, we choose from each cluster a simple random sample
to get a cluster sample.
Example (1-2):
To study the opportunities for appointing King Khalid University students
after graduation. How do we determine the best sample?
Sol: Use the cluster sample because we have college students, department
. . . . . . . . . . . . . . . . . . . .
students. . . . . . . . . . . . . . . . . . . . . 8
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
4- Systematic Sampling:
Researchers obtain systematic samples by numbering each subject of
the population and then selecting every k th subject.
Example (1-3):
Suppose there were 2000 subjects in the population and a sample of
50 subjects were needed.
Sol: Since 2000/50 = 40, then k = 40, and every 40th subject would
be selected; however, the first subject (numbered between 1 and 40)
would be selected at random. Suppose subject 12 were the first subject
selected; then the sample would consist of the subjects whose numbers
were 12, 52, 92, etc.
Note: See page 727 for other types of sampling techniques.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 9
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Chapter 2
Frequency Distributions and Graphs
Content:
✦ Organizing Data
✦ Histograms, Frequency Polygons, and Ogives
✦ Other Types of Graphs
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 10
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
1. Organizing Data
✦ When data are collected in original form, they are called raw data.
✦ After collecting raw data, we organize it to make it easier for us
to deal with it and study it, and it is organized with a table called
frequency distribution.
Frequency distribution is the organizing of raw data in table form, using
classes and frequencies; the frequency of a class is denoted by f .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 11
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
To construct a frequency distribution for this data, we will follow the
following steps:
1. Find the lowest (L) and the highest (H) values in the row data.
2. Calculate the Range (R) where
R = highest value − lowest value = H − L.
3. Decide on the number (n) of classes (or intervals) desired; use 5 to 15
classes.
4. Find the width (W) of the class using W = R
n. (Always round up.)
5. Find the lower limit (LL) and upper limit (UL) of the first class by:
LL = L, U L = LL + W − 1
6. For the second class limits, we use
LL = upper limit of first class + 1, U L = LL + W − 1
and use the same method for other classes. Then, calculate the frequency
for each class. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 12
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Example (2-1):
Data below shows the marks for 50 students in mathematics:
27 36 72 47 48 29 18 57 33 61 44 10 76 15 67 52 35
43 71 73 56 32 81 64 85 55 19 69 50 46 68 25 36 43
54 52 27 44 98 64 61 42 36 29 42 51 38 90 67 63.
Summarize the data in a frequency distribution table.
Sol:
1. Note that L = 10 and H = 98. Then R = 98 − 10 = 88
2. In this example, we will choose n = 9. Then
R 88
W = = = 9.77 ≈ 10
n 9
3. Find the lower limit (LL) and upper limit (UL) of the first class
by: LL = 10, U L = 10 + 10 − 1 = 19 .
.
.
.
.
.
. .
. .
. .
. . . . .
. . . .
.
.
. . . .
. . . .
.
.
.
.
.
.
.
.
.
13
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
For the second class limits, we use
LL = 19 + 1 = 20, U L = 20 + 10 − 1 = 29 and so on for oth-
ers. Then, find the frequency for each class.
4. The frequency distribution is finally constructed as following.
Class Limits Frequency fi
10 − 19 4
20 − 29 5
30 − 39 7
40 − 49 9
50 − 59 8
60 − 69 9
70 − 79 4
80 − 89 2
90 − 99 2
Sum 50
Remark:
to facilitate the construction of statistical tables derived from the frequency
distribution table, as well as the various statistical calculations that we will
be exposed to by explaining later, the boundaries of the classes in the
. . . . . . . . . . . . . . . . . . . .
frequency tables must be real. . . . . . . . . . . . . . . . . . . . . 16
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Sol.
Note that this bivariate data is a qualitative data of 20 Grades for chem-
istry and mathematics. We create a bivariate frequency table for this
data as follows:
Chimst
A B C D E Sum
Math
A 2 1 2 0 0 5
B 1 1 3 1 0 6
C 2 3 2 0 0 7
D 0 0 0 0 0 0
E 0 0 0 1 1 2
Sum 5 5 7 2 1 20
Note the ease of creating a bivariate frequency table in Example (2-2).
. . . . . . . . . . . . . . . . . . . .
Sol:
Note that these bivariate data are quantitative (numerical) data for the marks of
30 students in the subjects of statistics and mathematics. As the marks range
from 50 to 100, therefore the appropriate width for the class limits for both
statistics and mathematics in this example is 10. We construct the bivariate
. . . . . . . . . . . . . . . . . . . .
frequency table for this data as follows: . . . . . . . . . . . . . . . . . . . . 20
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Stat
50−59 60−69 70−79 80−89 90−99 Sum
Math
50−59 3 1 0 0 0 4
60−69 0 4 2 0 0 6
70−79 0 0 8 1 0 9
80−89 0 0 0 6 0 6
90−99 1 0 0 0 4 5
Sum 4 5 10 7 4 30
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 21
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
2. Graphic display
1. Histogram
9.5 19.5 29.5 39.5 49.5 59.5 69.5 79.5 89.5 99.5
As the histogram shows, the classes with the greatest number of data
values 9 are 39.5-49.5 and 59.5–69.5, followed by 8 for 49.5–59.5. The
graph also has two peaks. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 24
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
2. Frequency Polygon
The frequency polygon is a graph that displays the data by using lines
that connect points plotted for the frequencies at the midpoints of the
classes (at x-axis). The frequencies are represented by the heights of
the points (at y-axis).
The frequency polygon for Example (2-1) is plotted using the following
steps:
Step 1: Find the midpoints of each class as in Example (2-1).
Then, label the x axis with the midpoint of each class, and the y
axis with the frequencies, i.e.
x 14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5 94.5
y 4 5 7 9 8 9 4 2 2
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 25
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Step 2: Using the midpoints for the x values and the frequencies
as the y values, plot the points.
Step 3: Connect adjacent points with line segments.
9
Frequency (Number of students)
8
7
6
5
4
3
2
1
Marks (Midpoints)
3. Frequency Curve
By following the same previous steps in drawing the polygon, the fre-
quency curve can be drawn, but the broken lines are smoothed into a
curve so that it passes by the most number of points. For example (2-1),
the frequency curve can be drawn as:
9
Frequency (Number of students)
8
7
6
5
4
3
2
1
Marks (Midpoints)
Remark: The relative and percent frequency curves can be drawn in the same
. . . . . . . . . . . . . . . . . . . .
way. . . . . . . . . . . . . . . . . . . . . 27
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
4. Cumulative Frequency graph, or ogive
The ogive is a graph that represents the cumulative frequencies for the
classes in a frequency distribution. Steps for plotting ogive are:
Step 1: Find the cumulative frequency for each class.
Step 2: Draw the x and y axes. Label the x axis with the class
boundaries. Use an appropriate scale for the y axis to represent
the cumulative frequencies.
Step 3: Plot the cumulative frequency at each upper class bound-
ary. Upper boundaries are used since the cumulative frequencies
represent the number of data values accumulated up to the upper
boundary of each class.
Step 4: Starting with the first class boundary and connect adjacent
points with line segments. .
.
.
.
.
.
. .
. .
. .
. . . . .
. . . .
.
.
. . . .
. . . .
.
.
.
.
.
.
.
.
.
28
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
40
30
20
10
9.5 19.5 29.5 39.5 49.5 59.5 69.5 79.5 89.5 99.5
Marks (Class)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 30
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
We introduce other types of graphs that have the most important meth-
ods used to illustrate the relationship between variables. These are:
✦ Line Graph
✦ Bar Graph
✦ Pie Chart
✦ Stem-and-Leaf diagrams
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 31
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Line Graph (or a time series graph)
A line graph is a type of chart used to represent data that occur over
a specific period of time. The horizontal axis represents time (years,
months, or days) and the vertical axis represents the values of data.
Example (2-4):
The following table contains information collected about the speed of
a particle at certain time periods:
Time (s) 0 1 2 3 4 5 6
Speed (m\s) 0 3 7 12 20 30 45
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 32
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
30
Speed (m\s)
20
12
3
0
1 2 3 4 5 6
Time(s)
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 33
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Example (2-5):
The following table contains information gathered on the number of
secondary schools for boys and girls in the Kingdom of Saudi Arabia
from 1395 to 1401 H. Construct a compound line graphs for the data.
The data, in the above table, can be plotted graphically using a com-
pound of line graphs. A different colour or pattern should be given for
each graph. In this example, the x-axis represents “Year” and y-axis
represents “Number of schools”.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 34
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Boy Schools
350
Girl Schools
300
Number of Schools
250
200
150
100
Years
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 35
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Bar Graphs
A bar graph represents the data by using vertical or horizontal bars
whose heights or lengths represent the frequencies of the data. When
the data are qualitative or categorical, bar graphs can be used to
represent the data. A bar graph can be drawn using either horizontal
or vertical bars.
Bar graphs differ from histograms for three main reasons:
• The columns (bars) are positioned over a label that represents a
categorical variable.
• The columns do not have a class width.
• There is a gab between columns.
There are three types of bar graphs:
• Simple bar chart • Grouped bar charts . .
• Stacked bar charts
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 36
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
500
450
400
Number of Schools
350 # of Schools
300
250
200
150
100
50
0
1395 1396 1397 1398 1399 1400
Years
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 37
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
• Grouped bar chart
is used to represent and compare different categories of two or more
groups.
The grouped bar graph is drawn using the following steps:
Boy Schools
350
Girl Schools
300
Number of Schools
250
200
150
100
50
Years . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 39
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
• Stacked bar chart
is used to break down and compare parts of a whole. Each bar in the
chart represents a whole, and segments in the bar represent different
parts or categories of that whole.
For Example (2-5) It is possible to compare the evolution of the number
of boy schools and the number of girl schools in each year by using
stacked bar graph as follows:
500
450
Number of Schools
400
350
300
Boy Schools
250
200
Girl Schools
150
100
50
0
1395 1396 1397 1398 1399 1400
Years . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 40
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Pie Chart
A pie graph is a circle that is divided into sections or wedges according to the
percentage of frequencies in each category of the distribution.
To construct a pie graph for the data, follow these steps:
Step 1: Since there are 360◦ in a circle, the frequency for each class
must be converted into a proportional part of the circle. This conversion
f
is done by using the formula Degrees = × 360 where f frequency
n
for each class and n sum of the frequencies.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 42
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Using the data in the previous table, the pie chart can be drawn as
follows:
Africa
23%
Asia
36% South America
13%
Australia
6%
Europe North America
4% 18%
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 43
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Sometimes we have percentages for readings, given that the angles
are calculated as follows:
Angle = % × 3.6
Exercise (2-1):
The following percentages indicate the source of energy used
worldwide. Construct the pie graph
Energy Percentages
Petroleum 39.8
Coal 23.2
Dry natural gas 22.4
Hydroelectric 7.0
Nuclear 6.4
Other (wind, solar, etc.) 1.2
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 44
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Stem-and-Leaf diagrams
A stem and leaf plot is a data plot that uses part of the data value as the
stem and part of the data value as the leaf to form groups or classes. It
was presented by the statistician John Tony for the first time in 1960.
Its advantages are:
• It helps to gain a broad idea of the data in terms of the extent of
data and how it is centered.
• Clarifies any gaps in the data given and reveals the extreme values
in the data.
Stem-and-Leaf has two parts:
1- Leave is the first value to the right of the number.
2- Stem is the rest of the number.
For example, the number 35, the leave is 5, the stem is 3. For the
number 137, the leave is 7, and the stem is13. .
.
.
.
.
.
. .
. .
. .
. . . . .
. . . .
.
.
. . . .
. . . .
.
.
.
.
.
.
.
.
.
45
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
To construct the stem-and-Leaf diagram, follow these steps:
Step 1: Arrange the data in order.
Step 2: Separate the data according to the first digit.
Step 3: A display can be made by using the leading digit as the
stem and the trailing digit as the leaf.
Example (2-7):
At an outpatient testing center, the number of cardiograms performed
each day for 20 days is shown. Construct a stem and leaf plot for the
data.
25 31 20 32 13 14 43 02 57 23
36 32 33 32 44 32 52 44 51 45
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 46
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Sol:
Step 1: 02, 13, 14, 20, 23, 25, 31, 32, 32, 32, 32, 33, 36, 43, 44,
44, 45, 51, 52, 57
Step 2: 02 13, 14 20, 23, 25 31, 32, 32, 32, 32,
33, 36 43, 44, 44, 45 51, 52, 57
Step 3:
0 2
1 3 4
2 0 3 5
3 1 2 2 2 2 3 6
4 3 4 4 5
5 1 2 7
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 47
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
From the stem and leaf plot we see that the distribution peaks in the
center and that there are no gaps in the data. For 7 of the 20 days, the
number of patients receiving cardiograms was between 31 and 36. The
plot also shows that the testing center treated from a minimum of 2
patients to a maximum of 57 patients in any one day.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 48
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Chapter 3
A. Measures of Central Tendency
Content:
✦ Mean
✦ Weighted Mean
✦ Median
✦ Mode
✦ Geometric Mean
✦ Harmonic Mean . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 49
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
!
Summation
The Mean is one of the most important and best measures of central
tendency and one of the most common and used in statistical analysis
due to its good statistical properties and characteristics.
Definition
The mean is the sum of the values, divided by the total number of
values. The symbol x̄ represents the sample mean and µ represents
the population mean.
To find the mean of the data, we must differentiate between two cases:
Assume that the number of data (sample size) is n and that the sample
observations are x1 , x2 , ..., xn . Then, the mean (arithmetic average)
is calculated by
!n
x1 + x2 + . . . + xn i=1 xi
x̄ = =
n n
!N
x1 + x2 + . . . + xN i=1 xi
µ= =
N .
.
.
.
.
.
. .
N
. .
. .
. . . . .
. . . .
.
.
. . . .
. . . .
.
.
.
.
.
.
.
.
.
52
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Example(3-1)
The data represent the number of days off per year for a sample of
individuals selected from nine different countries:
20, 26, 40, 36, 23, 42, 35, 24, 30. Find the mean.
Sol:
!9
i=1 xi x1 + x2 + . . . + x9
x̄ = =
n 9
20 + 26 + 40 + 36 + 23 + 42 + 35 + 24 + 30
=
9
276
= = 30.7 (days)
9
• The original data is unknown, but the number of data in each boundary
class limit (class frequency) is known.
• To compute the mean, the midpoint is used as the mean value of all raw
data in each class.
Example (3-2)
Find the mean of the daily wage for a number of workers (in Riyals)
in a factory for a sample of 50 people whose wages are summarized in
the following table.
Boundary class limits Frequency fi
20 − 29 9
30 − 39 12
40 − 49 15
50 − 59 8
60 − 69 4
70 − 79 2
!
Sum . . .=
n . i. =
. . . . f . . 50
. . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 55
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Sol:
First, we construct the following table:
Boundary class limits Midpoints xi Frequency fi xi .fi
20 − 29 24.5 9 220.5
30 − 39 34.5 12 414.0
40 − 49 44.5 15 667.5
50 − 59 54.5 8 436.0
60 − 69 64.5 4 258.0
70 − 79 74.5 2 149.0
! !
Sum n = fi = 50 xi fi = 2145
!k
x f 2145
x̄ = i=1
! i i = = 42.90 SAR
fi 50
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 56
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
n
"
(xi − x̄) = (x1 − x̄) + (x2 − x̄) + . . . + (xn − x̄) = 0
i=1
Sol:
Note that x̄ = 20. Then, using the property of mean, the solution is
x̄ − 11 20 − 11 9
= = =3
3 3 3
3- Assume that we have two samples of data where n1 and x̄1 are
the sample size and the mean for the first sample data
respectively, and n2 and x̄2 are the sample size and the mean for
the second sample data respectively. Then, the mean of
combining these two samples can be calculated by the following
formula: n1 x̄1 + n2 x¯2
x= . . . . . . . . . . . . . . . . . . . .
n1 + n2 . . . . . . . . . . . . . . . . . . . . 58
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
• The mean is found by using all the values of the data and is easily sub-
jected to algebraic operations.
• The mean for the data set is unique and not necessarily one of the data
values.
Demerits of the mean:
Sometimes, you must find the mean of a data set in which not all values
are equally represented. The type of mean that considers in this case is
called the weighted mean.
Definition
Assume we have the sample x1 , x2 , . . . , xn with the corresponding
weights w1 , w2 , . . . , wn . Then, the weighted mean is computed by
!
w1 x1 + w2 x2 + . . . + wn xn n
wi xi
x̄w = = !i=1
w1 + w2 + . . . + wn n
i=1 wi
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 60
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Example(3-4):
Find the weighted mean of a student’s marks, given that weight is the
number of hours, for the courses listed below.
Course Number of hours (wi ) Marks (xi )
Math 3 60
Physics 4 75
Biology 4 82
Chemistry 4 70
Sol:
!n
w1 x1 + w2 x2 + . . . + wn xn wi xi
x̄w = = !i=1
w1 + w2 + . . . + wn n
i=1 wi
(3 × 60) + (4 × 75) + (4 × 82) + (4 × 70) 1088
= = = 72.53 Mark
3+4+4+4 15
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 61
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Remarks:
1- The mean is a weighted mean if class frequencies equal to
weights. i.e.
w1 = f1 , w2 = f2 , . . . , wk = fk
w1 = w2 = . . . = wn = 1
3- The general GPA and the semester GPA for a student at King
Khalid University are two weighted means for points, given that
the hours are the weights.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 62
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
3. Median
The median is the halfway point in a data set. Before you can find
this point, the data must be arranged in order. When the data set is
ordered, the median is the middle, i.e. 50% of data is equal to or less
than the median and 50% of data is equal to or more than the median.
The median either will be a specific value in the data set or will fall
between two values.
definition
The median is the midpoint of the data array. The symbol for the
median is MD.
. . . . . . . . . . . . . . . . . . . .
63
definition
. . . . . . . . . . . . . . . . . . . .
To find the median for a raw data, follow the following steps:
Step 1:Arrange the data in order.
Step 2: Select the middle value.
Step 3: If the sample size n is odd, then the median will be the
actual value in the middle. If the sample size n is even, then the
median will fall between two given values.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 64
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Example (3-5):
The following two sample data sets are the heights (in cm) of the
science college students:
Sample 1: 130, 145, 138, 142, 160, 158, 148
Sample 2: 135, 130, 145, 138, 142, 160, 158, 148
Sol:
Sample 1: 130, 138, 142, 145 , 148, 158, 160
Since n = 7, then MD = 145 cm.
Sample 2: 130, 135, 138, 142, 145 , 148, 158, 160
142+145
Since n = 8, then MD = 2 = 143.5 cm.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 65
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Second: Median for grouped data
There are two methods to calculate the median for the summarized data
in a frequency table:
A- Computational method: We follow these steps:
Step 1: Construct the cumulative frequency table.
n
Step 2: Determine 2, one- half of the total number of samples.
Then, locate the median class on the cumulative frequency by us-
ing n2 .
Step 3: Determine the following:
A: The lower limit of the median class.
L: The class width of the median class.
F1 : The previous frequency of the median class (before n2 ).
F2 : The next frequency of the median class (After n2 ).
.
.
.
.
.
.
. .
. .
. .
. . . . .
. . . .
.
.
. . . .
. . . .
.
.
.
.
.
.
.
.
.
66
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Example(3-6):
Use Example (3-2) to find the median value, computationally, of
worker daily wages (in Riyals).
Sol:
Note that
n 50
= = 25
2 2
Now, we construct the cumulative frequency table for Example (3-2)
as follow. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 67
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Daily wages Cumulative frequency
Less than 19.5 0
Less than 29.5 9
Less than A = 39.5 21 = F1 ↙25
Less than 49.5 36 = F2
Less than 59.5 44
Less than 69.5 48
Less than 79.5 50
40
30
25
20
10
MD≈42
19.5 29.5 39.5 49.5 59.5 69.5 79.5
Daily wage
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 70
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Definition
The value that occurs most often in a data set is called the mode. The
symbol for mode is MOD.
To find the mode of the data, we must differentiate between two cases:
• Mode for raw data.
• Mode for grouped data.
In the following, we discuss the above two cases in details.
First: Mode for raw data
For the raw data, we look for the value that occurs most often in the
data set. This value is the mode.
Example (3-8):
Find the mode for the following data sets:
Data set 1: 65, 55, 31, 65, 48, 65
Data set 2: 11, 23, 14, 17, 25
Data set 3: 33, 35, 33, 24, 28, 31, 29, 24 . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 73
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Sol:
There are two methods to calculate the median for the summarized data
in a frequency table:
A- Computational method: We follow these steps:
• Step 1: From the frequency table, identify the class that has high-
est frequency. Denote the frequency of the modal class by F .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 74
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Example(3-9):
Use Example (3-2) to find the mode value, computationally, of
worker daily wages (in Riyals).
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 75
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Sol:
= 42.5 SAR . . . . . . . . . . . . . . . . . . . . 76
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
B- Graphical method:
To find the mode graphically, use the following steps:
• Step 1: Plot the histogram of the given grouped data.
• Step 2: Identify the modal class and the bar representing it.
• Step 3: Draw lines from the top corners of the modal class bar to
the near corners of the neighboring bars.
• Step 4: Draw a perpendicular line from the intersection of the two
lines until it touches the horizontal axis. Then, read the mode from
the horizontal axis (x-axis).
Example(3-10):
Use Example (3-2) to find the mode value, graphically, of worker
daily wages (in Riyals).
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 77
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Sol:
16
14
Frequency (Number of workers)
12
10
Daily wage
From the green circle on the x-axis of the above figure, the mode is
approximately equal to 42 SAR. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 78
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Mode − Mean
= Median − Mean
3 . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 80
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Definition
The geometric mean (GM) is defined as the nth root of the product of
n values.
√
G.M = n
x1 x2 . . . xn
The above formula is not easy to compute when the sample size is large
enough. To overcome this problem, we take the logarithm of both sides,
i.e.,
√ 1
log G.M = log(x1 x2 . . . xn ) = log(x1 x2 . . . xn ) n
n
% n &
1 1 "
= log(x1 x2 . . . xn ) = log xi
n n i=1
Hence,
% n &
1 "
G.M = 10M , where M= log xi
n i=1
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 83
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Example (3-10):
Find the GM for 8, 4, 2.
' √
3
Sol: G.M = 3
(2)(4)(8) = 64 = 4
or
1
log G.M = (log 2+log 4+log 8) = 0.60206 ⇒ G.M = 100.60206 = 4.
3
Remark:
The previous example can also be solved using the natural logarithm
as follows:
ln G.M = 13 (ln 2 + ln 4 + ln 8) = 1.386 ⇒ G.M = exp(1.386) = 4.
Example (3-10):
If a person receives a 20% raise after 1 year of service and a 10%
raise after the second year of service. Find the average percentage
raise per year.
Sol: Note that the average percentage raise per year is not 15 but
14.89%, as shown,
(
G.M = 2
(1.20)(1.10) = 1.1489
To find geometric mean for grouped data, follow these two steps:
Step 1: Find the midpoint for boundary classes: x1 , x2 , . . . , xk .
Step 2: Compute geometric mean for grouped data by the
following formula: .
.
.
.
.
.
. .
. .
. .
. . . . .
. . . .
.
.
. . . .
. . . .
.
.
.
.
.
.
.
.
.
85
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
(
G.M = xf11 xf22 . . . xfkk ,
n
(1)
!k
where f1 , f2 , . . . , fk are the class frequencies and n = i=1 fi
When the sample size is large enough, we may prefer to work with
logarithm. So, taking logarithm of equation (1), we get
#( $ ) *1
log G.M = log xf11 xf22 . . . xfkk = log xf11 xf22 . . . xfkk
n n
% &
1 ) * 1 "k
= log x1 x2 . . . xk =
f1 f2 fk
fi log xi
n n i=1
% &
1"
k
∴ GM = 10M , where M= fi log xi
n i=1
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 86
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Definition
The harmonic mean (HM) is defined as the number of values divided
by the sum of the reciprocals of each value.
Remark:
The harmonic mean is useful for finding the average speed. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 89
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Example(3-11):
Find the harmonic mean for 8, 4, 2.
Sol:
Note that n = 3. Then
3
1 1" 1 1 1 1 1 1 7 7
= = ( + + )= × =
H.M 3 i=1 xi 3 2 4 8 3 8 24
Hence,
24
H.M = = 3.43
7
Example(3-12):
Suppose a person drove 100 miles at 40 miles per hour and returned
driving 50 miles per hour. Find the average miles per hour. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 90
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Sol: Note that The average miles per hour is not 45 miles per hour,
which is found by adding 40 and 50 and dividing by 2. The average is
found as shown below
2
HM = 1 1 = 44.44 miles.
40 + 50
1
f1
+ f2
+ ... + fk
1 f1 f2 fk 1" k
fi
= = ( + + ... + ) =
x1 x2 xk
H.M n n x1 x2 xk n i=1 xi
n
∴ H.M = !k fi
i=1 xi
Sol:
To simplify the solution, we construct the following table.
% k &
1 1 " fi 1 50
= = × 1.288 =⇒H.M = = 38.82 SAR
H.M n i=1
xi 50 1.288
Note that, from the previous results, we see the following result:
HM ≤ GM ≤ x̄
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 93
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
B. Measures of Position
Quartiles, Deciles and Percentiles
Remember that the median is the midpoint of the arranged data array.
50% 50%
Smallest |...................................|...................................| Largest
MD
Quartiles
divide the distribution into four groups, separated by Q1 , Q2 , Q3 .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 95
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
The above division points are:
Q1 = 1st quartile = the value with 25% of the data below it, and its rank = n
4
2n
Q2 = 2nd quartile = the value with 50% of the data below it, and its rank = 4
3n
Q3 = 3th quartile = the value with 75% of the data below it, and its rank = 4
Deciles
divide the distribution into 10 groups, as shown below.
1. M ed = Q2 = D5 = P50
2. Q1 = P25 , Q3 = P75
kn
4. Quartile number k is denoted by Qk and its rank is 4 .
kn
5. Decile number k is denoted by Dk and its rank is 10 .
kn
6. Percentile number k is denoted by Pk and its rank is 100 .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 100
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Example (3-13):
A teacher gives a 20-point test to 10 students. The scores are:
18, 15, 12, 6, 8, 2, 3, 5, 20, 10
1. Find the 3th quartile.
2. Find the 25th percentile.
3. Find the 5th decile.
4. Find the percentile rank of a score of 12.
Sol:
Arrange the data in order from lowest to highest:
6 + 0.5
= × 100% = 65th percentile.
10
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 102
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Example(3-14):
Use Example (3-2) data that represents the worker daily wages (in
Riyals) to find:
(i) D2
(ii) P99
Sol:
2n 2 × 50 99n 99 × 50
R(D2 ) = = = 10, R(P99 ) = = = 49.5
10 10 100 100
3 Use the table to locate the positions of R(D2 ) and R(P99 ) and
then compute D2 and P99 .
Daily wages Cumulative frequency
Less than 19.5 0
Less than 29.5 9 = F1
Less than 39.5 21 = F2
Less than 49.5 36
Less than 59.5 44
Less than 69.5 48 = F1
Less than 79.5 50 = F2
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 105
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
(i) A = 29.5, L = 39.5 − 29.5 = 10, F1 = 9, F2 = 21
% & # $
2n
10 − F1 10 − 9
D2 = A + × L = 29.5 + × 10 = 30.33
F2 − F1 21 − 9
Sol:
Note the cumulative frequency table is already constructed in
Example(3-6). Therefore, the ogive is plotted as following:
50
Frequency (Number of workers)
40
30
25
20
10
D2 ≈ 30 P99 ≈ 77
19.5 29.5 39.5 49.5 59.5 69.5 79.5
Daily wage
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 107
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Remark
The unit of central tendency measures as well as quartiles, deciles,
and percentiles are the same as the original unit of data. If the data
unit is in Riyal, then the unit of all measures of central tendency as
well as quartiles, deciles, and percentages is in Riyal.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 108
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 108
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Chapter 4
Measures of Dispersion
Content:
✦ Range and Semi-interquartile range
✦ Mean deviation
✦ Standard deviation
✦ Coefficient of variation
✦ Chebychev’s Inequality
✦ Z-score . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 109
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Range
First: Range for raw data
The range is useful for showing the spread within a dataset and for
comparing the spread between similar datasets. .
.
.
.
.
. . .
. .
. .
. .
. . .
. . . .
.
.
. . . .
. . . .
.
.
.
.
.
.
.
.
.
110
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Second: Range for grouped data
There two methods for calculation the range in the case of grouped data:
Example (4-1):
Find the range for the 30 students marks in the statistics course shown in the
following table:
Classes 24-28 29-33 34-38 39-43 44-48 49-53
Frequency 3 4 7 6 8 2
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 111
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Sol:
We notice that the two methods give two different answers for the
range. The first method is preferable in calculating the range.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 112
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Merits of Range:
Demerits of Range:
Semi-interquartile range
We already knew that the range is affected by extreme values and, be-
cause of this, it is not a reliable measure in describing the dispersal of
the nature of the data. Therefore, there is a need to find another mea-
sure that is not affected by extreme values; the extreme values from the
top and the bottom. This measure is the semi-interquartile range.
To find the semi-interquartile range for a raw data, follow these steps:
Step 1: Arrange the data in order.
Step 2: Calculate first and third quartiles; Q1 and Q3 .
Step 3: Compute the semi-interquartile range by Q =
. . . . . . . . . . . .
Q3 −Q1
. . . .2 . . . .
. . . . . . . . . . . . . . . . . . . . 114
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Example (4-2):
Find the semi-interquartile range for the following two samples:
Sample 1: 22, 24, 36, 21, 25, 30, 20, 28
Sample 2: 21, 20, 25, 17, 19, 15, 22, 18, 23, 24
Sol:
Sample 1: order the data first:
21+22 43 28+30 58
Then, Q1 = 2 = 2 = 21.5 and Q3 = 2 = 2 = 29. Hence,
Q3 − Q1 29 − 21.5 7.5
Q= = = = 3.75
2 2 2
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 115
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Q3 − Q1 23 − 18 5
Q= = = = 2.5
2 2 2
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 116
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Second: Semi-interquartile range for grouped data
Exercise (4-1):
Use Example (4-1) to find the semi-interquartile range.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 117
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
2. Mean deviation
The sum of deviations of the items from their arithmetic mean is always
zero. To avoid this property, we need to study the mean deviation.
Mean deviation
is the mean of the absolute deviations of a set of data about the data’s
mean. The symbol for mean deviation is MAD.
1" n
MAD = |xi − x̄|
n i=1 .
.
.
.
.
.
. .
. .
. .
. . . . .
. . . .
.
.
. . . .
. . . .
.
.
.
.
.
.
.
.
.
118
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Example (4-2):
Find the mean deviation for 14, 16, 10, 8, 2.
Sol:
We should compute the mean first:
!5
i=1 xi 2 + 8 + 10 + 16 + 14 50
x̄ = = = = 10.
5 5 5
1" n
MAD = |xi − x̄|
n i=1
1
= [|2 − 10| + |8 − 10| + |10 − 10| + |16 − 10| + |14 − 10|]
5
1 20
= [8 + 2 + 0 + 6 + 4] = = 4.
5 5 . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 119
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
1" k
MAD = fi |xi − x̄|
n i=1
Sol:
We construct the following table.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 120
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Classes Midpoints xi Frequency fi fi xi xi − x̄ fi |xi − x̄|
24 − 28 26 3 78 −13 39
29 − 33 31 4 124 −8 32
34 − 38 36 7 252 −3 21
39 − 43 41 6 246 2 12
44 − 48 46 8 368 7 56
49 − 53 51 2 102 12 24
!
30 1170 184
!k
i=1 fi xi 1170
x̄ = = = 39.
n 30
!k
i=1 fi |xi − x̄| 184
M AD = = = 6.13
n 30
Standard Deviation
is the square root of the variance. The symbol for the population
standard deviation is σ.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 123
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
1 !n
The variance is: S 2 = n−1 i=1 (xi − x̄)2 = 72/6 = 12
√
The standard deiavtian is: S = 12 = 3.46
Proof:
1 " 1 " 2
S2 = (x − x̄)2 = (x − 2xx̄ + x̄2 )
n−1 n−1
1 ) " " *
2 2
= x − 2x̄ x + nx̄
n−1
0 #! $ " # ! $2 1
1 " x x
= x2 − 2 x+n
n−1 n n
." ! 2/
1 ( x)
= x2 −
n−1
. . . . . . . . . . . . . . . . . . . .
n . . . . . . . . . . . . . . . . . . . . 126
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Example (4-5):
Use Example (4-4) and the formula in (3) to find the variance and
standard deviation.
Sol:
xi x2i
12 144
15 225
11 121
17 289
18 324
20 400
19 361
! ! 2
xi = 112 xi = 1864
. ! 2/ 2 3
1 ! ( 1 (112)2
2
S = n−1 2 = 72
x)
x − n = 7−1 1864 − 7 6 = 12
√ √
S= S2 = 12 = 3.46
. . . . . . . . . . . . . . . . . . . .
Remark:
A variance is a positive number equal to zero when all observations
are equal and its value increases whenever the observation variations
increase.
Example (4-6):
The following samples represent the scores of 3 students in the first
monthly test of Statistics. Find the mean and variance. What do you
see?
Sol:
Sample 1: 10, 10, 10; x̄ = 10, S2 = 0
Sample 2: 8, 10, 12; x̄ = 10, S2 = 4
Sample 3: 4, 10, 16; x̄ = 10, S 2 = 36
Note that although the mean for all samples is equal, the sample varia-
. . . . . . . . . . . . . . . . . . . .
2 1 "2 ¯
32 1 " ¯2
S = (d ± c) − (d ± c) = (d − d)
n−1. /
n−1
1 " 1 " 2
) *
= d2 − d (4)
n−1 n
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 129
Dr. Ibrahim Almanjahie Principles of Statistics and Probability "
% &2
2 1 " d d¯ 1 1 " ¯2
Sx = − = (d − d)
n−1 c c c n−1
2
1 2 1
Sx2 = S → Sx = Sd
c2 d |c|
2
∴ Snew = c2 Sx2 → Snew = |c| Sx
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 130
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
"
Property 3: The sum of squared deviations of the values from their
mean x̄ is smaller than the sum of the squared deviations of the values
from another assumed mean a where a ! x.
Proof:
" "
(x − a)2 = (x + x̄ − x̄ − a)2
"
= [(x − x̄) + (x̄ − a)]2
" "
= (x − x̄)2 − 2(x̄ − a) (x − x̄) + n(x̄ − a)2
"
= (x − x̄)2 + n(x̄ − a)2
"
> (x − x̄)2
! !
This implies that (x − x̄)2 < (x − a)2
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 131
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Property 4: The variance for combining two samples with sizes n1 and
n2 , variances S12 and S22 respectively, and x̄1 = x̄2 is
1 " 1 "
S12 = (xi − x̄)2 , S22 = (yi − x̄)2
n1 − 1 n2 − 1
" "
S12 (n1 − 1) = (xi − x̄)2 , S22 (n2 − 1) = (yi − x̄)2
" "
S12 (n1 − 1) + S22 (n2 − 1) = (xi − x̄) + 2
(yi − x̄)2 ,
+n2
n1"
= (xi − x̄)2 ; xi = yi , i > n1
i=1
S12 (n1 − 1) + S22 (n2 2
− 1) = S (n1 + n2 − 1), . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 132
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
This implies that
"
Example (4-6):
Find the variance and standard deviation after merging the following
two samples:
Sample 1: n1 = 5, x̄1 = 4, S12 = 3.5
Sample 2: n2 = 6, x̄2 = 4, S22 = 3
Sol:
If we have a sample data with size n and these data are summarised in
a frequency distribution table where:
• The number of classes is k.
• Midpoints are: x1 , x2 , . . . , xk .
• Class frequencies are: f1 , f2 , . . . , fk .
Then, the variance is computed by using one of the following formulas.
!k !
2 i=1 fi (xi − x̄)2 fi xi
S = , where x̄ =
n−1 n
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 134
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
% ! &
1 " ( fi xi )2
S2 = fi xi2 −
n−1 n
% ! &
1 " ( f d )2
S2 = fi di2 − where d = cx
i i
,
n−1 n
The standard deviation is obtained by taking the square root for the
variance.
Example (4-7):
Use Example (4-1) to find the variance and standard deviation.
Sol:
Construct the following table:
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 135
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Whenever two samples have the same units of measure, the variance
and standard deviation for each can be compared directly. But what if
the units of the two samples are different.
Definition
The coefficient of variation is a relative measure, denoted by CV, and
defined as the standard deviation divided by the mean.
The CV can be also computed by using the first and second quartiles
as
Q3 − Q1
C.V =
Q3 + Q1
Sample with the largest coefficient of variation has a greater relative
dispersion, i.e. it is less homogeneous and vice versa.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 138
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Uses of the coefficient of Variation:
• To assess the precision of a technique.
• Used as a measure of variability when the standard deviation is
proportional to the mean.
• To compare the variability of measurements made in different units.
• Data values must be positive.
• The arithmetic mean must be greater than zero.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 139
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Example (4-8):
The mean of the number of sales of cars over a 3-month period is 87,
and the standard deviation is 5. The mean of the commissions is
$5225, and the standard deviation is $773. Compare the variations of
the two.
Sol:
The coefficients of variation are
sx 5
CV = = = 0.057 sales
x̄ 87
sy 773
CV = = = 0.148 commissions
ȳ 5225
Sol:
Find first the mean and standard deviation for each variable. The mean
and standard deviation for the heights, say X, are
! 4
x 1267 1 "
x̄ = = = 158.38, Sx = (x − x̄)2 = 6.78
n 8 n−1
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 141
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Sx 6.78
C.Vx = = = 0.043
x̄ 158.38
Sy 10.41
C.Vy = = = 0.154
ȳ 67.63
Note that C.Vy > C.Vx . Since the coefficient of variation is larger for
weights, the weights are more variable (or dispersal) than the heights.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 142
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
5. Chebyshev’s Theorem
Theorem
5 1
6
At least 1 − k2 of data lie with ±k standard deviation, S, from the mean,
x̄, i.e. (x̄ − kS, x̄ + kS), regardless of the shape of the distribution.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 143
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Example (4-10):
It was found that the mean amount of vitamin C in a particular type of
fruit was 0.24 mg with a standard deviation of 0.004 mg. What is the
minimum percentage of fruits that contain this amount of vitamins
and fall in (0.232, 0.248) mg?
Sol:
Note that x̄ = 0.24 and S = 0.004. Then
Definition
A z score or standard score for a value is obtained by subtracting the
mean from the value and dividing the result by the standard deviation.
The symbol for a standard score is z. Mathematically,
Example (4-11):
A student scored 83, 85, 80 in statistics, mathematics, physics test
respectively that had means 73, 71, 68 and standard deviations 5, 8,
11 respectively. Compare his relative positions on these three tests.
Sol:
For let z1 , z2 , z3 denote the student scores in statistics, mathematics and
physics respectively. Then
x1 − x̄1 83 − 73 10
z1 = = = = 2,
S1 5 5
x2 − x̄2 85 − 71 14
z2 = = = = 1.75,
S2 8 8
x3 − x̄3 80 − 68 12
z3 = = = = 1.09.
S3 11 11
Since the z score for statistics is the largest, his relative position in the Stat
class is higher than his relative position in the. other
. . .
. . classes.
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . 146
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Chapter 5
Correlation and Simple Linear Regression
Content:
✦ Introduction
✦ Scatter plot
✦ Pearson’s coefficient of linear correlation
✦ Spearman’s rank correlation coefficient
✦ Coefficient of Association, Coefficient of Contingency and Kendall
Rank Coefficient
✦ Simple linear regression .
.
.
.
.
.
.
. . . .
. . .
. .
. .
. .
. . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
147
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
1. Introduction
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 149
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
2. Scatter diagrams
Question: How can the linear correlation strength between the two vari-
ables (X, Y ) be measured?
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 151
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Suppose that X and Y are taken from two populations with size N ,
where
X = x1 , x2 , . . . , xN
Y = y1 , y 2 , . . . , y N. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 152
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
The Pearson’s coefficient of linear correlation r between X and Y is
given by:
!N
1 i=1 (xi − µx )(yi − µy )
r= ,
N σx σy
where:
N : The population size.
µx : means for the variable X.
µy : means for the variable Y .
σx : standard deviation for the variable X.
σy : standard deviation for the variable Y .
For a sample with size n, The Pearson’s coefficient of linear correlation
r is given by:
!n
1 i=1 (xi − x̄)(yi − ȳ)
r=
n−1
. . . . . . . . . . . . . . . . . . . .
S x Sy. . . . . . . . . . . . . . . . . . . . 153
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
• −1 ≤ r ≤ 1
• r = 1 indicates perfect positive correlation between X and Y , while
r = −1 indicates perfect negative correlation.
• The closer the value is to +1 or −1, the stronger the linear correlation.
Based on |r|, Evans suggested (1996) the following table as a guideline for
describing the strength of correlation:
|r| values 0.00 − 0.19 0.20 − 0.39 0.40 − 0.59 0.60 − 0.79 0.80 − 1.00
Types of correlation very weak weak moderate strong very strong
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 154
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Remark 1: Note that:
• The correlation coefficient does not relate to the gradient beyond sharing
its +ve or −ve sign!
Remark 3:
Calculating the correlation coefficient r requires calculating Sx , Sy ,
x and y. These calculations make determining r not simple, so the r
formula can be simplified as follows:
! ! !
n xi yi − xi yi
r= ( ! 2 ! ! !
(n xi − ( xi )2 )(n yi2 − ( yi )2 )
where:
n number of pairs (xi , yi ).
!
xi yi sum of X times Y .
!
xi sum of X.
!
yi sum of Y .
! 2
xi Sum of the squares of the variable X.
! 2
yi Sum of the squares of the variable Y .
. .
. . .
.
.
. . . .
. . .
. .
. .
. .
. . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
156
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Example (5-1):
Use the Pearson method to find the correlation coefficient between the
expense (X) and the daily spending (Y ) in riyals for seven students
shown their data in the following table:
Expensing (X) 18 20 12 13 14 15 16
Spending (Y ) 14 18 11 12 14 12 14
Sol:
Use the following steps to simplify the solution:
• Subtract a constant number, say a = 11, from X and Y values.
(This is an optional step.)
• Construct the following table.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 157
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
2 2
xi yi x′i = xi − 11 yi′ = yi − 11 x′i yi′ x′i yi′
18 14 7 3 21 49 9
20 18 9 7 63 81 49
12 11 1 0 0 1 0
13 12 2 1 2 4 1
14 14 3 3 9 9 9
15 12 4 1 4 16 1
16 14 5 3 15 25 9
!
31 18 114 185 78
! ! !
x′i yi′ − x′i yi′
n
r= ( ! ! ! !
(n x′i 2 − ( x′i )2 )(n yi′ 2 − ( yi′ )2 )
7 × 114 − 31 × 18 240
= ' = = 0.88
(7 × 185 − (31)2 )(7 × 78 − (18)2 ) 272.3
That means, there is a very strong positive correlation between the student’s
. . . . . . . . . . . . . . . . . . . .
expense and daily spending. . . . . . . . . . . . . . . . . . . . . 158
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Second: Pearson’s coefficient of linear correlation for grouped data
Sol:
Use the following steps to simplify the solution:
• Find the midpoints for all classes. Let X represent the mathe-
matics marks and Y the statistics marks. Note that midpoints are
equals for both subjects. Hence,
xi = yi =: 54.5, 64.5, 74.5, 84.5, 94.5 i = 1, 2, . . . , 5
• The class width is L = 10. Therefore, subtract a constant number,
say a = 74.5, from X and Y midpoints, and then divide by L. For
xi − 74.5 yi − 74.5
X, ui = and for Y vi = .
10 10
• Construct the following table.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 160
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
The numbers in red in the small squares are calculated by taking the
product of ui , vi and fui vi . For example, the number 12 is obtained by
−2 × −2 × 3 = 12. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 161
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
(30 × 36) − (2 × 2)
= '
[(30 × 48) − 22 ][(30 × 44) − 22 ]
1076 1076
= √ = = 0.78
1436 × 1316 1374.69
Remark: Rank refers to finding the order of readings for the two vari-
ables (X, Y ) with each reading remaining in its position; ranking can
be descending in order or ascending in order.
Example (5-3):
Find the ranks of X whose its values are given below:
X 14 10 12 8 3 5 6
X 14 10 12 8 3 5 6
Rank X 7 5 6 4
. . .
1
. . . .
2 . . . .
3 . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 164
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Example (5-4):
Find the ranks of the following grades:
B, C, B, F, D, D, A
Sol:
Rank the grades by arranging them from the lowest grade to the highest
grade:
F, D, D, C, B, B, A
Assign ranks 1, 2, 3, 4, 5, 6, 7 to the above grades. Note that there are
two “D” with different ranks. In this case, we assign to each one the
2+3
average of ranks; that is = 2.5, and do the same for the grades
2
“B”. The final solution is
Grades X B C B F D D A
Ranks X 5.5 4 5.5 1
. . .
2.5
. . . . . .
2.5
. . .
7
. . . . . . . .
. . . . . . . . . . . . . . . . . . . . 165
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Remark:
All of Pearson’s correlation coefficient properties apply to Spearman’s
rank correlation coefficient. The value of rs in the case of quantitative
data is much closer to r, but the rs is distinguished by ease and accuracy,
especially when the value pairs are less than n = 30.
.
.
.
.
.
.
.
. . . .
. . .
. .
. .
. .
. . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
166
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Example (5-5):
The following data represents the grades of eight students in
chemistry and physics.
Chemistry A B D F C D F B
Physics A C F D C D F B
Compute the correlation coefficient for students’ grades in chemistry
and physics?
This means that there is a very strong positive correlation between the
two subjects.
Example (5-6):
Compute the Spearman rank correlation coefficient between X and Y .
X 0 1 2 3 4 5 2
Y −1 2 2 8 4 14 5
This means that there is a very strong positive correlation between the
. . . . . . . . . . . . . . . . . . . .
variables X and Y . . . . . . . . . . . . . . . . . . . . . 169
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
X
x1 x2
Y
y1 a b
y2 c d
Smoking
smoker Non-smoker
Cancer
Infected 55 10
Not infected 5 30
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 172
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Sol:
Coefficient of Contingency:
Suggested by Cramer in 1946. It is used to measure the correlation
strength between two qualitative variables, and each variable is divided
into more than two types (that is, the table contains more 4 cells).
The coefficient of contingency is computed by the following steps:
1- Suppose we have (X) with r categories, and (Y ) with s cate-
gories. The following table shows the contingency between the
two variables:
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 173
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Y
y1 y2 ... ys Sum
X
x1 f11 f12 ... f1s f1.
x2 f21 f22 ... f2s f2.
.. .. .. .. ..
. . . ... . .
xr fr1 fr2 ... frs fr.
Sum f.1 f.2 ... f.s f..
2- Calculate B by
2
f11 2
f12 frs2
B= + + ··· +
f.1 × f1. f.2 × f1. f.s × fr.
Children's eyes
Black Green Brown
Father's eyes
Black 2 4 4
Green 3 1 6
Brown 5 2 3
Sol:
. . . . . . . . . . . . . . . . . . . .
We add the rows and columns as follows: . . . . . . . . . . . . . . . . . . . . 175
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Children's eyes
Black Green Brown Sum
Father's eyes
Black 2 4 4 10
Green 3 1 6 10
Brown 5 2 3 10
Sum 10 7 13 30
Now, calculate B:
22 32 52 42 12
B= + + + +
10 × 10 10 × 10 10 × 10 7 × 10 7 × 10
22 42 62 32
+ + + +
7 × 10 13 × 10 13 × 10 13 × 10
= 1.15
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 176
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Then,
4 7 7
B−1 1.15 − 1 0.15
rc = = = = 0.36
B 1.15 1.15
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 177
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Sol:
1- Arrange referee 1, say X, ranks in natural order and put the cor-
responding referee 2, say Y , ranks underneath.
2- Starting from the smallest X rank, calculate the number of ranks
to the right of each of Y and smaller than it. Adding the results
gives the value of Q.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 179
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Artistic E C F G H J B I D A
Referee 1 1 2 3 4 5 6 7 8 9 10
Referee 2 2 1 4 3 6 5 10 7 9 8
# of small ranks to the right 1 0 1 0 1 0 3 0 1 0
4Q 4×7
τ =1− =1− = 0.69
n(n − 1) 10 ∗ (10 − 1)
y = β0 + β1 x
where each x and y represent the values of the random variables X
and Y respectively; β1 represents the regressor factor of Y on X. This
factor is also known as the slope of the linear regression, and β0 is the
cut section of the y-axis; called the intercept. .
.
.
.
.
.
.
. . . .
. . .
. .
. .
. .
. . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
181
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
In the linear regression, if the value of x is given, the value of y can be esti-
mated. In this case, we use ŷ to distinguish it from the real value of y. The
linear regression can be rewritten as
ŷ = βˆ0 + βˆ1 x + ϵ
The above regression is simple because there is one independent variable that
is used to predict the dependent variable.
Remarks:
• ŷ is the predicted value of the dependent variable (y) for any given value
of the independent variable (x).
• β0 is the intercept, the predicted value of y when the x = 0.
• β1 is the regression coefficient (slope) – how much we expect y to change
as x increases.
• ϵ is the error of the estimate, or how much variation there is in our esti-
mate of the regression coefficient. . . . . . . . . . . . . . . . . . . . .
182
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 183
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Note that:
the linear regression of X on Y is:
x̂ = β0′ + β1′ y + ϵ
X 3 2 1 1 5 6 1 4
Y 31 44 60 70 18 17 71 29
1- Plot the data and comment on its behaviour.
2- Find the linear regression equation of Y on X.
3- Find the linear regression equation of X on Y .
4- How much the selling price for a 2.5 year-old car?
5- Plot the data with the regression line.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 185
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Sol:
40
30
20
1 2 3 4 5 6
We see from the above scatter plot that there is a negative correlation
between the car age and the selling price. .
.
.
.
.
.
.
. . . .
. . .
. .
. .
. .
. . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
186
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
To find the simple linear regression, we first construct the following
table:
xi yi xi yi x2i yi2
3 31 93 9 961
2 44 88 4 1936
1 60 60 1 3600
1 70 70 1 4900
5 18 90 25 324
6 17 102 36 289
1 71 71 1 5041
4 29 116 16 841
! !
xi = yi = ! ! !
xi yi = 690 x2i = 93 yi2 = 17892
23 340
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 187
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
ŷ = −10.698x + 73.256 . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 188
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
3- To find the linear regression of X on Y , we estimate first the value
of β0′ and β1′ as follows.
! ! !
−
βˆ1′ =
n xi y i xi yi
! 2 ! 2
n y i − ( yi )
(8 × 690) − (23 × 340) −2300
= = = −0.08353
8 × 17892 − (340)2 27536
! !
βˆ0 = − βˆ1
′ xi ′ yi
n n
23 340
= − βˆ1′
8 8
= 2.875 − (−0.08353) × (42.5) = 6.425
x̂ = −0.08353y + 6.425
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 189
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
4- To find the selling price of a 2.5 year-old car, we use the first re-
gression as:
5- The following figure shows the line of best fit using linear regres-
sion of Y on X.
70
60
Selling Price
50
40
30
20
1 2 3 4 5 6
Content:
✦ Introduction
✦ Random Experiment
✦ Sample Space
✦ Events
✦ Probability
✦ Axioms of Probability .
.
.
.
.
.
.
. . . .
. . .
. .
. .
. .
. . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
191
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Introduction
Random Experiment
is an experiment whose outcomes cannot be predicted with certainty. How-
ever, in most cases the collection of every possible outcome of a random ex-
periment can be listed.
Sample Space
Sample space
A sample space of a random experiment is the collection of all possible
outcomes. The symbol Ω is used to denote the sample space, and the
number of outcomes is denoted by n(Ω)
Examples:
• In the experiment of tossing of a coin: Ω = {H, T }.
• In the experiment of checking a lamp: Ω = {0, 1}, where 0 is for
non-defective and 1 is for defective.
• In the experiment of tossing a die: Ω = {1, 2, 3, 4, 5, 6}.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 194
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Example (6-1):
Find the sample space for tossing a coin three times.
Sol:
Use the tree diagram to find the solution.
Example (6-2):
Find the sample space for tossing two dice in the same times. What is the
number of all possible outcomes.
Sol:
We can use the Cartesian Product to find the solution as
Ω = {1, 2, 3, 4, 5, 6} × {1, 2, 3, 4, 5, 6}
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 196
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
From the diagram, we have
8 9
Ω = (1, 1), (1, 2), (1, 3), . . . , (6, 3), (6, 4), (6, 5), (6, 6)
Events
Event
An event A is a subset of the sample space Ω.
• A is an event iff A ⊆ Ω.
• The event A occurs if it belongs to the sample space outcomes.
• The number of outcomes for the event A is denoted by n(A).
• Impossible event is the empty set φ where φ ⊆ Ω.
• Sure event is the Ω where Ω ⊆ Ω.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 198
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Example (6-3):
In the experiment of tossing a fair coin two times, find all possible
outcomes for the following events and the numbers of elements:
Sol:
Example (6-4):
In the experiment of tossing a die two times, find all possible outcomes for
the following events and the numbers of elements. Denote the first and
second toss outcomes by x and y respectively.
A = {(x, y) : x + y < 4}
B = {(x, y) : x = y}
C = {(x, y) : x = 5}
D = {(x, y) : x + y = 1}
Ω
A B
A∩B
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 201
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Intersection: A ∩ B
The event that occurs, when both A and B occur simultaneously.
Mathematically,
A ∩ B = {x ∈ Ω : x ∈ A and x ∈ B}
Ω
A B
A∩B
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 202
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Complement: Ac
The event that occurs, when A does not occur. Mathematically,
Ac = Ā = {x ∈ Ω : x " A}
Ω
A
Ac
A − B = {x ∈ Ω : x ∈ A and x " B}
Ω
A B
(Ac )c = A, φc = Ω, Ω c = φ, A∩A=A
A ∪ A = A, A ∩ Ω = A, A ∪ Ω = Ω, A⊆B ⇒A∩B =A
A ∪ φ = A, Ac = Ω − A, A ∩ φ = φ, A⊆B ⇒A∪B =B
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 204
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Corollary:
A − B = A ∩ Bc
(A ∪ B)c = Ac ∩ B c
(A ∩ B)c = Ac ∪ B c
∪ni=1 Ai = A1 ∪ A2 ∪ . . . ∪ An = Ω
Example (6-5):
In the experiment of tossing a die, what do you conclude for the following events?
Sol:
Exercise (6-6):
Use Example (6-1) to find the following:
1- Find all possible outcomes and the number of elements for :
A ∩ B, A ∪ C, Ac ∪ B c , (A ∩ B)c , A ∩ B c
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 208
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Probability
A = {T T }, n(A) = 1
1
Hence, the probability of getting tail twice is P (A) = n(A)
n(Ω) = 4
Remark:
The classical definition of probability is conceptually simple for many
situations. However, it is limited, since many situations do not have
finitely many equally likely outcomes. For examples, tossing a weighted
die is an example where we have finitely many outcomes, but they are
not equally likely. Studying people’s incomes over time would be a sit-
uation where we need to consider infinitely many possible outcomes.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 210
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Relative Frequency Probability:
If an experiment is repeated an extremely large number of times and
a particular outcome occurs If an experiment is repeated an extremely
large number of times, say n times, under the same conditions and the
number of event A to occur is r(A), the probability P (A) is defined by
r(A)
P (A) = lim
n→∞ n
Remark:
The relative frequency probability covers more cases than classical.
However, repeating the identical experiment an infinite number of times
is physically impossible.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 211
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
P (A ∩ B c ) = P (A) − P (A ∩ B) ⇐⇒ P (A) = P (A ∩ B) + P (A ∩ B c )
Example (6-8):
Tickets numbered 1 to 20 are mixed up and then a ticket is drawn at
random. What is the probability that the ticket drawn has a number
which is a multiple of 3 or 5?
Sol:
Here, Ω = {1, 2, 3, 4, . . . , 19, 20}.
Let E1 = event of getting a multiple of 3 = {3, 6, 9, 12, 18}.
Let E2 = event of getting a multiple of 5 = {5, 10, 15, 20}. Hence,
5 4 9
P (E1 ∪ E2 ) = P (E1 ) + P (E2 ) − P (E1 ∩ E2 ) = + −0= .
20 20 20
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 214
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
Example (6-9):
A breakdown of the sources of energy used in the United States is
shown below.
Oil Nuclear Gas Hydropower Coal others
39% 8% 24% 3% 23% 3%
Choose one energy source at random. Find the probability that it is
1- Not oil.
2- Gas or oil.
3- Not nuclear and not hydropower.
Sol:
Let O= event of choosing Oil, N= Event of choosing Nuclear, G=Event
of choosing Gas, H= Event of choosing Hydropower, C= Event of
choosing Coal and T= Event of choosing Others. Then
1- P (Oc ) = 1 − P (O) = 1 − 0.39 = 0.61 .
.
.
.
.
.
.
. . . .
. . .
. .
. .
. .
. . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
215
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
2-
3-
P (N c ∩ H c )= P ((N ∪ H)c )
= 1 − P (N ∪ H)
= 1 − (P (N ) + P (H))
= 1 − (0.08 + 0.03) = 0.89
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 216
Dr. Ibrahim Almanjahie Principles of Statistics and Probability
ﺘﻡ ﺒﺤﻤﺩﺍﷲ
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 217
Dr. Ibrahim Almanjahie Principles of Statistics and Probability