Modified Ps Final 2023

UNIVERSITY OF MINES AND TECHNOLOGY
TARKWA
FMMT, FGES, FoE, SPetS
DEPARTMENT OF MATHEMATICAL SCIENCES
LECTURE NOTES
ON
PROBABILITY AND STATISTICS
MR/MN/GL/GM/ES/RP/NG/PG/CE/MC/EL/RN 361
COMPILED
BY
ASSOC PROF. L. BREW/ DR P. BOYE/ DR B. ODOI
JAN. 2023
1
Chapter Contents Page
Chapter 1 Introduction to Statistics 3
Chapter 2 Introduction to Probability 31
Chapter 3 Random Variables and Distributions 52
Chapter 4 Special Probability Distributions 70
Chapter 5 Mathematical Expectation 92
Chapter 6 Estimation 107
Chapter 7 Tests of Hypotheses and Significance 118
Chapter 8 Regression and Correlation Analysis 130
2
CHAPTER 1
Learning Objectives
Having worked through this chapter the student will be able to:
● Discuss the reasons for studying statistics as an engineer.
● Identify basic statistical concepts.
● identify the levels of measurements
● discuss the sampling procedures
● understand data visualization using several graphical devices
1.0 Introduction to Statistics
Statistics is a way to get information from data. Statistics is a discipline which is concerned
with:
▪ summarizing information to aid understanding,
▪ drawing conclusions from data,
▪ estimating the present or predicting the future, and
▪ designing experiments and other data collection.
In making predictions, Statistics uses the concept of probability, which models chance
mathematically and enables calculations of chance in complicated cases.
1.1 Why Statistics?
The field of statistics deals with the collection, presentation, analysis, and use of data to make
decisions, solve problems, and design products and processes. In simple terms, statistics is the
science of data.
Because many aspects of engineering practice involve working with data, obviously
knowledge of statistics is just as important to an engineer as are the other engineering
sciences. Specifically, statistical techniques can be powerful aids in designing new products
and systems, improving existing designs, and designing, developing, and improving
production processes.
Statistical analysis provides objective ways of evaluating patterns of events or patterns in our
data by computing the probability of observing such patterns by chance alone.
Insisting on the use of statistical analyses on which to draw conclusions is an extension of the
argument that objectivity is critical in science. Without the use of statistics, little can be learnt
from most research studies.
3
Because of the increasing use of statistics in so many areas of our lives, it has become very
desirable to understand and practice statistical thinking. This is important even if you do not
use statistical methods directly.
1.2 Branches of Statistics
1.2.1 Descriptive statistics is the branch of statistics that involves the organization,
summarization, and display of data. Two general techniques are used to accomplish this goal.
i. Organize the entire set of scores into a table or a graph that allows researchers (and others)
to see the whole set of scores. (summarizing data graphically)
ii. Compute one or two summary values (such as the average) that describe the entire group.
(summarizing data numerically).
1.2.2 Inferential statistics is the branch of statistics that involves using a sample to draw
conclusions about a population. A basic tool in the study of inferential statistics is probability.
1.2.3 Variables
In statistics, a variable has two defining characteristics:
▪ A variable is an attribute that describes a person, place, thing, or idea.
▪ The value of the variable can "vary" from one entity to another.
For example, a person's hair color is a potential variable, which could have the value of "blond"
for one person and "brunette" for another.
1.2.4 Qualitative vs. Quantitative Variables

Variables can be classified as qualitative (aka, categorical) or quantitative (aka, numeric).
▪ Qualitative. Qualitative variables take on values that are names or labels. The
color of a ball (e.g., red, green, blue) or the breed of a dog (e.g., collie, shepherd, and terrier)
would be examples of qualitative or categorical variables.
▪ Quantitative. Quantitative variables are numeric. They represent a measurable
quantity. For example, when we speak of the population of a city, we are talking about the
number of people in the city - a measurable attribute of the city. Therefore, population would
be a quantitative variable. In algebraic equations, quantitative variables are represented by
symbols (e.g., x, y, or z).
1.2.5 Discrete vs. Continuous Variables

Quantitative variables can be further classified as discrete or continuous. If a variable can take
on any value between its minimum value and its maximum value, it is called a continuous
variable; otherwise, it is called a discrete variable.
1.3 Univariate vs. Bivariate Data

Statistical data are often classified according to the number of variables being studied.
4
Univariate data. When we conduct a study that looks at only one variable, we say that we
are working with univariate data. Suppose, for example, that we conducted a survey to
estimate the average weight of high school students. Since we are only working with one
variable (weight), we would be working with univariate data.
Bivariate data. When we conduct a study that examines the relationship between two
variables, we are working with bivariate data. Suppose we conducted a study to see if there
was a relationship between the height and weight of high school students. Since we are
working with two variables (height and weight), we would be working with bivariate data.
1.4 Populations and Samples

The study of statistics revolves around the study of data sets. This lesson describes two
important types of data sets - populations and samples. Along the way, we introduce simple
random sampling, the main method used in this tutorial to select samples.
1.4.1 Population vs. Sample

The main difference between a population and sample has to do with how observations are
assigned to the data set.
A population includes all of the elements from a set of data.
A sample consists of one or more observations from the population.
Depending on the sampling method, a sample can have fewer observations than
the population, the same number of observations, or more observations. More than
one sample can be derived from the same population.
A measurable characteristic of a population, such as a mean or standard deviation,

is called a parameter; but a measurable characteristic of a sample is called a statistic.
We will see in future lessons that the mean of a population is denoted by the symbol μ; but
the mean of a sample is denoted by the symbol .
1.5 Summarizing data graphically

Selected graphs for qualitative data
5
▪ Pie chart
▪ Bar Chart
▪ (Also frequency distribution)
Selected graphs for Numerical data

▪ Box plot
▪ Dot plot
▪ Stem-and-leaf
▪ Histogram
1.6 Summary Statistics
Measure of Location
These provide an indication of the center of the distribution where most of the scores tend to
cluster. There are three principal measures of central tendency: Mode, Median, and Mean.
1.6.1 Measure of Spread/ Variability

Variability is the measure of the spread in the data. The three common variability concepts
are: Range, Variance and Standard deviation.
1.6.2 How to Describe Data Patterns in Statistics

Graphic displays are useful for seeing patterns in data. Patterns in data are commonly
described in terms of: Center, Spread, Shape, Symmetry, Skewness and Kurtosis
1.7 Unusual Features

Sometimes, statisticians refer to unusual features in a set of data. The two most
common unusual features are gaps and outliers.
1.7.1 Gaps. Gaps refer to areas of a distribution where there are no observations. The first
figure below has a gap; there are no observations in the middle of the distribution.
1.7.2 Outliers. Sometimes, distributions are characterized by extreme values that differ
greatly from the other observations. These extreme values are called outliers. The second
figure below illustrates a distribution with an outlier. Except for one lonely observation
(the outlier on the extreme right), all of the observations fall between 0 and 4. As a "rule of
thumb", an extreme value is often considered to be an outlier if it is at least 1.5 interquartile
ranges below the first quartile (Q1), or at least 1.5 interquartile ranges above the third quartile
(Q3).
1.8 How to Compare Data Sets
6
Common graphical displays (e.g., dot plots, box plots, stem plots, bar charts) can be
effective tools for comparing data from two or more data sets.
1.8.1 Four Ways to Describe Data Sets

When you compare two or more data sets, focus on four features:
Center. Graphically, the center of a distribution is the point where about half of the
observations are on either side.
Spread. The spread of a distribution refers to the variability of the data. If the observations
cover a wide range, the spread is larger. If the observations are clustered around a single
value, the spread is smaller.
Shape. The shape of a distribution is described by symmetry, skewness, number of peaks,

etc.
Unusual features. Unusual features refer to gaps (areas of the distribution where there are
no observations) and outliers.
1.9 Sampling Procedures
Statisticians employ different procedures in choosing the observations that will

constitute their random samples of the population. The objective of these procedures is
to select samples that will be representative of the population from where they originate.
These samples, also known as random samples, will have the property that each sample
has the same probability of being drawn from the population as another sample.
1.9.1 Simple random sampling is the process of selecting a random sample from a finite
or infinite population. There are a total of different samples having size n that can
be obtained from a finite population of size N. If the n observations are selected
randomly, then the samples are random samples that have an equal probability
, of being selected. For an infinite population, the sample is random if each of the n
observations correspond to n independent random variables, i.e., each observation is
selected independently of the others. Oftentimes, the size of a population under study is
large enough so that the population can be considered infinite; if the sample size is small
relative to the population size, the population can usually be considered infinite. Fig. 1.1
illustrates this method.
7
Example 1: There are different samples of 5 letters that can be obtained from
the 26 letters of the alphabet. If a procedure for selecting a sample of 5 letters was devised
such that each of these 65780 samples had an equal probability (equal to 1/65780) of
being selected, then the sample selected would be a random sample.
Fig. 1.1 Random Sampling
1.9.2 Systematic (Regular) Sampling is the sampling procedure wherein the kth element
of the population under study is selected for the sample, with the starting point
randomly determined from the first k elements. The value of k is often dependent on the
structure and objectives of the sampling experiment, as well as the population under
study. In systematic sampling, the sample values are spread more evenly across the
population (as shown in Fig. 1.2); thus, many systematic samples are highly
representative of the population from which they were selected. Yet, one must be careful
that the value of k does not result in a sampling interval whose periodicity would
compromise the randomness of the observations.
Fig. 1.2 Regular Sampling
Example 2: In inspecting a batch of 1000 pipes for defects, we can choose to inspect every
10th item in the batch. The items inspected are the 10th, 20th, 30th, and so on until the
8
1000th item. In doing so, we must ensure that each 10th item is not specially produced
by a special process or machine; otherwise, the proportion of defects in the sample
consisting of every 10th item will be fairly homogenous within the sample, and the
sample will not be representative of the entire batch of 1000 pipes.
1.9.3 Stratified Random Sampling is the sampling procedure that divides the
population under study into mutually exclusive sub-populations, and then selects
random samples from each of these sub populations.
Fig.1.3 Stratified Random Sampling
The sub-populations are determined in such a way that the parameter of interest is fairly
homogeneous within a subpopulation. By doing so, the variability of the population
parameter within each sub-population should be considerably less than its variability
for the entire population. Oftentimes, there is a relationship between the characteristics
of a certain population and the population parameter. Fig. 1.3 illustrates this method.
Example 3: In determining the distribution of incomes among engineers in the Bay Area,
we can divide the population of engineers into sub-populations corresponding to each
major engineering speciality (electrical, chemical, mechanical, civil, industrial, etc.).
Random samples can then be selected from each of these sub-populations of engineers.
The logic behind this sampling structure is the reasonable assumption that the income
of an engineer depends, to a large extent, on his particular speciality.
1.9.4 Cluster sampling (also called block sampling) is the sampling procedure that
randomly selects clusters of observations from the population under study, and then
chooses all, or a random selection, of the elements of these clusters, as the observations
of the sample (as illustrated in Fig. 1.4). Often, cluster sampling is a cost efficient
procedure for selecting a sample representative of the population; this is especially true
for a widely scattered population.
9
Fig. 1.4 Cluster sampling
Example 4: In conducting a poll of voter preferences for a statewide election, we can

randomly select congressional districts (or some other applicable grouping of voters),
and then conduct the poll among the people in the chosen congressional district. Many
voter polls that utilize cluster sampling would carefully choose their clusters so that they
best represent the voter preferences for the whole state.
1.10 Levels of Measurement (Types of Data)
Variables can be classified on the basis of their level of measurement. The way we
classify variables greatly affects how we can use them in our analysis. Variables can
be (1) nominal, (2) ordinal, (3) interval, or (4) ratio.
A nominal measurement is created when names are used to establish categories into
which variables can be exclusively recorded. For example, sex can be classified as
"male" or "female." You could also code it with a "1" or a "2"; oxide and carbonate ores
may also be coded 1 and 2, but the numbers would serve only to indicate the
categories and would carry no numerical significance; mathematical calculations
using these codes would be meaningless. Soft drinks may be classified as Coke, Pepsi,
7-Up, or Ale 8. Each drink could be recorded in one of these categories to the exclusion
of the others.
Nominal Measurements Names or classifications are used to divide data into

separate and distinct categories.
It is important to remember that a nominal measurement carries no indication of order

of preference, but merely establishes a categorical arrangement into which each
observation can be placed.
Unlike a nominal measurement, an ordinal scale produces a distinct ordering or

arrangement of the data. That is, the observations are ranked on the basis of some
criterion. A retail company may rank its products as "good," "better," and "best."
Opinion polls often use an ordinal scale such as "strongly agree," "agree," "no opinion,"
10
"disagree," and "strongly disagree."
As with nominal data, numbers can be used to order the rankings. Like nominal data,
the magnitude of the numbers is not important; the ranking depends only on the order
of the values. The retailer could have used the rankings of "1," "2," and "3," or "1," "3,"
and "12," for that matter. The arithmetic differences between the values are
meaningless. A product ranked "2" is not twice as good as one with a ranking of "1."
Ordinal Measurements Measurements that rank observations into categories with

a meaningful order.
Variables on an interval scale are measured numerically, and, like ordinal data, carry
an inherent ranking or ordering. However, unlike the ordinal rankings, the differences
between the values is important. Thus, the arithmetic operations of addition and
subtraction are meaningful. The Fahrenheit scale for temperatures is an example of an
interval scale. Not only is 70 degrees hotter than 60 degrees, but also the same
difference of 10 degrees exists as between 90 and 100 degrees Fahrenheit.
The value of zero is arbitrarily chosen in an interval scale. There is nothing sacrosanct
about the temperature of zero; it is merely an arbitrary reference point. The Fahrenheit
scale could have been created so that zero was set at a much warmer (or colder)
temperature. No specific meaning is attached to zero other than to say it is 10 degrees
colder than 10 degrees Fahrenheit. Thus, 80 degrees is not twice as hot as 40 degrees
and the ratio 80/40 has no meaning.
Interval Measurements Measurements on a numerical scale in which the value of

zero is arbitrary but the difference between values is important.
Of all four levels of measurement, only the ratio scale is based on a numbering system
in which zero is meaningful. Therefore, the arithmetic operations of multiplication and
division also take on a rational interpretation. A ratio scale is used to measure many
types of data found in business and geoscientific analyses. Variables such as costs,
profits, inventory levels and grades are expressed as ratio measures. The value of zero
dollars to measure revenues, for example, can be logically interpreted to mean that no
sales have occurred. Furthermore, a firm with a 40 percent market share has twice as
much of the market as a firm with a 20 percent market share. Measurements such as
weight, time, and distance are also measured on a ratio scale since zero is meaningful,
and an item that weighs 100 pounds is one-half as heavy as an item weighing 200
pounds.
Ratio Measurements Numerical measurements in which zero is a meaningful value

and the difference between values is important.
11
You may notice that the four levels of measurement increase in sophistication,
progressing from the crude nominal scale to the more refined ratio scale. Each
measurement offers more information about the variable than did the previous one.
This distinction among the various degrees of refinement is important, since different
statistical techniques require different levels of measurements. While most statistical
tests require interval or ratio measurements, other tests, called nonparametric tests
(which will be examined later in this text), are designed to use nominal or ordinal data.
1.11 Frequency Distribution
Graphical representation makes unwieldy data readily intelligible and brings to light the
salient features of the data at a glance. It makes visual comparison of data easier. It
facilitates the comparison of two frequency distributions.
Several graphical devices are often used to portray shapes of distributions. The
following types of graphs are commonly used in representing frequency distributions.
1. Stem-and –leaf display,
2. Dot plot.
3. Box-and-whiskers display (box plot)
4. Histogram
5. Frequency polygon and Frequency curve
6. Cumulative frequency curve or the ‘Ogive’
7. Pareto chart
8. Pie chart
9. Bar chart
1.11.1 Stem-and-leaf
Stem-and leaf technique can be used to show both the rank order and shape of a data set
simultaneously. As an illustration consider the data in Table 1.1.
Table 1.1
112 72 69 97 107
73 92 76 86 73
126 128 118 127 124
82 104 132 134 83
92 108 96 100 92
115 76 91 102 81
95 141 81 80 106
84 119 113 98 75
68 98 115 106 95
100 85 94 106 119
12
Fig. 1.5 Stem-and-Leaf Displays
13
Fig. 1.6 Comparison of Stem-and-leaf and histogram
1.11.2. Dot Plot

This behaves similar to the stem-and-leaf plot. They are effective in detecting outliers.
Fig. 1.7 Dot Plot
1.11.3. Box-and-whiskers plot
Fig. 1.8 Box-and-whiskers Plot
14
Fig. 1.9 Comparison of Stem-and-leaf and histogram
1.11.4. Pareto Chart

These are used to help identify important quality problems and opportunities for
process improvement. By using these charts one can prioritise problem solving
activities. When you analyse a Pareto chart, make comparisons by looking at the heights
of the bars.
Fig. 1.10 Pareto Chart
1.12 Forms of Frequency curves.
1.12.1 Symmetrical Curves

A frequency curve is said to be symmetrical if it can be folded along a vertical line
(ordinate) so that the two halves of the figure coincide. This is called a normal
distribution (Fig. 1.13)
15
y
Mark (xi)
Fig 1.13 Symmetrical Distribution.

1.5.2 Skew Curves
A curve is said to be skewed if there is no symmetry. In a skew curve, observations tend

to pile up at one or the other end of the curve. Thus a curve may have a large tail to the
negative (left) side in which case it is said to be negatively skewed curve (Fig 1.14a) or
to the positive (right) in which case it is known as positively skewed curve (Fig. 1.14b).
Fig 1.14 Skewed distributions (a) negatively skewed, (b) positively skewed (After
Journel and Huijbregts, 1991)
16
1.13 Measure of Location and Dispersion
In addition to the histogram, the information in the frequency distribution can be further
summarised by means of just two numbers. The first is the location of the data, and the
various numbers that provide information about this are known as ‘measures of
location’ or ‘measures of central tendency’. ‘Location of the data’ refers to a value that
is typical of all the sample observations.
The second important aspect of the data is the dispersion of the observations. This
implies how the data are scattered (dispersed). This is also called ‘measure of variation’.
1.13.1 Measures of Location
a. The Mode:
The mode is defined as the observation in the sample which occurs most frequently. If
there is only one mode then it is unimodal otherwise it is multimodal (see Fig. 1.15)
Fig 1.15 Mode of Sample
b. The Arithmetic Mean.
It is the most commonly used measure of locations. Let the variable x take the values x1,
x2 ……, xn. The arithmetic mean is defined as:
For large data it may be advantageous to classify the data. If the class mark and
frequency of a class is denoted by and fi respectively, the total value of observation
for a class is: fi. Thus,
Where
k = number of classes
n = total number of observations.
17
c. The Geometric Mean
If th n non-zero and positive variate values x1, x2, ....., xn occur f1, f2, f3,...., fn times
respectively, then geometric mean G of the set of observations is defined by:
Thus,
The geometric mean may be used to show percentage changes in a series of positive
numbers.
As such, it has wide application in business and economics, since we are often interested
in determining the percentage change in sales, gross national product, or any other
economic series. The geometric mean (GM) is found by taking the nth root of the product
of n numbers. Thus:
GM
GM is most often used to calculate the average growth rate over time of some given
series.
Example 4: A farm labourer wishes to determine the average growth rate of his monthly
income based on the figures in Table 1.2. If the average annual growth rate of monthly
salary is less than 10% he will resign. Using GM should he resign?
Table 1.2 Salary of farm labourer
18
Solution
Hence the average annual increase is about 11.6%. Hence he may not resign.
It is worth noting that the geometric mean will always be less than the arithmetic mean
except in the rare case when all the percentage increases are the same. Then the two
means are equal.
d. Harmonic Mean
In statistics, harmonic mean is used to find the average rate. The harmonic mean, H, of
n non-zero different variate-values xi with frequencies fi is given by:
Thus the harmonic mean of the variate-values is the reciprocal of the arithmetic mean of
their reciprocals. For example, if the rate for one lap at the race is a and the rate for a
second lap is b then the average rate c is given by the harmonic mean. We have a distance
d traveled first at rate r1 and then at rate r2. Let the times be t1 and t2.Total time t = t1 +
t2. Our problem is to find the rate r for travel of the distance 2d. These last two equations
are forms of the harmonic mean r of r1 and r2. Finding the common rate for travel over
the same distance is the harmonic mean of the individual rates.
Since:
d = (r1)(t1) d = (r2)(t2)
Now,
Dividing both sides by d leads to:
Dividing each side by r and by 2 gives:
19
or
Note:
i. Harmonic mean is important in problems in which variable-values are compared with
a constant quantity of another variable , i.e. time, distance covered within a certain time,
etc.
ii. Another word for average. Mean almost always refers to arithmetic mean. In certain
contexts, however, it could refer to the geometric mean, harmonic mean, or root mean
square.
iii. For any set of positive numbers, it is always true that:
Harmonic mean ≤ geometric mean ≤ Arithmetic mean ≤ Root mean square
Example 5: For the numbers 4 and 9,
Harmonic mean =
Geometric mean
Arithmetic mean =
Root mean square =
e. The median
If the sample observations are arranged in order from smallest to largest, the median is
defined as the middle observation if the number of observations is odd, and as the
number halfway between the two middle observations if the number of observations is
even.
If x1, x2, …….., xn represents a random sample if size n, arranged in order of magnitude,
then the sample median is defined by:
20
The general formula for the median of classified data is:
Where
bL = lower boundary of the median class
n = number of observations
fm = the number of observations in the median class
fm-1 = the cumulative frequency of the class preceding the median class.
c = class interval of the median class
Note: Because of the distorting effect of extreme observations on the mean, the median
is often the preferred measure in such situations as salary negotiations.
Example 6: If bL = 199.5, n = 300, fm-1 = 116, fm = 73, c = 50

Solution:
Example 7: Consider the data in Table 1.3
Table 1.3
Median =
f. Relation between Measure of Location and Types of Frequency Curves.

In practice, the frequency curves have the shapes shown in Fig 1.16.
21
Fig 1.16 Relative positions of measures of location.
1.14 Measures of Dispersion, Skewness

It should be clear that a measure of central tendency by itself can exhibit only one of the
important characteristics of a distribution and therefore while studying a distribution it
is equally important to know how the variates are clustered around or away from the
point of central tendency. The variation of the points about the mean is called
dispersion. Spread or dispersion can be classified into three groups.
i. Measures of the difference between representative variate values such as the

range – the interquartile or the interdecile range.
ii. Measures obtained from the deviations of every variate value from some
central value such as the mean deviation from the mean or the mean deviation
from the median or the standard deviation.
iii. Measures obtained from the variations of all the variates among themselves,
such as mean difference.
1. Range: It is the difference between the extreme values of the variate i.e. (x n – x1)
when the values are arranged in ascending order.
2. The Interquartile range is the difference between the 75% and 25% i.e. (X75% - X25%).
3. The interdecile range is the difference between the ninth and first decile i.e. X0.9 –
X0.1. This combines eighty percent of the total frequency while the interquartile
range contains fifty percent. They are only mainly used in descriptive statistics
because of the mathematical difficulty in handling them in advanced statistics.
4. Average Deviation or Mean Deviation.The mean deviation is defined as:
For very large n, M.D may equal zero as some deviations may be negative and others
positive but the individual deviations could be numerically large, thus giving a poor
expression of the intrinsic dispersion. The mean absolute deviation (M.A.D). provides a
better and more useful measure of dispersion.
22
;
This is illustrated in Table 1.4
Table 1.4
Xi
21 6 6
17 2 2
13 -2 2
25 10 10
9 -6 6
19 4 4
6 -9 9
10 -5 15
120 0 44
The mean absolute deviation is based on all the values of the variate and hence it is a
better measure of dispersion than the mean deviation. However it is also difficult to
handle mathematically.
1.15. Variance and Standard Deviation.
In order to minimise the inefficiency of the mean deviation outlined earlier a better
option is the sum of the square deviations i.e. (known simply as the ‘sum of
squares’. The mean of this sum of squares is the sample variance; denoted symbolically
as:
For theoretical reasons, the sum of squares is divided by (n – 1) rather than n because it
represents a better estimate of the standard deviation.
23
For n > 35 there is practically no significant difference in the definitions.
Sample standard deviation S is defined as: or more appropriately:
For classified data, if the data have k classes
or
Where xi’ = midpoint (class mark) of the ith class
fi = the number of observations in the ith class
n = the total number of observations
= sample mean =
Example 8:
A test in probability and statistics was taken by 51 students in UMaT. The scores ranged
from 50% to 95% and were classified into 8 classes of width 6 units. Find the variance
and standard deviation.
It could also be worked as in Table 1.5.
Table 1.5
Class Class Mark Frequency
Limits (fi)
48-54 51 2 102 -24 1152
54-60 57 3 171 -18 972
60-66 63 5 315 -12 720
24
66-72 69 8 552 -6 288
72-78 75 10 750 0 0
78-84 81 12 972 6 432
84-90 87 10 870 12 1440
90-96 93 1 93 18 324
Totals 51 = 3825 5328
Table 1.6
fi
51 2 2601 102 5202

57 3 3249 171 9747
63 5 3969 315 19845
69 8 4761 552 38088
75 10 5625 750 56250
81 12 6561 972 78732
87 10 7569 870 75690
93 1 8649 93 8649
Totals 3825 292203
= 106.56
S = 10.23
1.16. Coefficient of Variation. Whilst the variance is very important in measuring

dispersion it has certain limitations in its applications in comparing distributions that:
i. Have significantly different means
ii. Are measured in different units
In such situations it is better to use the coefficient of variation which assesses the degree
of dispersion of a data set relative to its mean.
Using S = 10.23 and x = 75 from the previous example:
1.17 Skewness Measures describing the symmetry of distributions are called

‘Coefficients of Skewness’. One such measure is given by:
25
α3 will be positive or negative according as the distribution is skewed to the right or left
respectively. For a symmetric distribution α 3 = 0
1.18 Kurtosis Measures of the degree of peakedness of a distribution are called

‘coefficients of kurtosis' or briefly ‘kurtosis’. It is often measured as:
This is usually compared with the normal distribution curve that has a kurtosis of 3. Fig.
1.17 illustrates curves with different kurtosis.
Large Kurtosis
Small Kurtosis
Fig 1.17 Illustration of Kurtosis
CHAPTER 2
Learning Objectives
26
● Interpret probabilities and use probabilities of outcomes to calculate probabilities
of events in discrete sample spaces.
● Interpret and calculate conditional probabilities of events
● Use Bayes’ theorem to calculate conditional probabilities
● Discuss random variables.
● use counting techniques in calculating probabilities of events
2.0 Introduction to Probability
2.1 Definitions
(a) Experiment:
An experiment is any process that generates a set of data or well-defined outcomes. There are
two types of experiments, namely Deterministic and Random (or Chance) Experiments. In
the deterministic experiments the observed results are not subject to chance while the
outcomes of random experiments cannot be predicted with certainty. A random experiment
could be as simple as tossing a coin or die and observing an outcome or complex as choosing
50 people from a population and testing them for the AIDS disease.
(b) Trial: Each repetition of an experiment is called a trial. That is, a trial is a single
performance of an experiment.
(c) Outcome: The possible result of each trial of an experiment is called an outcome. When
an outcome of an experiment has equal chance of occurring as the others the outcomes are
said to be equally likely. For example, the toss of a coin and a die yield the possible outcomes
in the sets, {H, T} and {1, 2, 3, 4, 5, 6} and a play of a football match yields {win (W), loss (L),
draw (D)}.
(d) Sample Space:

Sample space is the collection of all possible outcomes at a probability experiment. We use
the notation S for sample space. Each element or outcome of the experiment is called a sample
point. For example,
(i) The results of two and three tosses of a coin give the following sample spaces:
S = {HH, HT, TH, TT}
S = {HHH, HHT, HTH, HTT, THH, THT, TTH,TTT}
(ii) A toss of a die and a coin simultaneously give the results.

S = {H1, H2, H3, H4, H5, H6, T1, T2, T3, T4, T5, T6}
(iii) The outcomes of two tosses of a die are as shown in the table
27
where T1 and T2 represent the first and second tosses respectively.
(iv) Drawing a card from a packet of playing cards has sample space with 52 cards
made up of 13 Heart, 13 Spade, 13 Diamond and 13 Club cards.
(e) Event: An event is a collection of one or more outcomes from an experiment.

That is, it is a subset of a sample space. It is denoted by a capital letter. For
example we may have:
(i) The event of observing a head (H) in three tosses of a coin,
A = {HTT, TTH}
(ii) The event of obtaining a total score of 8 on two tosses of a die,
B = {2,6), (3,5), (4,4), (5,3), (6,2)}
(iii) Consider a newly married couple planning to have three children. The event of
the family having two girls is:
D = {BGG, GBG, GGB}
(f) Tree Diagram: The tree diagram represents pictorially the outcomes of random
experiment. The probability of an outcome which is a sequence
of trials, is represented by any path of the tree. For example,
(i) Consider a couple planning to have three children, assuming each child born is equally
likely to be a boy (B) or girl (G).
(ii) A soccer team on winning (WT) or losing (LT) a toss can defend either post A or
B. It plays the match and either win (W), draw (D) or lose (L). We illustrate the
experiment on a diagram as follows
28
2.2 Determination of Probability of an Event
The probability of an event A, denoted, P(A), gives the numerical measure of the likelihood
of the occurrence of event A which is such that 0 ≤ P (A) ≤ 1. If P (A) = 0, the event A is said
to be impossible to occur and if P(A) = 1, A is said to be certain. If A/ is the complement of
the event A, then P(A/) = 1 – P(A), called the probability that event A will not occur. There
are three main schools of thought in defining and interpreting the probability of an event.
These are the Classical Definition, Empirical Concept and the Subjective Approach. The first
two are referred to as the Objective Approach.
(a) The Classical Definition: This is based on the assumption that the outcomes of an
experiment are equally likely. For example, if an experiment can lead to n mutually exclusive
and equally likely outcomes, then the probability of the event A is defined by
The classical definition of probability of event A is referred to as a priori probability because

it is determined before any experiment is performed to observe the outcomes of event A.
(b) The Empirical Concept: This concept uses the relative frequencies of past occurrences
to develop probabilities for future. The probability of an event A happening in future is
determined by observing what fraction of the time similar events happened in the past. That
is,
29
The relative frequency of the occurrence of the event A used to estimate P(A) becomes more
accurate if trials are largely repeated. The relative frequency approach of defining P(A) is
sometimes called posteriori probability because P(A) is determined only after event A is
observed.
(c) The Subjective Definition: The subjective concept of probability is based on the
degree of belief through the evidence available. The probability of an event A may therefore
be assessed through experience, intuitiveness, judgment or expertise. For example,
determining the probability of getting a cure of a disease or going to rain today. This approach
to probability has been developed relatively recently and is related to Bayesian Decision
Analysis. Although the subjective view of probability has enjoyed increased attention over the
years, it has not been fully accepted by statisticians who have traditional orientations.
Example 9:
Consider the problem of a couple planning to have three children, assuming each child born
is equally likely to be a boy (B) or a girl (G).
(a) List the possible outcomes in this experiment
(b) What is the probability of the couple having exactly two girls?
Solution:
(a) The sample space for this experiment is
S = {BBB, BBG, BGB, BGG, GBG, GGB, GGG}
(b) Let A be the event of the couple having exactly two girls. Then,
A = {BGG, GBG, GGB}
Example 10:
Suppose a card is randomly selected from a packet of 52 playing cards.
(i) What is the probability that it is a “Heart”?
(ii) What is the probability that the card bears the number 5 or a picture of
a queen?
(b) A box contains 4 red, 2 black and 3 white balls. What is the probability of drawing
a red ball?
Solution:
(a) Let the sample space be the set, S = {playing cards}, A = {Heart cards}, B = {Cards
numbered 5} Q = {Cards with a picture of queen}. Then
n(S) = 52, n(A) =13, n(B) = 4 and n(Q) = 4
(i)
(ii)
30
=
(b) The sample space, S = {4R, 2B, 3W-balls} and let R = {red balls}. Then
Example 11:
A die is tossed twice. List all the outcomes in each of the following events and compute the
probability of each event.
(a) The sum of the scores is less than 4
(b) Each toss results in the same score
(c) The sum of scores on both tosses is a prime number
(d) The product of the scores is at least 20
Solution:
The sample space for the experiment is the set of ordered paired (m, n), where and each
takes the values 1, 2, 3, 4, 5 and 6. Thus,
S = {(1, 1), (1, 2), (1, 3), . . . . , (6, 6)}, where n(S) = 36
(a) A = {(1, 1), (1, 2), (2, 1)}
(b) B = {each toss results in the same score}

= {(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)}
(c) D = {sum of scores on both tosses is prime}

D = {(1, 1), (1, 2), (1, 4), (1, 6), (2, 1), (2, 3), (2, 5), (3, 2), (3, 4), (4, 1), (4, 3),
(5, 2), (5, 6), (6, 1), (6, 5)}
(d) E = {product of the scores is at least 20}

= {(4, 5), (4, 6), (5, 4), (5, 5), (5, 6), (6, 4), (6 5), (6, 6)}
2.3 Probability of Compound Events

Two or more events are combined to form a single event using the set operations, and .
The event
(i) (A B) occurs if either A or B both occur(s).
(ii) (A B) occurs if both A and B occur.
2.3.1 Definitions:
(a) Mutually Exclusive Events: Two or more events which have no common outcome(s)
(i.e. never occur at the same time) are said to be mutually exclusive. If A and B are mutually
exclusive events of an experiment, then and , since .
31
(b) Independent Events: Two or more events are said to be independent if the probability
of occurrence of one is not influenced by the occurrence or non-occurrence of the other(s).
Mathematically, the two events, A and B are said to be independent, if and only if
. However, if A and B are such that, , they are said to
be conditionally independent.
(c) Conditional Probability: Let A and B be two events in the sample space, S with P(B)
> 0. The probability that an event A occurs given that event B has already occurred, denoted
P(A/B), is called the conditional probability of A given B. The conditional probability of A
given B is defined as.
In particular, if S is a finite equiprobable space, then
(d). Exhaustive Events: Two or more events defined on the same sample space are said to
be exhaustive if their union is equal to the sample space (thus, if they partition the sample
space mutually exclusively). Eg: if .
Definition (partition of sample space): The events form a partition of
the same sample space if the following hold:
(a) for all
(b) for all
(c)
In other words, the - events form a partition of the sample space if
the - events are (a) nonempty, (b) mutually exclusive and (c) collectively exhaustive.
Example 12:
(a) In a certain population of women, 40% have had breast cancer, 20% are smokers and
13% are smokers and have had breast cancer. If a woman is selected at random from the
population, what is the probability that she had breast cancer, smokes or both?
(b) Let and be events such that and
(i) Find
(ii) Are and independent?
Solution:
(a) Let B be the event of women with breast cancer and W the event of women who smoke.
Then,
and
32
= 0.4 + 0.20 – 0.13
= 0.47
(b) Given that and
(i)
(iii) A and B are independent if
Which means that A and B are independent.
2.3.2 Example on Conditional Probability
Example 13:
Complex components are assembled in a plant that uses two different assembly lines, A and
A/. Line A uses older equipment than A/, so it is somewhat slower and less reliable. Suppose
on a given day line A has assembled 8 components, of which 2 have been identified as
defective (B) and 6 as nondefective (B/), whereas A/ has produced 1 defective and 9
nondefective components. This information is summarized in the accompanying table.
Condition Total
B B/
Line A2 6 8
A1/ 9 10
3 15 18
Unaware of this information, the sales manager randomly selects 1 of these 18 components
for a demonstration. Prior to the demonstration
P(line A component selected) = P(A) =
However, if the chosen component turns out to be defective, then the event B has occurred,
so the component must have been 1 of the 3 in the B column of the table. Since these 3
components are equally likely among themselves after B has occurred,
Exercise:
33
Suppose that of all individuals buying a certain digital camera, 60% include an optional
memory card in their purchase, 40% include an extra battery, and 30% include both a card
and battery. Given that the selected individual purchased an extra battery, what is the
probability that an optional card was also purchased?
The Multiplication Rule for

The definition of conditional probability yields the following result, obtained by multiplying
both sides of the conditional probability equation by P(B).
This rule is important because it is often the case that is desired, whereas both P(B)
and can be specified from the problem description.
The Law of Total Probability

Let A1, . . . , Ak be mutually exclusive and exhaustive events. Then for any other event B,
2.4 Bayes’ Rule

The power of Bayes’ rule is that in many situations where we want to compute P(A|B) it
turns out that it is difficult to do so directly, yet we might have direct information about
P(B|A). Bayes’ rule enables us to compute P(A|B) in terms of P(B|A).
2.4.1 Bayes’ Theorem

Let A and Ac constitute a partition of the sample space S such that with P(A) > 0 and P(A c) >
0, then for any event B in S such that P(B) > 0,
The denominator P(B) in the equation can be computed,
34
Example 14:
A paint-store chain produces and sells latex and semigloss paint. Based on long-range sales,
the probability that a customer will purchase latex paint is 0.75. Of those that purchase latex
paint, 60% also purchase rollers. But only 30% of semi gloss pain buyers purchase rollers. A
randomly selected buyer purchases a roller and a can of paint. What is the probability that
the paint is latex?
Solution
L = {The customer purchases latex paint.}, P(L) = 0.75
S = {The customer purchases semigloss paint.}, P(S) = 0.25
R = {The customer purchases roller.}
P(R|L) =0.6
P(R|S) =0.3
P(R) =P(R|L)P(L)+P(R|S)P(S) = 0.6 × 0.75+0.3 × 0.25 = 0.525
2.5 Axioms of Probability

Given an experiment and a sample space, S , the objective of probability is to assign to each
event A a number P(A), called the probability of the event A, which will give a precise
measure of the chance that A will occur. To ensure that the probability assignments will be
consistent with our intuitive notions of probability, all assignments should satisfy the
following axioms (basic properties) of probability.
A.1: For every event A, 0 ≤ P(A) ≤1

A.2: P(S) = 1
A.3: If A and B are mutually exclusive events, i.e then
.
A.4: If is a sequence of mutually exclusive events, then,
or
( .
The following theorems arise directly from the above axioms:
(i) Theorem 1: If is the empty set, then .
Proof:
Let A be any event, then A and are mutually exclusive and A = A
35
Then by A.3 and
(ii) Theorem 2: If is the complement of an event , then
Proof
, by A.2
2.5.1 Some Rules of Probability
(a) The Addition Rule:
Let be events of the sample space, S. Then
(i)
(ii) +
If the events are mutually exclusive, then
(i)
(ii)
(iii)
(b) The Multiplication Theorem:
If are events of the same sample space, S, then
(i)
(ii)
2.6 Application of Counting Techniques
36
The classical definition of probability of an event requires the knowledge of the
number of outcomes of and the total possible outcomes of the experiment, . To find these
outcomes we list such outcomes explicitly, which may be impossible if they are too many.
Counting Techniques may be useful to determine the number of outcomes and compute
. We shall examine three basic counting techniques, namely the Multiplication Principle,
Permutation and Combination
2.6.1 The Multiplication Principle

The Multiplication Principle, also known as the Basic Counting Principle states that:
● If an operation can be performed in ways, and a second operation can be performed
in ways and so on for operation which can be performed in ways, then the combined
experiment or operations can be performed in ways.
For example: A homeowner doing some remodeling requires the services of both a plumbing
contractor and an electrical contractor. If there are 12 plumbing contractors and 9 electrical
contractors available in the area, in how many ways can the contractors be chosen? If we
denote the plumbers by P1, . . . , P12 and the electricians by Q1, . . . , Q9, then we wish the
number of pairs of the form (Pi,Qj). With n1=12 and n2=9, the product rule yields N = (12)(9)
= 108 possible ways of choosing the two types of contractors.
Example 15
(1) Tossing a coin has two possible outcomes and tossing a die has six possible outcomes.
Then the combined experiment, tossing the coin and die together will result in
twelve possible outcomes provided below:
(2) Another example is the number of different ways for a man to get dressed if he has 8
different shirts and 6 different pairs of trousers. The combination of the 8 different shirts and
the six different pairs of trousers results in possible ways.
(3) In a certain examination paper, students are required to answer 5 out of 10 questions
from Section A another 3 out of 5 questions from Section B and 2 out of 5 questions from
Section C. In how many ways can the students answer the examination paper?
Solution:
The number of ways of answering the questions in Section A
The number of ways of answering the questions in Section B
The number of ways of answering the question in Section C
37
Hence the students can answer the question in the three sections in
Application of the multiplication principle results in the other two counting techniques:
Permutation and Combination, used to find the number of possible ways when a fixed number
of items are to be picked from a lot without replacement.
2.6.2 Permutation of Objects

An ordered arrangement of objects is called a permutation. For example, the possible
permutations of the letters and are as follows: ,
.
Definitions:
(a) The number of permutations of distinct objects, taken all together is:
(b) The number of permutations of distinct objects taken at a time is:
(c) The number of permutations of objects consisting of groups of which of the first
group are alike, of the second group are alike and so on for the group with objects
which are alike is:
where
(d) Circular Permutations: Permutations that occur when objects are arranged in a circle are
called circular permutations. The number of ways of arranging different objects in a circle
is given by:
Example 16:
1. (i) The number of permutations of 10 distinct digits taken two at a time is
(ii) A company codes its customers by giving each customer an eight character code. The
first 3 characters are the letters A, B and C in any order and the remaining 5 are the digits 1,
2, 3, 4 and 5 also in any order. If each letter and digit can appear only once then the number
of customers the company can code is obtained as follows:
The first 3 letters can be filled in 3!
The next 5 digits can be filled in 5!
Then the required number
2.(a) The number of permutations of the letters of the word, POSSIBILITY, which
38
contains 3I’s and 2S’s is
(b) The number of arrangements of the letters of the word, ADDING, if

the two letters D and D are together (ADDING)
(c) The number of circular permutations of 6 persons sitting around a circular
table
(d) In how many ways can 4 boys and 2 girls seat themselves in a row if
(i) the 2 girls are to sit next to each other?
(ii) the 2 girls are not to sit next to each other?
Solution:
(i) If we regard the 2 girls as separate persons ( B1 B2 B3 B4 G1 G2 ), then the number of
arrangements of 5 different persons, taken all at a time = 5!
The 2 girls can exchange places and so the required number of ways the can seat
themselves
= 5! X 2! = 240
(ii) The number of ways the boys can arrange themselves = 4!
The number of ways the 2 girls can occupy the arrowed places in their mix with the
boys is as shown below:
The required number of permutations (with the 2 girls not sitting next to each other) = 4! x
5 x 4 = 480
2.6.3 Combination of Objects

A Combination is a selection of objects in which the order of selection does not matter.
Definition:
The number of ways in which objects can be selected from distinct objects, irrespective
of their order is defined by:
Example 17:
1.(a) (i) The number of ways a committee of 5 people can be chosen out of 9 is
39
(ii) The number of combinations of the letters a, b, c, d and e, taken three at a time
is which are listed below:
(b) Find the number of ways in which a committee of 4 can be chosen from 6 boys and 5
girls if it must
(i) consist of 2 boys and 2 girls
(ii) consist of at least 1 boy and 1 girl.
Solution:
(i) The number of ways of choosing 2 boys from 6 and 2 girls from 5 is as follows:
(ii) For the committee to contain at least 1 boy and 1 girl will involve the following:
The required number of ways
Example 18:
1. A box contains 6 red, 3 white and 5 blue balls. If three balls are drawn at random, one
after the other without replacement, find the probability that
(i) all are red
(ii) 2 are red and 1 is white
(iii) at least 1 is red
(iv) 1 of each colour
Solution:
(i)
(ii)
40
(iii)
(iv)
2. A board consists of 12 men and 8 women. If a committee of 3 members is to be formed,

what is the probability that
(i) It includes at least one woman?
(ii) It includes more women than men?
Solution:
The number of ways of forming the committee of 3 from twelve men and 8 women
(i) The probability that it includes at least 1 woman
(ii) The probability that it includes more women than men
41
2.7 Supplementary Questions
Example 19:
If the probability of achieving monthly production targets at Goldfields Ghana Limited,
(A), and Ashanti (Obuasi), (B), are 0.8 and 0.9 respectively, what is P(A∩B)?
Solution:
But production in GGL and production at Ashanti are independent. Hence,
Thus, P(A∩B) = P(A) x P(B) =0.8 x 0.9 = 0.72.

Since the mines are independent of each other, their productions are assumed to
be independent of each other.
(a) For dependent events. Here the first event is considered in determining the
probability of the second. The principle of conditional probability is required.
Thus, the probability of the joint events A and B is:
P(A∩B) = P(A) x P(B|A).
Example 20:
The Credit Manager at SSB collects data on 100 of her customers. Of the 60 men, 40 have
credit cards (C). Of the 40 women, 30 have credit cards (C).Ten of the men with credit
cards have balances (B), whilst 15 of the women have balances (B). The Credit Manager
wants to determine the probability that a customer selected at random is:
i) A woman with credit card
ii) A man with a balance.
Solution.
(i) P(W∩C) = P(W) x P(C|W)
(ii) P(M∩B) = P(M) x P(B|M)
42
OR
Example 21:
The probability that a mining company will make profit at an annual production rate of
5000t/yr is 0.7 if the gold price is $660/oz. If the gold price goes below $660/oz the
probability will fall to 0.40. The current world politics indicates that there is a 50%
probability that the dollar will be strong and gold price will fall below $660/oz. If:
A: Gold price falls below $660/oz
B: The mine is profitable.
a) What is the probability that both A and B occur?
b) What is the probability that either A or B will occur?
Solution:
Example 22:
A coin is tossed twice. What is the probability that at least one head occurs?
Solution: The sample space for this experiment is: S = {HH, HT, TH, TT}
½ H =¼
H
½
½ T =¼
H =¼
T ½
½
½ ½ T =¼
A ={HH, HT, TH}
∴ P(A) =
In general if an experiment can result in any one of N different equally likely outcomes,
and if exactly n of these outcomes correspond to event A, then the probability of event A
is: P(A) = n\N
43
Example 23:
If a player picks 5 cards, find the probability of holding 2 aces and 3 jacks.
Solution:
Number of ways being dealt 2 aces from 4 is:
Number of ways of being dealt 3 jacks from 4 is:
For each combination of 2 aces there are the number of combinations of 3 jacks. Thus there
are n = (6).(4) = 24 hands with 2 aces and 3 jacks. The total number of 5 cards all of which
are equally likely, is:
Example 24:
The probability of a certain lecturer arriving to lectures on time is P(A) = 0.82. The
probability of closing on time is P(D) = 0.83. The probability that he arrives and departs on
time is P(A∩D) = 0.78 . Find the probability that he will depart on time if he arrives on time
P(D⏐A).
Solution
The notion of conditional probability provides the capacity of re-evaluating the idea of
probability information that is when it is known that another event has occurred. The
probability of P(A⏐B) is an “updating” of P(A) based on the knowledge that event B has
occurred.
44
CHAPTER 3
Learning Objectives
● Discuss random variables
● Determine probabilities from probability density functions.
● Determine probabilities from cumulative distribution functions and cumulative
distribution functions from probability density functions, and the reverse.
3.0 Random Variables and Distribution
3.1 Introduction
45
Experiments are conducted with results that are subject to chance. Suppose that to each point
of a sample space we assign a number, we then have a function defined on the sample space.
This function is called a random variable (or stochastic variable). In other words, in a
particular experiment a random variable X would be some function that assigns a real
number X (S) to each possible outcome .
Definition: Let be the sample space associated with some experiment . A random
variable is a function that assigns a real number to each sample point .
There are two basic types of random variables: the Discrete random variable and the
Continuous random variable.
3.2 Discrete Random Variable

A discrete random variable is a random variable whose possible values either constitute a
finite set or else can be listed in an infinite sequence in which there is a first element, a second
element, and so on (“countably” infinite).
3.2.1 Probability Distribution of Discrete Random Variable

For a random variable of the discrete type, it is necessary and sufficient to know the
following:
(a) the list of all possible values
(b) the corresponding probability of each of these values.
When such information is available we say that the distribution law or the probability
distribution or simply the distribution of the random variable is known.
Definition: A probability distribution of a discrete random variable is a sequence of the
values of , together with the probability assigned to each point

.
The probability distribution of a discrete random variable is more often called probability
function or a probability mass function and is frequently denoted by and
.
Representation of Probability Distribution of the Discrete Random Variable

The probability distribution of a discrete random variable can be represented by a table, a
graph or a function.
Tabular form
It is convenient to specify the probability distribution of a random variable by means of a
table having two rows: the upper row contains the possible values the random variable
46
assumes and the lower row contains the corresponding probabilities of the values. This is
represented as in the table 1 below:
Table 1: Probability distribution of
Graphical form
The probability distribution may also be given graphically. The graph represents chance and
not data.
Definition: A graph of against is called a probability graph.

To obtain a probability graph, vertical lines or bars are drawn above the possible values of
the random variable on the horizontal axis. The height of each line or bar (rectangle) equals
the probabilities of the corresponding values of the random variables.
Figure 1: A graph of a probability distribution
Formula:
Representing a discrete probability by means of a formula is the most useful method.
47
For example:
defines a probability distribution of a random variable . Not all functions of a discrete

random variable qualify to be called probability mass function. We shall now give a formal
definition of a probability mass function.
Definition: Probability mass function: Any function defined on all possible values of
the discrete random variable is called a probability mass function if
it satisfies the following properties:
where the summation is over all possible values of the random variable .
A random variable in one experiment takes only one of its possible values with a
corresponding probability which is non-negative. Thus, in an experiment, one and only one
of the numerous possible mutually exclusive events will happen. Consequently, the sum of
the probabilities of these events:
Example 25:
A fair coin is tossed three times. Let the random variable represent the number of heads
which come up.
(a). find the probability distribution corresponding to the random variable
(b). Construct a probability graph.
Solution:
(a). the sample space is
The probability of each outcome is , since all the outcomes are equally likely sample events.
With each sample point we can associate a number for the random variable , as shown in
the table below:
Sample point
Number of Head 3 2 2 2 1 1 1 0
48
The table above shows that the random variable can take the values . The next task
is to compute the probability distribution of . Thus,
Thus, the probability distribution of X is tabulated as:
0 1 2 3
(b). The diagram which follows graphically describes the above distribution.
Example 26:
A shipment of 8 similar microcomputers to a retail outlet contains 3 that are defective. If a
school makes random purchase of 2 of these computers, find the probability distribution for
the number of defectives
49
Solution:
Let be a random variable whose values x are the possible numbers of defective computers
purchased by the school. Then x can be any of the numbers 0, 1 and 2. Now
Thus, the probability distribution of is:
x 0 1 2
f(x)
As a check, we can see whether the three probabilities we found will sum up to one because
as usual, the total probability associated with a number of mutually exclusive, exhaustive
events must be one.
Note: A probability distribution is a display of all possible outcomes of an experiment along

with the probabilities of each outcome. In fact, it is a list of all possible outcomes of some
experiment and the probability associated with each outcome.
Example 27:
Verify that the following probability distribution functions are probability mass functions.
(i).
(ii).
Solution:
(i). For the probability distribution function to be a probability mass function
the following must satisfy:
for all and
50
(ii). The value of is determined by assuming that is a probability
function. Thus,
3.3 Continuous Random Variable
If is a continuous random variable, the probability that takes on any one particular
value is generally zero. Therefore, we cannot define a continuous random variable in the
same way as for a discrete random variable. In order to arrive at a probability distribution for
a continuous random variable we note that the probability that lies between two different
values is meaningful. Thus, a continuous random variable is the type whose spaces are not
composed of a countable number of points but takes on values in some interval or a union of
intervals of the real line.
Definition: If the set of all possible values of a random variable , takes on an uncountably
infinite number of values or values in some interval or a union of intervals of the real line, it
is called a continuous random variable if there exists a function , called probability density
function of such that
(i) (Non-negative)
(ii)
(iii)
3.3.1 Probability Distribution of Continuous Random Variable
Continuous random variable can be considered as one that takes on values in some interval
of the real line. The probability distribution for a continuous random variable is more often
called a probability density function (p.d.f) or simply density function and is denoted by
.
51
Definition: The probability density function, (p.d.f.) of a random variable of the
continuous type, with space R that is an interval or union of intervals, defined on the set of
real numbers is an integrable function satisfying the following conditions.
(i) (Non-negative)
(ii)
(iii) The probability of the event X ∈ A is

A typical probability function of the continuous type is as shown on the sketch below
0 a b X
fig. 2.1 An example of a Probability Density Function
In this figure, the total area under the curve must be equal to 1 and the value of the
is equal to the area of the shaded region.
Example 28:
Suppose that the error in the reaction temperature, in for a controlled laboratory
experiment is a continuous random variable X having the probability density function:
Verify (a). (b).
Solution:
(a).
52
(b).
Example 29:
(a). Find the constant C such that the function below is a probability density function:
(b). Compute
Solution:
(a).
But since
(b).
Example 30:
A machine produced copper wire, and occasionally there is a flaw at some point along
the wire. The length of wire (in meters) produced between successive flaws is a
continuous random variable X with p.d.f of the form
where c is a constant
Let U = (1 + x) and du = dx apply power rule for integral
53
c = 2
Example 31:
For each of the following functions, find the constant c so that f(x) is a p.d.f of a random
variable X.
(i) f(x) = 4xc, 0 ≤ x ≤ 1
(ii) , 0 ≤ x ≤ 4
(iii) , 0<x<1
Solution:
(i)
c+1 = 4
c = 3
(ii)
(iii)
54
Post-Test
1 Define the following terms;
i. discrete random variable
ii. continuous random variable
2. Let , zero elsewhere be the p.d.f. of X. Find;

i. Pr [1 or 2]
ii. Pr [1< X < 3]
iii. Pr [ ]
3. For each of the following functions, find the constant c so that satisfies the
conditions of being a p.d.f. of a random variable X.
(i) ,
(ii) ,
3.4 Cumulative Distribution Function

The cumulative distribution function (c.d.f.) or simply the distribution function is the most
universal characteristic of a random variable. It exists for all random variables whether they
are discrete or continuous.
Definition: Let be a random variable and any real number. The cumulative distribution
function of is a function defined as the probability that the random variable takes
a value less than or equal to such that
The function in the above definition may also be written as
3.4.1 Cumulative distribution function of Discrete Random Variables

The statement of the theorem below for a distribution function of a discrete random variable
follows trivially from the definition of cumulative distribution function.
Theorem (Distribution function of a discrete random variable): Let be a discrete random

variable with probability , then the cumulative distribution function is defined by
55
For the discrete case, the random variable takes on only a finite number of values
. In this case, the distribution function is given by
The figure below depicts the graph which is discontinuous at at the possible values
. At these points is continuous from the right but discontinuous
from the left. Because of the appearance of its graph, the cumulative distribution function for
a discrete random variable is also called a staircase function or a step function, having jump
discontinuities at all possible values with a step at the of height . The graph
increases only through these jumps at . Everywhere else in the interval
, the cumulative distribution function is constant.
Figure 2 Cumulative Distribution Function (Discrete Case)
Example 32:
A fair coin is tossed three times. Let represent the number of heads which come up
(a). find the cumulative distribution function and
(b). Sketch the graph.
Solution:
(a). To obtain the cumulative distribution function, we need the following steps:
Step 1:
Find the probability distribution of the random variable . The probability distribution of
this example has been found in an earlier example and produces the results here for
convenience.
56
0 1 2 3
Step 2:
Find the cumulative distribution function:
Hence the cumulative distribution function is
(b), The graph of is shown in the figure below:
57
Finding the probability distribution from the cumulative distribution function is a
straightforward situation. If is the cumulative distribution function of a discrete random
variable , we then find the points at which the cumulative distribution function jumps, and
the jump sizes. The probability function has masses exactly at those jump points, with the
probability masses being equal in magnitude to the respective jump sizes. It is for this reason
that it is called probability mass function.
Example 33:
Suppose you are given the cumulative distribution function below:
Find its probability distribution.
Solution:
It would be noted from the graph of the cumulative distribution function that the magnitudes
or the heights (that is, of the jumps (steps) at 0, 1, 2, 3 are respectively, hence
58
Note:
We can obtain this result without the graph by finding the difference in the adjacent values
of .
3.4.2 Cumulative distribution function of Continuous Random Variables

Again, the statement of the theorem below for a distribution function of a continuous random
variable follows trivially from the definition of cumulative distribution function.
Theorem (Distribution function of a continuous random variable): Let be a continuous

random variable with probability distribution function , Then the cumulative
distribution function is defined by
The typical graph of the cumulative distribution function of the continuous random variable
is as shown below.
Figure 2 Cumulative Distribution Function (Continuous Case)
The graph of is continuous. Its slope need not be everywhere continuous, but where this
is so, it is equal to the probability density function. Thus, if the derivative exists
and .
Example 34:
The probability density function of a continuous random variable is given by
59
Find the cumulative distribution function and sketch its graph.
Solution:
If , then
If , then
If , then
Figure 2 Cumulative Distribution Function (Continuous Case)
60
CHAPTER 4
Learning Objectives
• Understand the assumptions for each of the discrete and continuous probability
distributions presented.
• Select an appropriate discrete and continuous probability distribution to calculate
probabilities in specific applications.
• Calculate probabilities, determine means and variances for each of the discrete
and continuous probability distributions presented.
4.0 Special Probability Distributions
4.1 Introduction
In general, it is useful to present the probability distribution of a random variable by a model.
Probability calculations can then be made convenient by substituting appropriate values into
the algebraic model.
In this section, we shall briefly define certain special distributions that are widely used in
applications of probability and statistics. Distributions that would be considered include both
discrete and continuous univariate distributions.
4.2 Discrete Probability Distribution
4.2.1 Bernoulli Distribution:

A single trial of an experiment may result in one of the two mutually exclusive outcomes such
as defective and non-defective, dead or alive, yes or no, male or female, etc. Such a trial is
called and a sequence of these trials form a process, satisfying the
following conditions:
(i). Each trial results in one of the two mutually exclusive outcomes, success and failure.
(ii). The probability of a success, remains constant, from trial to trial. The probability
of failure is denoted by .
(iii). The trials are independent. That is, the outcome of any particular trial is not
affected by the outcome of any other trial.
61
Definition:
A random variable, is said to have a Bernoulli distribution if it assumes the values 0 and 1
for the two outcomes. The probability distribution for the success in the trial is defined by
and
where the mean and variance of the distribution are as follows:
An important distribution arising from counting the number of successes in a fixed

number of independent Bernoulli trials is the Binomial distribution.
Example 35: An urn contains 5 red and 15 green balls. Draw one ball at random from
the urn. Let X=1 if the ball drawn is red, and X=0 if a green ball is drawn. Obtain
(i) the p.d.f. of X,
(ii) mean of X and
(iii) variance of X.
Solution:
The p.d.f. of a Bernoulli distribution is ,
where and
Mean of
=
Variance of X=
4.2.2 The Binomial Distribution

The binomial distribution is a discrete probability distribution, where the experiment is
repeated n times under identical conditions and each of the n trials is independent of each
other which results in one of the two outcomes.
Thus, in the event of independent trials (often called Bernoulli trials) let be the probability
that an event will happen (success) and the probability that the event will fail in any
single trial. Such experiments are called Binomial experiments and the probability that the
event will happen exactly times in trials is given by the probability function:
62
where the random variable denotes the number of success in trials and
.
The shape of the distribution depends on the two parameters and .

(i) when and is small, the distribution will be skewed to the right.
(ii) when and is small, the distribution will be skewed to the left
(iii) when the distribution will be symmetric.
(iv) In all cases, as gets larger the distribution gets closer to being a symmetric, bell-
shaped distribution.
Properties
Mean
Variance
Standard Deviation σ =
Example 36:
If 20% of the bolts produced by a machine are bad. Determine the probability that out of 4
bolts chosen at random.
(i) one is defective
(ii) none is defective
(iii) at most 2 bolts will be defective.
Solution:
n = 4, p = 0.2, q = 0.8
P[X ≤ 2] = P[X = 0, 1, 2] = P[X = 0] + P[X = 1] + P[X = 2]

0.4096 + 0.4096 + 0.1536 = 0.9728
or = 1 – P[X > 2] = 1 – P(X = 3) – P(X = 4)
or 1 – P[X ≥ 3] =
= 1 – 0.0256 – 0.0016
= 0.9728
Example 37:
(a) Suppose that it is known that 30% of a certain population is immune to some
disease. If a random sample of 10 is selected from this population. What is the
probability that it will contain exactly 4 immune persons?
n = 10, p = 0.3, x = 4
63
= 0.2
(b) In a certain population 10% of the population is color-blind. If a random sample
of 25 people is drawn from this population (use table). Find the probability that
i. P(X ≥ 5) = 1 – P(X < 5) = 0.0980
ii. P(X ≤ 4) = 0.902 or 1 – P(X ≥ 5) = 1 – 0.0980 = 0.902
iii. P(6 ≤ X ≤ 10) = p(6) + p(7) + p(8) ----- + p(10) = 0.0333
= 0.0334 i.e P(X ≥ 6)
Example 38:
From the experiment “toss four coins and count the number of tails” what is the variance
of X?
n = 4, p = ½, q = ½
V(X) = npq = 4 x ½ x ½ = 1
Example 39: Roll a fair 6 – sided die 20 times and count the number of times that 6 shows
up. What is the standard development of your random variable?
n = 20, p = q =
V(X) = npq
σ =
Example 40:
The following data are the number of seeds germinating out of 10 on damp filter paper for 80 sets
of seeds. Fit a binomial distribution to these data.
x 0 1 2 3 4 5 6 7 8 9 10 Total
f 6 20 28 12 8 6 0 0 0 0 0 80
Solution:
Here n = 10, N = 80 and ∑fi = 80
Arithmetic mean =
np =
Hence the binomial distribution to be fitted is b(x;10,0.2175). These are approximately,
x 0 1 2 3 4 5 6 7 8 9 10 Total
64
f 6.89 19.14 23.94 17.74 8.63 2.88 0.67 0.1 0.01 0.00 0.00 80
4.2.3 Negative Binomial Distribution (Pascal’s Distribution)

Let us consider an experiment in which the properties are the same as those listed for a binomial
experiment with the exception that the trials will be repeated until a fixed number of successes
occur. Therefore, instead of finding the probability of x successes in n trials, where n is fixed, we
are now interested in the probability that the…kth success occurs on the xth trial. Experiments of
this kind are called ‘negative binomial experiments’. (Walpole and Myres, 1993). The number X
of trials to produce k successes in a negative binomial experiment is called a “negative binomial
random variable” and its probability distribution is called the “negative binomial distribution”.
Since its probabilities depend on the number of successes desired and the probability of success
on a given trial, we shall denote them by the symbol b *(x;k,p ). For the general formula b*(x; k,
p), consider the probability of a success on the trial preceded by k-1 successes and x-k failures in
some specified order. The probability for the specified order ending in success is p k-1qx-kp = pkqx-
k
. The total number of sample points in the experiment ending in success, after the occurrence of
k-1 successes and x-k failures in any order is equal to the number of partitions of x-1 trials into
two groups with k-1 successes corresponding to one group and x-k failures corresponding to the
other group. This number is given by the term , each mutually exclusive and occurring
with equal probability pkqx-k. We obtain the general formula by multiplying pkqx-k by .
In other words:
p = probability of success
q = (1 - p) = probability of failure
x = total number of trials on which the kth success occurs.
Areas of application of negative binomial distribution include many biological situations such
as death of insects, number of insect bites per fruit (e.g. mango).
Example 41:
Consider an exploration company that is determined to discover two new fields in a virgin basin
it is prospecting, and will drill as many holes as required to achieve its goal. We can investigate
the probability that it will require 2, 3, 4,…, n exploratory holes before two discoveries are made.
65
The same conditions that govern the binomial distribution may be assumed, except that the
number of trials is not fixed.
Solution:
x dry holes will be drilled before r discoveries are made.
If the regional success ratio is assumed to be 10% then the probability that a two-hole program
will meet the company's goal of two discoveries will be:
The probability that five holes will be required to achieve two successes is:
=
or
Example 42:
Find the probability that a person tossing three coins will get either all heads or all tails for the
second time in the fifth toss?
Solution:
66
The negative binomial distribution derives its name from the fact that each term in the expansion
of pk(1-p)x-k corresponds to the values of b*(x ; k,p ) for x = k, k+1, k+2, …..
4.2.4 Geometric Distribution

The geometric distribution is a special case of the negative binomial distribution for which k = 1.
This is the probability distribution for the number of trials required for a single success.
Thus:
B*(x;1,p) = pqx-1
The geometric distribution is denoted by:
g(x,p) = pqx-1
Example 43:
In a certain theodolite manufacturing process, it is known that on the average, 1 in every 100 is
defective. What is the probability that the fifth item inspected is the first defective theodolite
found?
Solution
Using the geometric distribution with x = 5; p = 0.01, we have:
g(5,0.01) = p1(1-p)5-1 = 0.011 x 0.994 = 0.0096
For geometric distribution:
Example 44:
Consider the case in Example 4.4, what is the probability that the first discovery will be the fifth
exploration hole?
Solution:
n = 5; p = 0.1
4.2.5 Poisson Distribution
Introduction:
Experiments yielding numerical values of a random variable (x), the number of successes
occuring during a given time interval or in a specified region, are often called poisson
experiments. The given time interval may be of any length, such as a minute, a day, a week,
a month or even a year. Hence, a poisson experiment might generate observations for the
67
random variable representing the number of telephone calls per hour received by an office,
the number of days school is closed due to snow during the winter, or the number of
postponed games due to rain during a basketball season. The specified region could be a line
segment, an area, a volume or perhaps a material. In this case, might represent the number
of field mice per acre, the number of bacteria in a given culture, or the number of typing errors
per page.
The Poisson process:

A poisson experiment is derived from the Poisson process and possesses the following
properties:
● The number of successes occuring in one time interval or specified region are
independent of those occurring in any other disjoint time interval or region of space.
● The probability of a single success occuring during a very short time interval or in a
small region is proportional to the length of the time interval or the size of the region and
does not depend on the number of successes occuring outside this time interval or region.
● The probability of more than one success occuring in such a short time interval or
falling in such a small region is negligible.
The probability distribution of the Poisson random variable is called the Poisson
distribution and is denoted by since its values depend only on , the average number
of successes occuring in the given time interval or specified region. This formula is given by
the definition below:
Definition: The probability distribution of the Poisson random variable , representing the
number of successes occuring in a given time interval or specified region is given by:
Where is the average number of successes occuring in the given time interval or specified
region and .
Theorem: The mean and variance of the Poisson distribution both have the value .
Example 45:
Suppose that an urn contains 100,000 marbles and 120 are red. If a random sample of
1000 is drawn what are the probabilities that 0, 1, 2, 3, and 4 respectively will be red.
n = 1000, = 0.0012, q = 0.9988
Solution:
68
Binomial =
For x=3,
i.e 166167000 x 1.728- 09 x 0.30206
Using the Poisson method,
λ = np = 1000 x 0.0012 = 1.2
e- 1.2 = 0.3012
P(X > 5) = 1 – p(X ≤ 4)

= 1 – 0.9985 = 0.0015
Example 46:
Let X have a Poisson distribution with a mean of λ = 5. Find
i. P(X<6)
ii. P(X>5)
iii. P(X=6)
iv. P(X>4)
Solution:
i.
ii. P(X > 5) = 1 – P(X ≤ 5) = 1 – 0.616 = 0.384
iii. P(X = 6) = P(X ≤ 6) – P(X ≤ 5) = 0.762 – 0.616 = 0.146
iv. P(X ≥ 4) = 1 – P(X < 4)
Example 47:
A hospital administrator, who has been studying daily emergency admissions over a
period of several years, has come to the conclusion that they are distributed according
to the Poisson law. Hospital records reveal that emergency admissions have averaged
three per day during this period. If the administrator is correct in assuming a Poisson
distribution. Find the probability that
i. exactly two emergency admissions will occur on a given day.
ii. No emergency admissions will occur on a particular day.
iii. Either 3 or 4 emergency cases will be admitted on a particular day.
Solution:
i. λ = 3
69
=
ii.
iii.
=
= 0.05 (7.875)
= 0.394
Example 48:
Fit a Poisson distribution to the following data which gives the number of yeast cells per square
for 400 squares.
No. of cells per square (x) 0 1 2 3 4 5 6 7 8 9 10 Total
No. of squares (f) 103 143 98 42 8 4 2 0 0 0 0 400
Solution:
The expected theoretical frequency for r successes is Ne -mmr/r!, but m is not given in this example.
The mean of the Poisson distribution is m. Hence
thus,
, therefore,
No. of cells per square (x) 0 1 2 3 4 5 6 7 8 9 10 Total
No. of squares (f) 107 141 93 41 14 4 0 0 0 0 0 400
4.2.5 Relationship between the Binomial and Poisson distributions

In the binomial distribution, if n is large while the probability p of occurrence of an event is close
to zero, so that q = 1 – p is close to one, the event is called a ‘rare event’. In practice, we shall
consider an event as rare if the number of trials is at least 50 (n > 50) while np is less than 5. For
such cases the Poisson distribution very closely approximates the binomial distribution
70
. with λ = np (Spiegel, 1980). Thus the Poisson distribution is a limiting case of the
Binomial distribution when n→∞, p→0 and np (n = μ) remain constant (Davis, 1986).
Example 49:
In a manufacturing process in which glass is being produced, defects or bubbles occur,
occasionally rendering the pieces undesirable for marketing. If it is known that on the average 1
in every 1000 of these items produced have one or more bubbles. What is the probability that a
random sample of 8000 will yield fewer than 7 items possessing bubbles?
Solution:
This is essentially a binomial experiment with n = 8000 and p = 0.001. Since p is very close to
zero and n is quite large, we shall approximate with the Poisson distribution, using λ =
(8000)(0.001) = 8. Hence, if X represents the number of bubbles, we have:
Post-Test
1. Suppose that 24% of a certain population have blood group B, for a sample of size
20 drawn from this population, find the probability that
a) Exactly 3 persons with blood group B will be found.
b) Three or more persons ≡ the characteristics of interest will be found
c) Fewer than three will be found.
d) Exactly five will be found.
2. In a large population, 16% of the members are left-handed. In a random sample

of size 10, find
a) The probability that exactly 2 will be left-handed p(X = 2)
b) P(X ≥ 2)
c) P(X < 2)
d) P(1 ≤ X ≤ 4)
3. Suppose mortality rate of a certain disease is 0.1, suppose 10 people in a

community contract the disease, what is the probability that
a) None will survive
71
b) 50% will be
c) At least 3 will die
d) Exactly 3 will die
4. Suppose it is known that the probability of recovery from a certain disease is 0.4.
If 15 people are stricken with the disease what is the probability that
a) or more will recover?
b) 4 or more will recover?
c) at least 5 will recover?
d) fewer than three recover?
5. In the study of a certain aquatic organism, a large number of samples were taken
from a pond, and the number of organisms in each sample was counted. The
average number of organisms per sample was found to be two. Assuming the
number of organisms to be Poisson distributed. Find the probability that:
a) The next sample taken will contain one or more organisms.
b) The next sample taken will contain exactly three organisms.
c) The next sample taken will contain fewer than five organisms.
6. It has been observed that the number of particles emitted by a radioactive
substance, which reach a given portion of space during time t, follows closely the
Poisson distribution with parameter =100. Calculate the probability that:
a) No particles reach the portion of space under consideration during time t;
b) Exactly 120 particles do so;
c) At least 50 particles do so.
7. The phone calls arriving at a given telephone exchange within one minute follow
the Poisson distribution with parameter value equal to ten. What is the
probability that in a given minute:
a) No calls arrive?
b) Exactly 10 calls arrive?
c) At least 10 calls arrive
4.3 Continuous Probability Distribution
4.3.1 The Normal Distribution:

The graph of the normal distribution which is a bell-shaped smooth curve approximately
describes many phenomena that occur in nature, industry and research. In addition, errors in
scientific measurements are extremely well approximated by a normal distribution. Thus, the
normal distribution is one of the most widely used probability distributions for modelling
random experiments. It provides a good model for continuous random variables involving
measurements such as time, heights/weights of persons, marks scored in an examination,
amount of rainfall, growth rate and many other scientific measurements.
Definition:
The probability density function for the normal random variable X which is simply called
normal distribution is defined by:
72
Where and the mean and variance of the measurements , are
and .
If the random variable is modelled by the normal distribution with mean, and variance,
then it is simply denoted as .
This fact considerably simplifies the calculations of probabilities concerning normally
distributed variables, as seen in the following illustration:
Suppose that X is , let c1<c2, and since , then
Note that .
The normal distribution possesses the following properties.
Properties
Mean
Variance
Standard Deviation
Example 50:
1. If Z is , find;
i.
ii.
2. If X is , find
i.
ii.
3. If X is normally distributed with a mean of 6 and a variance 25, find
Solution:
1. (i) =
(ii)
73
(2)
4.3.2 Normal approximation to the Binomial

Whenever the number of trials in a binomial experiment is small it is easy to find the probabilities
associated with the experiment quickly as calculations become laborious. The normal distribution
is often a good approximation to a discrete distribution when the latter takes on a symmetrical
bell shape.
If X is a binomial random variable with mean μ = np and variance σ 2. = npq, then the limiting
form of the distribution of:
is the standard normal distribution n(Z;0,1).
It turns out that the normal distribution with μ = np and σ2 = npq = np(1- p) not only provides a
very accurate approximation to the binomial distribution when n is large and p is not extremely
close to 0 or 1, but also provides a fairly good approximation even when n is small and p is
reasonably close to ½.
In fact, is asymptotically normal.
Example 51:
Consider a binomial distribution with p.d.f. of b(x; 15, 0.4). Calculate the probability that x
assumes values from 7 to 9 inclusive.
Solution:
The exact probability is given by:
74
= 0.9662-0.6098 (from tables)
= 0.3564
Using normal approximation, corresponding values under a histogram:
μ = np = 15*0.4 = 6
σ2 = npq = 15*0.4*0.6 = 3.6
∴ σ = √3.6 = 1.897
(The 0.5 used here is called ‘Continuity Correction Factor)
Now,
Note:
The 0.5 that was added to or subtracted from the 9 and 7 respectively is called “continuity
correction factor”. It corrects for the fact that we are using a continuous distribution to
approximate a discrete distribution. Generally, the degree of accuracy, which depends on how
well the curve fits the histogram, will increase as n increases. If both np and nq are greater than
or equal to 5, the approximation will be good.
Fig.4.4 shows the areas of the normal distribution representing probabilities within some defined
standard deviations.
Fig. 4.4 Areas enclosed by successive standard deviation of the standard normal distribution.
4.3.3 Uniform Distribution
75
Suppose that a continuous random variable X can assume values in a bounded interval
only, say the open interval (a, b), and suppose the p.d.f. of X is given as
= 0, elsewhere.
This distribution is referred to as the Uniform or Rectangular Distribution on the interval
(a, b) and is simply written as , where ‘a’ and ‘b’ are the parameters of the
distribution. It provides a probability model for selecting a point at random from the
interval (a, b).
Properties
Mean
Variance
Standard Deviation σ =
Example 52:
The hardness of a certain alloy (measured on Rockwell scale) is a random variable X.
Assume that .
a) Find
b) Find E(X)
c) Find Var (X)
Solution:
a) =
b) E(X) = =
Or
E(X) =
76
c)
Or
4.3.4 The Gamma Distribution:

The Gamma distribution arises in the study of waiting times, for example, in the lifetime of
devices. It is also useful in modeling many nonnegative continuous variables. The gamma
distribution requires the knowledge of the gamma function.
Definition: The gamma function is defined by
for and for all x > 0

is read as “gamma function of
Integrating by parts with and , we obtain
Which yields the recursion formula

.
Repeated application of the recursion formula gives
and so forth. Note that when , where n is a positive integer,

=
General Gamma Distribution:

The continuous random variable has a gamma distribution, with parameters and
if its density function is given by:
where and .
Property of Gamma distribution

a) E(X) =
b) Var(X) =
77
4.3.5 Exponential Distribution
An exponential distribution is a special gamma distribution for which . It has many
applications in the field of statistics, particularly in the areas of reliability theory and waiting
times or queueing problems. It is a continuous distribution that can be related to the Poisson
distribution in the discrete sense.
Definition: A continuous random variable has an exponential distribution with parameter
if its density function is given by:
where .
Properties
Mean
Variance
Standard Deviation
So if is the mean of changes in the unit interval, then is the mean waiting time for the
first change.
Example 53:
Let the p.d.f. of X be .

i. What is the mean and variance of X?
ii. Calculate
iii. Calculate
iv. Calculate
Solution:
i. and
ii.
iii.
78
iv.
NB. Discuss other discrete and continuous distributions with students by stating their
properties respectively.
Post-Test
1. Let X have an exponential distribution with a mean of . Compute
i.
ii.
iii.
iv.
2. Telephone calls enter a college switchboard according to a Poisson process on the

average of two every 3 minutes. Let X denote the waiting time until the first
call that arrives after 10 A.M.
i. What is the p.d.f. of X?
ii. Find
3. Customers arrive randomly at a bank teller’s window. Given that one
customer arrived during a particular 10-minute period, let X equal the time
within the 10 minutes that the customer arrived. If X is , find
i. The p.d.f. of X.
ii.
iii.
iv.
v.
4. Explain the relationship that exists between the Poisson and the Exponential
distributions.
5. If X is , find and .
6. If Z is , find values of c such that
i.
ii.
iii.
7. Let X be , so that and . Find and
.
8. Show that the random variable is distributed .
9. Suppose that . Find the following probabilities:
i.
ii.
79
iii.
iv.
10. Find the value of ‘a’ and ‘b’ such that
i.
ii.
CHAPTER 5
Learning Objectives
● understand and compute the expected values of random variables using
engineering data.
5.0 Mathematical Expectation
5.1 Introduction
A very important concept in probability and statistics is that of mathematical

expectation, expected value or briefly expectation of a random variable. The expectation
of X is very often called the mean of X and is denoted by µx or simply µ when a particular
random variable is understood. This expected value of x gives a simple value, which
acts as a representative, or average of the value of x and for this reason it is often called
a measure of central tendency.
Consider that the random variable x has the values x1, x2, xn and f(x1), f(x2)… f(xn)
as the probabilities of occurrence. The mean or expected value of x is:
80
and
It is said that the expectation E(X) exist for a continuous distribution if and only if
the integral is absolutely convergent, that is if and only if
Example 54 (Spiegel, 1980)
Suppose that a game is played with a single die assumed fair. In this game a player wins
$20 if a 2 turns up, $40 if a 4 turns up; loses $30 if a 6 turns up; while he neither wins nor
loses if any other face turns up. Find the expected sum of money won.
Solutions:
Table 5.1
I 1 2 3 4 5 6
Xi 0 20 0 40 0 -30
f(xi) 1/6 1/6 1/6 1/6 1/6 1/6
The probability function of x is displayed Table 7.1 Thus the expected value is:
It follows that the player can expect to win £5. In a fair game, therefore he should be
expected to pay $5 in order to play the game.
Example 55:
Suppose that the pdf of a random variable X with a continuous contribution is
f(x)=
Then
E(X)=
5.2 The Expectation of a Function
Let us consider a new random variable g(x), which depends on X, that is, each value of
g(x) is determined by knowing the values of x. For instance, g(x) might be x 2 or 3x - 1,
so that whenever X assures the value 2, g(X) assures the value g(2). In particular, if X is
81
a discrete random variable with probability distribution f(x), x=-1, 0, 1, 2 and g(X), =X2
then
P[g(X)= 0] = P(x = 0) = f(0)

P[g(X)= 1] = p(x=-1)+p(x=1) = f(-1)+f(1)
P[g(X)= 4] = p(x=2) = f(2)
So that the probability distribution of g(x) may be written
g(x) 0 1 4
P[g(X)=g(x)] f(0) f(-1)+f(1) f(2)
By definition of an expected value of a random variable, we obtain

μg(x) = E[g(x)]
g(x) = 0f(0)+1[f(-1)+f(1)]+4f(2)
=(-1)2f(-1)+(0)2f(0)+(1)2f(1)+(2)2f(2)
In the general form, for a random variable X with a probability distribution f(x), the
expectation of the random variable g(X) is:
If X is discrete, and
If X is continuous.
Example 56:
Suppose that the pdf of a random variable X with a continuous distribution is
Determine the expectations of Y when Y = X1/2
Solution:
Example 57:
Suppose that the number of cars, X that pass through a car wash between 4pm and 5pm
on any sunny Friday has the following probability distribution:
X 4 5 6 7 8 9
f(x) 1/12 1/12 1/4 1/4 1/6 1/6
82
Let g(x) = 2X-1 represents the amount of money in the dollars paid to the
attendants expected earnings for the particular time period. How much does the
attendant expect to earn on any sunny Friday?
Solution:
The attendant can expect to receive
Example 58:
Let X be a continuous random variable with density functions
f
Find the expected value of g(X) = 4x + 3
Solution:
E(4x + 3)=
5.3 Properties of Expectations
Basic Theorems
Suppose that X is a random variable for which the expectation E(X) exists. We shall
present several results pertaining to the basic properties of expectations.
Theorem 1
If Y = ax + b where a and b are constants, then E(Y)=aE(x)+b.
Proof: First we shall assume, for convenience, that X has a continuous distribution for
which the pdf is f(x). Then,
83
E(Y)=E(ax + b)=
=a
=aE(x)+b
A similar proof can be given for a discrete distribution or for a more general type of
distributions
Example 59:
Suppose that E(x) = 5 then
(i) E(3x-5)=3E(X)-5=3(5)-5=10
(ii) E
Theorem 2
If there exists a constant a such that P(x ) = 1, then E(x) If there exists a constant b
such that P , then
Theorem 3
If XI Xn are n random variables such that each expectations E(Xi) exists (i = 1, 2, ….
.n), then
E(Xi +…………….+Xn) = E(Xi) + …………….+ E(Xn)
This theorem means that the expectations of the sum of several random variables must
be equal to the sum of their individual expectations, regardless of whether or not the
random variables are independent.
Follow Theorems 1 and 3:
E(a1x1+…………..+ anxn+b) = a1E(x1)+……………+ anE(Xn) + b
Theorem 4
E(
It must be emphasised that the expectations of the product of a group of random

variables are NOT ALWAYS equal to the product of their individual expectations. If the
random variables are independent, then they are equal.
Example 60:
84
Suppose that X1, X2, and X3 are independent random variables such E(X and E
for i = 1,2,3, Determine E
Solution:
Since X1 X2 and X3 are independent, it follows that the two random variables X12 and
(X2-4x3)2 are also independent. Therefore
=E
=E Since
=1-8E(X2)E(X3)+16
=1-0+16=17
5.4 Expectation of Joint Probability Function
Let X and Y be random variables with joint probability distribution f(x,y). The mean or
expected value of the random variable g(X,Y) is
5.4.1 Marginal Distribution
The marginal distribution of X alone and of Y alone are given by:
5.4.2 Conditional Distribution
Let X and Y be two random variables, discrete or continuous. The ‘Conditional

distribution’ of the random variable X, given that Y = y is given by:
85
For the joint density function:
P(a<X<b|Y=y) =
Example 61:
The joint density for the random variable (x,y) is given by:
Find the marginal densities g(x) and h(y) and the conditional density f(y|x).
Solution:
for 0 < x < 1
Now,
Example 62:
Let X and Y be the random variables with joint probability distribution given by the table
below. Find the expected value of g(X,Y) = XY
86
Solution:
(a)
= (0)(0)f(0,0) + (0)(1)f(0,1) + (0)(2)f(0,2)
+(1)(0)f(1,0) + (1)(1)f(1,1) + (1)(2)f(1,2)
+(2)(0)f(2,0) + (2)(1)f(2,1) + (2)(2)f(2,2)
= f(1,1) =
Example 63:
Find for the density function
Solution:
87
=
5.5 Variance and covariance
The expected value of a random variable X is of special importance in statistics because

it describes where the probability distribution is centred. By itself, however, the mean
does not give adequate description of the shape of the distribution. We need to
characterise the variability in the distribution.
5.5.1. Variance
Suppose that X is a random variable with mean, . The variance of X denoted

by var(X) is:
Var (X)=E also Var (X)=E(X2)- μ2
It must be kept in mind that this may or may not be finite. If the expectation is not finite,
it is said that Var(X) does not exist. However, if the possible values of X are bounded,
the Var(X) must exist.
For a random variable with probability distribution f(x) and mean the variance of X
is
if X is discrete, and
If X is continuous
Example 64
Let the random variable X represent the number of automobiles that are used for official
business purposes on any given workday. The probability distributions for company A
is given by:
X 1 2 3
f(x) 0.3 0.4 0.3
And for company B by:
88
X 0 1 2 3 4
f(x) 0.2 0.1 0.3 0.3 0.1
Show that the variances of the probability distribution for company B is greater than that
of company A.
Solution:
For company A
=(1)
and then
For company B
.:
Properties of Variance
(1) For any constants a and b:
Var(ax + b)=a2 Var(x)
(2) For any random variable X:
Var (x)=
5.5.2 Covariance
89
When we consider the joint distribution of two random variables, the mean, the median
and the variance provide useful information about their marginal distribution.
However, these values do not provide any information about the relationship between
the two variables or about their tendency to vary together rather than independently.
The quantity covariance enables us to measure the association between two random
variables, to determine the variance of the sum of any number of dependent random
variables and to predict the value of one random variable by using the observed value
of some other related variable.
Let X and Y be random variables having a specified joint distribution and Let E(x)=μ x,
E(Y) = μy, Var(X)= σ2x and Var (Y)= σ2y. The covariance of X and Y, (denoted by COV
(X,Y), is defined as:
Cov(X,Y) = σxy= E[(X-μx)(Y-μy)]
=∑∑(X-μx)(Y -μy)]f(x,y) if X and Y are discrete, and
It can be shown (DeGroot, 1989) that if σx2 < ∞ and σ2y<∞, then the expectation above will
exist and cov(x,y) can be finite. However, the value of cov(x,y) can be positive, negative
or zero. When X and Y are independent covariance is zero. The converse however, is
not generally true. Two variables may have zero covariance and still not be statistically
independent. To dependent variables can be unrelated.
Some properties of covariance
1. For any random variables X and Y such
That σ2y<∞ and σ2y<∞,
Cov(X Y) = E(XY) - E(X)E(Y).
Example 65:
Two variables X and Y have a joint distribution as:
Find the covariance of X and Y
Solution:
But
90
Hence we first must compute the marginal density functions But for a joint distribution
f(x,y) marginal distributions of X and Y alone are given by:
Thus:
It could have been solved from
It also could have been solved from
91
CHAPTER 6
Learning Objectives
● understand point and interval estimates and how it can be applied to engineering
data.
6.0 Estimation
6.1 Introduction
The basic reasons for the need to estimate population parameters from sample
information is that it is ordinarily too expensive or simply infeasible to enumerate
92
complete populations to obtain the required information. The cost of complete censuses
may be prohibitive in finite populations while complete enumerations are impossible in
the case of infinite populations. Hence, estimation procedures are useful in providing
the means of obtaining estimates of population parameters with desired degree of
precision. We now consider estimation, the first of the two general areas of statistical
inference. The second general area is hypothesis testing which will be examined later.
The subject of estimation is concerned with the methods by which population
characteristics are measured from sample information. The objectives are to present:
i. properties for judging how well a given sample statistic estimates the
parent population parameter.
ii. several methods for estimating these parameters.
There are basically two types of estimation: point estimation and interval estimation. In
point estimation, a single sample statistic, such as , or is calculated from the sample
to provide a best estimate of the true value of the corresponding population parameter
such as or . Such a statistic is termed a point estimator.
The function or rule that is used to estimate the value of a parameter is called an estimator.
An estimate is a particular value calculated from a particular sample of observations.
On the other hand, an interval estimate consists of two numerical values defining an
interval which, with varying degrees of confidence, we feel includes the parameter being
estimated.
6.2 Properties of a Point Estimator

Unbiasedness: If the expected value or mean of all possible values of a statistic over all
possible samples is equal to the population parameter being estimated, the sample
statistic is said to be unbiased. That is, if the expected value of an estimator is equal to
the corresponding population parameter, the estimator is unbiased.
Example 66:
The sample mean is an unbiased estimator of the population mean.
Efficiency: The most efficient estimator among a group of unbiased estimators is the one
with the smallest variance. This concept refers to the sampling variability of an
estimator.
Example 67:
Let , and be unbiased estimators for . Let be a biased estimator among this
group of estimators. Assume that Which of these

estimators is most efficient?
93
Consistency: An estimator is consistent if as the sample size increases, the probability
increases that the estimator will approach the true value of the population parameter.
Alternatively, an estimator is consistent if it satisfies the following conditions:
i. as
ii. becomes unbiased as
6.3 Interval Estimation

For most practical purposes, it would not suffice to have merely a single value estimate
of a population parameter. Any single point estimate will be either right or wrong.
Therefore, instead of obtaining only a single estimate of a population parameter, it
would certainly seem to extremely useful and perhaps necessary to obtain two
estimators, say and , and say with some confidence that the interval between and
includes the true mean .
Thus, an interval estimate of a population parameter is a statement of two values
between which it is estimated that the parameter lies.
We shall be discussing the construction of confidence intervals as a means of interval
estimation.
The confidence we have that a population parameter, , will fall within some confidence
interval will equal , where is the probability that the interval does not contain
(i.e. the probability , is an allowance for error).
To construct a 95% confidence interval . That is, the probability is 0.05 that the
value will not lie within the interval.
Note that
The larger the confidence interval, the smaller the probability of error for the interval
estimator.
Confidence Interval For

A confidence interval is constructed on the basis of sample information. It also depends
on the size of . Assume the population variance is known and the population is
normal, then the percent C.I for is given by
Simply written as
94
where is the Z value representing an area to the right and left tails of the standard
normal probability distribution.
Example 68:
The yield of a chemical process is being studied. From previous experience yield is
known to be normally distributed and . The past five days of plant operation have
resulted in the following percent yields: 91.6, 88.75, 90.8, 89.95, and 91.3. Find a 95% two-
sided confidence interval on the true mean yield.
Solution:
n=5, , ,
Example 69:
A manufacturer produces piston rings for an automobile engine. It is known that ring
diameter is normally distributed with millimeters. A random sample of 15
rings has a mean diameter of millimeters.
(a) Construct a 99% two-sided confidence interval on the mean piston ring diameter.
(b) Construct a 95% confidence interval on the mean piston ring diameter.
Solution:
n=15, , , ,
(a)
(b)
Example 70:
ASTM Standard E23 defines standard test methods for notched bar impact testing of
metallic materials. The Charpy V-notch (CVN) technique measures impact energy and
is often used to determine whether or not a material experiences a ductile-to-brittle
transition with decreasing temperature. Ten measurements of impact energy (J) on
95
specimens of A238 steel cut at 60ºC are as follows: 64.1, 64.7, 64.5, 64.6, 64.5, 64.3, 64.6,
64.8, 64.2, and 64.3. Assume that impact energy is normally distributed with . We
want to find a 95% CI for , the mean impact energy. The resulting 95% CI is
That is, based on the sample data, a range of highly plausible values for mean impact
energy for A238 steel at 60°C is .
Exercises:
1. A confidence interval estimate is desired for the gain in a circuit on a
semiconductor device. Assume that gain is normally distributed with standard
deviation .
(a) Find a 95% CI for when n =10 and .
(b) Find a 95% CI for when n = 25 and .
(c) Find a 99% CI for when n = 10 and .
(d) Find a 99% CI for when n = 25 and .
2. A civil engineer is analyzing the compressive strength of concrete. Compressive
strength is normally distributed with . A random sample of 12 specimens

has a mean compressive strength of .
(a) Construct a 95% two-sided confidence interval on mean compressive strength.
(b) Construct a 99% two-sided confidence interval on mean compressive strength.
Compare the width of this confidence interval with the width of the one found in part
(a).

In practice, the standard deviation of a population , is not likely to be known. When
is unknown and n is 30 or more, we proceed as before and estimate with the sample
standard deviation s. the resulting large sample confidence interval for becomes
Example 71:
1. A sample of 40 ten-year-old girls gave a mean weight of 71.5 and standard
deviation of 12 pounds respectively. Assuming normality, find the
i. 90% confidence interval for .
96
ii. 95% confidence interval for
iii. 99% confidence interval for .
2. A hospital administrator took a sample of 45 overdue accounts from which he

computed a mean of $250 and a standard deviation of $75. Assuming that the
amounts of all overdue accounts are normally distributed. Find the
i. 90% confidence interval for .
ii. 95% confidence interval for .
iii. The 99% confidence interval for .

When the is not known and the sample size is small, the procedure for interval
estimation of population mean is based on a probability distribution known as the
student t-distribution. When the population variance is unknown, and the sample size is
small, the correct distribution for constructing a confidence interval for is the t-
distribution. Here, an estimate s must be calculated from the sample to substitute for the
unknown population standard deviation. The t-distribution is used such that
, where
The t-distribution is based on the assumption that the population is normal. A
CI for the population mean, with the population normal and unknown is given by
where
Notice that a requirement for the valid use of the t-distribution is that the sample must be
drawn from a normal distribution.
Example 72:
A sample of 25 ten-year-old boys yielded a mean weight and standard deviation of 73
and 10 pounds respectively. Assuming a normally distributed population, find 90, 95
and 99 percent confidence intervals for the mean of the population from which the
sample came.
Solution:
, ,
97
Summary:
Confidence Interval for
Population Mean Confidence Interval
Sample size
Large
assumed known
unknown (estimated by s)
Small
assumed known
unknown (estimated by s)
6.3 Confidence Interval For A Population Proportion

It is often necessary to construct confidence intervals on a population proportion. For
example, suppose that a random sample of size n has been taken from a large (possibly
infinite) population and that X ( n) observations in this sample belong to a class of
interest. Then is a point estimator of the proportion of the population p that
belongs to this class. Note that n and p are the parameters of a binomial distribution.
Furthermore, we know that the sampling distribution of is approximately normal
with mean p and variance if p is not too close to either 0 or 1 and if n is relatively
large. Typically, to apply this approximation we require that and be greater
than or equal to 5. We will make use of the normal approximation in this regard.
Definition: If n is large, the distribution of
is approximately standard normal.

The CI for p is then given by
Note that the quantity in the equation above is called the standard error of
the point estimator . Unfortunately, the upper and lower limits of the confidence
interval obtained from this equation contain the unknown parameter p. However, a
satisfactory solution is to replace p by in the standard error, which results in
This procedure depends on the adequacy of the normal approximation to the binomial.
To be reasonably conservative, this requires that and be greater than or equal
98
to 5. In situations where this approximation is inappropriate, particularly in cases where
n is small, other methods must be used.
Example 73:
1. A manufacturer of electronic calculators is interested in estimating the fraction of
defective units produced. A random sample of 800 calculators contains 10
defectives. Compute a 99% confidence interval on the fraction defective.
Solution:
n=800, x=10, ,
=
2. Of 1000 randomly selected cases of lung cancer, 823 resulted in death within 10
years. Construct a 95% confidence interval on the death rate from lung cancer.
Solution:
n=1000, x=823, ,
Exercise
i. A random sample of 50 suspension helmets used by motorcycle riders and
automobile race-car drivers was subjected to an impact test, and on 18 of these helmets
some damage was observed. Find a 95% confidence interval on the true proportion of
helmets of this type that would show damage from this test.
ii. In a random sample of 85 automobile engine crankshaft bearings, 10 have a

surface finish that is rougher than the specifications allow. Construct the 95% confidence
interval for the proportion of bearings in the population that exceeds the roughness
specification.
99
iii. A survey was conducted to study the dental health practices and attitudes of a
certain urban adult population. Of 300 adults interviewed, 123 said that they regularly
had a dental checkup twice a year. Obtain a 95% CI for p, based on these data.
Confidence Interval for the Difference Between Two Population Means (Variances
Known)
There are instances where we are interested in estimating the difference between two
population means. Here, from each of the populations a sample is drawn and from the
data of each, the sample means and respectively, are computed. The estimator
yields an unbiased estimate of , the difference between the population means.
The quantity
has a distribution.
The confidence interval for is given by
for large sample sizes n1 and n2 respectively.
Example 74:
i. Tensile strength tests were performed on two different grades of aluminum
spars used in manufacturing the wing of a commercial transport aircraft. From past
experience with the spar manufacturing process and the testing procedure, the
standard deviations of tensile strengths are assumed to be known. The data obtained
are as follows: , ,and . If and denote
the true mean tensile strengths for the two grades of spars, find a 90% confidence
interval on the difference in mean strength .
ii. Two machines are used for filling plastic bottles with a net volume of 16.0
ounces. The fill volume can be assumed normal, with standard deviation
and ounces. A member of the quality engineering staff suspects that both
machines fill to the same mean net volume, whether or not this volume is 16.0 ounces.
A random sample of 10 bottles is taken from the output of each machine.
Machine 1 Machine 2
16.03 16.01 16.02 16.03
16.04 15.96 15.97 16.04
16.05 15.98 15.96 16.02
100
16.05 16.02 16.01 16.01
16.02 15.99 15.99 16.00
Find a 95% confidence interval for the difference in means.
iii. Two different formulations of an oxygenated motor fuel are being tested to study
their road octane numbers. The variance of road octane number for formulation 1 is
, and for formulation 2 it is . Two random samples of size and
are tested, and the mean road octane numbers observed are and
. Assume normality. Construct a 95% confidence interval on the difference in
mean road octane number.
Confidence Interval For The Difference Between Two Population Proportions

The magnitude of the difference between two population proportions is often of interest.
An unbiased point estimator of the difference in two populations is provided by the
difference in the sample proportions, . When n1 and n2 are large and the
population proportions are not too close to 0 or 1, the central limit theorem applies
where is a standard normal random variable and thus, normal distribution theory may
be employed to obtain confidence intervals.
A confidence interval for is given by
Example 75:
i. Two different types of injection-molding machines are used to form plastic
parts. A part is considered defective if it has excessive shrinkage or is discolored.
Two random samples, each of size 300, are selected, and 15 defective parts are found
in the sample from machine 1 while 8 defective parts are found in the sample from
machine 2. Construct a 95% confidence interval on the difference in the two fractions
defective.
ii. Two hundred patients suffering from a certain disease were randomly divided
into two equal groups. Of the first group, who received the standard treatment, 78
recovered within three days. Out of the other 100, who were treated by a new
method, 90 recovered within three days. The physician wished to estimate the true
difference in the proportions who would recover within three days. Find 95% CI for
.
101
iii. In a study designed to assess the side effects of two drugs, 50 animals were
given Drug A, and 50 animals were given Drug B. Of the 50 receiving Drug A, 11
showed undesirable side effects, while 8 of those receiving Drug B reacted similarly.
Find the 90 and 95 percent confidence intervals for .
CHAPTER 7
Learning Objectives
• Structure engineering decision-making problems as hypothesis tests.
• Test hypotheses on the mean of a normal distribution using either a Z-test or a t-
test procedure.
• Test hypotheses on the variance or standard deviation of a normal distribution.
• Test hypotheses on a population proportion.
7.0 Tests of Hypotheses and Significance
7.1 Introduction
We now discuss the subject of hypothesis testing, which as earlier noted is one of the
two basic classes of statistical inference. Testing of hypotheses involves using statistical
inference to test the validity of postulated values for population parameters. If the
hypothesis specifies the distribution completely it is called simple, otherwise it is called
composite. For example, a demographer interested in the mean age of residents in a
certain local government area might pose a simple hypothesis such as or he might
specify a composite hypothesis as or .
A statistical test is usually structured in terms of two mutually exclusive hypotheses
referred to as the null hypothesis and the alternative hypothesis denoted by and
respectively.
Two types of error occur in hypothesis testing; these are type I error and type II error.
Type I error occurs if is rejected when it is true. The probability of a type I error is the
conditional probability, P (reject is true) and is denoted by .

Hence,
= P (reject is true) and
= P (accept is true)
Type II error if is accepted when it is false. Its probability is denoted by the symbol
, where
102
= P (accept is false) and
= P (reject is false) called power of the test
Types I and II error can be explained as follows:
is true is false
Accept
(correct decision) (Type II errors)
Reject
(Type II errors) (correct decision)
Standard format of hypothesis testing: this format involves five steps.

Step 1: State the null and alternative hypotheses.
Step 2: Determine the suitable test statistics. This involves choosing the appropriate
random variable to use in deciding to accept or reject the null hypothesis.
Unknown Parameter Appropriate Test Statistic

‘ ’
known, population normal
‘ ’
unknown, population normal
if n is ‘large’ usually
‘ ’
unknown, n small, population ,with (n-1) df
normal
‘ ’
population normal, n large
Step 3: Determine the critical region using the cumulative distribution table for the test
statistic. The set of values that lead to the rejection of the null hypothesis is called the
critical region. A statistical test may be a one-tail or two-tail test. Whether one uses a
one- or two- tail test of significance depends upon how the alternative hypothesis is
formulated.
Types Decision Rule
of Hypothesis
Rejected if
Two-tail
or
103
Right-tail
Left-tail
Step 4: Compute the values of the test statistic based on the sample information, e.g.
.
Step 5: Make a statistical decision and interpretation. is rejected if the computed
value of the test statistic falls in the critical region otherwise it is accepted.
Possible situation in testing a statistical hypothesis

Hypothesis is correct Hypothesis is
incorrect
Hypothesis is Accepted Correct decision Type II error β
Hypothesis is Rejected Type I error α Correct decision
Type I error: We reject a hypothesis when it should be accepted. P(reject H o|Ho true).
Type II error: We accept a hypothesis when it should be rejected. P(accepting H o|Ho
false).
7.2 A Single Population Mean,

We shall consider testing of hypothesis about a population mean under three different
conditions:
1. When sampling is from a normally distributed population with known
variance.
2. When sampling is from a normally distributed population with unknown
variance.
3. When sampling is from a population that is not normally distributed.
Sampling From Normally Distributed Populations: Population Variance Known
Example 76:
A researcher is interested in the mean level of some enzyme in a certain population. The
data available to the researcher are the enzyme determinations made on a sample of 10
individuals from the population of interest, and the sample mean is 22. If the sample
came from a population that is normally distributed with a known variance, .
Can the researcher conclude that the mean enzyme level in this population is different
from 25? Take .
Solution:
Step 1:
104
Step 2:
since and are known.
Step 3:
Step 4: , ,
= = =
Step 5: We are unable to reject the null hypothesis, since -1.42 > -1.96.
Example 77:
Aircrew escape systems are powered by a solid propellant. The burning rate of this
propellant is an important product characteristic. Specifications require that the mean
burning rate must be 50 centimeters per second. We know that the standard deviation
of burning rate is centimeters per second. The experimenter decides to specify a
type I error probability or significance level of and selects a random sample of
and obtains a sample average burning rate of centimeters per second.
What conclusions should be drawn?
Solution:
= = =3.25
Conclusion: Since , we reject at the 0.05 level of significance.

We conclude that the mean burning rate differs from 50 centimeters per second, based
on a sample of 25 measurements.
7.3 Tests on the Mean of a Normal Distribution: Variance Unknown

Test statistic is
has a t distribution with n-1 degrees of freedom.
Example 78:
105
A study revealed that the upper limit of the Normal Body Temperature of males is 98.6.
The body temperatures for 25 male subjects were taken and recorded as follows: 97.8,
97.2, 97.4, 97.6, 97.8, 97.9, 98.0, 98.0, 98.0, 98.1, 98.2, 98.3, 98.3, 98.4, 98.4, 98.4, 98.5, 98.6,
98.6, 98.7, 98.8, 98.8, 98.9, 98.9 and 99.0. Test the hypothesis versus
, using .
Example 79:
Nine patients suffering from the same physical handicap, but otherwise comparable
were asked to perform a certain task as part of an experiment. The average time required
to perform the task was seven minutes with a standard deviation of two minutes.
Assuming normality, can we conclude that the true mean time required to perform the
task by this type of patient is at least ten minutes?
Example 80:
The increased availability of light materials with high strength has revolutionized the
design and manufacture of golf clubs, particularly drivers. Clubs with hollow heads and
very thin faces can result in much longer tee shots, especially for players of modest skills.
This is due partly to the “spring-like effect” that the thin face imparts to the ball. Firing
a golf ball at the head of the club and measuring the ratio of the outgoing velocity of the
ball to the incoming velocity can quantify this spring-like effect. The ratio of velocities is
called the coefficient of restitution of the club. An experiment was performed in which
15 drivers produced by a particular club maker were selected at random and their
coefficients of restitution measured. In the experiment, the golf balls were fired from an
air cannon so that the incoming velocity and spin rate of the ball could be precisely
controlled. Determine if there is evidence (with ) to support a claim that the
mean coefficient of restitution exceeds 0.82. The observations are:
0.8411 0.8191 0.8182 0.8125 0.8750
0.8580 0.8532 0.8483 0.8276 0.7983
0.8042 0.8730 0.8282 0.8359 0.8660
The sample mean and sample standard deviation are and .
7.4 Tests on a Population Proportion

It is often necessary to test hypotheses on a population proportion. For example, suppose
that a random sample of size n has been taken from a large population and that
observations in this sample belong to a class having a particular characteristic of interest.
Then is a point estimator of the proportion of the population p that belongs to

this class. Note that n and p are the parameters of a binomial distribution. Recall that the
sampling distribution of is approximately normal with mean and variance
, if is not too close to either 0 or 1 and if n is relatively large.
106
In many engineering problems, we are concerned with a random variable that follows
the binomial distribution. For example, consider a production process that manufactures
items that are classified as either acceptable or defective. It is usually reasonable to model
the occurrence of defectives with the binomial distribution, where the binomial
parameter p represents the proportion of defective items produced. Consequently, many
engineering decision problems include hypothesis testing about p.
Consider testing
For large samples, the normal approximation to the binomial with the test statistic
may be used.
This presents the test statistic in terms of the sample proportion instead of the number
of items X in the sample that belongs to the class of interest.
Example 81:
In a study designed to assess the relationship between a certain drug and a certain
anomaly in chick embryos, 50 fertilized eggs were injected with the drug on the fourth
day of incubation. On the twentieth day of incubation the embryos were examined and
in 12 the presence of the abnormality was observed. Test the null hypothesis that the
drug causes abnormalities in not more than 20 percent of eggs into which it is
introduced. Let .
Example 82:
A manufacturer of intraocular lenses is qualifying a new grinding machine and will
qualify the machine if the percentage of polished lenses that contain surface defects does
not exceed 2%. A random sample of 250 lenses contains six defective lenses. Formulate
and test an appropriate set of hypotheses to determine if the machine can be qualified.
Use .
Example 83:
A semiconductor manufacturer produces controllers used in automobile engine
applications. The customer requires that the process fallout or fraction defective at a
critical manufacturing step not exceed 0.05 and that the manufacturer demonstrate
process capability at this level of quality using . The semiconductor
manufacturer takes a random sample of 200 devices and finds that four of them are
defective. Test the null hypothesis that the process fallout does not exceed 0.05.
7.5 The Difference Between two Population Means
107
Hypothesis testing involving the difference between two population means is most
frequently employed to determine whether or not it is reasonable to conclude that the
two are unequal. In such cases, one or other of the following hypotheses is tested:
Hypothesis Tests for a Difference in Means, Variances Known

Test statistic is
Example 84:
In a large hospital for the treatment of the mentally retarded, a sample of 12 individuals
with mongolism yielded a mean serum uric acid value of . In a general
hospital, a sample of 15 normal individuals of the same age and sex were found to have
a mean value of . If it is reasonable to assume that the two populations
of values are normally distributed with variances equal to 1. Do these data provide
sufficient evidence to indicate a difference in mean serum uric acid levels between
normal individuals and individuals with mongolism? Let .
Solution:
=2.84
Reject since 2.84 > 1.96 on the basis of these data, there is an indication that the means
are not equal.
Example 85:
108
A product developer is interested in reducing the drying time of a primer paint. Two
formulations of the paint are tested; formulation 1 is the standard chemistry, and
formulation 2 has a new drying ingredient that should reduce the drying time. From
experience, it is known that the standard deviation of drying time is 8 minutes, and this
inherent variability should be unaffected by the addition of the new ingredient. Ten
specimens are painted with formulation 1, and another 10 specimens are painted with
formulation 2; the 20 specimens are painted in random order. The two sample average
drying times are minutes and minutes, respectively. What conclusions
can the product developer draw about the effectiveness of the new ingredient, using
?
Solution:
=2.52
Conclusion: Reject H0.
Exercises
i. Two machines are used for filling plastic bottles with a net volume of 16.0 ounces.
The fill volume can be assumed normal, with standard deviation and
ounces. A member of the quality engineering staff suspects that both
machines fill to the same mean net volume, whether or not this volume is 16.0 ounces.
A random sample of 10 bottles is taken from the output of each machine.
Machine 1 Machine 2
16.03 16.01 16.02 16.03
16.04 15.96 15.97 16.04
16.05 15.98 15.96 16.02
16.05 16.02 16.01 16.01
16.02 15.99 15.99 16.00
109
Do you think the engineer is correct? Use .
ii. Two different formulations of an oxygenated motor fuel are being tested to study
their road octane numbers. The variance of road octane number for formulation 1 is
, and for formulation 2 it is . Two random samples of size and
are tested, and the mean road octane numbers observed are and
. Assume normality, and if formulation 2 produces a higher road octane
number than formulation 1, the manufacturer would like to detect it. Formulate and test
an appropriate hypothesis, using .
Hypothesis Tests for a Difference in Means, Variances unknown but Assumed Equal.
We now extend the results of the previous lecture to the difference in means of the two
distributions when the variances of both distributions and are unknown. If the
sample sizes and exceed 30, the normal distribution procedures could be used.
However, when small samples are taken, we will assume that the populations are
normally distributed and base our hypothesis tests on the t distribution. This nicely
parallels the case of inference on the mean of a single sample with unknown variance.
The normality assumption is required to develop the test procedure, but moderate
departures from normality do not adversely affect the procedure. Two different
situations must be treated. In the first case, we assume that the variances of the two
normal distributions are unknown but equal; that is, . In the second, we
assume that and are unknown and not necessarily equal.
The test statistic is
with df
The two sample variances are combined to form an estimator of . The pooled
estimator of is defined as follows.
Example 86:
The diameter of steel rods manufactured on two different extrusion machines is being
investigated. Two random samples of sizes and are selected, and the
sample means and sample variances are , , , and ,
respectively. Assume that and that the data are drawn from a normal
distribution. Is there evidence to support the claim that the two machines produce rods
with different mean diameters? Use in arriving at this conclusion.
110
Example 87:
Two catalysts are being analyzed to determine how they affect the mean yield of a
chemical process. Specifically, catalyst 1 is currently in use, but catalyst 2 is acceptable.
Since catalyst 2 is cheaper, it should be adopted, providing it does not change the process
yield. A test is run in the pilot plant and results in the data shown in the following table.
Is there any difference between the mean yields? Use , and assume equal
variances.
Observation Catalyst 1 Catalyst 2

Number
1 91.50 89.19
2 94.18 90.95
3 92.18 90.46
4 95.39 93.21
5 91.79 97.19
6 89.07 97.04
7 94.72 91.07
8 89.21 92.75
Example 88:
Serum amylase determinations were made on a sample of 15 apparently normal subjects.
The sample yielded a mean of 96 units/100ml and a standard deviation 35 units/100ml.
Serum amylase determinations were also made on 22 hospitalized subjects. The mean
and standard deviation from this second group are 120 and 40 units/100ml.,
respectively. Would we be justified in calculating that the implied population means are
different? Let .
Hypothesis Tests for a Difference Between Two Population Proportions

Suppose that two independent random samples of sizes and are taken from two
populations, and let and represent the number of observations that belong to the
class of interest in samples 1 and 2, respectively. Furthermore, suppose that the normal
:approximation to the binomial is applied to each population, so the estimators of the
population proportions and have approximate normal

distributions.
The test statistic is
111
is distributed approximately as standard normal and is the basis of a test for
. If the null hypothesis is true, using the fact that , the random variable
is distributed approximately . An estimator of the common parameter p is
The test statistic for is then
Example 89:
A random sample of 500 adult residents of Maricopa County found that 385 were in
favor of increasing the highway speed limit to 75 mph, while another sample of 400 adult
residents of Pima County found that 267 were in favor of the increased speed limit. Do
these data indicate that there is a difference in the support for increasing the speed limit
between the residents of the two counties? Use .
Example 90:
Out of a sample of 150, selected from patients admitted over a two-year period to a large
hospital, 129 had some type of hospitalization insurance. In a sample of 160 similarly
selected patients from a second hospital, 144 had some type of hospitalization insurance.
Test the null hypothesis that . Let .
CHAPTER 8
112
Learning Objectives
● Use simple linear regression for building empirical models of engineering and
scientific data.
● Understand how the method of least squares is used to estimate the parameters
in a linear regression model.
● Test statistical hypotheses and construct confidence intervals on regression model
parameters.
● Use the regression model to make a prediction of a future observation and
construct an appropriate prediction interval on the future observation.
● Apply the correlation model.
8.0 Regression and Correlation Analysis

8.1 Introduction
Many problems in engineering and science involve exploring the relationships between
two or more variables. Regression analysis is a statistical technique that is very useful
for these types of problems. For example, in a chemical process, suppose that the yield
of the product is related to the process-operating temperature. Regression analysis can
be used to build a model to predict yield at a given temperature level. This model can
also be used for process optimization, such as finding the level of temperature that
maximizes yield, or for process control purposes. Other examples are, studying the
relationship between blood pressure and age, the concentration of an injected drug and
heart rate etc.
Regression analysis is concerned with the study of the dependence of one variable, the
dependent variable, on one or more other variables, the independent or explanatory
variables with a view to estimating and predicting the (population) mean or average of
the former (dependent) in terms of the known or fixed (in repeated sampling) values of
the latter (independent).
Very often in practice, a relationship is found to exist between two (or more) variables
and one wishes to express this relationship in mathematical form by determining an
equation connecting the variables.
Correlation analysis, on the other hand, is concerned with measuring the strength of the
relationship between variables. When we compute measures of correlation from a set of
data, we are interested in the degree of the correlation between variables.
8.2 The Regression Model

In the typical regression problem, the researcher has available for analysis a sample of
observations from some real or hypothetical population. Based on the result of his
analysis of the sample data, he is interested in reaching decisions about the population
113
from which the sample is presumed to have been drawn. It is important that the
researcher understand the nature of the population in which he is interested.
In the simple linear regression model two variables X and Y, are of interest. The variable
X is usually referred to as the independent variable, while the other variable, Y is called
the dependent variable; and we speak of the regression of Y on X.
The following are the assumptions underlying the simple linear regression model.
i. Values of the independent variable X are fixed.
ii. The variable X is measured without error.
iii. Values Y are normally distributed
iv. The Y values are statistically independent.
v. The variances of the subpopulations of Y are all equal.
vi. The means of the subpopulations of Y all lie on the same straight line, this is the
assumption of linearity.
(1.0)
These assumptions may be summarized by means of the following equation which is
called the simple linear regression model because it has only one independent variable
or regressor:
(1.1)
where and (slope and intercept) are called population regression coefficients, e is
called the error term with mean zero and variance . The random errors corresponding
to different observations are also assumed to be uncorrelated random variables.
The results of n observations of the set of random variables X and Y can be summarized
by drawing a scatter diagram. A straight line passing closely to the points may be drawn.
The main problem arises when the points do not all lie exactly on the straight line, but
simply form a cloud of points around it. Thus, it may be possible by guess work to draw
quite a number of lines each of which will appear to be able to explain the relationship
between X and Y. We shall consider finding a best fit line. Such a line will then be used
as a model relating the random variable Y with the random variable X.
Suppose that we have n pairs of observations . The following
figure shows a typical scatter plot of observed data and a candidate for the estimated
regression line.
The estimates of and should result in a line that is (in some sense) a “best fit” to the
data. The German scientist Karl Gauss proposed estimating the parameters and in
Equation 1.1 to minimize the sum of the squares of the deviations in the diagram. We
call this criterion for estimating the regression coefficients the method of least squares.
Using Equation 1.1, we may express the n observations in the sample as
and the sum of the squares of the deviations of the observations from the true regression
line is
114
(1.2)
The least squares estimators of and , say and , must satisfy
(1.3)
Simplifying these two equations yields
(1.4)
Equations 1.4 are called the least squares normal equations. The solution to the normal
equations results in the least squares estimators and .

The least squares estimates of the intercept and slope in the simple linear regression
model are
(1.5)
(1.6)
Equation 1.6 can also be written as
The estimated regression line is therefore
Alternatively
115
8.3 Method of Least Squares
We shall now find a and b, the estimates of α and β so that the sum of the squares of the
residuals is a minimum. The residual sum of squares is often called the Sum of Squares
of Errors (SSE) about the regression line. This minimisation procedure for estimating the
parameter is called the “methods of least squares”. Hence we shall find a and b so as to
minimise
Differentiating SSE with respect a and b, we have
Setting the partial derivative equal to zero and rearranging the terms, we obtain the
equation (called the normal equations)
Solving for a and b from (1) and (2)
Equations (1) and (2) can also be solved using matrices as:
Example 91:
116
Assuming we have the following quantities n = 8, Σx = 140, Σy = 382, Σxy = 3870, and
Σx2 = 3500.
Solution:
Using the above equation:
Solving gives a = 94.67 and b = -2.68
Thus, the estimated regression line is given by:

= 94.67 - 2.68x
It may be noted that the least-squares line passes through the point (x, y) called the
‘Centroid’ or centre of gravity of the data. The slope b of the regression line is
independent of the origin of coordinates. It is therefore said that b is invariable under
the translation of axes.
Besides assuming that the regression of y and x is a linear function having the form
E(Y|X) = α + βx we have made three further assumptions which may be summarised as
follows:
1. Normality: We have assumed that each variable yi has a normal distribution.

2. Independence: We have assumed that the variables yi....yn are independent.
3. Homoscedasticity:- We have assumed that the variable yi....yn have the same variance
σ2. This assumption is called the assumption of homoscedasticity. In general, it is said
that random variables having the same variance are homoscedastic, and random
variables having different variance are heteroscedastic.
Example 92:
A team of professional mental health workers in a long-stay psychiatric hospital wished
to measure the level of response of withdrawn patients to a program of remotivation
therapy. A standard test was available for this purpose, but it was expensive and time-
consuming to administer. To overcome this obstacle, the team developed a test that was
much easier to administer. To test the usefulness of the new instrument for measuring
the level of patient response, the team decided to examine the relationship between
scores made on the new test and scores made on the standardized test. The objective was
to use the new test if it could be shown that it was a good predictor of a patient’s score
on the standardized test. The results are shown in the table below:
117
Patients’ Scores on Standardized Test
and New Test
Patient Score on Score on
Number New Test Standardized
(X) Test (Y)
1 50 61
2 55 61
3 60 59
4 65 71
5 70 80
6 75 76
7 80 90
8 85 106
9 90 98
10 95 100
11 100 114
Obtain the estimates of the regression coefficients.
Example 93:
A research on “Near Surface Characteristics of Concrete: Intrinsic Permeability”,
presented data on compressive strength x and intrinsic permeability y of various
concrete mixes and cures. Summary quantities are , , ,
, , and . Assume that the two variables are

related according to the simple linear regression model.
(a) Calculate the least squares estimates of the slope and intercept.
(b) Use the equation of the fitted line to predict what permeability would be observed
when the compressive strength is .
(c) Give a point estimate of the mean permeability when compressive strength is
.
Example 94:
The following data were obtained from a study investigating the relationship between
noise exposure and hypertension.
Y 1 0 1 2 5 1 4 6 2 3 5 4 6 8 4 5 7 9 7 6
X 6 6 6 7 7 7 8 9 8 8 8 8 9 9 9 9 9 10 10 10
0 3 5 0 0 0 0 0 0 0 5 9 0 0 0 0 4 0 0 0
i. Fit the simple linear regression model using least squares.

ii. Find the predicted mean rise in blood pressure level associated with a sound
pressure level of 85 decibels.
118
8.5 Correlation Analysis
Closely related but conceptually very much different from regression analysis is
correlation analysis, where the primary objective is to measure the strength or degree of
linear association between two variables. The correlation coefficient measures this
strength of (linear) association. For example, we may be interested in finding the
correlation between smoking and lung cancer; between scores on mathematics and fluid
mechanics examinations, between high school grades and college grades etc. In
regression analysis, as already noted, we are not primarily interested in such a measure.
Instead, we try to estimate the average value of one variable on the basis of the fixed
values of another variable.
The population correlation coefficient between two random variables, X and Y is defined
as
where is the covariance between variables X and Y, and are the standard
deviations of X and Y respectively. It is possible to draw inferences about the correlation
coefficient using its estimator, the sample correlation coefficient, r. “r” is the correlation
coefficient between “n” pairs of observations whose values are (Xi, Yi) and is given by
Properties of r:
1. It is symmetrical in nature (the two variables are treated symmetrically). That
is, there is no distinction between the dependent and independent variables.
2. Both variables are assumed to be random.
3. It can be positive or negative, the sign depending on the sign of the term in
the numerator which measures the sample co variation of the two variables.
4. It lies between the limits of -1 and +1; that is, .
5. If X and Y are independent, the correlation coefficient between them is zero
but if r=0 it does not mean that the two variables are independent.
119
Fig. 6.2 Scatter plots with various r values
Testing hypothesis about the correlation coefficient.

A test of the special hypothesis = 0 versus an appropriate alternative is equivalent to
testing β = 0 for the simple linear regression model. In doing this the t-distribution with
n-2 degrees of freedom may be needed.
t which can also be written as to test:
120
Example 95:
Using the following data, test the hypothesis that there is no linear correlation among
the variables that generated them; at 5% level of significance:
SSxx = 0.11273 SSyy = 11,807,324,786 SSxy = 34,422 75972
Solution:
Ho : ρ = 0
H1 : ρ ≠ 0
α = 0.05 df= (n – 2)
critical region: t < - 2.052 and t > 2.052
P < 0.0001
Decision:
Since t > t0.025(27) reject the hypothesis of no linear correlation.

More generally, if X and Y follow the bivariate normal distribution, it can be shown that
quantity.
is a random variable that follows approximately the normal distribution with mean.
and variance equal to 1/(n-3). There the procedure is to compute
Example 96:
Consider the immediate preceding example data, test the null hypothesis that ρ = 0.9
against the alternative that ρ > 0.9 at 5% level of significance.
Solution:
Ho: ρ = 0.9
H1: ρ > 0.9
Critical region : Z > 1.645
Decision: Since Z<Z0.05 there is no evidence that the correlation coefficient is not equal to
0.9
121
In ordinary usage of this method, it is not necessary to use the formula for Z that
corresponds to r values between 0.0 and 0.99. Tables contain fisher - Z values Zf are
available. In this case to test Ho : ρ = ρo vrs H1 : ρ ≠ρo
Z = we have Z=
Critical region is Z ≤ - Zα/2 and Z ≥ Zα/2 where Zf and μf are the fisher - Z values for r
and ρo respectively.
Example 97:
The following data gave X= the water content of snow on April 1 and Y= the yield from
April to July (in inches) on the Snake River watershed in Wyoming for 17 years.
x y X y
23.1 10.5 37.9 22.8
32.8 16.7 30.5 14.1
31.8 18.2 25.1 12.9
32.0 17.0 12.4 8.8
30.4 16.3 35.1 17.4
24.0 10.5 31.5 14.9
39.5 23.1 21.1 10.5
24.2 12.4 27.6 16,1
52.5 24.9
Estimate the correlation between X and Y.
Example 98:
Two methods of measuring cardiac output were compared in 10 experimental animals
with the following results
Cardiac Output
(1./min)
Method I Method I
x Y
0.8 0.5
1.0 1.2
1.3 1.1
1.4 1.3
1.5 1.1
1.4 1.8
2.0 1.6
2.4 2.0
2.7 2.4
3.0 2.8
Compute the sample correlation coefficient.
122
Example 99:
A group of eight athletes ran a 400 metres race twice. The times in seconds were recorded
as follows for each athlete.
Runner
1st Trial 2nd Trial
x Y
48.4 48.0
51.2 54.3
48.6 49.4
49.5 48.4
51.6 54.0
49.3 47.2
50.8 51.8
49.7 50.3
Calculate the correlation coefficient between these two trials.
References
1) Hayslett, H. T. (1981), Statistics made simple, Heinemann Prefessional Publishing
Ltd. 246 pp.
2) Kapur, H. C. (1997) Mathematical Statistics, S. Chand & Company Ltd. New

Delhi. 768 pp.
3) Webster, A. L. (1998), Applied Statistics for Business and Economics: An Essentials

Version, 604 pp.
4) Keller, G. & Warrack, K. (1999), Statistics for Management and Economics, Dux
bury Press. New York. 860 pp.
5) Bluman A. G. (2004), Elementary Statistics: A step by step Approach, McGraw Hill.

New York. 810 pp.
6) Bruce L. B. & O'Connell, B. (2003) Business Statistics in Practice McGraw Hill,

London. 864 pp
7) Davis, C. D. (1986), Statistics and Data Analysis in Geology, John Wiley & Sons,
New York.646 pp.
123
8) Walpole, R. E. & Myer, R. H. (1993) "Probability and Statistics for Engineers and
Scientists" Macmillan Publishing Company, New York. 766 pp.
9) DeGroot, M. H. (1989) Probability and Statistics Addison-Wesley Publishing

Company, Tokyo, Japan. 723 pp.
10) Bowerman, B. L. & O’Connell, R. T. (2003) “Business Statistics in Practice”

McGraw-Hill, New York, 864 pp.
11) Lindgren Bernard W (1976). “Statistical Theory” Third Edition. New York:
Macmillan Publishing Co., Inc.
12) Roussas George G. (1972). “A First Course in Mathematical Statistics.” Reading

Massachusetts: Addisson-Wesley Publishing Company.
13) Spiegel Murray R (1972). “Theory and problems of Statistics.” SI (metric)

Edition. New York: McGraw-Hill Book Company (UK) Limited.
124

Modified Ps Final 2023

Uploaded by

Copyright:

Available Formats

Modified Ps Final 2023

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Modified Ps Final 2023

Uploaded by

Copyright:

Available Formats

UNIVERSITY OF MINES AND TECHNOLOGY

FMMT, FGES, FoE, SPetS

DEPARTMENT OF MATHEMATICAL SCIENCES

PROBABILITY AND STATISTICS

ASSOC PROF. L. BREW/ DR P. BOYE/ DR B. ODOI

Chapter 1 Introduction to Statistics 3

Chapter 2 Introduction to Probability 31

Chapter 3 Random Variables and Distributions 52

Chapter 4 Special Probability Distributions 70

Chapter 5 Mathematical Expectation 92

Chapter 6 Estimation 107

Chapter 7 Tests of Hypotheses and Significance 118

Chapter 8 Regression and Correlation Analysis 130

1.0 Introduction to Statistics

1.1 Why Statistics?

1.2 Branches of Statistics

1.2.4 Qualitative vs. Quantitative Variables

1.2.5 Discrete vs. Continuous Variables

1.3 Univariate vs. Bivariate Data

1.4 Populations and Samples

1.4.1 Population vs. Sample

A sample consists of one or more observations from the population.

A measurable characteristic of a population, such as a mean or standard deviation,

1.5 Summarizing data graphically

▪ (Also frequency distribution)

Selected graphs for Numerical data

1.6 Summary Statistics

1.6.1 Measure of Spread/ Variability

1.6.2 How to Describe Data Patterns in Statistics

1.7 Unusual Features

1.8 How to Compare Data Sets

1.8.1 Four Ways to Describe Data Sets

Shape. The shape of a distribution is described by symmetry, skewness, number of peaks,

1.9 Sampling Procedures

Statisticians employ different procedures in choosing the observations that will

Fig. 1.1 Random Sampling

Fig. 1.2 Regular Sampling

Fig.1.3 Stratified Random Sampling

Example 4: In conducting a poll of voter preferences for a statewide election, we can

1.10 Levels of Measurement (Types of Data)

Nominal Measurements Names or classifications are used to divide data into

It is important to remember that a nominal measurement carries no indication of order

Unlike a nominal measurement, an ordinal scale produces a distinct ordering or

Ordinal Measurements Measurements that rank observations into categories with

Interval Measurements Measurements on a numerical scale in which the value of

Ratio Measurements Numerical measurements in which zero is a meaningful value

1.11 Frequency Distribution

1.11.2. Dot Plot

Fig. 1.7 Dot Plot

1.11.3. Box-and-whiskers plot

Fig. 1.8 Box-and-whiskers Plot

1.11.4. Pareto Chart

Fig. 1.10 Pareto Chart

1.12 Forms of Frequency curves.

1.12.1 Symmetrical Curves

Fig 1.13 Symmetrical Distribution.

A curve is said to be skewed if there is no symmetry. In a skew curve, observations tend

1.13.1 Measures of Location