Nothing Special   »   [go: up one dir, main page]

Probability Theory

Download as pdf or txt
Download as pdf or txt
You are on page 1of 354

Lecture 1

Descriptive Statistics
Population & Sample
Descriptive measures quartiles
Percentiles & box plots
Lecture 1
● Statistics
● Descriptive Statistics
● Statistical Inference
● Population vs Sample
● Frequency Distributions
● Cumulative Distributions
● Sample Mean
● Sample Median
● Deviations from Mean
● Variance
● Standard Deviation
● Quartiles
● Percentiles
● Box Plots
Statistics

What is statistics?
Statistics is the study and manipulation of data, including ways to gather, review, analyze,
and draw conclusions from data.

Why study statistics?

Answers provided by statistical analysis can provide the basis for making better decisions
and choices of actions. Statistical reasoning and methods can help you become efficient at
obtaining information and making useful conclusions.
Descriptive Statistics

Interest in the field grew in 18th century

At first, descriptive statistics consisted merely of the presentation of data in tables


and charts.

Nowadays, it includes the summarization of data by means of numerical


descriptions and graphs.
Statistical Inference

Statistical inference is concerned with generalizations based on sample data.

When making a statistical inference always proceed with caution.

One must decide carefully how far to go in generalizing from a given set of data.

Careful consideration must be given to determining whether such generalizations


are reasonable and whether it might be wise to collect more data.
Population Vs Sample

Population Sample

A population is the collection of all items of A sample is a subset of the population. It is


interest to our study. representative of the population.

The measurable characteristic of the The measurable characteristic of the


population like the mean or standard sample is called a statistic.
deviation is known as the parameter.

A survey done of an entire population is A survey done using a sample of the


accurate and more precise with no margin population bears accurate results, only
of error except human inaccuracy in after further factoring the margin of error
responses. However, this may not be and confidence interval.
possible always.

All the students in the class are population. All the students who regularly attend class
is a sample.
Frequency Distributions

A frequency distribution is a table that divides a set of data into a suitable number
of classes (categories), showing also the number of items belonging to each class.

The table sacrifices some of the information contained in the data.

Instead of knowing the exact value of each item, we only know that it belongs to a
certain class.
Example
Data

245 333 296 304 276 336 289 234 253 292 366 323 309 284 310 338 297 314 305 330 266 391 315 305 290 300
292 311 272 312 315 355 346 337 303 265 278 276 373 271 308 276 364 390 298 290 308 221 274 343

(205,245]

Note that the class limits are given to as many decimal places as the original data. Had the original data been given to one
decimal place, we would have used the class limits 205.1–245.0, 245.1–285.0, …, 365.1–405.0.
Class Mark and Class Interval

Class Mark: The class marks of a frequency distribution are obtained by averaging
successive class boundaries.

Class Interval: If the classes of a distribution are all of equal length then
subtraction the lower limit from the upper limit gives the class interval.

Class mark: 225, 265, 305, 345, 385

Class Interval: 40
Cumulative Distribution(less than or equal to variant)

Intervals Cumulative Frequency

(205,245] 3

(245,285] 14

(285,325] 37

(325,365] 46

(365,405] 50
Descriptive Measures: Sample Mean

N measurements/data points

Mean : the sum of the observations divided by sample size.


Descriptive Measures: Sample Median
When median is used over mean sometimes?

Sometimes it is preferable to use the sample median as a descriptive measure of


the center, or location, of a set of data.

This is particularly true if it is desired to minimize the calculations

or

If it is desired to eliminate the effect of extreme (very large or very small) values.
Question

A sample of five university students responded to the question “How much time, in
minutes, did you spend on the social network site yesterday?”

100 45 60 130 30 35

Find the mean and median.


Question

A sample of five university students responded to the question “How much time, in
minutes, did you spend on the social network site yesterday?”

100 45 60 130 30 35

Find the mean and median.

Mean: 66.67

Median: 52.5
Descriptive Measures: Deviations from Mean
Descriptive Measures: Deviations from Mean

Data: 1 2 3 4 5 Mean 3

Data: -7 -3 3 10 12 Mean 3

We observe that the dispersion of a set of data is small if the values are closely
bunched about their mean, and that it is large if the values are scattered widely
about their mean.

It would seem reasonable, therefore, to measure the variation of a set of data in


terms of the amounts by which the values deviate from their mean.
Descriptive Measures: Deviations from Mean

The sum of the deviations about mean is always zero.

Because the deviations sum to zero, we need to remove their signs. Absolute
value and square are two natural choices.

If we take their absolute value, so each negative deviation is treated as positive,


we would obtain a measure of variation.

However, to obtain the most common measure of variation, we square each


deviation.
Descriptive Measures: Variance

Reason for dividing by n−1 instead of n is that there are only n−1 independent deviations xi − x̄.

Because their sum is always zero, the value of any particular one is always equal to the negative
of the sum of the other n − 1 deviations.

If many of the deviations are large in magnitude, either positive or negative, their squares will be
large and s2 will be large. When all the deviations are small, s 2 will be small.
Example

The delay times (handling, setting, and positioning the tools) for cutting 6 parts on
an engine lathe are 0.6, 1.2, 0.9, 1.0, 0.6, and 0.8 minutes. Calculate s2.
Descriptive Measures: Standard Deviation

Notice that the units of s2 are not those of the original observations.

In previous question the data are delay times in minutes, but s2 has the unit
(minute)2

Consequently, we define the standard deviation of n observations x1, x2,..., xn as


the square root of their variance.

The standard deviation is by far the most generally useful measure of variation. Its
advantage over the variance is that it is expressed in the same units as the
observations.
Descriptive Measures: Quartiles

In addition to the median, which divides a set of data into halves, we can consider
other division points.

When an ordered data set is divided into quarters, the resulting division points are
called sample quartiles.

The first quartile, Q1, is a value that has one-fourth, or 25%, of the observations
below its value. The first quartile is also the sample 25th percentile P0.25.
Descriptive Measures: Percentile

More generally, we define the sample 100 pth percentile as :

The sample 100 pth percentile is a value such that at least 100p% of the
observations are at or below this value, and at least 100(1 − p)% are at or above
this value.
Descriptive Measures: Percentile
Question
Given the data

136 143 147 151 158 160 161 163 165 167 173 174 181 181 185 188 190 205

Obtain the quartiles and the 10th percentile.

n = 18

First quartile: 18*(0.25) = 4.5 (round up to 5)

Q1 = 5th observation = 158

Number of observations below or equal to 158 = 5 (atleast 4.5 required acc to definition)

Number of observations equal to or above 158 = 14 (atleast 13.5 required acc to definition)
Question
Given the data
136 143 147 151 158 160 161 163 165 167 173 174 181 181 185 188 190 205
Obtain the quartiles and the 10th percentile.
n = 18
Second: 18*(0.5) = 9 Therefore, we average the 9th and 10th ordered values
Q2 = average the 9th and 10th ordered values = (165+167)/2 = 166
Q3 = 181 P0.10 = 143
Descriptive Measures: Range & Interquartile Range

The minimum and maximum observations also convey information concerning the
amount of variability present in a set of data. Together, they describe the interval
containing all of the observed values.

range = maximum − minimum

The amount of variation in the middle half of the data is described by the
interquartile range.

interquartile range = third quartile − first quartile = Q3 − Q1


Descriptive Measures: Box Plots
References

Probability and statistics for engineers RA Johnson, I Miller, JE Freund - 2000 -


117.239.47.98

Statistics for business & economics DR Anderson, DJ Sweeney, TA Williams, JD


Camm

Probability and statistics for engineering and science J Deovre


Lecture 2
PNC
Probability
Lecture 2

● Experiment
● Sample Space
● Events
● Set Theory
● Disjoint Events
● Permutations & Combinations
● Questions
Experiment

An experiment is any activity or process whose outcome is subject to uncertainty.

Examples:

tossing a coin once or several times

selecting a card or cards from a deck


Sample Space

The sample space of an experiment, denoted by S, is the set of all possible


outcomes of that experiment.

Toss a coin → Sample Space H, T

Tossing two coins → Sample Space HH, TT, HT, TH

Tossing three coins → HHH, HHT, HTT, HTH,....

Toss n coins: 2n possibilities


Simple and Compound Event

An event is any collection (subset) of outcomes contained in the sample space S.


An event is simple if it consists of exactly one outcome and compound if it consists
of more than one outcome.

Coin tossed twice

Simple Event → HH

Compound Event → {HT,TH}


Set Theory

An event is just a set, so relationships and results from elementary set theory can
be used to study events.
Mutually Exclusive or Disjoint Events

A and B have no outcomes in common, so that the intersection of A and B


contains no outcomes.
Exercise: De Morgan’s Law

Proof the following


Properties

Given an experiment and a sample space , the objective of probability is to assign


to each event A a number P(A), called the probability of the event A, which will
give a precise measure of the chance that A will occur.
Properties
Permutation and Combination(nCk)

A permutation is used for the list of data (where the order of the data matters) and the combination
is used for a group of data (where the order of data doesn’t matter).
Question
The computers of six faculty members in a certain department are to be replaced. Two
of the faculty members have selected laptop machines and the other four have chosen
desktop machines.
Suppose that only two of the setups can be done on a particular day, and the two
computers to be set up are randomly selected from the six (implying 15 equally likely
outcomes; if the computers are numbered 1, 2, . . . , 6, then one outcome consists of
computers 1 and 2, another consists of computers 1 and 3, and so on).
a.What is the probability that both selected setups are for laptop computers?
b. What is the probability that both selected setups are desktop machines?
c. What is the probability that at least one selected setup is for a desktop computer?
d. What is the probability that at least one computer of each type is chosen for setup?
Question
The computers of six faculty members in a certain department are to be replaced. Two of the
faculty members have selected laptop machines and the other four have chosen desktop
machines.

Suppose that only two of the setups can be done on a particular day, and the two computers to be
set up are randomly selected from the six (implying 15 equally likely outcomes; if the computers
are numbered 1, 2, . . . , 6, then one outcome consists of computers 1 and 2, another consists of
computers 1 and 3, and so on).

a. What is the probability that both selected setups are for laptop computers? 2C2/15

b. What is the probability that both selected setups are desktop machines? 4c2/15

c. What is the probability that at least one selected setup is for a desktop computer? (15-1)/15
=14/15

d. What is the probability that at least one computer of each type is chosen for setup? (2*4)/15
Propositions

A homeowner doing some remodeling requires the services of both a plumbing contractor
and an electrical contractor. If there are 12 plumbing contractors and 9 electrical contractors
available in the area, in how many ways can the contractors be chosen? 108
Question

A production facility employs 20 workers on the day shift, 15 workers on the swing
shift, and 10 workers on the graveyard shift. A quality control consultant is to
select 6 of these workers for in-depth interviews.

Suppose the selection is made in such a way that any particular group of 6
workers has the same chance of being selected as does any other group (drawing
6 slips without replacement from among 45).

a. How many selections result in all 6 workers coming from the day shift? What is
the probability that all 6 selected workers will be from the day shift?
Question
A production facility employs 20 workers on the day shift, 15 workers on the swing
shift, and 10 workers on the graveyard shift. A quality control consultant is to
select 6 of these workers for in-depth interviews.
Suppose the selection is made in such a way that any particular group of 6
workers has the same chance of being selected as does any other group (drawing
6 slips without replacement from among 45).
a. How many selections result would lead to all 6 workers coming from the day
shift? What is the probability that all 6 selected workers will be from the day shift?
20C 20C /45C
6, 6 6
Question

A production facility employs 20 workers on the day shift, 15 workers on the swing
shift, and 10 workers on the graveyard shift. A quality control consultant is to
select 6 of these workers for in-depth interviews.

Suppose the selection is made in such a way that any particular group of 6
workers has the same chance of being selected as does any other group (drawing
6 slips without replacement from among 45).

b. What is the probability that all 6 selected workers will be from the same shift?
Question

A production facility employs 20 workers on the day shift, 15 workers on the swing
shift, and 10 workers on the graveyard shift. A quality control consultant is to
select 6 of these workers for in-depth interviews.

Suppose the selection is made in such a way that any particular group of 6
workers has the same chance of being selected as does any other group (drawing
6 slips without replacement from among 45).

b. What is the probability that all 6 selected workers will be from the same shift?

(20C6+ 15C6 +10C6)/45C6


Question

A production facility employs 20 workers on the day shift, 15 workers on the swing
shift, and 10 workers on the graveyard shift. A quality control consultant is to
select 6 of these workers for in-depth interviews.

Suppose the selection is made in such a way that any particular group of 6
workers has the same chance of being selected as does any other group (drawing
6 slips without replacement from among 45).

c. What is the probability that at least two different shifts will be represented
among the selected workers?
Question
A production facility employs 20 workers on the day shift, 15 workers on the swing shift,
and 10 workers on the graveyard shift. A quality control consultant is to select 6 of these
workers for in-depth interviews.
Suppose the selection is made in such a way that any particular group of 6 workers has the
same chance of being selected as does any other group (drawing 6 slips without
replacement from among 45).
c. What is the probability that at least two different shifts will be represented among the
selected workers?
(1-(20C6+ 15C6 +10C6))/45C6
Question

A production facility employs 20 workers on the day shift, 15 workers on the swing
shift, and 10 workers on the graveyard shift. A quality control consultant is to
select 6 of these workers for in-depth interviews. Suppose the selection is made in
such a way that any particular group of 6 workers has the same chance of being
selected as does any other group (drawing 6 slips without replacement from
among 45).

d. What is the probability that at least one of the shifts will be unrepresented in the
sample of workers?
Question

A1 = Day shift workers unrepresented

P(A1) = 25C6 / 45C6

A2 = Swing shift workers unrepresented

P(A2) = 30C6 / 45C6

A3 = Graveyard shift workers unrepresented

P(A3) = 35C6 / 45C6

P(A1 ∩ A2) = 10C6 / 45C6

P(A1 ∩ A2 ∩ A3) = 0
Question
An academic department with five faculty members— Anderson, Box, Cox, Cramer, and
Fisher—must select two of its members to serve on a personnel review committee. Because
the work will be time-consuming, no one is anxious to serve, so it is decided that the
representative will be selected by putting the names on identical pieces of paper and then
randomly selecting two.
a. What is the probability that both Anderson and Box will be selected?
b. What is the probability that at least one of the two members whose name begins with C is
selected?
c. If the five faculty members have taught for 3, 6, 7, 10, and 14 years, respectively, at the
university, what is the probability that the two chosen representatives have a total of at least
15 years’ teaching experience there?
Question
An academic department with five faculty members— Anderson, Box, Cox, Cramer, and Fisher—must
select two of its members to serve on a personnel review committee. Because the work will be time-
consuming, no one is anxious to serve, so it is decided that the representative will be selected by putting
the names on identical pieces of paper and then randomly selecting two.

a. What is the probability that both Anderson and Box will be selected? 0.1

b. What is the probability that at least one of the two members whose name begins with C is selected?

0.7

c. If the five faculty members have taught for 3, 6, 7, 10, and 14 years, respectively, at the university, what
is the probability that the two chosen representatives have a total of at least 15 years’ teaching experience
there?

0.6
Question
The three most popular options on a certain type of new car are a built-in GPS (A), a
sunroof (B), and an automatic transmission (C). If 40% of all purchasers request A,
55% request B, 70% request C, 63% request A or B, 77% request A or C, 80% request
B or C, and 85% request A or B or C, determine the probabilities of the following
events.
a. The next purchaser will request at least one of the three options.
b. The next purchaser will select none of the three options.
c. The next purchaser will request only a built in GPS and not either of the other two
options.
d. The next purchaser will select exactly one of these three options.
Question
The three most popular options on a certain type of new car are a built-in GPS (A), a
sunroof (B), and an automatic transmission (C). If 40% of all purchasers request A,
55% request B, 70% request C, 63% request A or B, 77% request A or C, 80% request
B or C, and 85% request A or B or C, determine the probabilities of the following
events.
a. The next purchaser will request at least one of the three options. 0.85
b. The next purchaser will select none of the three options. 0.15
c. The next purchaser will request only a built in GPS and not either of the other two
options.
d. The next purchaser will select exactly one of these three options.
References

Probability and statistics for engineers RA Johnson, I Miller, JE Freund - 2000 -


117.239.47.98

Statistics for business & economics DR Anderson, DJ Sweeney, TA Williams, JD


Camm

Probability and statistics for engineering and science J Deovre


Lecture 3
Conditional Probability
Bayes Theorem
Independent Events
Questions
Index

● Conditional Probability
● Bayes Theorem
● Independent Events
● Questions
Conditional Probability

P(A).P(B|A) = P(B).P(A|B)
Question

A chain of video stores sells three different brands of DVD players. Of its DVD
player sales, 50% are brand 1 (the least expensive), 30% are brand 2, and 20%
are brand 3. Each manufacturer offers a 1-year warranty on parts and labor. It is
known that 25% of brand 1’s DVD players require warranty repair work, whereas
the corresponding percentages for brands 2 and 3 are 20% and 10%,
respectively.

1) What is the probability that a randomly selected purchaser has bought a brand
1 DVD player that will need repair while under warranty?
Question
A chain of video stores sells three different brands of DVD players. Of its DVD
player sales, 50% are brand 1 (the least expensive), 30% are brand 2, and 20%
are brand 3. Each manufacturer offers a 1-year warranty on parts and labor. It is
known that 25% of brand 1’s DVD players require warranty repair work, whereas
the corresponding percentages for brands 2 and 3 are 20% and 10%,
respectively.
1) What is the probability that a randomly selected purchaser has bought a brand
1 DVD player that will need repair while under warranty?
0.125
Question
Question

A chain of video stores sells three different brands of DVD players. Of its DVD
player sales, 50% are brand 1 (the least expensive), 30% are brand 2, and 20%
are brand 3. Each manufacturer offers a 1-year warranty on parts and labor. It is
known that 25% of brand 1’s DVD players require warranty repair work, whereas
the corresponding percentages for brands 2 and 3 are 20% and 10%,
respectively.

2. What is the probability that a randomly selected purchaser has a DVD player
that will need repair while under warranty?
Question
A chain of video stores sells three different brands of DVD players. Of its DVD
player sales, 50% are brand 1 (the least expensive), 30% are brand 2, and 20%
are brand 3. Each manufacturer offers a 1-year warranty on parts and labor. It is
known that 25% of brand 1’s DVD players require warranty repair work, whereas
the corresponding percentages for brands 2 and 3 are 20% and 10%,
respectively.
2. What is the probability that a randomly selected purchaser has a DVD player
that will need repair while under warranty?
0.205
Question

A chain of video stores sells three different brands of DVD players. Of its DVD
player sales, 50% are brand 1 (the least expensive), 30% are brand 2, and 20%
are brand 3. Each manufacturer offers a 1-year warranty on parts and labor. It is
known that 25% of brand 1’s DVD players require warranty repair work, whereas
the corresponding percentages for brands 2 and 3 are 20% and 10%,
respectively.

3. If a customer returns to the store with a DVD player that needs warranty repair
work, what is the probability that it is a brand 1 DVD player? A brand 2 DVD
player? A brand 3 DVD player?
Question

3. If a customer returns to the store with a DVD player that needs warranty repair
work, what is the probability that it is a brand 1 DVD player? A brand 2 DVD
player? A brand 3 DVD player?
Bayes Theorem
Question
Question

Only 1 in 1000 adults is afflicted with a rare disease for which a diagnostic test has
been developed. The test is such that when an individual actually has the disease,
a positive result will occur 99% of the time, whereas an individual without the
disease will show a positive test result only 2% of the time. If a randomly selected
individual is tested and the result is positive, what is the probability that the
individual has the disease?
Independent Events
Question

Each day, Monday through Friday, a batch of components sent by a first supplier
arrives at a certain inspection facility. Two days a week, a batch also arrives from
a second supplier. Eighty percent of all supplier 1’s batches pass inspection, and
90% of supplier 2’s do likewise. What is the probability that, on a randomly
selected day, two batches pass inspection?
Question

Two pumps connected in parallel fail independently of one another on any given
day. The probability that only the older pump will fail is .10, and the probability that
only the newer pump will fail is .05. What is the probability that the pumping
system will fail on any given day (which happens if both pumps fail)?
Question
Two pumps connected in parallel fail independently of one another on any given
day. The probability that only the older pump will fail is .10, and the probability that
only the newer pump will fail is .05. What is the probability that the pumping
system will fail on any given day (which happens if both pumps fail)?

P(Pump 1 fails) = P(only Pump 1 failing) + P( both pumps failing) = 0.1+0.005 =


0.105
P(Pump 2 fails) = P(only Pump 2 failing) + P( both pumps failing) = 0.05+0.05 =
0.055
Since independent events then answer is P(Pump 1 fails)*P(pump 2 fails) =
0.105*0.055 = 0.005775
Question

Individual A has a circle of five close friends (B, C, D, E, and F). A has heard a
certain rumor from outside the circle and has invited the five friends to a party to
circulate the rumor. To begin, A selects one of the five at random and tells the
rumor to the chosen individual. That individual then selects at random one of the
four remaining individuals and repeats the rumor. Continuing, a new individual is
selected from those not already having heard the rumor by the individual who has
just heard it, until everyone has been told.
Question

1. What is the probability that the rumor is repeated in the order B, C, D, E, and
F?

1/(5.4.3.2.1) = 1/120 = 0.0083


Question

2. What is the probability that F is the third person at the party to be told the
rumor?

??F??
4.3.1.2.1
=24
Total possibilities = 5.4.3.2.1 =120
So answer is 24/120 = 0.2
Question

3. What is the probability that F is the last person to hear the rumor?

????F
4.3.2.1.1
=24
Total possibilities = 5.4.3.2.1 =120
So answer is 24/120 = 0.2
Question

4. If at each stage the person who currently “has” the rumor does not know who
has already heard it and selects the next recipient at random from all five possible
individuals, what is the probability that F has still not heard the rumor after it has
been told ten times at the party?

4.4.4.4…../ 5.5.5.5…..
410/510
=0.1074
References

Probability and statistics for engineers RA Johnson, I Miller, JE Freund - 2000 -


117.239.47.98
Statistics for business & economics DR Anderson, DJ Sweeney, TA Williams, JD
Camm
Probability and statistics for engineering and science J Deovre
Lecture 4
Random Variables
Probability Distributions
Index

● Random Variable
● Bernoulli Random Variable
● Probability Distribution
● Parameter
● Cumulative Distribution
● Expectation
Random Variable

Random Variable is just a variable/rule which is going to assign numerical values


to each outcome of sample space.
Random variables are denoted by uppercase letters, such as X and Y.
The notation X(s) = x means that x is the value associated with the outcome s by
the rv X.
Example

When a student calls a university help desk for technical support, he/she will either
immediately be able to speak to someone (S, for success) or will be placed on
hold (F, for failure).
With Sample Space = {S,F}, define an rv X by
X(S) = 1 and X(F) =0
The rv X indicates whether (1) or not (0) the student can immediately speak to
someone.
Bernoulli Random Variable

Any random variable whose only possible values are 0 and 1 is called a Bernoulli
random variable.
Types of Random Variables

Discrete: A discrete variable is a variable whose value is obtained by counting.

Continuous: A continuous variable is a variable whose value is obtained by


measuring.
Probability Distribution

The probability distribution or probability mass function (pmf) of a discrete rv is


defined for every number x by

p(x) = P(X=x)

For Example p(0) = P(X=0)

The values of X along with their probabilities collectively specify the pmf.
Example

Six lots of components are ready to be shipped by a certain supplier. The number
of defective components in each lot is as follows:

Let X be the number of defectives in the selected lot. The three possible X values
are 0, 1, and 2.
Example

Consider whether the next person buying a computer at a certain electronics store
buys a laptop or a desktop model. Let
Question

Consider a group of five potential blood donors—a, b, c, d, and e—of whom only a
and b have type O+ blood. Five blood samples, one from each individual, will be
typed in random order until an O+ individual is identified. Let the rv Y = number of
typings necessary to identify an individual with O+ blood.
Note: Once a donor is selected he cannot be selected again.
Find pmf of Y
Question
Parameter of Probability Distribution

Suppose p(x) depends on a quantity that can be assigned any one of a number of
possible values, with each different value determining a different probability
distribution. Such a quantity is called a parameter of the distribution. The collection
of all probability distributions for different values of the parameter is called a family
of probability distributions.
Bernoulli distribution (Each different number α between 0 and 1 determines a
different member of the Bernoulli family of distributions.)
Question

Starting at a fixed time, we observe the gender of each student coming inside the
class until a boy (B) comes. Let p = P(B), assume that successive coming of
students inside the class are independent, and define the rv X by x = number of
students observed. Find out the pmf.
Cumulative Distribution Function
Question

A store carries flash drives with either 1 GB, 2 GB, 4 GB, 8 GB, or 16 GB of
memory. The accompanying table gives the distribution Y = the amount of
memory in a purchased drive:

Calculate F(1), F(2), F(3), F(4), F(5), F(2.7), F(7.999)


Question

X
Cumulative Distribution Function
Question
A consumer organization that evaluates new automobiles reports the number of
major defects in each car examined. Let X denote the number of major defects in
a randomly selected car of a certain type. The cdf of X is as follows:
Question

1. P(X=2) = F(X=2) - F(X=1) = 0.39 - 0.19 =0.20


2. P(X>3) = 1 - P(X≤3) = 1- 0.67 = 0.33
3. P(2≤ X ≤5) = F(5) - F(1) = 0.97 - 0.19 =0.78
4. P(2< X <5) = F(4) -F(2) = 0.92 - 0.39 = 0.53
Expected Values
Question

Just after birth, each newborn child is rated on a scale called the Apgar scale. The
possible ratings are 0, 1, . . . , 10, with the child’s rating determined by color,
muscle tone, respiratory effort, heartbeat, and reflex irritability (the best possible
score is 10). Let X be the Apgar score of a randomly selected child born at a
certain hospital during the next year, and suppose that the pmf of X is

Calculate the Expectation.


Question

Just after birth, each newborn child is rated on a scale called the Apgar scale. The
possible ratings are 0, 1, . . . , 10, with the child’s rating determined by color,
muscle tone, respiratory effort, heartbeat, and reflex irritability (the best possible
score is 10). Let X be the Apgar score of a randomly selected child born at a
certain hospital during the next year, and suppose that the pmf of X is
Question

Let X, the number of interviews a student has prior to getting a job, have pmf

Calculate the expected value.


Hint: Use summation Σ
Question

Let X, the number of interviews a student has prior to getting a job, have pmf

Hint: Use summation Σ


The Expected Value of a Function

A computer store has purchased three computers of a certain type at $500 apiece.
It will sell them for $1000 apiece. The manufacturer has agreed to repurchase any
computers still unsold after a specified period at $200 apiece. Let X denote the
number of computers sold, and suppose that p(0) = 0.1, p(1) = 0.2, p(2) = 0.3 and
p(3) = 0.4. h(X) denote the profit associated. h(X) = 800X-900. Calculate the
expected profit.
Question

A computer store has purchased three computers of a certain type at $500 apiece.
It will sell them for $1000 apiece. The manufacturer has agreed to repurchase any
computers still unsold after a specified period at $200 apiece. Let X denote the
number of computers sold, and suppose that p(0) = 0.1, p(1) = 0.2, p(2) = 0.3 and
p(3) = 0.4. h(X) denote the profit associated. h(X) = 800X-900. Calculate the
expected profit.
References

Probability and statistics for engineers RA Johnson, I Miller, JE Freund - 2000 -


117.239.47.98
Statistics for business & economics DR Anderson, DJ Sweeney, TA Williams, JD
Camm
Probability and statistics for engineering and science J Deovre
Lecture 5
Variance
Binomial Experiment
Binominal Random Variable
Index

● Variance
● Bernoulli Random Variable
● Binomial Experiment
● Binomial Random Variable
● Binomial Tables
● Questions
Variance
Expectation and Variance

E(X+Y) = E(X) + E(Y) where X and Y are random variables


E(aX) = aE(X) where X is a random variable and a is a constant
E(a) = a where a is a constant
Var(aX) = a2 Var(X) where X is a random variable and a is a
constant
Var(a) = 0 where a is a constant
Expectation and Variance

Var(X) = E[(X-u)2]
= E[X2 + u2 - 2Xu]
= E[X2] + E[u2] - 2E[Xu]
= E[X2] + u2 - 2uE[X]
= E[X2] + u2 - 2uu
= E[X2] - u2
Variance
Question

Let X be a Bernoulli rv with pmf. Calculate the following


1) E(X)
2) E(X2)
3) V(X)
4) E(X79)
Note: Assume p(1) = p and p(0) = 1-p
Question

X is a Bernoulli rv with pmf p(1) = p and p(0) = 1-p.


1) E(X) = 1.p + 0.(1-p) = p
2) E(X2) = 12.p + 02.(1-p) = p
3) V(X) = E(X2) - (E(X))2 = p-p2 = p(1-p)
4) E(X79) = 179.p + 079.(1-p) = p
Question

Let X = the outcome when fair dice is rolled once. If before the die is rolled you
are offered either (1/3.5) dollars or h(X) = 1/X dollars, would you accept the
guaranteed amount or would you gamble?
Question

Let X = the outcome when fair dice is rolled once. If before the die is rolled you
are offered either (1/3.5) dollars or h(X) = 1/X dollars, would you accept the
guaranteed amount or would you gamble?
E(h(X)) = ⅙(1 + ½ + ⅓ + ¼ + ⅕ + ⅙ )
E(h(X)) = 1/(2.44)
So E(h(X)) greater than 1/3.5 so would gamble.
Binomial Experiment

The experiment consists of a sequence of n smaller experiments called trials,


where n is fixed in advance of the experiment.
Each trial can result in one of the same two possible outcomes, which we
generically denote by success (S) and failure (F).
The trials are independent, so that the outcome on any particular trial does not
influence the outcome on any other trial.
The probability of success P(S) is constant from trial to trial; we denote this
probability by p.
Binomial Random Variable
n=4
Formulae
Binomial Tables

Even for a relatively small value of n, the computation of cumulative binomial


probabilities can be tedious.
Hence we use binomial tables.
Binomial Tables tell the cdf.
Question

Suppose that 20% of all copies of a particular textbook fail a certain binding
strength test. Let X denote the number among 15 randomly selected copies that
fail the test. Then X has a binomial distribution with n=15 and p =0.2.
1. Calculate the probability that at most 8 fail the test is
Questions

2. Calculate the probability that exactly 8 fails


Questions

2. Calculate the probability that exactly 8 fails

The result is 0.999 - 0.996 = 0.003.


Question

3. Calculate the probability that at least 8 fail.


4. Calculate the probability that between 4 and 7 fail (4 and 7 inclusive)

Refer the table and calculate the answers


References

Probability and statistics for engineers RA Johnson, I Miller, JE Freund - 2000 -


117.239.47.98
Statistics for business & economics DR Anderson, DJ Sweeney, TA Williams, JD
Camm
Probability and statistics for engineering and science J Deovre
Lecture 6
Poisson Distribution
Poisson Distribution as a limit
Poisson Process
Index

● Questions on Binomial Random Variable


● Poisson Random Variable
● Poisson Distribution
● Poisson Distribution as a limit
● Comparing Poisson and Binomial Distribution values
● Mean & Variance of Poisson Random Variable
● Questions
● Poisson Process
Question

A particular type of tennis racket comes in a midsize version and an oversize


version. Sixty percent of all customers at a certain store want the oversize version.
a. Among ten randomly selected customers who want this type of racket, what is
the probability that at least six want the oversize version?
Question

A particular type of tennis racket comes in a midsize version and an oversize


version. Sixty percent of all customers at a certain store want the oversize version.
a. Among ten randomly selected customers who want this type of racket, what is
the probability that at least six want the oversize version?

= 1- B(5;10, 0.6)

= 1-0.367

= 0.633
Question

A particular type of tennis racket comes in a midsize version and an oversize


version. Sixty percent of all customers at a certain store want the oversize version.
b. Among ten randomly selected customers, what is the probability that the
number who want the oversize version is within 1 standard deviation of the mean
value?
Question

A particular type of tennis racket comes in a midsize version and an oversize


version. Sixty percent of all customers at a certain store want the oversize version.
b. Among ten randomly selected customers, what is the probability that the
number who want the oversize version is within 1 standard deviation of the mean
value?

Mean = 10*(0.6) = 6
Standard Deviation = √(10*0.6*0.4) = 1.55
P(4.45<=X<=7.55)
P(4<X<8) = B(7;10,0.6) - B(4;10,0.6) = 0.833 - 0.166 = 0.667
Question

A particular type of tennis racket comes in a midsize version and an oversize


version. Sixty percent of all customers at a certain store want the oversize version.
c. The store currently has seven rackets of each version. What is the probability
that ten customers can get the version they want from current stock?
Question

A particular type of tennis racket comes in a midsize version and an oversize


version. Sixty percent of all customers at a certain store want the oversize version.
c. The store currently has seven rackets of each version. What is the probability
that all of the next ten customers can get the version they want from current
stock?
P( 3<= X <= 7) = B(7;10,0.6) - B(2;10,0.6) = 0.833 - 0.012 = 0.821
Poisson Distribution
Poisson Distribution as a legitimate pdf

● p(x; µ) > 0 because µ>0


The Poisson Distribution as a limit
Comparing the Poisson and Three Binomial Distributions
Mean & Variance
Mean & Variance
Question

Let X denote the number of creatures of a particular type captured in a trap during
a given time period. Suppose that X has a Poisson distribution with µ = 4.5 , so on
average traps will contain 4.5 creatures.
a. Find probability that the trap contains exactly 5 creatures.
Question

Let X denote the number of creatures of a particular type captured in a trap during
a given time period. Suppose that X has a Poisson distribution with µ = 4.5 , so on
average traps will contain 4.5 creatures.
a. Find probability that the trap contains atleast 5 creatures.
Question

Let X, the number of flaws on the surface of a randomly selected boiler of a


certain type, have a Poisson distribution with parameter µ = 5. Calculate the
following:
a. P(X ≤ 8)
b. P(X = 8)
c. P(9 ≤ X)
d. P(5 ≤ X ≤8)
e. P(5 < X < 8)
Question

Let X, the number of flaws on the surface of a randomly selected boiler of a


certain type, have a Poisson distribution with parameter µ = 5. Calculate the
following:
a. P(X ≤ 8) = F(8; 5) = 0.932
b. P(X = 8) = F(8;5) - F(7;5) = 0.932 - 0.867 = 0.065
c. P(9 ≤ X) = 1- F(8;5) = 1 - 0.932 = 0.068
d. P(5 ≤ X ≤ 8) = F(8;5) - F(4;5) = 0.932 - 0.44 = 0.492
e. P(5 < X < 8) = F(7;5) - F(5;5) = 0.867 - 0.616 = 0.251
Poisson Process

PK(t) denote the probability that k events will be observed during any particular
time interval of length t. The occurrence of events over time as described is called
a Poisson process; the parameter α specifies the rate for the process.
Question

The number of requests for assistance received by a towing service is a Poisson


process with rate α = 4 per hour.
a. Compute the probability that exactly ten requests are received during a
particular 2-hour period.
b. If the operators of the towing service take a 30-min break for lunch, what is the
probability that they do not miss any calls for assistance?
c. How many calls would you expect during their break?
References

Probability and statistics for engineers RA Johnson, I Miller, JE Freund - 2000 -


117.239.47.98
Statistics for business & economics DR Anderson, DJ Sweeney, TA Williams, JD
Camm
Probability and statistics for engineering and science J Deovre
Lecture 7
Continuous Random Variable
Probability Density Function
Uniform Distribution
Percentile of a Distribution
Index

● Probability Density Function


● Uniform Distribution
● Mean and Variance
● Cumulative Distribution Function
● Percentile of a Distribution
● Questions
Probability Density Function
Probability Density Function
Uniform distribution
Mean
Variance

E[(X-u)2] = E[X2 + u2 - 2Xu]


= E[X2] + E[u2] - 2E[Xu]
= E[X2] + u2 - 2uE[X]
= E[X2] + u2 - 2uu
= E[X2] - u2
Question

Calculate mean and variance for the uniform distribution (in terms of a and b).
Question

The time X (min) for a lab assistant to prepare the equipment for a certain
experiment is believed to have a uniform distribution with A = 25 and B = 35.
Cumulative Distribution Functions
Question

Calculate cdf for uniform distribution.


Probability from F(x)
Question

Given the pdf find out the following:


a. cdf F(x)
b. P(1<=X<=1.5)
c. P(X>1)
Obtaining f(x) from F(x)
Percentile of a Distribution

η(p) is that value on the measurement axis such that 100p% of the area under the
graph of f(x) lies to the left of η(p) and 100(1-p)% lies to the right.
Question
References

Probability and statistics for engineers RA Johnson, I Miller, JE Freund - 2000 -


117.239.47.98
Statistics for business & economics DR Anderson, DJ Sweeney, TA Williams, JD
Camm
Probability and statistics for engineering and science J Deovre
Lecture 8
Normal Distribution
Standard Normal Distribution
Percentile
Zα Notation
Normal to Standard Normal Distribution
Index
● Normal Distribution
● Standard Normal Distribution
● Percentile
● Zα Notation
● Normal to Standard Normal Distribution
● Normal Approximation to Binomial Distribution
Normal Distribution
Normal Distribution

P(a ≤ X ≤ b) =
Standard Normal Distribution
Question

Find out the standard normal probabilities:


1. P(Z ≤ 1.25)
2. P(Z > 1.25)
3. P(Z ≤ -1.25)
4. P(-3.8 ≤ Z ≤ 1.25)
Question

P(Z ≤ 1.25) = Φ(1.25) = 0.8944


P(Z>1.25)

P(Z > 1.25) = 1 - P(Z ≤ 1.25) = 1 - Φ(1.25) = 1 - 0.8944 = 0.1056


P(Z ≤ -1.25)

P(Z ≤ -1.25) = Φ(-1.25) = 0.1056

Imp

P(Z ≤ -a ) = P(Z ≥ a)
P(-3.4 ≤ Z ≤ 1.25)
P(-3.4 ≤ Z ≤ 1.25)

P(-3.4 ≤ Z ≤ 1.25) = Φ(1.25) - Φ(-3.4) = 0.8944 - 0.0003 = 0.8941


Percentile
Zα Notation for Z critical values

Zα covers the upper tail area


Zα Notation

α of the area under the z curve lies to the right of Zα


Hence 1 - α of the area lies to the left.
Thus Zα is the (100-α)th percentile of the standard normal distribution.
By symmetry the area under the standard normal curve to the left of -Zα is also α.
Zα Notation
Normal to Standard Normal
Question

The breakdown voltage of a randomly chosen diode of a particular type is known


to be normally distributed(Not standard normally).
What is the probability that a diode’s breakdown voltage is within 1 standard
deviation of its mean value?
What is the probability that a diode’s breakdown voltage is within 2 standard
deviation of its mean value?
What is the probability that a diode’s breakdown voltage is within 3 standard
deviation of its mean value?
Solution
Normal Approximation to Binomial Distribution

A direct proof of this result is quite difficult.


Question

Suppose that 25% of all students at a large public university receive financial aid.
Let X be the number of students in a random sample of size 50 who receive
financial aid. X follows binomial distribution. Calculate the probability that atmost
10 students receive aid.
p = 0.25, np = 50(0.25) = 12.5 ≥ 10, nq = 50(0.75) = 37.5 ≥ 10
References

Probability and statistics for engineers RA Johnson, I Miller, JE Freund - 2000 -


117.239.47.98
Statistics for business & economics DR Anderson, DJ Sweeney, TA Williams, JD
Camm
Probability and statistics for engineering and science J Deovre
Lecture 9
Exponential Distribution
Index

● Exponential Distribution
● Mean & Variance Derivation
● Cumulative Distribution
● Memoryless Property
● Questions
Exponential & Gamma Distribution

● The density curve corresponding to any normal distribution is


bell-shaped and therefore symmetric.
● There are many practical situations in which the variable of
interest to an investigator might have a skewed distribution.
● One family of distributions that has this property is the Gamma
family.
● The exponential distribution is a special case of Gamma
distribution.
Exponential Distribution
Expected Value and Variance
Exponential Distribution
Question

The distribution of stress range in certain bridge connections is a exponential


distribution with mean value 6MPa.
a. Find the probability that stress range is at most 10 MPa.
Question

The distribution of stress range in certain bridge connections is a exponential


distribution with mean value 6 MPa.
b. Find the probability that stress range is between 5 MPa and 10 MPa.
Poisson Distribution & Exponential Distribution
Poisson Distribution & Exponential Distribution
Memoryless Property
Question

Let X = the time between two successive arrivals at the drive-up window of a local
bank. If X has an exponential distribution with λ = 1, compute the following:
a. The expected time between two successive arrivals
b. The standard deviation of the time between successive arrivals
c. P(X<=4)
d. P(2<= X< =5)
References

Probability and statistics for engineers RA Johnson, I Miller, JE Freund - 2000 -


117.239.47.98
Statistics for business & economics DR Anderson, DJ Sweeney, TA Williams, JD
Camm
Probability and statistics for engineering and science J Deovre
Lecture 10
Gamma Function
Gamma Distribution
Incomplete Gamma Function
Index

● Gamma Function
● Gamma Distribution
● Standard Gamma Distribution
● Exponential Distribution
● Gamma Density Curves
● Mean & Variance
● Non standard gamma to standard gamma function
● Questions
Gamma Function
Gamma Function
Question

Evaluate each of the following expressions, leaving the final answer in exact
simplified form.
Gamma Distribution
Standard Gamma Distribution
Exponential Distribution
Gamma Density Curves
Standard Gamma Density Curves
Properties

● For the standard pdf, when α ≤ 1 then f(x;α) is strictly decreasing as x


increases from 0.
● For standard pdf, when α > 1 then f(x;α) rises from 0 at x =0 to a maximum
and then decreases.
● β is called the scale parameter because values other than 1 either stretch or
compress the pdf in the x direction.
Mean & Variance
Incomplete Gamma Distribution
Question

Suppose the reaction time X of a randomly selected individual to a certain


stimulus has a standard gamma distribution with α = 2. Calculate the following:
a. P(3 ≤ X ≤ 5)
Question

Suppose the reaction time X of a randomly selected individual to a certain


stimulus has a standard gamma distribution with α = 2. Calculate the following:
a. P(3 ≤ X ≤ 5)
Question

Suppose the reaction time X of a randomly selected individual to a certain


stimulus has a standard gamma distribution with α = 2. Calculate the following:
b. P(X > 4)
Question

Suppose the reaction time X of a randomly selected individual to a certain


stimulus has a standard gamma distribution with α = 2. Calculate the following:
b. P(X > 4)
Question

Let X have a standard gamma distribution with α = 7. Evaluate the following


a. P(X ≤ 5)
b. P(X < 5)
c. P(X>8)
d. P(3 ≤ X ≤ 8)
e. P(3 < X < 8)
f. P(X<4 or X>6)
Question

Let X have a standard gamma distribution with α = 7. Evaluate the following


a. P(X ≤ 5) = 0.238
b. P(X < 5) = 0.238
c. P(X>8) = 1 - P(X ≤ 8) = 1 - 0.687 = 0.313
d. P(3 ≤ X ≤ 8) = F(8) - F(3) = 0.687 - 0.034 = 0.653
e. P(3 < X < 8) = F(8) - F(3) = 0.687 - 0.034 = 0.653
f. P(X<4 or X>6) = F(4) + 1 - F(6) = 0.111 + 1 - 0.394 = 0.717
Non standard gamma function to gamma function
Question

Suppose the time spent by a randomly selected student who uses a terminal
connected to a local time-sharing computer facility has a gamma distribution with
mean 20 min and variance 80 min2.
a. What are the values of α and β?
Question

Suppose the time spent by a randomly selected student who uses a terminal
connected to a local time-sharing computer facility has a gamma distribution with
mean 20 min and variance 80 min2.
b. What is the probability that a student uses the terminal for at most 24 min?
Question

Suppose the time spent by a randomly selected student who uses a terminal
connected to a local time-sharing computer facility has a gamma distribution with
mean 20 min and variance 80 min2.
c. What is the probability that a student spends between 20 and 40 min using the
terminal?
References

Probability and statistics for engineers RA Johnson, I Miller, JE Freund - 2000 -


117.239.47.98
Statistics for business & economics DR Anderson, DJ Sweeney, TA Williams, JD
Camm
Probability and statistics for engineering and science J Deovre
Lecture 11
Chi square distribution
Two random variables
Joint & Marginal Probability Functions
Index

● Chi square distribution


● Skewed Distribution
● Chi square curve
● Two discrete random variables
● Marginal probability mass function
● Two continuous random variables
● Marginal probability density function
● Independent random variables
● Questions
Chi square Distribution

● The chi-squared distribution is important because it is the basis for a number


of procedures in statistical inference.
● Gamma pdf is
Chi square Distribution
Skewed Distribution

● If one tail is longer than another, the distribution is skewed.


● These distributions are sometimes called asymmetric or asymmetrical
distributions as they don’t show any kind of symmetry.
● In a normal distribution, the mean and the median are the same number while
the mean and median in a skewed distribution become different numbers.
Left skewed distribution

● A left-skewed distribution has a long left tail.


● Left-skewed distributions are also called negatively-skewed distributions.
● That’s because there is a long tail in the negative direction on the number
line.
● Mean is also to the left of the peak.
● Mean is to the left of the median also.
Right skewed Distribution

● A right-skewed distribution has a long right tail.


● Right-skewed distributions are also called positive-skew distributions.
● That’s because there is a long tail in the positive direction on the number line.
● Mean is to the right of the peak.
● Mean is to the right of the median also.
Chi square curve

● The curve is nonsymmetrical and skewed to the right.


● There is a different chi-square curve for each dof.
● The mean is located to the right of the peak.
● Mean = dof
● Variance = 2*dof
Two discrete random variables
Marginal Probability Mass Function
Question
A large insurance agency services a number of customers who have purchased
both a homeowner’s policy and an automobile policy from the agency. For each
type of policy, a deductible amount must be specified. For an automobile policy,
the choices are $100 and $250, whereas for a homeowner’s policy, the choices
are 0, $100, and $200. Suppose an individual with both types of policy is selected
at random from the agency’s files. Let X be the deductible amount on the auto
policy and Y be the deductible amount on the homeowner’s policy. Suppose the
joint pmf is given as follows:
Question
Calculate the following:
1. p(100,100)
2. P(Y>=100)
3. pX(x)
4. pY(y)
1. p(100,100) = P( X=100 and Y =100) = 0.1
2. P(Y>=100) = p(100,100) + p(250,100) + p(100,250) + p(250,200) = 0.75
(Can be computed from pmf of Y too)
Question

A service station has both self-service and full-service islands. On each island,
there is a single regular unleaded pump with two hoses. Let X denote the number
of hoses being used on the self-service island at a particular time, and let Y
denote the number of hoses on the full-service island in use at that time. The joint
pmf of X and Y appears in the accompanying tabulation.
Question
Answers

a. 0.20
b. 0.42
c. Atleast one of the hoses is there in both full service and self service islands,
0.7
d. px(0) = 0.16 , px(1) = 0.34 , px(2) = 0.5 , 0 otherwise
py(0) = 0.24 , p1(1) = 0.38 , p2(0) = 0.38 , 0 otherwise
P(X<=1) = 0.5
Two continuous random variables
Marginal Probability Density Function
Independent random variables
For the given pdf

1. Verify it is a legitimate pdf.


2. Find out P(0 ≤ X ≤ 0.25, 0 ≤ Y ≤ 0.25)
3. Find marginal pdf of X
4. Find marginal pdf of Y
5. Find P(0.25 ≤ Y ≤ 0.75)
Answer 1.
Answer 2.
Answer 3 & 4
Answer 5
References

Probability and statistics for engineers RA Johnson, I Miller, JE Freund - 2000 -


117.239.47.98
Statistics for business & economics DR Anderson, DJ Sweeney, TA Williams, JD
Camm
Probability and statistics for engineering and science J Deovre
Lecture 12
Covariance
Correlation Coefficient
Index

● Two random variables


● Expected value of a function
● Covariance
● Correlation Coefficient
● Questions
Question
Each front tire on a particular type of vehicle is supposed to be filled to a pressure of
26 psi. Suppose the actual air pressure in each tire is a random variable X for the right
tire and Y for the left tire with the joint pdf

a. What is the value of K?


b. What is the probability that both tires are underfilled?
c. Determine the (marginal) distribution of air pressure in the right tire alone.
d. Are X and Y independent rv’s?
Answer
Expected Value of a function
Covariance
Covariance

For a strong positive relationship that is when X increases then Y also increases,
Cov(X, Y) would be quite positive.
For a strong negative relationship that is when X increases then Y decreases,
Cov(X, Y) would be quite negative.
If X and Y are not strongly related, covariance will be near 0
Covariance
Question

Given the joint pmf, calculate Cov(X,Y)


Cov(X,Y) shortcut formula

Cov(X,Y) = E[(X-uX)(Y-uY)]
= E[XY + uXuY - XuY - YuX]
= E(XY) + E(uXuY) - E(XuY) - E(YuX)
= E(XY) + uXuY - uyE(X) - uxE(Y)
= E(XY) + uXuY - uyux - uxuy
= E(XY) - uXuY
Correlation Coefficient
Correlation Coefficient

● ρ is a measure of the degree of linear relationship between X and Y.


● ρ = 0 does not imply that X and Y are independent, but only that there is a
complete absence of a linear relationship.
● When ρ = 0, X and Y are said to be uncorrelated. Two variables could be
uncorrelated yet highly dependent because there is a strong nonlinear
relationship, so be careful not to conclude too much from knowing that ρ = 0.
Covariance vs Correlation
References

Probability and statistics for engineers RA Johnson, I Miller, JE Freund - 2000 -


117.239.47.98
Statistics for business & economics DR Anderson, DJ Sweeney, TA Williams, JD
Camm
Probability and statistics for engineering and science J Deovre
Lecture 13
Central Limit Theorem
Hypothesis Testing
Index

● Random Sample
● Sample mean
● Central Limit Theorem
● Hypothesis Testing
● Test Procedure
● Type of errors
● Level of Significance
● p value
● Lower Tail Test
● Questions
Random Sample
Sample mean
Central Limit Theorem
Let X1, X2, . . . , Xn be random samples from a distribution with mean μ and
variance σ2 .
Then if n is sufficiently large, X̄ has approximately a normal distribution with
mean μ and variance σ2/n.
The larger the value of n, the better the approximation.
Rule of thumb: If n > 30, the Central Limit Theorem can be used.
In case X1, X2, . . . , Xn are normally distributed with mean μ and variance σ2
then for any n, X̄ has a normal distribution with mean μ and variance σ2/n.
Question

The amount of a particular impurity in a batch of a certain chemical product is a


random variable with mean value 4.0 g and standard deviation 1.5 g. If 50 batches
are independently prepared, what is the (approximate) probability that the sample
average amount of impurity X̄ is between 3.5 and 3.8 g?
Hypothesis Testing
Hypothesis

A statistical hypothesis, or just hypothesis, is a claim or assertion either about the


value of a single parameter (population characteristic or characteristic of a
probability distribution) or about the values of several parameters, or about the
form of an entire probability distribution.
Test Procedure
Type of errors
Null and Alternate Hypothesis
For hypothesis tests involving a population mean, we let μ0 denote the
hypothesized value and we must choose one of the following three forms for the
hypothesis test.
Equality part always comes in the H0
Question

The manager of an automobile dealership is considering a new bonus plan


designed to increase sales volume. Currently, the mean sales volume is 14
automobiles per month. The manager wants to conduct a research study to see
whether the new bonus plan increases sales volume. To collect data on the plan,
a sample of sales personnel will be allowed to sell under the new bonus plan for a
one-month period.
a. Develop the null and alternative hypotheses most appropriate for this situation.
b. Comment on the conclusion when H0 cannot be rejected.
c. Comment on the conclusion when H0 can be rejected.
Question

H0 is the default assumption that nothing has changed. So if μ becomes greater


than 14, then it is a change which will be part of Ha
Ho : μ ≤ 14
Ha : μ > 14
Ho cannot be rejected when there is no evidence that new plan increases sales.
Ho can be rejected when there is evidence that new plan increases sales.
Question

Because of high production-changeover time and costs, a director of


manufacturing must convince management that a proposed manufacturing
method reduces costs before the new method can be implemented. The current
production method operates with a mean cost of $220 per hour. A research study
will measure the cost of the new method over a sample production period.
a. Develop the null and alternative hypotheses most appropriate for this study.
b. Comment on the conclusion when H0 cannot be rejected.
c. Comment on the conclusion when H0 can be rejected.
Question

H0 is the default assumption that nothing has changed. So if μ becomes less than
200, then it is a change which will be part of Ha
H0 : μ ≥ 200
Ha : μ < 200
H0 cannot be rejected when there is no evidence that proposed manufacturing
method reduces costs.
H0 can be rejected when there is evidence that proposed manufacturing method
reduces costs.
Level of Significance

● The level of significance is the probability of making a type I error when the null
hypothesis is true as an equality.
● Type I error: Reject null hypothesis when it is actually true.
● The greek symbol α (alpha) is used to denote the level of significance, and
common choices for α are 0.05 and 0.01.
● In practice, the level of significance is already specified before testing.
● In simple terms, level of significance will define the rejection region of the
graph.
Level of Significance
● By selecting α, that person is controlling the probability of making a type I error.
● Applications of hypothesis testing that only control for the type I error are called
significance tests.
● Because of the uncertainty associated with making a type II error when
conducting significance tests, statisticians usually recommend that we use the
statement “do not reject H0” instead of “accept H0.”
Tests for Population Mean when σ known: Z test

● σ known case corresponds to applications in which historical data and/or other


information are available that enable us to obtain a good estimate of the
population standard deviation prior to significance tests.
● Assumption: Population is normally distributed.
● In cases where it is not reasonable to assume the population is normally
distributed, these methods are still applicable if the sample size is large
enough.
One tailed tests
p value

● A p-value is a probability that provides a measure of the evidence against the


null hypothesis provided by the sample.
● Smaller p-values indicate more evidence against H0.
● The value of the test statistic is used to compute the p-value.
Rules for hypothesis testing
Question
References

Probability and statistics for engineers RA Johnson, I Miller, JE Freund - 2000 -


117.239.47.98
Statistics for business & economics DR Anderson, DJ Sweeney, TA Williams, JD
Camm
Probability and statistics for engineering and science J Deovre
Lecture 14
Hypothesis Testing
Index

● Hypothesis Testing
● Lower tail Z test
● Upper tail Z test
● Two tail Z test
● t test
Hypothesis

A statistical hypothesis, or just hypothesis, is a claim or assertion either about the


value of a single parameter (population characteristic or characteristic of a
probability distribution), about the values of several parameters, or about the form
of an entire probability distribution.
Test Procedure
Type of errors
Level of Significance

The level of significance is the probability of making a type I error when the null
hypothesis is true as an equality.
Type I error: Reject null hypothesis when it is actually true.
The greek symbol α (alpha) is used to denote the level of significance, and
common choices for α are 0.05 and 0.01.
In practice, the person responsible for the hypothesis test specifies the level of
significance.
In simple terms, level of significance will define the rejection region of the graph.
Tests for Population Mean when σ known: Z test

σ known case corresponds to applications in which historical data and/or other


information are available that enable us to obtain a good estimate of the
population standard deviation prior to significance tests.
Assumption: Population is normally distributed.
In cases where it is not reasonable to assume the population is normally
distributed, these methods are still applicable if the sample size is large enough.
One tailed tests
p value

A p-value is a probability that provides a measure of the evidence against the null
hypothesis provided by the sample. Smaller p-values indicate more evidence
against H0.
The value of the test statistic is used to compute the p-value.
Question
Question
Question
Rules for hypothesis testing
Tests for Population Mean when σ unknown: t test

Because the σ unknown case corresponds to situations in which an estimate of


the population standard deviation cannot be developed prior to sampling, the
sample must be used to develop an estimate of both μ and σ.
t is the test statistic
Degree of freedom

Degrees of Freedom refers to the maximum number of logically independent


values, which are values that have the freedom to vary, in the data sample.
t Distribution

The t distribution, also known as the Student's t-distribution, is a type of probability


distribution that is similar to the normal distribution with its bell shape but has
heavier tails.
t distributions have a greater chance for extreme values because of the fatter tails.
The t statistics has t distribution with n-1 degrees of freedom.
Rules for hypothesis testing
Question

a. Compute the value of the test statistic.


b. Use the t distribution table to compute a range for the p-value.
c. At α = .05, what is your conclusion?
d. What is the rejection rule using the critical value? What is your conclusion?
do’nt
^
References

Probability and statistics for engineers RA Johnson, I Miller, JE Freund - 2000 -


117.239.47.98
Statistics for business & economics DR Anderson, DJ Sweeney, TA Williams, JD
Camm
Probability and statistics for engineering and science J Deovre

You might also like