UCLAChapter 9
UCLAChapter 9
UCLAChapter 9
Population of size N.
N = 100
n = 10
Step 3: Assuming we are sampling from a finite population, repeat Steps 1 and 2 until all simple
random samples of size n have been obtained.
1
Example: A student has a large digital music library. The mean length of the songs is 258 seconds with
a standard deviation of 87 seconds.
b. What should the student expect the average song length to be for his play-list?
c. What is the standard error for the mean song length of 30 randomly selected songs?
d. Would a play-list of 50 songs have a standard error that is greater than or less than your answer to
part c? Explain.
2
9.2: Central limit theorem for sample means
The Central Limit Theorem (CLT) which says regardless of the shape of the underlying pop-
ulation distribution, the sampling distribution of x̄ will follow an approximately normal
distribution as the sample size, n, is increase.
• Large Sample: The rule of thumb is if n is greater than or equal to 25 the data set will be
approximately normal no matter the shape of the population population.
n ≥ 25
3
Steps to find the probability of the Sampling Distribution of x̄.
µx̄=µ
– Find the standard deviation of the sampling distribution of x̄ (σx̄ ). This is also known
as the standard error.
σx̄= √σn
n ≥ 25
∗ If none of the top two applied then the sampling distribution is considered NOT a
normal distribution and the probability cannot found.
x̄ − µx̄
z=
σx̄
• Step 3: Draw a standard normal distribution and shade the appropriate side.
−3 −2 −1 0 1 2 3
4
Example: According to data from the National Health and Nutrition Exam Survey, the mean weight of
10-year-old boys is 88.3 pounds with a standard deviation of 2.06 pounds. Assume the distribution of
weights is Normal.
a. Suppose we take a random sample of 49 boys from this population. Can we find the approximate
probability that the average weight of this sample will be above 89 pounds? If so, find it. If not,
explain why.
b. Suppose we take a random sample of 16 boys from this population. Can we find the approximate
probability that the average weight of this sample will be above 89 pounds? If so, find it. If not,
explain why not.
5
Example: Home prices in California have a distribution that is skewed right. The mean of the home
prices is $498,000 with a standard deviation of $25,200.
a. Suppose we take a random sample of 30 homes in California, what is the probability that the mean of
this sample is greater than $510,000?
b. Suppose we take a random sample of 10 homes in this California. Can we find the approximate
probability that the mean of the sample is more than $510,000? If so, find it. If not, explain why not.
6
The t-distribution
0.1
−6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6
t-distribution
• The area under the t-distribution curve is 1. The area under the t-distribution curve to the
right of 0 equals the area under the curve to the left of 0, which equals 12 .
• As t increases or decreases without bound, the graph approaches, but never equals, zero.
• The area in the tails of the t-distribution is a little greater than the area in the tails of the
standard normal distribution, because we are using S as an estimate of σ, thereby introducing
further variability into the t - distribution.
• As the sample size n increases, the distribution curve of t gets closer to the standard normal
distribution. This result occurs because, as the sample size n increases, the values of S get
closer to the values of σ, by the Law of Large Numbers.
7
9.3: Answering Questions about the Mean of a Population
There are two approaches for answering questions about a population mean:
• Confidence intervals: Used for estimating parameter values. We use the confidence interval
when the parameter in unknown.
• Hypothesis tests: Used for deciding whether a parameter’s value is one thing or another. We use
hypothesis testing when we have an idea of what the parameter can be, but we believe it is now
different than that value.
Use confidence intervals whenever you are estimating the value of a population parameter on the basis of
a random sample.
Do NOT use a confidence interval if there is no uncertainty in your estimate. If you have data for the
entire population you need to find a confidence interval since the population parameter is known – there
is no need to estimate it.
Recall: The Central Limit Theorem (CLT) which says regardless of the shape of the underlying
population (Parent distribution), the sampling distribution (Child distribution) of x̄ will
follow an approximately normal distribution as the sample size, n, is increase.
• Large Sample: The rule of thumb is if n is greater than or equal to 25 the data set will be
approximately normal no matter the shape of the parent population.
n ≥ 25
8
Constructing a (1 − α) × 100% Confidence Interval for the population mean (µ).
Steps to find the confidence interval for µ:
• Point of Estimate = x̄
α
t∗ = T Dist(df ).inverseCDF (1 − )
2
Degree of Freedom (df) = n - 1.
α = 1 − Confidence Level
s
Standard Error = σx̄ = √
n
Find the margin of error.
x̄ − E < µ < x̄ + E
The confidence level is a measure of how well the method used to produce the confidence interval
performs. For example, a 95% confidence interval means that if we were to take many random samples of
the same size from the same population, we expect 95 of them would “work” (contain the population
parameter) and five of them would be “wrong” (not contain the population parameter).
9
Example: Data on the speed (in mph) for random sample of 30 cars traveling on a highway was
collected. The sample mean speed was 63.3 mph with a sample standard deviation of 5.23 mph.
a. Find the 95% confidence interval for the mean speed of all cars traveling on the highway.
b. Is it plausible that the mean speed of cars on the highway is 67 mph? Why or why not?
10
Example: A used car website wanted to estimate the mean price of a Nissan Altima. The site gathered
data on a random sample of 26 such cars and found a sample mean of $16,610 and a sample standard
deviation of $2736.
b. Verify that the conditions for a valid confidence interval are met.
c. Construct a 90% confidence interval for the mean cost of this model car based on this data
11
9.4: Hypothesis Testing for Means
Just as with hypothesis tests for proportions, hypothesis tests can be one-sided or two-sided depending on
the research question. The choice of the alternative hypothesis determines how the p-value is calculated.
µx̄ = µ
s
σx̄ = √
n
x̄ − µx̄
t◦ =
σx̄
t◦ follows Student’s t-distribution with n - 1 degrees of freedom.
Recall: The Central Limit Theorem (CLT) which says regardless of the shape of the underlying
population (Parent distribution), the sampling distribution (Child distribution) of x̄ will
follow an approximately normal distribution as the sample size, n, is increase.
• Large Sample: The rule of thumb is if n is greater than or equal to 25 the data set will be
approximately normal no matter the shape of the parent population.
n ≥ 25
12
• Prework • P-value method and Interpretation.
∗ Left-tailed test
– Find and label the following values:
∗ Two-tailed test
n, x̄, µ, s, and degree of freedom (df)
∗ Right-tailed test
– Check the conditions of the Central
Limit Theorem if the distribution is
not normal. – Compare the p-value with α.
∗ Simple Random Sample: The
sample is obtained by simple ran- ∗ P-value ≤ α
dom sampling. · Reject H◦
∗ Large Sample: n ≥ 25 ∗ P-value > α
• State the null and alternative hypoth- · Fail to reject H◦
esis:
H◦ :
Ha :
• Stating a conclusion interpreting the
• Compute the test statistics(t◦ ): results of the hypothesis test:
µx̄ = µ
– Once we have found the p-value and
made a statistical decision about the
s
σx̄ = √ null hypothesis (i.e. we will reject the
n
null or fail to reject the null), we then
x̄ − µx̄ want to summarize our results into an
t◦ = overall conclusion for our test.
σx̄
13
Example: Susan is in charge of quality control at a small fruit juice bottling plant. Each bottle
produced is supposed to contain exactly 12 fluid ounces(fl oz) of juice.Susan decides to test this by
randomly sampling 30 filled bottles and carefully measuring the amount of juice inside each. She will
recalibrate the machinery if the average amount of juice per bottle differs from 12 fl oz at the 1%
significance level.The sample of 30 bottles has an average of 11.92 fl oz per bottle and a standard
deviation of 0.26 fl oz. Should Susan recalibrate the machinery?
14
Example: The board of a major credit card company requires that the mean wait time for customers for
service calls is at most 3.00 minutes. To make sure that the mean wait time is not exceeding the
requirement, an assignment manager tracks the wait times of 45 randomly selected calls. The mean wait
time was calculated to be 3.4 minutes with a standard deviation of 1.45 minutes, is there sufficient
evidence to say that the mean wait time is longer than 3.00 minutes with a significant level of 2.5%?
15
9.5: Comparing Two Population Means
When comparing two populations, it is important to note whether the data are two independent samples
or are paired (dependent) samples.
Each observation in one group is coupled or paired with one particular observation in the other group.
Example: People chosen in a random sample are asked how many minutes they spend reading and how
many minutes they spent exercising during a certain day. Researchers wanted to know how different the
mean amounts of time were for each activity. Would this study be considered a dependent or
independent sample?
Example: A sample of men and women each had their hearing tested. Researchers wanted to know
whether, typically, men and women differed in their hearing ability. Would this study be considered a
dependent or independent sample?
Example: A random sample of married couples are asked how many minutes per day they spent
exercising. Means were compared to see if the mean exercise times for husbands and wives differed.
Would this study be considered a dependent or independent sample?
16
Confidence Intervals: Independent Samples
To construct the confidence interval for the difference in population means given independent samples,
check four conditions:
• The populations from which the samples are drawn are normally distributed or the sample sizes are
large (n1 ≥ 25, n2 ≥ 25);
• For each sample, the sample size is no more than 10% of the population size. (10×n < Population
of interest.)
Formula for constructing a confidence interval about the difference between the population
means
Interpreting confidence intervals for the difference of population means given independent samples is the
same as interpreting confidence intervals for the difference of population proportions.
17
Example: A young statistics professor decided to give a quiz in class every week. He was not sure if the
quiz should occur at the beginning of class when the students are fresh or at the end of class when
they’ve gotten warmed up with some statistical thinking. Since he was teaching two sections of the same
course that performed equally well on past quizzes, he decided to do an experiment. He randomly chose
the first class to take the quiz during the second half of the class period (Late) and the other class took
the same quiz at the beginning of their hour (Early). He put all of the grades into a data and found the
mean and standard deviation for both groups and put it in table 1. Construct a 80% confidence interval.
18
Example: In a packing plant, a machine packs carton with jars. It is supposed that a new machine will
pack faster on the average than the machine currently used. The times it takes each machine to pack ten
cartons are recorded. Assume both datasets to be normally distributed. The results of the machines, in
seconds, are shown in the following table.
Construct a 95% confidence interval for the difference in the mean package time for the old and new
machine.
19
Inference about Two Means: Independent Samples
To test hypotheses regarding two population means, µ1 and µ2 , with unknown population standard
deviations, we can use the following steps, provided that:
• The populations from which the samples are drawn are normally distributed or the sample sizes are
large (n1 ≥ 25, n2 ≥ 25);
• For each sample, the sample size is no more than 10% of the population size.
3. t= Difference SE
in sample means
est
4. If all the conditions are met for the CLT, the test statistic approximately follows Student’s t-
distribution with the smaller of n1 -1 or n2 -1 degrees of freedom where is the sample mean and
Si is the sample standard deviation from population i.
The degree of freedom for the critical value is calculated by using the smaller of n1 – 1 or n2 – 1 degrees
of freedom.
20
Example: A statistics professor was handing out midterm grade slips on a Friday which happened to be
the day before the school’s Spring break. He noticed that there were an unusually large number of
students missing from class that day. So he collected the leftover grades slips and created the data in
Table 3 that summarized the midterm grades (out of a possible 100) for students that attended and
missed class. The professor had reason to suspect, before even looking at the data, that, in general,
students who missed class would tend to have lower mean midterm grades. Test the professor claim at a
significant level of 10%. You may assume that the data for both groups are reasonably symmetric and
have no strong outliers.
21
Example: A young statistics professor decided to give a quiz in class every week. He was not sure if the
quiz should occur at the beginning of class when the students are fresh or at the end of class when
they’ve gotten warmed up with some statistical thinking. Since he was teaching two sections of the same
course that performed equally well on past quizzes, he decided to do an experiment. He randomly chose
the first class to take the quiz during the second half of the class period (Late) and the other class took
the same quiz at the beginning of their hour (Early). He put all of the grades into a data and found the
mean and standard deviation for both groups and put it in table 4. Test to see if it make a difference
when the quiz was given during the class with a significant level of 20%.
22
Example: In a packing plant, a machine packs carton with jars. It is supposed that a new machine will
pack faster on the average than the machine currently used. To test that hypothesis, the times it takes
each machine to pack ten cartons are recorded. Assume both datasets to be normally distributed. The
results of the machines, in seconds, are shown in the following table. Do the data provide sufficient
evidence to conclude that, on the average, the new machine pack faster? Perform the required hypothesis
test at the 5% level of significance.
23
Dependent Samples
• Transform the original data from two variables into a single variable that contains the difference
between the scores in Group 1 and Group 2.
• After the differences have been computed, we can apply either a confidence interval approach or a
hypothesis test approach to the differences.
Construct and Interpret Confidence Intervals for the Population Mean Difference of Matched-
Pairs Data:
24
Example: Do you think your pulse rate is higher when you are taking a quiz than when you are sitting
in a lecture? The data in Table 5 show pulse rates collected from 10 students in a class lecture and then
from the same students during a quiz. Assume that the data has a normal distribution.
Student 1 2 3 4 5 6 7 8 9 10
Quiz 75 52 52 80 56 90 76 71 70 66
Lecture 73 53 47 88 55 70 61 75 61 78
Difference
Using the data in the table,Construct a 95% confidence interval for the difference in mean pulse rate
between students in a class lecture and taking a quiz.
25
Test Hypotheses Regarding Matched-Pairs/Dependent Data
Remark: Statistical inference methods on matched-pairs data use the same methods as inference on a
single population mean, except that the differences are analyzed.
To test hypotheses regarding the mean difference of matched-pairs data, the following must be satisfied:
• The differences are normally distributed with no outliers or the sample size, n, is large (n ≥ 25).
Determine the null and alternative hypotheses. The hypotheses can be structured in one of three ways,
where �d is the population means difference of the matched-pairs data.
x̄dif f erence −0
t= SEdif f ernce
26
Example: Do you think your pulse rate is higher when you are taking a quiz than when you are sitting
in a lecture? The data in Table 6 show pulse rates collected from 10 students in a class lecture and then
from the same students during a quiz. Assume that the data has a normal distribution.
Student 1 2 3 4 5 6 7 8 9 10
Quiz 75 52 52 80 56 90 76 71 70 66
Lecture 73 53 47 88 55 70 61 75 61 78
Difference
Using the data in the table, test whether pulse rate is higher when you are taking a quiz than when you
are sitting in a lecture at a significant level of 5%.
27
A Lot of Repetition…
• The hypothesis test for two means is very similar to the test for one mean.
• The hypothesis test for paired data is really a special case of the one-sample t-test.
• Hypothesis tests use almost the same calculations as confidence intervals and they impose the same
conditions.
Don’t Accept Ho
If the p-value is larger than the significance level, we do not reject the null hypothesis.
This is different from “accepting” the null hypothesis. Just because we do not reject the null hypothesis
does not mean that we now believe the null hypothesis is true.
If the alternative hypothesis is two-sided, a confidence interval can be used instead of a hypothesis test.
Both approaches will always reach the same conclusion.
28
1. The standard deviation of the sampling distribution is called the
a. Unbiased estimator.
b. Standard error.
c. Standard deviation.
d. p-value.
2. What effect, if any, does the sample size have on the standard error?
3. As a rule of thumb, we can apply the Central Limit Theorem for Sample Means for population
distributions which may not be Normal if the sample size is at least
a. 10.
b. 20.
c. 25.
d. 50.
4. Which of the following is NOT a condition that must hold in order to construct a confidence
interval for the difference of two population means?
d. The populations are approximately normal or each sample size is at least 25.
29
5. In a study on weekly time spent in employment, researchers constructed a confidence interval for
the difference in weekly mean employment times for men (Population 1) and women (Population
2). The resulting interval was (-2.14, 1.15). Based on this interval, we can conclude
a. Men spend between 2.14 and 1.15 more hours weekly in employment than women.
b. Women spend between 2.14 and 1.15 more hours weekly in employment than men.
c. There is no difference in the mean employment times for men and women.
30