Nothing Special   »   [go: up one dir, main page]

UCLAChapter 9

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

9.

1: Sample means of random samples

Illustrating Sampling Distributions

Click on the link for an illustration of the sampling distribution.

Population of size N.

N = 100

Step 1: Obtain a simple random sample of size n.

n = 10

Step 2: Compute the sample mean.

Find the sample mean (x̄) of the sample.

Step 3: Assuming we are sampling from a finite population, repeat Steps 1 and 2 until all simple
random samples of size n have been obtained.

1
Example: A student has a large digital music library. The mean length of the songs is 258 seconds with
a standard deviation of 87 seconds.

a. Is the mean value of 258 seconds a parameter or a statistic? Explain.

b. What should the student expect the average song length to be for his play-list?

c. What is the standard error for the mean song length of 30 randomly selected songs?

d. Would a play-list of 50 songs have a standard error that is greater than or less than your answer to
part c? Explain.

2
9.2: Central limit theorem for sample means

The Central Limit Theorem (CLT) for a Sample Means

Population distri- Population distri- Population distri-


bution is uniform bution is symmetric bution is skewed

The sampling distribution


will be approximately nor-
mally distributed if we take
a sample size of n many
from the population distri-
bution, because of the CLT

The Central Limit Theorem (CLT) which says regardless of the shape of the underlying pop-
ulation distribution, the sampling distribution of x̄ will follow an approximately normal
distribution as the sample size, n, is increase.

Conditions for the Central Limit Theorem (CLT)

• Random Sample: The sample is obtained using simple random sampling.

• Large Sample: The rule of thumb is if n is greater than or equal to 25 the data set will be
approximately normal no matter the shape of the population population.

n ≥ 25

3
Steps to find the probability of the Sampling Distribution of x̄.

• Step 1: Describe the Sampling Distribution of the Sample Means.

– Find mean of the sampling distribution of x̄:

µx̄=µ

– Find the standard deviation of the sampling distribution of x̄ (σx̄ ). This is also known
as the standard error.
σx̄= √σn

– Check the shape/distribution of the Sampling Distribution of x̄.

∗ If the population distribution is normally distributed, then the sampling distri-


bution will also follow a normal distribution.

∗ If the population distribution is NOT normally distributed, the sampling distri-


bution will be approximately normally distributed if the sample size is increased
(CLT) and a random sample.

n ≥ 25
∗ If none of the top two applied then the sampling distribution is considered NOT a
normal distribution and the probability cannot found.

• Step 2: Convert the x̄ into a z-value:

x̄ − µx̄
z=
σx̄
• Step 3: Draw a standard normal distribution and shade the appropriate side.

−3 −2 −1 0 1 2 3

• Step 4: Find the probability using technology.

4
Example: According to data from the National Health and Nutrition Exam Survey, the mean weight of
10-year-old boys is 88.3 pounds with a standard deviation of 2.06 pounds. Assume the distribution of
weights is Normal.

a. Suppose we take a random sample of 49 boys from this population. Can we find the approximate
probability that the average weight of this sample will be above 89 pounds? If so, find it. If not,
explain why.

b. Suppose we take a random sample of 16 boys from this population. Can we find the approximate
probability that the average weight of this sample will be above 89 pounds? If so, find it. If not,
explain why not.

5
Example: Home prices in California have a distribution that is skewed right. The mean of the home
prices is $498,000 with a standard deviation of $25,200.

a. Suppose we take a random sample of 30 homes in California, what is the probability that the mean of
this sample is greater than $510,000?

b. Suppose we take a random sample of 10 homes in this California. Can we find the approximate
probability that the mean of the sample is more than $510,000? If so, find it. If not, explain why not.

6
The t-distribution

Introduction to the t-Distribution (Student’s Distribution)

0.4 Standard Normal(0, 1) Standard Normal


df = 10 df =1
df =2
df =10
0.3
df = 2
df = 1
0.2

0.1

−6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6
t-distribution

Summary of Student’s t-Distribution

• The t-Distribution is used if σ is unknown

• The t-distribution is different for different degrees of freedom.

• The t-distribution is centered at 0 and is symmetric about 0.

• The area under the t-distribution curve is 1. The area under the t-distribution curve to the
right of 0 equals the area under the curve to the left of 0, which equals 12 .

• As t increases or decreases without bound, the graph approaches, but never equals, zero.

• The area in the tails of the t-distribution is a little greater than the area in the tails of the
standard normal distribution, because we are using S as an estimate of σ, thereby introducing
further variability into the t - distribution.

• As the sample size n increases, the distribution curve of t gets closer to the standard normal
distribution. This result occurs because, as the sample size n increases, the values of S get
closer to the values of σ, by the Law of Large Numbers.

7
9.3: Answering Questions about the Mean of a Population

There are two approaches for answering questions about a population mean:

• Confidence intervals: Used for estimating parameter values. We use the confidence interval
when the parameter in unknown.

• Hypothesis tests: Used for deciding whether a parameter’s value is one thing or another. We use
hypothesis testing when we have an idea of what the parameter can be, but we believe it is now
different than that value.

When to Use Confidence Intervals

Use confidence intervals whenever you are estimating the value of a population parameter on the basis of
a random sample.

Do NOT use a confidence interval if there is no uncertainty in your estimate. If you have data for the
entire population you need to find a confidence interval since the population parameter is known – there
is no need to estimate it.

Recall: The Central Limit Theorem (CLT) which says regardless of the shape of the underlying
population (Parent distribution), the sampling distribution (Child distribution) of x̄ will
follow an approximately normal distribution as the sample size, n, is increase.

Conditions for the Central Limit Theorem (CLT)

• Random Sample: The sample is obtained using simple random sampling.

• Large Sample: The rule of thumb is if n is greater than or equal to 25 the data set will be
approximately normal no matter the shape of the parent population.

n ≥ 25

8
Constructing a (1 − α) × 100% Confidence Interval for the population mean (µ).
Steps to find the confidence interval for µ:

 Find the point of estimate.

• Point of Estimate = x̄

 Find the critical value (t α2 = t∗ ).

α
t∗ = T Dist(df ).inverseCDF (1 − )
2
Degree of Freedom (df) = n - 1.

α = 1 − Confidence Level

 Find the standard error.

s
Standard Error = σx̄ = √
n
 Find the margin of error.

Margin of Error = t∗ × σx̄

 Find the confidence interval.

Point of Estimate ± Margin of Error

x̄ − E < µ < x̄ + E

Interpretation of the Confidence Level

The confidence level is a measure of how well the method used to produce the confidence interval
performs. For example, a 95% confidence interval means that if we were to take many random samples of
the same size from the same population, we expect 95 of them would “work” (contain the population
parameter) and five of them would be “wrong” (not contain the population parameter).

9
Example: Data on the speed (in mph) for random sample of 30 cars traveling on a highway was
collected. The sample mean speed was 63.3 mph with a sample standard deviation of 5.23 mph.

a. Find the 95% confidence interval for the mean speed of all cars traveling on the highway.

b. Is it plausible that the mean speed of cars on the highway is 67 mph? Why or why not?

10
Example: A used car website wanted to estimate the mean price of a Nissan Altima. The site gathered
data on a random sample of 26 such cars and found a sample mean of $16,610 and a sample standard
deviation of $2736.

a. Describe the population. Is the number $16,610 an example of a parameter or a statistic?

b. Verify that the conditions for a valid confidence interval are met.

c. Construct a 90% confidence interval for the mean cost of this model car based on this data

11
9.4: Hypothesis Testing for Means

Just as with hypothesis tests for proportions, hypothesis tests can be one-sided or two-sided depending on
the research question. The choice of the alternative hypothesis determines how the p-value is calculated.

Two-Tailed Left-Tailed Right-Tailed


Ho : µ = µo Ho : µ = µo Ho : µ = µo
Ha : µ ̸= µo Ha : µ < µo Ha : µ > µo

Finding the Test Statistics for a Population Mean when σ is unknown

µx̄ = µ

s
σx̄ = √
n

x̄ − µx̄
t◦ =
σx̄
t◦ follows Student’s t-distribution with n - 1 degrees of freedom.

Recall: The Central Limit Theorem (CLT) which says regardless of the shape of the underlying
population (Parent distribution), the sampling distribution (Child distribution) of x̄ will
follow an approximately normal distribution as the sample size, n, is increase.

Conditions for the Central Limit Theorem (CLT)

• Random Sample: The sample is obtained using simple random sampling.

• Large Sample: The rule of thumb is if n is greater than or equal to 25 the data set will be
approximately normal no matter the shape of the parent population.

n ≥ 25

12
• Prework • P-value method and Interpretation.

– In a sentence describe the parameter


that is being tested?
– Type of the test

∗ Left-tailed test
– Find and label the following values:
∗ Two-tailed test
n, x̄, µ, s, and degree of freedom (df)
∗ Right-tailed test
– Check the conditions of the Central
Limit Theorem if the distribution is
not normal. – Compare the p-value with α.
∗ Simple Random Sample: The
sample is obtained by simple ran- ∗ P-value ≤ α
dom sampling. · Reject H◦
∗ Large Sample: n ≥ 25 ∗ P-value > α
• State the null and alternative hypoth- · Fail to reject H◦
esis:
H◦ :
Ha :
• Stating a conclusion interpreting the
• Compute the test statistics(t◦ ): results of the hypothesis test:

µx̄ = µ
– Once we have found the p-value and
made a statistical decision about the
s
σx̄ = √ null hypothesis (i.e. we will reject the
n
null or fail to reject the null), we then
x̄ − µx̄ want to summarize our results into an
t◦ = overall conclusion for our test.
σx̄

13
Example: Susan is in charge of quality control at a small fruit juice bottling plant. Each bottle
produced is supposed to contain exactly 12 fluid ounces(fl oz) of juice.Susan decides to test this by
randomly sampling 30 filled bottles and carefully measuring the amount of juice inside each. She will
recalibrate the machinery if the average amount of juice per bottle differs from 12 fl oz at the 1%
significance level.The sample of 30 bottles has an average of 11.92 fl oz per bottle and a standard
deviation of 0.26 fl oz. Should Susan recalibrate the machinery?

14
Example: The board of a major credit card company requires that the mean wait time for customers for
service calls is at most 3.00 minutes. To make sure that the mean wait time is not exceeding the
requirement, an assignment manager tracks the wait times of 45 randomly selected calls. The mean wait
time was calculated to be 3.4 minutes with a standard deviation of 1.45 minutes, is there sufficient
evidence to say that the mean wait time is longer than 3.00 minutes with a significant level of 2.5%?

15
9.5: Comparing Two Population Means

Independent vs. Dependent Samples

When comparing two populations, it is important to note whether the data are two independent samples
or are paired (dependent) samples.

Each observation in one group is coupled or paired with one particular observation in the other group.

• “Before and after” comparisons

• Related objects/people (twins, siblings, spouses)

Example: People chosen in a random sample are asked how many minutes they spend reading and how
many minutes they spent exercising during a certain day. Researchers wanted to know how different the
mean amounts of time were for each activity. Would this study be considered a dependent or
independent sample?

Example: A sample of men and women each had their hearing tested. Researchers wanted to know
whether, typically, men and women differed in their hearing ability. Would this study be considered a
dependent or independent sample?

Example: A random sample of married couples are asked how many minutes per day they spent
exercising. Means were compared to see if the mean exercise times for husbands and wives differed.
Would this study be considered a dependent or independent sample?

16
Confidence Intervals: Independent Samples

To construct the confidence interval for the difference in population means given independent samples,
check four conditions:

Central limit theorem

• The samples are obtained using simple random sampling;

• The samples are independent;

• The populations from which the samples are drawn are normally distributed or the sample sizes are
large (n1 ≥ 25, n2 ≥ 25);

• For each sample, the sample size is no more than 10% of the population size. (10×n < Population
of interest.)

Formula for constructing a confidence interval about the difference between the population
means

• Point of estimate for the difference of the means=x¯1 − x¯2


√ 2
S S2
• SEest = n11 + n22

• Margin of Error=t∗ × SEest

• Point of estimate of the difference of the means ± Margin of Error

t∗ is based on an approximate t-distribution with df as the smaller of n1 – 1 and n2 – 1. For more


accuracy, use technology.

Interpreting Confidence Intervals: Independent Samples µ1 – µ2

Interpreting confidence intervals for the difference of population means given independent samples is the
same as interpreting confidence intervals for the difference of population proportions.

1. If 0 is in the interval, there is no significant difference between µ1 and µ2 .

2. If both values in the confidence interval are positive, then µ1 > µ2 .

3. If both values in the confidence interval are negative, then µ1 < µ2 .

17
Example: A young statistics professor decided to give a quiz in class every week. He was not sure if the
quiz should occur at the beginning of class when the students are fresh or at the end of class when
they’ve gotten warmed up with some statistical thinking. Since he was teaching two sections of the same
course that performed equally well on past quizzes, he decided to do an experiment. He randomly chose
the first class to take the quiz during the second half of the class period (Late) and the other class took
the same quiz at the beginning of their hour (Early). He put all of the grades into a data and found the
mean and standard deviation for both groups and put it in table 1. Construct a 80% confidence interval.

Timing of the quiz Sample Mean Standard Deviation


Late 32 22.56 5.13
Early 30 19.73 6.61

Table 1: Quiz Timing

18
Example: In a packing plant, a machine packs carton with jars. It is supposed that a new machine will
pack faster on the average than the machine currently used. The times it takes each machine to pack ten
cartons are recorded. Assume both datasets to be normally distributed. The results of the machines, in
seconds, are shown in the following table.

New machine Old machine


Mean 42.14 43.23
Standard deviation 0.683 0.750

Construct a 95% confidence interval for the difference in the mean package time for the old and new
machine.

19
Inference about Two Means: Independent Samples

To test hypotheses regarding two population means, µ1 and µ2 , with unknown population standard
deviations, we can use the following steps, provided that:

Central limit theorem( Check for normality)

• The samples are obtained using simple random sampling;

• The samples are independent;

• The populations from which the samples are drawn are normally distributed or the sample sizes are
large (n1 ≥ 25, n2 ≥ 25);

• For each sample, the sample size is no more than 10% of the population size.

Sampling Distribution of the Difference of Two Means: Independent Samples with


Population Standard Deviations Unknown (Welch’s t)

Steps to find the test statistics:

1. Difference in sample means =(x¯1 − x¯2 )


√ 2
S S2
2. SEest = n11 + n22

3. t= Difference SE
in sample means
est

4. If all the conditions are met for the CLT, the test statistic approximately follows Student’s t-
distribution with the smaller of n1 -1 or n2 -1 degrees of freedom where is the sample mean and
Si is the sample standard deviation from population i.

Two-Tailed Right-Tailed Left-Tailed


Ho : µ1 = µ2 Ho : µ1 = µ2 Ho : µ1 = µ2
Ha : µ1 ̸= µ2 Ha : µ1 > µ2 Ha : µ1 < µ2

Table 2: Determine the null and alternative hypotheses.

The degree of freedom for the critical value is calculated by using the smaller of n1 – 1 or n2 – 1 degrees
of freedom.

20
Example: A statistics professor was handing out midterm grade slips on a Friday which happened to be
the day before the school’s Spring break. He noticed that there were an unusually large number of
students missing from class that day. So he collected the leftover grades slips and created the data in
Table 3 that summarized the midterm grades (out of a possible 100) for students that attended and
missed class. The professor had reason to suspect, before even looking at the data, that, in general,
students who missed class would tend to have lower mean midterm grades. Test the professor claim at a
significant level of 10%. You may assume that the data for both groups are reasonably symmetric and
have no strong outliers.

n Mean Standard Deviation


In class 15 80.9 11.07
Missed class 9 68.2 9.26

21
Example: A young statistics professor decided to give a quiz in class every week. He was not sure if the
quiz should occur at the beginning of class when the students are fresh or at the end of class when
they’ve gotten warmed up with some statistical thinking. Since he was teaching two sections of the same
course that performed equally well on past quizzes, he decided to do an experiment. He randomly chose
the first class to take the quiz during the second half of the class period (Late) and the other class took
the same quiz at the beginning of their hour (Early). He put all of the grades into a data and found the
mean and standard deviation for both groups and put it in table 4. Test to see if it make a difference
when the quiz was given during the class with a significant level of 20%.

Timing of the quiz Sample Mean Standard Deviation


Late 32 22.56 5.13
Early 30 19.73 6.61

22
Example: In a packing plant, a machine packs carton with jars. It is supposed that a new machine will
pack faster on the average than the machine currently used. To test that hypothesis, the times it takes
each machine to pack ten cartons are recorded. Assume both datasets to be normally distributed. The
results of the machines, in seconds, are shown in the following table. Do the data provide sufficient
evidence to conclude that, on the average, the new machine pack faster? Perform the required hypothesis
test at the 5% level of significance.

New machine Old machine


Mean 42.14 43.23
Standard deviation 0.683 0.750

23
Dependent Samples

• Transform the original data from two variables into a single variable that contains the difference
between the scores in Group 1 and Group 2.

• After the differences have been computed, we can apply either a confidence interval approach or a
hypothesis test approach to the differences.

Construct and Interpret Confidence Intervals for the Population Mean Difference of Matched-
Pairs Data:

• x̄dif f erence is the average difference of the two data set.

• Sdif f erence is the standard deviation of the difference.

• A (1 – α)•100% confidence interval for µd¯ is given by:


Sdif f erence
1. SEdif f ernce = √
n

2. Margin of Error = t × SEdif f ernce
3. x̄dif f erence ± Margin of Error

24
Example: Do you think your pulse rate is higher when you are taking a quiz than when you are sitting
in a lecture? The data in Table 5 show pulse rates collected from 10 students in a class lecture and then
from the same students during a quiz. Assume that the data has a normal distribution.

Student 1 2 3 4 5 6 7 8 9 10
Quiz 75 52 52 80 56 90 76 71 70 66
Lecture 73 53 47 88 55 70 61 75 61 78
Difference

Table 3: Quiz and lecture pulse rates for 10 students

Using the data in the table,Construct a 95% confidence interval for the difference in mean pulse rate
between students in a class lecture and taking a quiz.

25
Test Hypotheses Regarding Matched-Pairs/Dependent Data

• Test Hypotheses Regarding Matched-Pairs Data

Remark: Statistical inference methods on matched-pairs data use the same methods as inference on a
single population mean, except that the differences are analyzed.

To test hypotheses regarding the mean difference of matched-pairs data, the following must be satisfied:

• The sample is obtained using simple random sampling.

• The sample data are matched pairs.

• The differences are normally distributed with no outliers or the sample size, n, is large (n ≥ 25).

Determine the null and alternative hypotheses. The hypotheses can be structured in one of three ways,
where �d is the population means difference of the matched-pairs data.

Two-Tailed Left-Tailed Right-Tailed


Ho :µdif f erence = 0 Ho :µdif f erence = 0 Ho :µdif f erence =0
Ha :µdif f erence ̸= 0 Ha :µdif f erence < 0 Ha :µdif f erence > 0

Compute the test statistic for Matched-Pairs Data formula


Sdif f erence
SEdif f ernce = √
n

x̄dif f erence −0
t= SEdif f ernce

Which approximately follows Student’s t-distribution with n – 1 degrees of freedom.


The values of x̄dif f erence and Sdif f erence are the mean and standard deviation of the differences data.

26
Example: Do you think your pulse rate is higher when you are taking a quiz than when you are sitting
in a lecture? The data in Table 6 show pulse rates collected from 10 students in a class lecture and then
from the same students during a quiz. Assume that the data has a normal distribution.

Student 1 2 3 4 5 6 7 8 9 10
Quiz 75 52 52 80 56 90 76 71 70 66
Lecture 73 53 47 88 55 70 61 75 61 78
Difference

Using the data in the table, test whether pulse rate is higher when you are taking a quiz than when you
are sitting in a lecture at a significant level of 5%.

27
A Lot of Repetition…

• The hypothesis test for two means is very similar to the test for one mean.

• The hypothesis test for paired data is really a special case of the one-sample t-test.

• Hypothesis tests use almost the same calculations as confidence intervals and they impose the same
conditions.

Don’t Accept Ho

If the p-value is larger than the significance level, we do not reject the null hypothesis.

This is different from “accepting” the null hypothesis. Just because we do not reject the null hypothesis
does not mean that we now believe the null hypothesis is true.

Confidence Intervals and Hypothesis Tests

If the alternative hypothesis is two-sided, a confidence interval can be used instead of a hypothesis test.
Both approaches will always reach the same conclusion.

28
1. The standard deviation of the sampling distribution is called the

a. Unbiased estimator.

b. Standard error.

c. Standard deviation.

d. p-value.

2. What effect, if any, does the sample size have on the standard error?

a. As sample size increases, the standard error decreases.

b. As sample size increases, the standard error also increases.

c. The sample size has no effect on the standard error.

3. As a rule of thumb, we can apply the Central Limit Theorem for Sample Means for population
distributions which may not be Normal if the sample size is at least

a. 10.

b. 20.

c. 25.

d. 50.

4. Which of the following is NOT a condition that must hold in order to construct a confidence
interval for the difference of two population means?

a. Both samples are taken randomly from their populations.

b. The two samples are independent of each other.

c. The population must be at least 10 times larger than the sample.

d. The populations are approximately normal or each sample size is at least 25.

29
5. In a study on weekly time spent in employment, researchers constructed a confidence interval for
the difference in weekly mean employment times for men (Population 1) and women (Population
2). The resulting interval was (-2.14, 1.15). Based on this interval, we can conclude

a. Men spend between 2.14 and 1.15 more hours weekly in employment than women.

b. Women spend between 2.14 and 1.15 more hours weekly in employment than men.

c. There is no difference in the mean employment times for men and women.

30

You might also like