Nothing Special   »   [go: up one dir, main page]

Lecture 9

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

- Descriptive statistics: 2 methods: visualization + numerical method

- Probability and probability distribution: Binomial, Poisson, Uniform,

Normal distribution.

- Inferential statistics: 2 methods: Estimation + hypothesis testing

2 parameters:

+ mean of normal distribution, µ (normal distribution)

+ proportion, p (Binomial)

CHAPTER 8: Estimation – Confidence Interval.

Why: Population  variable (score of EBBA students): continuous

Score: X~N(µ,σ2): we want to estimate the parameter µ, population

mean.

To estimate it, we take a sample with n observations.

We can estimate µ by sample mean, x_bar: point estimator (descriptive statistic)

Because the point estimator does give the level of confidence, we use

Confidence Interval for µ: find an interval (a,b) such that:

P(a < µ < b) = 1-α

(1-α) is the confidence level, often 95%, 90%, 99%

The interval (a,b) is called Confidence Interval of µ with confidence level

1-α.

The construction of formula:

Case 1: Estimate the confidence interval for population mean, µ when

We know the population variance, σ2:

From the sample, we find sample mean, x_bar:

The CI (1-α)100% for µ is:

Then, zα/2 *(σ/sqrt(n) = ME: Margin of error

zα/2: from Z table: z0.025=1,96; z0.05 = 1.645, z0.005= 2.58


How the CI changes if we change 1-α or sample size n

- Increase 1-α (n fix): CI: wider: more confident, less precise


- Increase n (1-α fix): CI: narrower: more precise

Example:
Data were collected on the amount spent by 64 customers for lunch at a major Houston
restaurant. These data are contained in the file named Houston. Based upon past studies
the population standard deviation is known with σ = $6.
a. At 99% confidence, what is the margin of error?
b. Develop a 99% confidence interval estimate of the mean amount spent for lunch.

Variable: Spent amount for lunch (of customers in restaurant in Houston)

Assume this amount has normal distribution N(µ,62)

Margin of Error: ME= zα/2 *(σ/sqrt(n)):

1-α=0.99, zα/2 = z0.005 =2.58: ME= 2.58*(6/sqrt(64)) = 2.58*(6/8)=1.94

To find the confidence interval, we calculate the sample mean,

x_bar= 21.52  CI 99% for population mean is: 21.52 +/- 1.94

(19.58, 23,46)

Interpretation: With 99% confidence level, the mean amount of spending for lunch

Is from 19.58 to 23.46 $

 Sample size determination: given MEo, σ, 1-α. Sample size required


2 2
(z α / 2) σ
n= 2
ME
Example: How many students to take into the sample if you want to estimate

The CI 95% for their Math score with ME= 5 points. Assume σ=3.5 point.

Solution: n= 1.962 * 3.52 / 25= 1.88  2 students needed.

Case 2: Estimate the confidence interval for population mean, µ when

We do not know the population variance, σ2

When σ2 unknown, we use sample standard deviation, s replacing for it.

Now, Z  student, T, zα/2  tn-1α/2

CI is:
s
x ± t α / 2 (n−1)
√n
Note that if sample size, n is large (n >30)  tn-1α/2  zα/2

Example: Sales personnel for Skillings Distributors submit weekly reports listing the customer
contacts made during the week. A sample of 65 weekly reports showed a sample mean of
19.5 customer contacts per week. The sample standard deviation was 5.2. Provide 90% and
95% confidence intervals for the population mean number of weekly customer contacts for
the sales personnel.

Solution: variable: X: customer contacts, X~N(µ,σ2), σ2 unknown

CI for µ:

s
x ± t α / 2 (n−1)
√n
xbar = 19.5, s=5.2, n=65, tn-1α/2 = t640.05 ~ 1.645, (1.67)

CI: 19.5 +/- 1.67*(5.2/sqrt(65))  (18.42, 20.58)

Case 3: estimation for population proportion, p

The variable of interest is qualitative (Gender, Preference…).

Take a sample of n observations in which there are m elements having the qualitative

Value that we want to estimate the proportion.

The formula is:


In which, pbar is the sample proportion, = m/n

Example: Estimate the proportion of students in NEU holding IELTS certificate.

We take a sample of 150 students in NEU to which there are 65 of them who hold

The certificate. Find the CI 95% for this proportion.

Solution: The CI 95% is:

pbar = 65/150, zα/2 =z0.025 =1.96, n=150

CI: 65/150 +/- 1.96* sqrt((65/150*(1-65/15))/150)

ME= 0.0793  35.4% to 51.3%

Example: Estimate the proportion of female attendance in a conference, take a

random sample of 200 attendees, we see that there are 75 of them who are female

1. Produce the confidence interval 98% for the population proportion of female?
2. If we want to produce the interval with 2% margin of error, what is the sample
size required (confidence level is still 98%)

Solution: the Confidence interval for p (female proportion) is:

p ± z α /2
√ p(1−p)
n
pbar =75/200= 0.375, 1-α=0.98  zα/2 = z0.01 = 2.33

 CI: 0.375 +/- 2.33* sqrt(0.375*(1-0.375)/200)


 (0.295; 0.455)

Margin of error: ME = z α/ 2
√ p (1− p)
n
= 2.33* sqrt(0.375*(1-0.375)/200)=0.08

3. The formula for the sample size when estimating p is:

= 2.33^2 * 0.375*(1-0.375)/(0.02^2)3180.996  n=3181


Note: If we do not have the value of p (pbar)  maximum sample size needed is:

n= zα/22 * 0.25 /ME2


Hypothesis testing:

We assume a hypothesis for parameters, then test for this hypothesis.

Test for population mean, µ of normal – assume σ2 unknown

Step 1: Specify the hypothesis: Ho: Null hypothesis (=, ≥, ≤)

H1: Alternative hypothesis (≠, <, >)

If we want to compare µ with µo: there are possible three types of hypothesis

Two tail test: Ho: µ=µo; H1: µ ≠ µo

Left tail test: Ho: µ ≥ µo; H1: µ < µo

Right tail test: Ho: µ ≤ µo; Ho: µ > µo

Example: - Test if the mean score in maths of Students is at least 75:

Ho: µ ≥ 75, H1: µ <75

- Test if the mean score in maths of Students is more than 75


Ho: µ ≤ 75 ; H1: µ >75

Step 2: calculate the test statistic: use t test (test statistic is t distribution)

x−μ0
t=
s/√n
Step 3: Rejection rule: when to reject Ho:

Given the value of significant level (α)- the probability of rejecting the Ho

Two tail test: Ho: µ=µo; H1: µ ≠ µo : Reject Ho if t > tn-1α/2 or t < - tn-1α/2

Left tail test: Ho: µ ≥ µo; H1: µ < µo : Reject Ho if t < -tn-1α

Right tail test: Ho: µ ≤ µo; Ho: µ > µo : Reject Ho if t > tn-1α

Step 4: Make conclusion

Example: To test for the assumption that the average GPA of NEU students is more

than 3.3, we take a sample of 30 students randomly. The mean and standard deviation

from this sample are 3.6 and 0.4. Do the test with 5% significant level.

Solution:

Step 1: Hypothesis: Ho: µ ≤ 3.3; H1: µ >3.3 (Right tail)


Step 2: calculate the test statistic (t- value):

x−μ0
t=
s/√n
t= (3.6-3.3)/(0.4/sqrt(30)) = 4.108

Step 3: Reject Ho if t > tn-1α = t290.05 = 1.699

So, we reject Ho

Step 4: The average GPA of NEU students is truly more than 3.3.

Example: A shareholders’ group, in lodging a protest, claimed that the mean tenure for a
chief executive office (CEO) was at least nine years. A survey of companies reported in The Wall
Street Journal found a sample mean tenure of 7.27 years for CEOs with a standard
deviation of s = 6.38 years.
a. Formulate hypotheses that can be used to challenge the validity of the claim made by
the shareholders’ group.
b. Assume 85 companies were included in the sample. What is the p-value for your
hypothesis test?
c. At α = .01, what is your conclusion?

Solution:

Hypothesis: Ho: µ ≥ 9; H1: µ <9

Calculate t statistic: t = (7.27 – 9) / (6.38/sqrt(85))= -2.5

Reject Ho if t < t840.01 ~ -2.37

So, Reject Ho. It means the mean tenure of CEO is less than 9 years.

Rule of test using p-value:

- If p-value =< α : reject Ho


- If p-value > α: do not reject Ho

In the above example: p-value = P(T < -2.5)

Example: Ex 29 on page 415:

Hypothesis: Ho: µ=90000; H1: µ ≠ 90000

Calculate t, from the data, sample mean xbar= 85272;

sample standard deviation, s=11039

t= (85272-90000)/(11039/sqrt(25))=-2.14

Reject Ho if t > t240.025 = 2.064 or t < -t240.025 =-2.064

Reject Ho  The mean salary of Ohio is differed from national level.


Find the p-value= P(T <-2.14) or P(T > 2.14) = 2*P(T >2.14)= 0.042726

Alpha=0.05  p-value < alpha Reject Ho

Note: P-value in two tail test = 2* p-value in one test

One-Sample Test

Test Value = 90000

95% Confidence Interval of the


Difference

t df Sig. (2-tailed) Mean Difference Lower Upper

Salary -2.141 24 .043 -4728.000 -9284.77 -171.23

 Test for the population proportion, p

Two tail: Ho: p =p0; H1: p≠ p0 : Reject Ho if z > zα/2 or z < -zα/2

Right tail: Ho: p ≤p0; H1: p > p0 Reject Ho if z > zα

Left tail: Ho: p ≥p0; H1: p < p0 Reject Ho if z < - zα

Test statistic:

In which:

Rejection rule:

Example: Test of the proportion of female student is dominated in NEU

If we take a random sample of 300 students in which there are 175 females.

Conclude with 5% significant level. What is the p value?


Solution: Hypothesis:

Ho: p =< 0.5; H1: p >0.5

pbar = 175/300=0.583

z= (0.583 – 0.5)/ sqrt(0.5*0.5/300)= 2.88

Reject Ho if z > z0.05 = 1.645  reject Ho  the female students is dominated.

 P-value = P(Z > 2.88)= 1-P(Z <2.88)=1-0.998=0.002 << 0.05  reject Ho


Chapter 10: Test to compare two population parameters

Compare two population means:


2 variable X1 ~N(µ1, σ12); X2 ~(µ2, σ22)
Assumptions:
- Data is normal
- Two populations are independent
- Two populations have the same variance, σ12 = σ22

Compare µ1, and µ2 :


Three types of hypothesis:
Two tail: Ho : µ1 =µ2 ; H1: µ1 ≠µ2
Right: µ1 =< µ2 ; H1: µ1 > µ2
Two sample t test:

t=x 1−x 2 /s x −x
1 2

In which:

And

Rejection rule:
Two tail: Reject Ho if t > tn1+n2-2α/2 or t < - tn1+n2-2α/2
Right tail: Reject ho if t > tn1+n2-2α
Left tail: Reject Ho if t < - tn1+n2-2α

Example:

Specific Motors of Detroit has developed a new automobile known as the


M car. 12 M cars and 8 J cars (from Japan) were road tested to compare
miles-per- gallon (mpg) performance. The sample statistics are:
sample mean (M)= 29.8 mpg, s1= 2.56 mpg; sample mean (J)= 27.3 mpg
s2= 1.81 mpg.
Test with alpha =0.05
Solution:
Hypothesis: Ho : µ1 =µ2 ; H1: µ1 ≠µ2. µ1 is mean for M, µ2 is mean of J car.

s2 = (11*2.56^2 + 7*1.81^2) /(12+8-2)=5.279


se(xbar1-xbar2)= sqrt(5.279*(1/12+1/8))=1.049
t= (29.8-27.3)/1.049=2.38

Reject Ho if t > t180.025=2.101 or t < -2.101


So, we reject Ho as t =2.38 >2.101

Conclusion: The mean of gas consumption of two types of cars are different

Example: Use the data HomePrices: compare the average prices of houses in
Two points of time: in 2006 and in 2009: whether the home price increases?

Solution:
Hypothesis: Ho: µ1 >= µ2; H1: µ1 > µ2 (increases)

F test for equal variance: Ho: σ12 = σ22


F= 1.531, p-value =0.22 >0.05  do not reject Ho  equal variance

Test for equal means with assumption of equal variance


Ho: µ1 >= µ2; H1: µ1 > µ2
p-value is very small, test 1 tail, half of p-value is smaller  reject Ho
 Two means are different.
 Compare two population proportions, p1 and p2:

Hypothesis: Two tail: Ho: p1 = p2; H1: p1 ≠ p2

Test statistic: use Z distribution

( p1 −p 2)
z=
s p −p
1 2

In which:

Example: To compare the proportion of students who hold the IELTS certificate

in two Universities A and B, we take two sample of 250 and 300 students in each Uni.

Of those students, 120 and 160 IELTS holders in each Uni.

Solution: Ho: p1 = p2; H1: p1 ≠ p2

Calculate z value:

pbar1 =120/250=0.48; pbar2 160/300=0.53

sp1-p2 = sqrt(0.48*(1-0.48)/250 + 0.53*(1-0.53)/300) =0.043

z= (0.48-0.53)/0.043 =-1.16

Reject Ho if z > 1.96 or z <-1.96

So, we do not reject Ho,  the proportions of that of two Uni are the same.

You might also like