Lecture Slides Sec 5 Power
Lecture Slides Sec 5 Power
Lecture Slides Sec 5 Power
Test:
Reject null α 1-β = power
hypothesis (Type I error) (correct)
Determinants of power
Power (1-β) depends on
δ = delta = true difference
σ = sigma = true SD or true variation
α = alpha = significance criteiron
n = sample size
(Or, n depends on δ, σ, α, 1-β )
Alpha versus Power
The top distribution shows
• the sampling distribution of
a test statistic under the
assumption that delta (δ) is
zero (the null hypothesis is
α true).
δ=0
δ = 3.5
-3 -2 -1 0 1 2 3 4 5 6 7
Areas under the curves and right of the vertical line are
α for the black curve and power for the other curves.
The power is larger for the red curve than for the blue.
Power Summary
Power increases as:
• True difference (δ) increases
• Sample size (n) increases
• α increases (less strict significance criterion)
• Patient heterogeneity (σ) decreases
Multiple analyses
Exploratory vs confirmatory
protein example
750 proteins are compared between two groups – 12 are significant at p < 0.05
Protein name Atril fib Atherosclerosis p value
RAS guanyl-releasing protein 2 33.3% 0.0% 0.0000
Glutathione S-transferase P 38.9% 100.0% 0.0000
Selenium-binding protein 1 22.2% 0.0% 0.0000
Nucleosome assembly protein 1-like 4 16.7% 0.0% 0.0000
Integrin beta;Integrin beta-2 11.1% 50.0% 0.0000
Spectrin alpha chain, non-erythrocytic 1 11.1% 0.0% 0.0000
Pituitary tumor-transforming gene 1 protein-interacting 11.1% 0.0% 0.0000
WW domain-binding protein 2 16.7% 50.0% 0.0000
Syntaxin-4 5.6% 0.0% 0.0006
CD9 antigen 27.8% 50.0% 0.0013
ATP synthase-coupling factor 6, mitochondrial 27.8% 50.0% 0.0013
Flotillin-1 77.8% 100.0% 0.0037
Aconitate hydratase, mitochondrial 38.9% 50.0% 0.1142
Fructose-bisphosphate aldolase C 94.4% 100.0% 0.4402
Alpha-adducin 50.0% 50.0% 1.0000
40S ribosomal protein SA 1.0% 1.0% 1.0000
Abl interactor 1 1.0% 1.0% 1.0000
Bone marrow proteoglycan;Eosinophil granule major basic 1.0% 1.0% 1.0000
Tubulin alpha-4A chain 100.0% 100.0% 1.0000
… (750 proteins total)
Exploratory vs confirmatory
Who killed Tweety Bird?
Did Sylvester do it?
Motivation (class discussion)
Tweety Bird is murdered by a cat who left a DNA
sample. The particular DNA profile found in the
sample is known to occur in one of every one
million cats. There is also about a 0.1% false
positive rate for this test.
Is the level of evidence (guilt) equal in these two
scenarios?
1. Sylvester only is tested and is a match.
2. A DNA database on 100,000 cats (but not all
cats), including Sylvester, is searched and
Sylvester is a match, although not necessarily
the only match. No prior belief that Sylvester
is guilty.
Motivation (class discussion)
The “disease score” ranges from 2 (good) to
12 (worst).
Scenario A: Due to prior suspicion (prior
information), only patients 19 and 47 are
measured and both have scores of 12. We
report that they are “significantly” ill.
Scenario B: The score is measured on 72
patients. Only patients 19 and 47 have
scores of 12. We report that they are
“significantly” ill.
Is the amount of “evidence” or “belief” that
patients 19 and 47 “really” are very ill (have
“true” score of 12) the same in both
scenarios? The data for patients 19 and 47
are the same in both scenarios.
Most would agree that, if both patients were
retested (confirmation step), and came out
with lower scores, this would decrease the
belief that there “true” score is 12. If they
came out with 12 again, this would increase
the belief that the true score is 12.
Multiple testing
“If you torture the data long enough, it will eventually
confess”
Two different situations for new arthritis treatment compared
to aspirin.
A. Only pain (0-10) and swelling (0-10) are measured. Both
are significantly better at p < 0.05 on the new treatment
compared to aspirin.
B. Ten different outcomes measured: pain, swelling, activities
of daily living, quality of life, sleep, walking, bending, lifting,
grinding, climbing. Only the two that are significant are
reported after all 10 are evaluated. (fraud?)
1 0.0500
2 0.0975
3 0.1426
4 0.1855
5 0.2262
10 0.4013
20 0.6415
25 0.7226
50 0.9231
Multiple testing-What to do?
Option 1: Use nominal alpha level for
significance. Creates too many false
positives.
Option 2: Use Bonferroni criterion –Declare
significance if p < α/m if “m” tests are
made. Has too many false negatives.
Option 3: Use Holm/Hochberg criterion – a
compromise
Holm/Hochberg criterion
Rule for m (not necessarily independent) significance
tests. Keeps overall false positive rate at α.
0.05
significance criterion
0.04 no adjustment
Bonferroni
Hochberg
0.03
p value
0.02
0.01
0
1 2 3 4 5
i
m=5, alpha=0.05
Truth-Null true U V m0
total m-R R m