G Power
G Power
G Power
S
Evie McCrum-Gardner
is Lecturer in Health
Statistics, Schools of
Nursing and Health
Sciences, Health and
Rehabilitation Sciences
Research Institute,
University of Ulster,
Newtownabbey,
Northern Ireland
Correspondence to:
E McCrum-Gardner
Email: ee.gardner@
ulster.ac.uk
10
Hypothesis testing
Researchers begin with a research hypothesis, for
example, that treatment A is better than treatment
B. Hypothesis testing involves expressing this as
a null hypothesis and performing the appropriate statistical test to investigate whether the null
hypothesis can be accepted/rejected. An example
of a null hypothesis is: there is no difference (in
the mean outcome measure) between treatments
A and B. The researcher wants to be able to reject
Power
Power is the probability of rejecting the null
hypothesis when the alternative hypothesis is
true. It measures the ability of a test to reject the
null hypothesis when it should be rejected. At
a given significance level, the power of the test
is increased by having a larger sample size. The
minimum accepted level is considered to be 80%,
which means there is an eight in ten chance of
detecting a difference of the specified effect size.
Effect size
The effect size quantifies the difference between
two or more groups. It is a measure of the difference in the outcomes of the experimental
and control groups, i.e. a measure of the effectiveness of the treatment. Cunningham and
McCrum-Gardner (2007) provide formulae for
effect size in different situations, but sample
size software can be used, which requests the
required information (e.g. absolute mean difference and standard deviation) and performs
the calculations. The effect size can often be esti-
Table 1.
Definition of type I and type II errors
Null hypothesis
rejected
Null hypothesis
not rejected
Type I error
Type II error
Response Rate
After the sample size has been calculated, it will
need to be increased depending on the expected
response rate. This can be estimated from previous
publications or a pilot study.
11
Table 2.
Website addresses for software packages
Software
package Website address
Minitab www.minitab.com/en-GB/support/downloads/
PS biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize
GPower www.psycho.uni-duesseldorf.de/aap/projects/gpower/
Epi-info www.cdc.gov/epiinfo/downloads.htm
Software
There are many software packages for performing
sample size/power calculations. Four useful ones
are described in this article:
n Minitab
n PS
n GPower
n Epi-info/StatCalc
Minitab is simple to use for a beginner but has a
limited range of study designs; it necessitates purchasing a licence, although a 30-day free trial is available (see Table 2 for list of website addresses). PS,
GPower and Epi-info can be downloaded for free.
GPower is more complicated to use for a novice, but does cover a wider range of study designs.
GPower is useful if using effect sizes (small/mod-
Websites
There are many websites which can be used to
perform sample size/power calculations. Some
Figure 1. Using PS software for the sample size calculation for two independent groups
12
Examples
The examples below demonstrate what information is required for the sample size calculation
for a range of study designs. To aid understanding they also demonstrate the increase/decrease
in sample size when factors are modified, e.g.
increasing the mean difference, decreasing the
standard deviation, and increasing the power
level. The PS software has been used for these
examples, but alternatively, the other software packages or websites described above
could be used. More information about the statistical tests mentioned can be obtained from
McCrum-Gardner (2008).
The independent samples t-test is used to compare sample means from two independent groups
for an intervalscale variable when the distribution
is approximately normal; the paired t-test is used
to compare two sample means for an intervalscale
variable where there is a one-to-one correspondence (or pairing) between the samples and the distribution of within-pair differences is approximately
normal; the chi-square (2) test is used to compare
proportions between two or more independent
groups or investigate whether there is any associa-
13
Key points
n Sample size estimation is a significant concern for researchers as guidelines must
be adhered to for ethics committees, grant applications and publications.
n Studies may be underpowered (too few participants), or overpowered (too
many participants) so it is important to achieve the correct balance.
n Relevant definitions of power, significance level, effect size etc. are provided
in order to understand the process of performing sample size calculations.
n Clear examples of sample size calculations for three commonly-used study
designs are provided for illustration and to aid understanding.
n A range of software packages and websites are available to perform sample
size calculations and a selection of them are discussed and evaluated.
14
Graphical Presentation
Both PS and GPower can produce graphs to explore
the relationships between power, sample size and
effect size, e.g. the difference in population means.
It is often helpful to hold one of these variables constant, and plot the other two against each other. For
example, a plot of difference in means against sample size for the two treatment groups in example 1 is
shown in Figure 2.
Conclusions
The reader must be aware that the choice of the correct statistical test is vitally important at the statistical analysis stage, and this is described elsewhere
(McCrum-Gardner, 2008). This article has described
the process of sample size calculations including
relevant definitions and examples for different study
designs are provided for illustration. A range of
software packages and websites have also been discussed and evaluated. This information should be
useful for researchers in order to perform sample
size calculations for their research projects. IJTR
Conflict of interest: none
Central Office for Research Ethics Committees (2007)
Question specific guidance on NHS REC application
form (2007). Available from: http://tinyurl.com/yjogwn9
(accessed 17 December 2009)
Cunningham JB, McCrum-Gardner E (2007) Power, effect
and sample size using GPower: practical issues for
researchers and members of research ethics committees.
Evidence Based Midwifery 5(4): 1326
Devane D, Begley C, Clarke M (2004) How many do I need?
Basic principles of sample size estimation. Journal of
Advanced Nursing 47(3): 297302
Florey CV (1993) Sample size for beginners. BMJ 306: 11814
Lenth RV (2001) Some Practical Guidelines for Effective
Sample Size Determination. The American Statistician 55:
18793. Available from http://www.stat.uiowa.edu/techrep/
tr303.pdf (accessed 17 December 2009)
McCrum-Gardner E (2008) Which is the correct statistical
test to use? Brit J Oral Maxillofacial Surgery 46: 3841
Ruperto N (2007) Is Minimal Clinically Important Difference
Relevant for the Interpretation of Clinical Trials in Pediatric
Rheumatic Diseases? The Journal of Rheumatology. http://
tinyurl.com/yjst6ct (accessed 17 December 2009)