09w77 gblg5
09w77 gblg5
09w77 gblg5
Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Front of Book > Authors
Author
Martin Bland
Professor of Medical Statistics
St George's Hospital Medical School, London
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Front of Book > Dedication
Dedication
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Table of Contents > 1 - Introduction
1
Introduction
most of the Royal Colleges, except for the MRCPsych. I have indicated
by an asterisk in the subheading those sections which I think will be
required only by the postgraduate or the researcher.
When working through a textbook, it is useful to be able to check your
understanding of the material covered. Like most such books, this one
has exercises at the end of each chapter, but to ease the tedium most of
these are of the multiple choice type. There is also one long exercise,
usually involving calculations, for each chapter. In keeping with the
computer age, where laborious calculation would be necessary
intermediate results are given to avoid this. Thus the exercises can be
completed quite quickly and the reader is advised to try them. You can
also download some of the data sets from my website
(http://www.sghms.ac.uk/depts/phs/staff/jmb). Solutions are given at the
end of the book, in full for the long exercises and as brief notes with
references to the relevant sections in the text for MCQs. Readers who
would like more numerical exercises are recommended to Osborn
(1979). For a wealth of exercises in the understanding and interpretation
of statistics in medical research, drawn from the published literature and
popular media, you should try the companion volume to this one,
Statistical Questions in Evidence-based Medicine (Bland and Peacock
2000).
Finally, a question many students of medicine ask as they struggle with
statistics: is it worth it? As Altman (1982) has argued, bad statistics leads
to bad research and bad research is unethical. Not only may it give
misleading results, which can result in good therapies being abandoned
and bad ones adopted, but it means that patients may have been
exposed to potentially harmful new treatments for no good reason.
Medicine is a rapidly changing field. In ten years' time, many of the
therapies currently prescribed and many of our ideas about the causes
and prevention of disease will be obsolete. They will be replaced by new
therapies and new theories, supported by research studies and data of
the kind described in this book, and probably presenting many of the
same problems in interpretation. The practitioner will be expected to
decide for her- or himself what to prescribe or advise based on these
studies. So a knowledge of medical statistics is one of the most useful
things any doctor could acquire during her or his training.
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Table of Contents > 2 - The design of experiments
2
The design of experiments
The scanned 1978 patient did better than the unscanned 1974 patient in
31% of pairs. whereas the unscanned 1974 patient did better that the
scanned 1978 patient in only 7% of pairs. However, he also compared
the survival of patients in 1978 who did not receive a C-T scan with
matched patients in 1974. These patients too showed a marked
improvement in survival from 1974 to 1978 (Table 2.1). The 1978 patients
did better in 38% of pairs and the 1974 patients in only 19% of pairs.
There was a general improvement in outcome over a fairly short period of
time. If we did not have the data on the unscanned patients from 1978 we
might be tempted to interpret these data as evidence for the
effectiveness of the C-T scanner. Historical controls like this are seldom
very convincing, and usually favour the new treatment. We need to
compare the old and new treatments concurrently.
Average
no. of Proportio
visits to of parent
No. of
clinic giving goo
Period No. of deaths Death
during cooperatio
of trial children from rate
1st year as judge
TB
of by visitin
follow-
up
1927–32 Selection made by physician
Column
Row 1– 5– 9– 13– 17– 21– 25– 29– 33–
4 8 12 16 20 24 28 32
1 36 88 28 59 46 00 67 32
45 31 73 43 32 32 15 49
2 90 40 18 95 65 16 95 15
51 66 46 54 89 80 33 88
3 98 90 48 80 91 33 40 38
41 22 37 31 39 80 82 26
4 55 71 14 64 99 82 73 92
25 27 68 04 24 30 43 68
5 02 10 77 88 79 70 59 75
99 75 21 55 97 32 87 35
6 79 55 63 08 04 18 53 58
85 66 84 63 00 34 94 01
7 33 95 06 34 13 37 95 15
53 28 81 95 93 16 06 91
8 74 13 22 37 15 42 96 90
75 13 16 76 57 38 23 24
9 06 30 00 32 36 46 17 66
66 43 66 60 60 05 31 80
10 92 31 87 76 17 31 13 17
83 60 30 83 85 48 23 32
11 61 31 98 77 72 35 69 14
21 49 29 70 11 23 47 27
12 27 01 74 38 53 53 55 35
82 01 41 77 68 26 16 66
13 61 50 94 86 10 95 88 72
05 10 85 32 72 67 21 09
14 11 85 94 49 35 39 80 54
57 67 91 48 49 41 17 45
15 15 08 92 13 26 20 72 94
16 90 86 32 01 02 45 74
16 22 29 15 76 94 48 75 81
09 66 44 74 92 13 85 28
17 69 53 35 43 83 79 92 83
13 55 87 23 32 40 20 76
18 08 79 00 35 86 10 18 43
29 37 33 34 55 91 86 50
19 37 99 55 32 71 85 31 63
29 85 63 66 98 20 93 91
20 65 14 88 28 04 42 87 20
11 04 86 92 03 99 08 55
21 66 81 30 21 15 26 33 51
22 58 80 10 53 90 77 19
22 37 77 69 20 67 46 75 69
21 13 31 22 13 29 32 79
23 51 09 68 05 14 89 37 25
43 72 38 77 62 07 89 30
24 31 37 92 15 21 03 35 84
59 83 55 31 24 93 97 61
25 79 43 52 00 44 91 11 25
05 69 93 77 82 65 71 37
1 3 A
2 4 B
3 6 B
4 2 B
5 9 A
6 7 A
7 5 A
8 3 A
9 2 B
10 6 B
11 9 A
12 7 A
13 9 A
14 3 A
15 9 A
16 2 B
17 3 A
18 3 A
19 2 B
20 4 B
patients and the control group 52 cases. The condition of the patients on
admission is shown in Table 2.5. The frequency distributions of
temperature and sedimentation rate were similar for the two groups; if
anything the treated (S) group were slightly worse. However, this
difference is no greater than could have arisen by chance, which, of
course, is how it arose. The two groups are certain to be slightly different
in some characteristics, especially with a fairly small sample, and we can
take account of this in the analysis (Chapter 17).
Group
S C
Fair 17 20
Poor 30 24
99- 13 12
99.9
100- 15 17
100.9
101+ 24 19
11-20 3 2
21-50 16 20
51+ 36 29
Maximum Group
evening
temperature
Outcome Streptomycin Control
during first
observation group group
week
98-98.9°F Alive 3 4
Dead 0 0
99-99.9°F Alive 13 11
Dead 0 1
100- Alive 15 12
100.9°F
Dead 0 5
After six months, 93% of the S group survived, compared to 73% of the
control group. There was a clear advantage to the streptomycin group.
The relationship of survival to initial condition is shown in Table 2.6.
Survival was more likely for patients with lower temperatures, but the
difference in survival between the S and C groups is clearly present
within each temperature category where deaths occurred.
Randomized trials are not restricted to two treatments. We can compare
several treatments. A drug trial might include the new drug, a rival drug,
and
Died 39 11 0 (0%) 81
(25%) (22%) (36%)
Other methods of allocation set out to be random but can fall into this sort
of difficulty. For example, we could use physical mixing to achieve
randomization. This is quite difficult to do. As an experiment, take a deck
of cards and order them in suits from ace of clubs to king of spades. Now
shuffle them in the usual way and examine them. You will probably see
many runs of several cards which remain together in order. Cards must
be shuffled very thoroughly indeed before the ordering ceases to be
apparent. The physical randomization method can be applied to an
experiment by marking equal numbers on slips of paper with the names
of the treatments, sealing them into envelopes and shuffling them. The
treatment for a subject is decided by withdrawing an envelope. This
method was used in another study of anticoagulant therapy by Carleton
et al. (1960). These authors reported that in the latter stages of the trial
some of the clinicians involved had attempted to read the contents of the
envelopes by holding them up to the light, in order to allocate patients to
their own preferred treatment.
Interfering with the randomization can actually be built into the allocation
procedure, with equally disastrous results. In the Lanarkshire Milk
Experiment, discussed by Student (1931), 10000 school children
received three quarters of a pint of milk per day and 10000 children acted
as controls. The children were weighed and measured at the beginning
and end of the six-month experiment. The object was to see whether the
milk improved the growth of children. The allocation to the ‘milk’ or control
group was done as follows:
The teachers selected the two classes of pupils, those getting milk and
those acting as controls, in two different ways. In certain cases they
selected them by ballot and in others on an alphabetical system. In any
particular school where there was any group to which these methods had
given an undue proportion of well-fed or ill-nourished children, others
were substituted to obtain a more level selection.
The result of this was that the control group had a markedly greater
average height and weight at the start of the experiment than did the milk
group. Student interpreted this as follows:
Presumably this discrimination in height and weight was not made
deliberately, but it would seem probable that the teachers, swayed by the
very human feeling that the poorer children needed the milk more than
the comparatively well to do, must have unconsciously made too large a
substitution for the ill-nourished among the (milk group) and too few
among the controls and that this unconscious selection affected
Paralytic polio
Number
Study group Number Rate per
in group
of cases 100000
Randomized control
Vaccinated 200745 33 16
Observed control
Vaccinated 221998 38 17
2nd grade
Unvaccinated 123605 43 35
2nd grade
In most diseases, the effect of volunteer bias is opposite to this. Poor
conditions are related both to refusal to participate and to high risk,
whereas volunteers tend to be low risk. The effect of volunteer bias is
then to produce an apparent difference in favour of the treatment. We
can see that comparisons between volunteers and other groups can
never be reliable indicators of treatment effects.
1 71 29 42
3 8 1 7
4 14 7 7
5 23 16 7
6 34 25 9
7 79 65 14
8 60 41 19
9 2 0 2
10 3 0 3
11 17 15 2
12 7 2 5
changes in the brain. Mind and body are intimately connected, and
unless the psychological effect is actually part of the treatment we usually
try to eliminate such factors from treatment comparisons. This is
particularly important when we are dealing with subjective assessments,
such as of pain or well-being.
Fig. 2.1. Pain relief in relation to drug and to colour of placebo (after
Huskisson 1974)
the single dose drug may receive a daily placebo and those on the daily
dose a single placebo at the start.
Placebos are not always possible or ethical. In the MRC trial of
streptomycin. where the treatment involved several injections a day for
several months, it was not regarded as ethical to do the same with an
inert saline solution and no placebo was given. In the Salk vaccine trial,
the inert saline injections were placebos. It could be argued that paralytic
polio is not likely to respond to psychological influences, but how could
we be really sure of this? The certain knowledge that a child had been
vaccinated may have altered the risk of exposure to infection as parents
allowed the child to go swimming, for example. Finally, the use of a
placebo may also reduce the risk of assessment bias as we shall see in
§2.9.
Radiological
S Group C Group
assessment
Considerable 28 51% 4 8%
improvement
No material change 2 4% 3 6%
Deaths 4 7% 14 27%
and the treatment given, until half the animals had been treated. The
treated animals were put into smaller cages, five to a cage, which were
placed together in a constant environment chamber. The control mice
were in cages also placed together in the constant environment chamber.
When the data were analysed, it was discovered that the mean initial
weights was greater in the treated animals than in the control group. In a
weight gain experiment this could be quite important! Perhaps larger
animals were easier to pick up, and so were selected first. What that
experimenter should have done was to place the mice in the boxes, give
each box a place in the constant environment chamber, then allocate the
boxes to treatment or control at random. We would then have two groups
which were comparable, both in initial values and in any environmental
differences which may exist in the constant environment chamber.
3
Sampling and observational studies
3.2 Censuses
One simple question we can ask about any group of interest is how many
members it has. For example, we need to know how many people live in
a country and how many of them are in various age and sex categories,
in order to monitor the changing pattern of disease and to plan medical
services. We can obtain it by a census. In a census, the whole of a
defined population is counted. In the United Kingdom, as in many
developed countries, a population census is held every ten years. This is
done by dividing the entire country into small areas called enumeration
districts, usually containing between 100 and 200 households. It is the
responsibility of an enumerator to identify every household in the district
and ensure that a census form is completed, listing all members of the
household and providing a few simple pieces of information. Even though
completion of the census form is compelled by law, and enormous effort
goes into ensuring that every household is included, there are
undoubtedly some who are missed. The final data, though extremely
useful, are not totally reliable.
The medical profession takes part in a massive, continuing census of
deaths, by providing death certificates for each death which occurs,
including not only the name of the deceased and cause of death, but also
details of age, sex, place of residence and occupation. Census methods
are not restricted to national populations. They can be used for more
specific administrative purposes too. For example, we might want to
know how many patients are in a particular hospital at a particular time,
how many of them are in different diagnostic groups, in different age/sex
groups, and so on. We can then use this information together with
estimates of the death and discharge rates to estimate how many beds
these patients will occupy at various times in the future (Bewley et al.
1975, 1981).
3.3 Sampling
A census of a single hospital can only give us reliable information about
that hospital. We cannot easily generalize our results to hospitals in
general. If we want to obtain information about the hospitals of the United
Kingdom, two courses are open to us: we can study every hospital, or we
can take a representative sample of hospitals and use that to draw
conclusions about hospitals as a whole.
views of the population. This is called quota sampling. In the same way
we could try to choose a sample of rats by choosing given numbers of
each weight, age, sex, etc. There are difficulties with this approach. First,
it is rarely possible to think of all the relevant classifications. Second, it is
still difficult to avoid bias within the classifications, by picking interviewees
who look friendly, or rats which are easy to catch. Third, we can only get
an idea of the reliability of findings by repeatedly doing the same type of
survey, and of the representativeness of the sample by knowing the true
population values (which we can actually do in the case of elections), or
by comparing the results with a sample which does not have these
drawbacks. Quota sampling can be quite effective when similar surveys
are made repeatedly as in opinion polls or market research. It is less
useful for medical problems, where we are continually asking new
questions. We need a method where bias is avoided and where we can
estimate the reliability of the sample from the sample itself. As in §2.2, we
use a random method: random sampling.
We can also carry out sampling without a list of the population itself,
provided we have a list of some larger units which contain all the
members of the population. For example, we can obtain a random
sample of school children in an area by starting with a list of schools,
which is quite easy to come by. We then draw a simple random sample of
schools and all the children within our chosen schools form the sample of
children. This is called a cluster sample, because we take a sample of
clusters of individuals. Another example would be sampling from any
age/sex group in the general population by taking a sample of addresses
and then taking everyone at the chosen addresses who matched our
criteria.
Sometimes it is desirable to divide the population into different strata, for
example into age and sex groups, and take random samples within
these. This is rather like quota sampling, except that within the strata we
choose at random. If the different strata have different values of the
quantity we are measuring, this stratified random sampling can
increase our precision considerably. There are many complicated
sampling schemes for use in different situations. For example, in a study
of cigarette smoking and respiratory disease in Derbyshire
schoolchildren, we drew a random sample of schools, stratified by school
type (single-sex/mixed, selective/non-selective, etc.). Some schools
which took children to age 13 then fed into the same 14+ school were
combined into one sampling unit. Our sample of children was all children
in the chosen schools who were in their first secondary school year
(Banks et al. 1978). We thus had a stratified random cluster sample.
These sampling methods affect the estimate obtained. Stratification
improves the precision, cluster sampling worsens it. The sampling
scheme should be taken into account in the analysis (Cochran 1977, Kish
1994). Often it is ignored, as was done by Banks et al. (1978) (that is, by
me), but it should not be and results may be reported as being more
precise than they really are.
In §2.3 I looked at the difficulties which can arise using methods of
allocation which appear random but do not use random numbers. In
sampling, two such methods are often suggested by researchers. One is
to take every tenth subject from the list, or whatever fraction is required.
The other is to use the last digit of some reference number, such as the
hospital number, and take as the sample subjects where this is, say, 3 or
4. These sampling methods are systematic or quasi-random. It is not
usually obvious why they should not give ‘random’ samples, and it may
be that in many cases they would be just as good as random sampling.
They are certainly easier. To use them, we must be very sure that there is
no pattern to the list which could produce an unrepresentative group. If it
is possible, random sampling seems safer.
Volunteer bias can be as serious a problem in sampling studies as it is in
trials (§2.4). If we can only obtain data from a subset of our random
sample, then this subset will not be a random sample of the population.
Its members will be self selected. It is often very difficult to get data from
every member of a sample. The proportion for whom data is obtained is
called the response rate and in a sample survey of the general
population is likely to be between
70% and 80%. The possibility that those lost from the sample are
different in some way must be considered. For example, they may tend to
be ill, which can be a serious problem in disease prevalence studies. In
the school study of Banks et al. (1978), the response rate was 80%, most
of those lost being absent from school on the day. Now, some of these
absentees were ill and some were truants. Our sample may thus lead us
to underestimate the prevalence of respiratory symptoms, by omitting
sufferers with current acute disease, and the prevalence of cigarette
smoking by omitting those who have gone for a quick smoke behind the
bike sheds.
One of the most famous sampling disasters, the Literary Digest poll of
1936, illustrates these dangers (Bryson 1976). This was a poll of voting
intentions in the 1936 US presidential election, fought by Roosevelt and
Landon. The sample was a complex one. In some cities every registered
voter was included, in others one in two, and for the whole of Chicago
one in three. Ten million sample ballots were mailed to prospective
voters, but only 2.3 million, less than a quarter, were returned. Still, two
million is a lot of Americans, and these predicted a 60% vote to Landon.
In fact, Roosevelt won with 62% of the vote. The response was so poor
that the sample was most unlikely to be representative of the population,
no matter how carefully the original sample was drawn. Two million
Americans can be wrong! It is not the mere size of the sample, but its
representativeness which is important. Provided the sample is truly
representative, 2000 voters is all you need to estimate voting intentions
to within 2%, which is enough for election prediction if they tell the truth
and do not change their minds (see §18E).
whose parents are not smokers, and for those whose parents are
smokers. As Figure 3.1 shows, this relationship in fact persisted and
there was no reason to suppose that a third causal factor was at work.
Fig. 3.1. Prevalence of self-reported morning cough in Derbyshire
schoolboys, by their own and their parents' cigarette smoking (Bland
et al. 1978)
Non-
Smokers Total
smokers
Males
Females
Lung cancer 19 41 60
patients (31.7%) (68.3%)
Control 32 28 60
patients (53.3%) (46.7%)
difficult to interpret. The evidence from such studies can be useful, but
data from other types of investigation must be considered, too, before
any firm conclusions are drawn.
The case-control design is used clinically to investigate the natural history
of disease by comparing patients with healthy subjects or patients with
another disease. For example, Kiely et al. (1995) were interested in
lymphatic function in inflammatory arthritis. We compared arthritis
patients (the cases) with healthy volunteers (the controls). Lymphatic flow
was measured in the arms of these subjects and the groups compared.
We found that lymphatic drainage was less in the cases than in the
control group, but this was only so for arms which were swollen
(oedematous).
Age (years)
16– 35– Total
34 54 55+
Often the easiest and best method, if not the only method, of obtaining
data about people is to ask them. When we do it, we must be very careful
to ensure that questions are straightforward, unambiguous and in
language the respondents will understand. If we do not do this then
disaster is likely to follow.
0 3 42
1–3 11 3
4–5 5 1
6–7 10 1
factors in the week before the onset of illness. Controls were asked the
same questions about the corresponding week for their matched cases. If
a control or member of his or her family had had diarrhea lasting more
than 3 days in the week before or during the illness of the respective
case, or had spent any nights during that week away from home, another
control was found. Evidence of bird attack included the pecking or tearing
off of milk bottle tops. A history of bird attack was defined as a previous
attack at that house.
Fifty-five people with Campylobacter infection resident in the area were
reported during the study period. Of these, 19 were excluded and 4 could
not be interviewed, leaving 32 cases and 64 matched controls. There
was no difference in milk consumption between cases and controls, but
more cases than controls reported doorstep delivery of bottled milk,
previous milk bottle attack by birds, milk bottle attack by birds in the index
week, and handling or drinking milk from an attacked bottle (Table 3.4).
Cases reported bird attacks more frequently than controls (Table 3.5).
Controls were more likely to have protected their milk bottles from attack
or to have discarded milk from attacked bottles. Almost all subjects
whose milk bottles had been attacked mentioned that magpies and
jackdaws were common in their area, though only 3 had actually
witnessed attacks and none reported bird droppings near bottles.
None of the other factors investigated (handling raw chicken; eating
chicken bought raw; eating chicken, beef or ham bought cooked; eating
out; attending barbecue; cat or dog in the house; contact with other cats
or dogs; and contact with farm animals) were significantly more common
in controls than cases. Bottle attacks seemed to have ceased when the
study was carried out, and no milk could be obtained for analysis.
1. What problems were there in selecting cases?
View Answer
4
Summarizing data
we count the number of patients having each diagnosis. The results are
shown in Table 4.1. The count of individuals having a particular quality is
called the frequency of that quality. For example, the frequency of
schizophrenia is 474. The proportion of individuals having the quality is
called the relative frequency or proportional frequency. The relative
frequency of schizophrenia is 474/1467 = 0.32 or 32%. The set of
frequencies of all the possible categories is called the frequency
distribution of the variable.
Schizophrenia 474
Subnormality 58
Alcoholism 57
Total 1467
Relative Cumulative
Discharge Frequency
frequency frequency
Relative
Relative
Cumulative cumulative
Parity Frequency frequency
frequency frequency
(per cent)
(per cent)
0 59 47.2 59 47.2
1 44 35.2 103 82.4
2.0 0 0.0
2.5 3 5.3
3.0 9 15.8
3.5 14 24.6
4.0 15 26.3
4.5 10 17.5
5.0 6 10.5
5.5 0 0.0
Total 57 100.0
Table 4.6. Tally system for finding the
frequency distribution of FEV1
FEV1 Frequency
2.0 0
2.5 /// 3
5.0 ///// / 6
5.5 0
Total 57
observations will all be at one end of the interval. Making the starting
point of the interval as a fraction rather than an integer gives a slightly
better picture (Figure 4.5). This can also be helpful for continuous data
when there is a lot of digit preference (§15.2). For example, where most
observations are recorded as integers or as something point five, starting
the interval at something point seven five can give a more accurate
picture.
Fig. 4.5. Histograms of parity (Table 4.3) using integer and fractional
cut-off points for the intervals
Relative
Age Relative frequency
frequency (per
group per year (per cent)
cent)
Figure 4.4 shows a histogram for the same distribution as Figure 4.3, with
frequency per unit FEV1 (or frequency density) shown on the vertical
axis. The distributions appear identical and we may well wonder whether
it matters which method we choose. We see that it does matter when we
consider a frequency distribution with unequal intervals, as in Table 4.7. If
we plot the histogram using the heights of the rectangles to represent
relative frequency in the interval we get the left-hand histogram in Figure
4.6, whereas if we use the relative frequency per year we get the right-
hand histogram. These histograms tell different stories. The left-hand
histogram in Figure 4.6 suggests that the most common age for accident
victims is between 15 and 44 years, whereas the right-hand histogram
suggests it is between 0 and 4. The right-hand histogram is correct, the
left-hand histogram being distorted by the unequal class intervals. It is
therefore preferable in general to use the frequency per unit (frequency
density) rather than per class interval when plotting a histogram. The
frequency for a particular interval is then represented by the area of the
rectangle on that interval. Only when the class intervals are all equal can
the frequency for the class interval be represented
mode of the distribution and Figure 4.3 has one such point. It is
unimodal. Figure 4.9 shows a very different shape. Here there are two
distinct modes, one near 5 and the other near 8.5. This distribution is
bimodal. We must be careful to distinguish between the unevenness in
the histogram which results from using a small sample to represent a
large population and those which result from genuine bimodality in the
data. The trough between 6 and 7 in Figure 4.9 is very marked and might
represent a genuine bimodality. In this case we have children, some of
whom have a condition which raises the cholesterol level and some of
whom do not. We actually have two separate populations represented
with some overlap between them. However, almost all distributions
encountered m medical statistics are unimodal.
Figure 4.10 differs from Figure 4.3 in a different way. The distribution of
serum triglyceride is skew, that is, the distance from the central value to
the extreme is much greater on one side than it is on the other. The parts
of the histogram near the extremes are called the tails of the distribution.
If the tails are equal the distribution is symmetrical, as in Figure 4.3. If
the tail on the right is longer than the tail on the left as in Figure 4.10, the
distribution is skew to the right or positively skew. If the tail on the left
is longer, the distribution is skew to the left or negatively skew. This is
unusual, but Figure 4.11 shows an example. The negative skewness
comes about because babies can be born alive at any gestational age
from about 20 weeks, but soon after 40 weeks the baby will have to be
born. Pregnancies will not be allowed to go on for more than 44 weeks;
the birth would be induced artificially. Most distributions encountered in
medical work are symmetrical or skew to the right, for reasons we shall
discuss later (§7.4).
Table 4.8. Serum triglyceride measurements in cord b
or a stem and leaf plot. For the FEV1 data the median is 4.1, the 29th
value in Table 4.4. If we have an even number of points, we choose a
value midway between the two central values.
Fig. 4.11. Gestational age at birth for 1749 deliveries at St. George's
Hospital
For the median, for example, the 0.5 quantile, i = q(n+1) = 0.5 × (57+1) =
29, the 29th observation as before.
Other quantiles which are particularly useful are the quartiles of the
distribution. The quartiles divide the distribution into four equal parts,
called fourths. The second quartile is the median. For the FEV1 data the
first and third quartiles are 3.54 and 4.53. For the first quartile, i = 0.25 ×
58 = 14.5. The quartile is between the 14th and 15th observations, which
are both 3.54. For the third quartile, i = 0.75 × 58 = 43.5, so the quartile
lies between the 42nd and 43rd observations, which are 4.50 and 4.56.
The quantile is given by 4.50 + (4.56 - 4.50) × (43.5 - 43) = 4.53. We
often divide the distribution at 99 centiles or percentiles. The median is
thus the 50th centile. For the 20th centile of FEV1, i = 0.2 × 58 = 11.6, so
the quantile is between the 11th and 12th observation, 3.42 and 3.48,
and can be estimated by 3.42 + (3.48 - 3.42) × (11.6 - 11) = 3.46. We can
estimate these easily from Figure 4.2 by finding the position of the
quantile on the vertical axis, e.g. 0.2 for
the 20th centile or 0.5 for the median, drawing a horizontal line to
intersect the cumulative frequency polygon, and reading the quantile off
the horizontal axis.
Fig. 4.12. Box and whisker plots for FEV1 and for serum triglyceride
The summation sign is an upper case Greek letter, sigma, the Greek S.
When it is obvious that we are adding the values of x1, for all values of i,
which runs from 1 to n, we abbreviate this to ∑xi or simply to ∑x. The
mean of the xi is denoted by [x with bar above] (‘x bar’), and
The sum of the 57 FEV1s is 231.51 and hence the mean is 231.51/57 =
4.06. This is very close to the median, 4.1, so the median is within 1% of
the mean. This is not so for the triglyceride data. The median triglyceride
(Table 4.8) is 0.46 but the mean is 0.51, which is higher. The median is
10% away from the mean. If the distribution is symmetrical the sample
mean and median will be about the same, but in a skew distribution they
will not. If the distribution is skew to the right, as for serum triglyceride,
the mean will be greater, if it is skew to the left the median will be greater.
This is because the values in the tails affect the mean but not the
median.
The sample mean has much nicer mathematical properties than the
median and is thus more useful for the comparison methods described
later. The median is a very useful descriptive statistic, but not much used
for other purposes.
2 -2 4
3 -1 1
9 5 25
5 1 1
4 0 0
0 -4 16
6 2 4
3 -1 1
4 0 0
36 0 52
The most commonly used measures of dispersion are the variance and
standard deviation. We start by calculating the difference between each
observation and the sample mean, called the deviations from the
mean, Table 4.9. If the data are widely scattered, many of the
observations xi will be far from the mean [x with bar above] and so many
deviations xi - [x with bar above] will be large. If the data are narrowly
scattered, very few observations will be far from the mean and so few
deviations xi - [x with bar above] will be large. We need some kind of
average deviation to measure the scatter. If we add all the deviations
together, we get zero, because ∑(xi - [x with bar above]) = ∑xi - ∑[x with
bar above] = ∑xi - n[x with bar above] and n[x with bar above] = ∑xi.
Instead we square the deviations and then add them, as shown in Table
4.9. This removes the effect of sign; we are only measuring the size of
the deviation not the direction. This gives us ∑(xi - [x with bar above])2, in
the example equal to 52, called the sum of squares about the mean,
usually abbreviated to sum of squares.
Clearly, the sum of squares will depend on the number of observations as
well as the scatter. We want to find some kind of average squared
deviation. This leads to a difficulty. Although we want an average squared
deviation, we divide the sum of squares by n - 1, not n. This is not the
obvious thing to do and puzzles
We have already said that ∑(xi - [x with bar above])2 is called the sum of
squares. The quantity n - 1 is called the degrees of freedom of the
variance estimate (§7A). We have:
We shall usually denote the variance by s2. In the example, the sum of
squares is 52 and there are 9 observations, giving 8 degrees of freedom.
Hence s2 = 52/8 = 6.5.
The formula ∑(xi - [x with bar above])2 gives us a rather tedious
calculation. There is another formula for the sum of squares, which
makes the calculation easier to carry out. This is simply an algebraic
manipulation of the first form and give exactly the same answers. We
thus have two formulae for variance:
The algebra is quite simple and is given in §4B. For example, using the
second formula for the nine observations, we have:
Figure 4.14 also shows, this is true for the highly skew triglyceride data,
too. In this case, however, the outlying observations are all in one tail of
the distribution. In general, we expect roughly 2/3 of observations to lie
within one standard deviation of the mean and 95% to lie within two
standard deviations of the mean.
Fig. 4.14. Histograms of FEV1 and triglyceride with mean and
standard deviation
9 1 0 7 5 6 9 5 8 8 1 0 5
1 8 8 8 5 2 4 8 3 1 6 5 5
2 8 1 8 5 8 4 0 1 9 2 1 6
1 9 7 9 7 2 7 7 0 8 1 6 3
7 0 2 8 8 7 2 5 4 1 8 6 8
Appendices
4A Appendix: The divisor for the
variance
The variance is found by dividing the sum of squares about the sample
mean by n - 1, not by n. This is because we want the scatter about the
population mean, and the scatter about the sample mean is always less.
The sample mean is ‘closer’ to the data points than is the population
mean. We shall try a little sampling experiment to show this. Table 4.10
shows a set of 100 random digits which we shall take as the population to
be sampled. They have mean 4.74 and the sum of squares about the
mean is 811.24. Hence the average squared difference from the mean is
8.1124. We can take samples of size two at random from this population
using a pair of decimal dice, which will enable us to choose any digit
numbered from 00 to 99. The first pair chosen was 5 and 6 which has
mean 5.5. The sum of squares about the population mean 4.74 is (5 -
4.74)2 + (6 - 4.74)2 = 1.655. The sum of squares about the sample mean
is (5 - 5.5)2 + (6 - 5.5)2 = 0.5.
The sum of squares about the population mean is greater than the sum
of squares about the sample mean, and this will always be so. Table 4.11
shows this for 20 such samples of size two. The average sum of squares
about the population mean is 13.6, and about the sample mean it is 5.7.
Hence dividing by the sample size (n = 2) we have mean square
differences of 6.8 about the population mean and 2.9 about the sample
mean. Compare this to 8.1 for the population as a whole. We see that the
sum of squares about the population
mean is quite close to 8.1, while the sum of squares about the sample
mean is much less. However, if we divide the sum of squares about the
sample mean by n - 1, i.e. 1, instead of n we have 5.7, which is not much
different to the 6.8 from the sum of squares about the population mean.
5 6 1.655 0.5
8 8 21.255 0.0
6 1 15.575 12.5
9 3 21.175 18.0
5 5 0.135 0.0
7 7 10.215 0.0
1 7 19.095 18.0
9 8 28.775 0.5
3 3 6.055 0.0
5 1 14.055 8.0
8 3 13.655 12.5
5 7 5.175 2.0
5 2 5.575 4.5
5 7 5.175 2.0
8 8 21.255 0.0
3 2 10.535 0.5
0 4 23.015 8.0
9 3 21.175 18.0
5 2 7.575 4.5
6 9 19.735 4.5
Mean variance
Number in sample, estimates
n
2 4.5 9.1
3 5.4 8.1
4 5.9 7.9
5 6.2 7.7
10 7.2 8.0
Table 4.12 shows the results of a similar experiment with more samples
being taken. The table shows the two average variance estimates using n
and n - 1 as the divisor of the sum of squares, for sample sizes 2, 3, 4, 5
and 10. We see that the sum of squares about the sample mean divided
by n increases steadily with sample size, but if we divide it by n - 1
instead of n the estimate does not change as the sample size increases.
The sum of squares about the sample mean is proportional to n - 1.
5. For the first column only, i.e. for 4,7, 4.2, 3.9, and 3.4, calculate the
standard deviation using the deviations from the mean formula
First calculate the sum of the observations and the sum of the
observations squared. Hence calculate the sum of squares about
the mean. Is this the same as that found in 4 above? Hence
calculate the variance and the standard deviation.
View Answer
First calculate the sum of the observations and the sum of the
observations squared. Hence calculate the sum of squares about
the mean. Is this the same as that found in 4 above ? Hence
calculate the variance and the standard deviation.
View Answer
7. Use the following summations for the whole sample: ∑xi = 162.2,
∑xi2 = 676.74. Calculate the mean of the sample, the sum of squares
about the mean, the degrees of freedom for this sum of squares,
and hence estimate the variance and standard deviation.
View Answer
5
Presenting data
figures we get 0.001 10, because the last digit is 6 and so the 9 which
precedes it is rounded up to 10. Note that significant figures are not the
same as decimal places. The number 0.001 10 is given to 5 decimal
places, the number of digits after the decimal point. When rounding to the
nearest digit, we leave the last significant digit, 9 in this case, if what
follows it is less than 5, and increase by one if what follows is greater
than 5. When we have exactly 5, I would always round up, i.e. 1.5 goes
to 2. This means that 0, 1, 2, 3, 4 go down and 5, 6, 7, 8, 9 go up, which
seems unbiased. Some writers take the view that 5 should go up half the
time and down half the time, since it is exactly midway between the
preceding digit and that digit plus one. Various methods are suggested
for doing this but I do not recommend them myself. In any case, it is
usually a mistake to round to so few significant figures that this matters.
How many significant figures we need depends on the use to which the
number is to be put and on how accurate it is anyway. For example, if we
have a sample of 10 sublingual temperatures measured to the nearest
half degree, there is little point in quoting the mean to more than 3
significant figures. What we should not do is to round numbers to a few
significant figures before we have completed our calculations. In the lung
cancer mortality rate example, suppose we round the numerator and
denominator to two significant figures. We have 27 000/24 000 000 =
0.001 125 and the answer is only correct to two figures. This can spread
through calculations causing errors to build up. We always try to retain
several more significant figures than we required for the final answer.
Consider Table 5.1. This shows mortality data in terms of the exact
numbers of deaths in one year. The table is taken from a much larger
table (OPCS 1991) which shows the numbers dying from every cause of
death in the International Classification of Diseases (ICD), which gives
numerical codes to many hundreds of causes of death. The full table,
which also gives deaths by age group, covers 70 A4 pages. Table 5.1
shows deaths for broad groups of diseases called ICD chapters. This
table is not a good way to present these data if we want to get an
understanding of the frequency distribution of cause of death, and the
differences between causes in men and women. This is even more true
of the 70 page original. This is not the purpose of the table, of course. It
is a source of data, a reference document from which users extract
information for their own purposes. Let us see how Table 5.1 can be
simplified. First, we can reduce the number of significant figures. Let us
be extreme and reduce the data to one significant figure (Table 5.2). This
makes comparisons rather easier, but it is still not obvious which are the
most important causes of death. We can improve this by re-ordering the
table to put the most frequent cause, diseases of the circulatory system,
first (Table 5.3). We can also combine a lot of the smaller categories into
an ‘others’ group. I did this arbitrarily, by combining all those accounting
for less than 2% of the total. Now it is clear at a glance that the most
important causes of death in England and Wales are diseases of the
circulatory system, neoplasms and diseases of the respiratory system,
and that these dwarf all the others. Of course, mortality is not the only
indicator of the importance of a disease. ICD chapter XIII, diseases of the
musculo-skeletal
system and connective tissues, are easily seen from Table 5.2 to be only
minor causes of death, but this group includes arthritis and rheumatism,
the most important illness in its effects on daily activity.
Number of
Chapter and type of deaths
I.C.D.
disease
Males Females
II Neoplasms 75 69 948
(cancers) 172
X Genitourinary 3 4 156
system 616
XI Complications of 0 56
pregnancy, childbirth
and the puerperium
Number of
Chapter and type of deaths
I.C.D.
disease
Males Females
II Neoplasms 80 70 000
(cancers) 000
X Genitourinary 4 4 000
system 000
XI Complications of 0 60
pregnancy, childbirth
and the puerperium
Number of
I.C.D. Chapter and type of deaths
disease
Males Females
Others 20 20 000
000
qualitative data, the pie chart or pie diagram. This shows the relative
frequency for each category by dividing a circle into sectors, the angles of
which are proportional to the relative frequency. We thus multiply each
relative frequency by 360, to give the corresponding angle in degrees.
Table 5.4 shows the calculation for drawing a pie chart to represent the
distribution of cause of death for females, using the data of Tables 5.1
and 5.3. (The total degrees are 361 rather than 360 because of rounding
errors in the calculations.) The resulting pie chart is shown in Figure 5.1.
This diagram is said to resemble a pie cut into pieces for serving, hence
the name.
60 5.1 65 5.4
61 5.0 66 5.4
62 5.2 67 5.6
63 5.2 68 5.8
64 5.2 69 6.0
Fig. 5.2. Bar chart showing the relationship between mortality due to
cancer of the oesophagus and year, England and Wales, 1960–1969
There are many uses for bar charts. As in Figure 5.2, they can be used to
show the relationship between two variables, one being quantitative and
the other either qualitative or a quantitative variable which is grouped, as
is time in years. The values of the first variable are shown by the heights
of bars, one bar for each category of the second variable.
Bar charts can be used to represent relationships between more than two
variables. Figure 5.3 shows the relationship between children's reports of
breathlessness and cigarette smoking by themselves and their parents.
We can see quickly that the prevalence of the symptom increases both
with the child's smoking and with that of their parents. In the published
paper reporting these respiratory symptom data (Bland et al. 1978) the
bar chart was not used; the data were given in the form of tables. It was
thus available for other researchers to compare to their own or to carry
out calculations upon. The bar chart was used to present the results
during a conference, where the most important thing was to convey an
outline of the analysis quickly.
Bar charts can also be used to show frequencies. For example, Figure
5.4(a) shows the relative frequency distributions of causes of death
among men and women, Figure 5.4(b) shows the frequency distribution
of cause of death among
men. Figure 5.4(b) looks very much like a histogram. The distinction
between these two terms is not clear. Most statisticians would describe
Figures 4.3, 4.4, and 4.6 as histograms, and Figures 5.2 and 5.3 as bar
charts, but I have seen books which actually reverse this terminology and
others which reserve the term ‘histogram’ for a frequency density graph,
like Figures 4.4 and 4.6.
Fig. 5.3. Bar chart showing the relationship between the prevalence
of self-reported breathlessness among schoolchildren and two
possible causative factors
present these data also. The vertical axis represents albumin and we
choose two arbitrary points on the horizontal axis to represent the
groups.
Alcoholics Controls
15 28 39 41 44 48 34 41 43 45
16 29 39 43 45 48 39 42 43 45
17 32 39 43 45 49 39 42 43 45
18 37 40 44 46 51 40 42 43 45
20 38 40 44 46 51 41 42 44 45
21 38 40 44 46 52 41 42 44 45
28 38 41 44 47 41 42 44 45
Fig. 5.6. Scatter diagrams showing the data of Table 5.7
representing the data of Table 5.5. This chart appears to show a very
rapid increase in mortality, compared to the gradual increase shown in
Figure 5.2. Yet both show the same data. Figure 5.9 omits most of the
vertical scale, and instead stretches that small part of the scale where the
change takes place. Even when we are aware of this, it is difficult to look
at this graph and not think that it shows a large increase in mortality. It
helps if we visualize the baseline as being somewhere near the bottom of
the page.
Fig. 5.9. Bar chart with zero omitted on the vertical scale
There is no zero on the horizontal axis in Figures 5.2 and 5.9, either.
There are two reasons for this. There is no practical ‘zero time’ on the
calendar; we use an arbitrary zero. Also, there is an unstated assumption
that mortality rates vary with time and not the other way round.
The zero is omitted in Figure 5.5. This is almost always done in scatter
diagrams, yet if we are to gauge the importance of the relationship
between vital capacity and height by the relative change in vital capacity
over the height range we need the zero on the vital capacity scale. The
origin is often omitted on scatter diagrams because we are usually
concerned with the existence of a relationship and the distributions
followed by the observations, rather than its magnitude. We estimate the
latter in a different way, described in Chapter 11.
Line graphs are particularly at risk of undergoing the sort of distortion of
missing zero described in §5.8. Many computer programs resist drawing
bar charts like Figure 5.9, but will produce a line graph with a truncated
scale as the default. Figure 5.10 shows a line graph with a truncated
scale, corresponding to Figure 5.9. Just as there, the message of the
graph is a dramatic increase in mortality, which the data themselves do
not really support. We can make this even more dramatic by stretching
the vertical scale and compressing the horizontal scale. The effect is now
really impressive and looks much more likely than Figure 5.7 to attract
research funds, Nobel prizes and interviews on television. Huff (1954)
aptly names such horrors ‘gee whiz’ graphs. They are even more
dramatic if we omit the scales altogether and show only the soaring line.
Fig. 5.10. Line graphs with a missing zero and with a stretched
vertical and compressed horizontal scale, a ‘gee whiz’ graph
Fig. 5.11. Figure 5.1 with three-dimensional effects
This is not to say that authors who show only part of the scale are
deliberately trying to mislead. There are often good arguments against
graphs with vast areas of boring blank paper. In Figure 5.5, we are not
interested in vital capacities near zero and can feel quite justified in
excluding them. In Figure 5.10 we certainly are interested in zero
mortality; it is surely what we are aiming for. The point is that graphs can
so easily mislead the unwary reader, so let the reader beware.
The advent of powerful personal computers led to an increase in the
ability to produce complicated graphics. Simple charts, such as Figure
5.1, are informative but not visually exciting. One way of decorating such
graphs is make them appear three-dimensional. Figure 5.11 shows the
effect. The angles are no longer proportional to the numbers which they
represent. The areas are, but because they are different shapes it is
difficult to compare them. This defeats the primary object of conveying
information quickly and accurately. Another approach to decorating
diagrams is to turn them into pictures. In a pictogram the bars of
Hence we get a straight line relationship between log mortality and time t:
log(mortality after t years) = t × log(constant) + log(mortality as start)
When the constant proportion changes, the slope of the straight line
formed by plotting log(mortality) against time changes and there is a very
obvious kink in the line.
Log scales are very useful analytic tools. However, a graph on a log scale
can be very misleading if the reader does not allow for the nature of the
scale. The log scale in Figure 5.12 shows the increased rate of reduction
in mortality associated with the anti-TB measures quite plainly, but it
gives the impression that these measures were important in the decline
of TB. This is not so. If we look at the corresponding point on the natural
scale, we can see that all these measures did was to accelerate a decline
which had been going on for a long time (see Radical Statistics Health
Group 1976)
Appendices
5A Appendix: Logarithms
Logarithms are not simply a method of calculation dating from before the
computer age, but a set of fundamental mathematical functions. Because
of their special properties they are much used in statistics. We shall start
with logarithms (or logs for short) to base 10, the common logarithms
used in calculations. The log to base 10 of a number x is y where
x = 10y
We write y = log10(x). Thus for example log10(10) = 1, log10(100) = 2,
log10(1 000) = 3, log10(10 000) = 4, and so on. If we multiply two
numbers, the log of the product is the sum of their logs:
log(xy) = log(x) + log(y)
For example.
100 × 1 000 = 102 × 103 = 102+3 = 105 = 100 000
Or in log terms:
log10(100 × 1 000) = log10(100) + log10(1 000) = 2 + 3 = 5
Hence, 100 × 1 000 = 105 = 100 000. This means that any multiplicative
relationship of the form
y = a × b × c × d
can be made additive by a log transformation:
log(y) = log(a) + log(b) + log(c) + log(d)
This is the process underlying the fit to the Lognormal Distribution
described in §7.4.
There is no need to use 10 as the base for logarithms. We can use any
number. The log of a number x to base b can be found from the log to
base a by a simple calculation:
Ten is convenient for arithmetic using log tables, but for other purposes it
is less so. For example, the gradient, slope or differential of the curve y =
log10(x) is log10(e)/x, where e = 2.718 281… is a constant which does not
depend on the base of the logarithm. This leads to awkward constants
spreading through formulae. To keep this to a minimum we use logs to
the base e, called natural or Napierian logarithms after the mathematician
John Napier. This is the logarithm usually produced by LOG(X) functions
in computer languages.
Figure 5.13 shows the log curve for three different bases, 2, e and 10.
The curves all go through the point (1,0), i.e. log(1) = 0. As x approaches
0, log(x) becomes a larger and larger negative number, tending towards
minus infinity as x tends to zero. There are no logs of negative numbers.
As x increases from 1, the curve becomes flatter and flatter. Though
log(x) continues to increase, it does so more and more slowly. The
curves all go through (base, 1) i.e. log(base) = 1. The curve for log to the
base 2 goes through (2,1), (4,2), (8,3) because 21 = 2. 22 = 4, 23 = 8. We
can see that the effect of replacing data by their logs will be to stretch out
the scale at the lower end and contract it at the upper.
We often work with logarithms of data rather than the data themselves.
This may have several advantages. Multiplicative relationships may
become additive, curves may become straight lines and skew
distributions may become symmetrical.
We transform back to the natural scale using the antilogarithm or
antilog. If y = log10(x), x = 10y is the antilog of y. If Z = loge(x), x = ez or x
= exp(z) is the antilog of z. If your computer program does not transform
back, most calculators have ex and 10x functions for this purpose.
Fig. 5.13. Logarithmic curves to three different bases
2 22 17 13 6 22
3 21 21 14 10 26
4 22 17 15 13 12
5 24 22 16 19 33
6 15 23 17 13 19
7 23 20 18 17 21
8 21 16 19 10 28
9 18 24 20 16 19
10 21 21 21 24 13
11 17 20 22 15 29
2. Table 2.8 shows the paralytic polio rates for several groups of
children. Construct a bar chart for the results from the randomized
control areas.
View Answer
6
Probability
6.1 Probability
We use data from a sample to draw conclusions about the population
from which it is drawn. For example, in a clinical trial we might observe
that a sample of patients given a new treatment respond better than
patients given an old treatment. We want to know whether an
improvement would be seen in the whole population of patients, and if so
how big it might be. The theory of probability enables us to link samples
and populations, and to draw conclusions about populations from
samples. We shall start the discussion of probability with some simple
randomizing devices, such as coins and dice, but the relevance to
medical problems should soon become apparent.
We first ask what exactly is meant by ‘probability’. In this book I shall take
the frequency definition: the probability that an event will happen under
given circumstances may be defined as the proportion of repetitions of
those circumstances in which the event would occur in the long run. For
example, if we toss a coin it comes down either heads or tails. Before we
toss it, we have no way of knowing which will happen, but we do know
that it will either be heads or tails. After we have tossed it, of course, we
know exactly what the outcome is. If we carry on tossing our coin, we
should get several heads and several tails. If we go on doing this for long
enough, then we would expect to get as many heads as we do tails. So
the probability of a head being thrown is half, because in the long run a
head should occur on half of the throws. The number of heads which
might arise in several tosses of the coin is called a random variable, that
is, a variable which can take more than one value with given probabilities.
In the same way, a thrown die can show six faces, numbered one to six,
with equal probability. We can investigate random variables such as the
number of sixes in a given number of throws, the number of throws
before the first six, and so on. There is another, broader definition of
probability which leads to a different approach to statistics, the Bayesian
school (Bland and Altman 1998), but it is beyond the scope of this book.
The frequency definition of probability also applies to continuous
measurement, such as human height. For example, suppose the median
height in a population of women is 168 cm. Then half the women are
above 168 cm in height. If we choose women at random (i.e. without the
characteristics of the woman influencing the choice) then in the long run
half the women chosen will have
heights above 168 cm. The probability of a woman having height above
168 cm is one half. Similarly, if 1/10 of the women have height greater
than 180 cm. a woman chosen at random will have height greater than
180 cm with probability 1/10. In the same way we can find the probability
of height being between any given values. When we measure a
continuous quantity we are always limited by the method of
measurement, and so when we say a woman's height is 170 cm we
mean that it is between, say, 169.5 and 170.5 cm, depending on the
accuracy with which we measure. So what we are interested in is the
probability of the random variable taking values between certain limits
rather than particular values.
What happens if we toss two coins at once? We now have four possible
events: a head and a head, a head and a tail, a tail and a head, a tail and
a tail. Clearly, these are equally likely and each has probability 1/4. Let Y
be the number of heads. Y has three possible values: 0, 1, and 2. Y = 0
only when we get a tail and a tail and has probability 1/4. Similarly, Y = 2
only when we get a head and a head, so has probability 1/4. However, Y
= 1 either when we get a head and tail, or when we have a tail and a
head, and so has probability 1/4 + 1/4 = 1/2. We can write this probability
distribution as:
PROB(Y = 0) = 1/4
PROB(Y = 1) = 1/2
PROB(Y = 2) = 1/4
The probability distribution of Y is shown in Figure 6.1(b).
We can apply this to the number of heads in tosses of two coins. The
number of heads will be from a Binomial distribution with p = 0.5 and n =
2. Hence the probability of two heads (r = 2) is:
This is what was found for two coins in §6.3. We can use this distribution
whenever we have a series of trials with two possible outcomes. If we
treat a group of patients, the number who recover is from a Binomial
distribution. If we measure the blood pressure of a group of people, the
number classified as hypertensive is from a Binomial distribution.
Figure 6.3 shows the Binomial distribution for p = 0.3 and increasing
values of n. The distribution becomes more symmetrical as n increases.
It is converging to the Normal distribution, described in the next chapter.
6.5 Mean and variance
The number of different probabilities in a Binomial distribution can be
very large and unwieldy. When n is large, we usually need to summarize
these probabilities in some way. Just as a frequency distribution can be
described by its mean and variance, so can a probability distribution and
its associated random variable.
The mean is the average value of the random variable in the long run. It
is also called the expected value or expectation and the expectation of
a random variable X is usually denoted by E(X). For example, consider
the number of heads in tosses of two coins. We get 0 heads in 1/4 of
pairs of coins, i.e. with probability 1/4. We get 1 head in 1/2 of pairs of
coins, and 2 heads in 1/4 of pairs. The average value we should get in
the long run is found by multiplying each value by the proportion of pairs
in which it occurs and adding:
as its mean and variance suffice. The mean of the Poisson distribution for
the number of events per unit time is simply the rate, µ. The variance of
the Poisson distribution is also equal to µ. Thus the Poisson is a family of
distributions, like the Binomial, but with only one parameter, µ. This
distribution is important, because deaths from many diseases can be
treated as occuring randomly and independently in the population. Thus,
for example, the number of deaths from lung cancer in one year among
people in an occupational group, such as coal miners, will be an
observation from a Poisson distribution, and we can use this to make
comparisons between mortality rates (§16.3).
Figure 6.4 shows the Poisson distribution for four different means. You
will see that as the mean increases the Poisson distribution looks rather
like the Binomial distribution in Figure 6.3. We shall discuss this similarity
further in the next chapter.
the same. For example, Table 6.1 shows the relationship between two
diseases, hay fever and eczema in a large group of children. The
probability that in this group a child with hay fever will have eczema also
is
PROB(eczema | hay fever) = 141/1 069 = 0.13
the proportion of children with hayfever who have eczema also. This is
clearly much less than the probablity that a child with eczema will have
hay fever,
PROB(hay fever | eczema) = 141/561 = 0.25
the proportion of children with eczema who have hayfever also.
Hay fever
Eczema Total
Yes No
For those who never knew, or have forgotten, the theory of combinations,
it goes like this. First, we look at the number of permutations, i.e. ways of
arranging a set of objects. Suppose we have n objects. How many ways
can we order them? The first object can be chosen n ways, i.e. any
object. For each first object there are n - 1 possible second objects, so
there are n × (n - 1) possible first and second permutations. There are
now only n - 2 choices for the third object, n - 3 choices for the fourth,
and so on, until there is only one choice for the last. Hence, there are n ×
(n - 1) × (n - 2) × … × 2 × 1 permutations of n objects. We call this
number the factorial of n and write it ‘n!’.
Now we want to know how many ways there are of choosing r objects
from n objects. Having made a choice of r objects, we can order those in
r! ways. We can also order the n - r not chosen in (n - r)! ways. So the
objects can be ordered in r!(n - r)! ways without altering the objects
chosen. For example, say we choose the first two from three objects, A,
B and C. Then if these are A and B, two permutations give this choice,
ABC and BAC. This is, of course, 2! × 1! = 2 permutations. Each
combination of r things accounts for r!(n - r)! of the n! permutations
possible, so there are
so 0! = 1.
and not
and so we find E(x2i) = σ2 + µ2 and so E(Σx2i) = n(σ2 + µ2), being the sum
of n numbers all of which are σ2 + µ2. We now find the value of E((Σxi)2).
We need
So
Number Number
Age in Age in
surviving, surviving,
years, x years, x
lx lx
0 1 000 60 758
10 959 70 524
20 952 80 211
30 938 90 22
40 920 100 0
50 876
2. What is the probability that this individual will die before age 10?
Which property of probability does this depend on?
View Answer
3. What are the probabilities that the individual will survive to ages
10, 20. 30, 40, 50, 60, 70. 80, 90, 100? Is this set of probabilities a
probability distribution?
View Answer
7. What is the probability that a man dies in his second decade? You
can use the fact that PROB(death in 2nd) + PROB(survives to 3rd) =
PROB(survives to 2nd).
View Answer
8. For each decade, what is the probability that a given man will die
in that decade? This is a probability distribution—why? Sketch the
distribution.
View Answer
7
The Normal distribution
count the number of values in which fall within certain limits (§4.2). We
can represent this as a histogram such as Figure 7.1 (§4.3). One way of
presenting the histogram is as relative frequency density, the proportion
of observations in the interval per unit of X (§4.3), Thus, when the interval
size is 5, the relative frequency density is the relative frequency divided
by 5 (Figure 7.1). The relative frequency in an interval is now represented
by the width of the interval multiplied by the density, which gives the area
of the rectangle. Thus, the relative frequency between any two points can
be found from the area under the histogram between the points. For
example, to estimate the relative frequency between 10 and 20 in Figure
7.1 we have the density from 10 to 15 as 0.05 and between 15 and 20 as
0.03. Hence the relative frequency is
0.05 × (15 - 10) + 0.03 × (20 - 15) = 0.25 + 0.15 = 0.40
If we take a larger sample we can use smaller intervals. We get a
smoother looking histogram, as in Figure 7.2, and as we take larger and
larger samples, and so smaller and smaller intervals, we get a shape
very close to a smooth curve (Figure 7.3). As the sample size
approaches that of the population, which we can assume to be very
large, this curve becomes the relative frequency density of the whole
population. Thus we can find the proportion of observations between any
two limits by finding the area under the curve, as indicated in Figure 7.3.
If we know the equation of this curve, we can find the area under it.
(Mathematically we do this by integration, but we do not need to know
how to integrate to use or to understand practical statistics—all the
integrals we need have been done and tabulated.) Now, if we choose an
individual at random, the probability that X lies between any given limits
is equal to the proportion of individuals who fall between these limits.
Hence, the relative frequency distribution for the whole population gives
us the probability distribution of the variable. We call this curve the
probability density function.
Fig. 7.3. Relative frequency density or probability density function,
showing the probability of an observation between 10 and 20
height, blood pressure, serum cholesterol, etc., do not arise from simple
probability situations. As a result, we do not know the probability
distribution for these measurements on theoretical grounds. As we shall
see, we can often find a standard distribution whose mathematical
properties are known, which fits observed data well and which enables us
to draw conclusions about them. Further, as sample size increases the
distribution of certain statistics calculated from the data, such as the
mean, become independent of the distribution of the observations
themselves and follow one particular distribution form, the Normal
distribution. We shall devote the remainder of this chapter to a study of
this distribution.
Fig. 7.5. Binomial distributions for p = 0.3 and six different values of
n, with corresponding Normal distribution curves
and variance provided n is large enough. Figure 7.5 shows the Binomial
distributions of Figure 6.3 with the corresponding Normal distribution
curves. From n = 10 onwards the two distributions are very close.
Generally, if both np and n(1 - p) exceed 5 the approximation of the
Binomial to the Normal distribution is quite good enough for most
practical purposes. See §8.4 for an application. The Poisson distribution
has the same property, as Figure 6.4 suggests.
distribution lying between given limits. The areas under the curve can be
found numerically, however, and these have been calculated and
tabulated. Table 7.1 shows the area under the probability density curve
for different values of the Normal distribution. To be more precise, for a
value z the table shows the area under the curve to the left of z, i.e. from
minus infinity to z (Figure 7.8). Thus Φ(z) is the probability that a value
chosen at random from the Standard Normal distribution will be less than
z. Φ is the Greek capital ‘phi’. Note that half this table is not strictly
necessary. We need only the half for positive z as Φ(-z) + Φ(z) = 1. This
arises from the symmetry of the distribution. To find the probability of z
lying between two values a and b, where b > a, we find Φ(b) - Φ(a). To
find the probability of z being greater than a we find 1 - Φ(a). These
formulae are all examples of the additive law of probability. Table 7.1
gives only a few values of z, and much more extensive ones are
available (Lindley and Miller 1955, Pearson and Hartley 1970). Good
statistical computer programs will calculate these values when they are
needed.
One-sided Two-sided
P1 (z) P2 (z)
50 0.00
25 0.67 50 0.67
10 1.28 20 1.28
5 1.64 10 1.64
1 2.33 2 2.33
Fig. 7.8. One- and two-sided percentage points (5%) of the Standard
Normal distribution
On the curve with mean 0 nearly all the probability is between -3 and +3.
For the curve with mean 1 it is between -2 and +4, i.e. between the mean
-3 and the mean +3. The probability of being a given number of units
from the mean is the same for both distributions, as is also shown by the
5% points.
Fig. 7.9. Normal distributions with different means and with different
variances, showing two-sided 5% points
more complex formulae. It does not make much difference which is used.
We find from a table of the Normal distribution the values of z which
correspond to Φ(z) = 0.5/n, 1.5/n, etc. (Table 7.1 lacks detail for practical
work, but will do for illustration.) For 5 points, for example, we have Φ(z)
= 0.1, 0.3, 0.5, 0.7, and 0.9. and z = -1.3, -0.5, 0, 0.5, and 1.3. These are
the points of the Standard Normal distribution which correspond to the
observed data. Now, if the observed data come from a Normal
distribution of mean µ and variance σ2, the observed point should equal
σz + µ, where z is the corresponding point of the Standard Normal
distribution. If we plot the Standard Normal points against the observed
values we should get something close to a straight line. We can write the
equation of this line as σz + µ = x, where x is the observed variable and z
the corresponding quantile of the Standard Normal distribution. We can
rewrite this as
which goes through the point defined by (µ, 0) and has slope 1/σ (see
§11.1). If the data are not from a Normal distribution we will not get a
straight line, but a curve of some sort. Because we plot the quantiles of
the observed frequency
14 25 30 42 54
17 26 31 43 54
20 26 31 46 63
21 26 32 48 67
22 27 35 52 83
24
Vit Vit
i Φ(z) z i Φ(z) z
D D
Φ(z) = (i - 0.5)/26
Fig. 7.12. Blood vitamin D levels and log10 vitamin D for 26 normal
men, with Normal plots
Table 7.3 shows vitamin levels measured in the blood of 26 healthy men.
The calculation of the Normal plot is shown in Table 7.4. Note that the
Φ(z) = (i - 0.5)/26 and z are symmetrical, the second half being the first
half with opposite sign. The value of the Standard Normal deviate, z, can
be found by interpolation in Table 7.1, by using a fuller table, or by
computer. Figure 7.12 shows the histogram and the Normal plot for these
data. The distribution is skew and the Normal plot shows a pronounced
curve. Figure 7.12 also shows the vitamin D data after log transformation.
It is quite easy to produce the Normal plot, as the corresponding
Standard Normal deviate, z, is unchanged. We only need to log the
observations and plot again. The Normal plot for the transformed data
conforms very well to the theoretical line, suggesting that the distribution
of log vitamin D level is close to the Normal.
A single bend in the Normal plot indicates skewness. A double curve
indicates that both tails of the distribution are different from the Normal,
usually being too long, and many curves may indicate that the distribution
is bimodal (Figure 7.13). When the sample is small, of course, there will
be some random fluctuations.
There are several different ways to display the Normal plot. Some
programs plot the data distribution on the vertical axis and the theoretical
Normal distribution on the horizontal axis, which reverses the direction of
the curve. Some
plot the theoretical Normal distribution with mean [x with bar above], the
sample mean, and standard deviation s, the sample standard deviation.
This is done by calculating [x with bar above] + sz. Figure 7.14(a) shows
both these features, the Normal plot drawn by the program Stata's
‘qnorm’ command. The straight line is the line of equality. This plot is
identical to the second plot in Figure 7.12, except for the change of scale
and switching of the axes. A slight variation is the standardized Normal
probability plot or p-p plot, where we standardize the observations to
zero mean and standard deviation one, y = (x-[x with bar above])/s, and
plot the cumulative Normal
Appendices
7A Appendix: Chi-squared, t, and F
Less mathematically inclined readers can skip this section, but those who
persevere should find that applications like chi-squared tests (Chapter
13) appear much more logical.
Many probability distributions can be derived for functions of Normal
variables which arise in statistical analysis. Three of these are particularly
important: the Chi-squared, t and F distributions. These have many
applications, some of which we shall discuss in later chapters.
The Chi-squared distribution is denned as follows. Suppose Z is a
Standard Normal variable, so having mean 0 and variance 1. Then the
variable formed by Z2 follows the Chi-squared distribution with 1 degree
of freedom. If we have n such independent Standard Normal variables,
Z1, Z2,…, Zn then the variable defined by
χ2 = Z21 + Z22 + … + Z2n
is defined to be the Chi-squared distribution with n degrees of
freedom. χ is the Greek letter ‘chi’, pronounced ‘ki’ as in ‘kite’. The
distribution curves for several different numbers of degrees of freedom
are shown in Figure 7.15. The mathematical description of this curve is
rather complicated, but we do not need to go into this.
Some properties of the Chi-squared distribution are easy to deduce. As
the distribution is the sum of n independent identically distributed random
variables it tends to the Normal as n increases, from the central limit
theorem (§7.2). The convergence is slow, however, (Figure 7.15) and the
square root of chi-squared converges much more quickly. The expected
value of Z2 is the variance of Z, the expected value of Z being 0, and so
E(Z2) = 1. The expected value of chi-squared with n degrees of freedom
is thus n:
37. When a Normal plot is drawn with the Standard Normal deviate
on the y axis:
(a) a straight line indicates that observations are from a Normal
Distribution;
(b) a curve with decreasing slope indicates positive skewness;
(c) an ‘S’ shaped curve (or ogive) indicates long tails;
(d) a vertical line will occur if all observations are equal;
(e) if there is a straight line its slope depends on the standard
deviation.
View Answer
2. Construct a Normal plot for the data. This is quite easy as they
are ordered already. Find (i - 0.5)/n for i = 1 to 40 and obtain the
corresponding cumulative Normal probabilities from Table 7.1. Now
plot these probabilities against the corresponding blood glucose.
View Answer
3. Does the plot appear to give a straight line? Do the data follow a
Normal distribution?
View Answer
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Table of Contents > 8 - Estimation
8
Estimation
9 1 0 7 5 6 9 5 8 8 1 0 5
1 8 8 8 5 2 4 8 3 1 6 5 5
2 8 1 8 5 8 4 0 1 9 2 1 6
1 9 7 9 7 2 7 7 0 8 1 6 3
7 0 2 8 8 7 2 5 4 1 8 6 8
Fig. 8.1. Distribution of the population of Table 8.1
Sample 6 7 7 1 5 5
4 8 9 8 2 5
6 1 2 8 9 7
1 8 7 4 5 8
Sample 7 7 2 8 3 4
8 3 5 0 7 8
7 8 0 7 4 7
2 7 8 7 8 7
Fig. 8.2. Distribution of the population of Table 8.1 and of the sample
of the means of Table 8.2
from the standard error. What we do is find limits which are likely to
include the population mean, and say that we estimate the population
mean to lie somewhere in the interval (the set of all possible values)
between these limits. This is called an interval estimate.
Fig. 8.4. Sampling distribution of the mean of 4 observations from a
Standard Normal distribution
say that the population value lies within the interval. We just don't know
which 95%. We express this by saying that we are 95% confident that the
mean lies between these limits.
Fig. 8.5. Mean and 95% confidence interval for 20 random samples
of 100 observations from the Standard Normal distribution
In the FEV1 example, the sampling distribution of the mean is Normal
and its standard deviation is well estimated because the sample is large.
This is not always true and although it is usually possible to calculate
confidence intervals for an estimate they are not all quite as simple as
that for the mean estimated from a large sample. We shall look at the
mean estimated from a small sample in §10.2.
There is no necessity for the confidence interval to have a probability of
95%. For example, we can also calculate 99% confidence limits. The
upper 0.5% point of the Standard Normal distribution is 2.58 (Table 7.2),
so the probability of a Standard Normal deviate being above 2.58 or
below -2.58 is 1% and the probability of being within these limits is 99%.
The 99% confidence limits for the mean FEV1 are therefore, 4.062 - 2.58
× 0.089 and 4.062 + 2.58 × 0.089, i.e. 3.8 and 4.3 litres. These give a
wider interval than the 95% limits, as we would expect since we are more
confident that the mean will be included. The probability we choose for a
confidence interval is thus a compromise between the desire to include
the estimated population value and the desire to avoid parts of scale
where there is a low probability that the mean will be found. For most
purposes, 95% confidence intervals have been found to be satisfactory.
Standard error is not the only way in which we can calculate confidence
intervals, although at present it is the one used for most problems. In
§8.8 I describe a different approach based on the exact probabilities of a
distribution, which requires no large sample assumption. In §8.9 I
describe a large sample method which uses the Binomial distribution
directly. There are others, which I shall omit because they are rarely
used.
The standard error of the proportion is only of use if the sample is large
enough for the Normal approximation to apply. A rough guide to this is
that np and n(1 - p) should both exceed 5. This is usually the case when
we are concerned with straightforward estimation. If we try to use the
method for smaller samples, we may get absurd results. For example, in
a study of the prevalence of HIV in ex-prisoners (Turnbull et al. 1992), of
29 women who did not inject drugs one was HIV positive. The authors
reported this to be 3.4%, with a 95% confidence interval -3.1% to 9.9%.
The lower limit of -3.1%, obtained from the observed proportion minus
1.96 standard errors, is impossible. As Newcombe (1992) pointed out,
the correct 95% confidence interval can be obtained from the exact
probabilities of the Binomial distribution and is 0.1% to 17.8% (§8.8).
Bronchitis at 5
Cough at 14 Total
Yes No
Yes 26 44 70
The 95% confidence interval for the difference is 0.05317 - 1.96 × 0.0188
to 0.05317 + 1.96 × 0.0188 = 0.016 to 0.090. Although the difference is
not very precisely estimated, the confidence interval does not include
zero and gives us clear evidence that children with bronchitis reported in
infancy are more likely than others to be reported to have respiratory
symptoms in later life. The data on lung function in §8.5 gives us some
reason to suppose that this is not entirely due to response bias (§3.9). As
in §8.4, the confidence interval must be estimated
The standard error is the square root of this. (This formula is often written
in terms of frequencies, but I think this version is clearer.) For the
example the log ratio is loge(2.26385) = 0.81707 and the standard error is
The 95% confidence interval for the log ratio is therefore 0.81707-1.96 ×
0.23784 to 0.81707+1.96 × 0.23784 = 0.35089 to 1.28324. The 95%
confidence interval for the ratio of proportions itself is the antilog of this:
e0.35089 to e1.28324 = 1.42 to 3.61. Thus we estimate that the proportion of
children reported to cough during the day or at night among those with a
history of bronchitis is between 1.4 to 3.6 times the proportion among
those without a history of bronchitis.
The proportion of individuals in a population who develop a disease or
symptom is equal to the probability that any given individual will develop
the disease, called the risk of an individual developing a disease. Thus in
Table 8.3 the risk
that a child with bronchitis before age 5 will cough at age 14 is 26/273 =
0.095 24, and the risk for a child without bronchitis before age 5 is
44/1046 = 0.04207. To compare risks for people with and without a
particular risk factor, we look at the ratio of the risk with the factor to the
risk without the factor, the relative risk. The relative risk of cough at age
14 for bronchitis before 5 is thus 2.26. To estimate the relative risk
directly, we need a cohort study (§3.7) as in Table 8.3. We estimate
relative risk for a case-control study in a different way (§13.7).
In the unusual situtation when the samples are paired, either matched or
two observations on the same subject, we use a different method (§13.9).
Unless the observed proportion is zero or one, these values are never
included in the exact confidence interval. The population proportion of
successes cannot be zero if we have observed a success in the sample.
It cannot be one if we have observed a failure.
The 95% confidence interval is thus from the 22nd to the 36th
observation, 3.75 to 4.30 litres from Table 4.4. Compare this to the 95%
confidence interval for the mean, 3.9 to 4.2 litres, which is completely
included in the interval for the median. This method of estimating
percentiles is relatively imprecise. Another example is given §15.5.
8.10 What is the correct confidence interval?
A confidence interval only estimates errors due to sampling. They do not
allow for any bias in the sample and give us an estimate for the
population of which our data can be considered a random sample. As
discussed in §3.5, it is often not clear what this population is, and we rely
far more on the estimation of differences than absolute values. This is
particularly true in clinical trials. We start with patients in one locality,
exclude some, allow refusals, and the patients cannot be regarded as a
random sample of patients in general. However, we then randomize into
two groups which are then two samples from the same population, and
only the treatment differs between them. Thus the difference is the thing
we want the confidence interval for, not for either group separately. Yet
researchers often ignore the direct comparison in favour of estimation
using each group separately.
For example, Salvesen et al. (1992) reported follow-up of two
randomized controlled trials of routine ultrasonography screening during
pregnancy. At ages 8 to 9 years, children of women who had taken part
in these trials were followed up. A subgroup of children underwent
specific tests for dyslexia. The test results classified 21 of the 309
screened children (7%, 95% confidence interval 3-10%) and 26 of the
294 controls (9%, 95% confidence interval 4–12%) as dyslexic. Much
more useful would be a confidence interval for the difference between
prevalences (-6.3 to 2.2 percentage points) or their ratio (0.44 to 1.34),
because we could then compare the groups directly.
39. The 95% confidence limits for the mean estimated from a set of
observations
(a) are limits between which, in the long run, 95% of observations
fall;
(b) are a way of measuring the precision of the estimate of the
mean;
(c) are limits within which the sample mean falls with probability
0.95;
(d) are limits which would include the population mean for 95% of
possible samples;
(e) are a way of measuring the variability of a set of observations.
View Answer
Standard
Number Mean
deviation
3. Find the standard error of the mean plasma magnesium for each
group.
View Answer
9
Significance tests
Number of attacksDifference
while on Sign of
placebo —
difference
Placebo Pronethalol pronethalol
71 29 42 +
8 1 7 +
14 7 7 +
23 16 7 +
34 25 9 +
79 65 14 +
60 41 19 +
2 0 2 +
3 0 3 +
17 15 2 +
7 2 5 +
Thus, we would have observed a very unlikely event if the null hypothesis
were true. This means that the data are not consistent with null
hypothesis, and we can conclude that there is strong evidence in favour
of a difference between the treatments. (Since this was a double blind
randomized trial, it is reasonable to suppose that this was caused by the
activity of the drug.)
5. Conclude that the data are consistent or inconsistent with the null
hypothesis.
We shall deal with several different significance tests in this and
subsequent chapters. We shall see that they all follow this pattern.
If the data are not consistent with the null hypothesis, the difference is
said to be statistically significant. If the data do not support the null
hypothesis, it is sometimes said that we reject the null hypothesis, and if
the data are consistent with the null hypothesis it is said that we accept it.
Such an ‘all or nothing’ decision making approach is seldom appropriate
in medical research. It is preferable to think of the significance test
probability as an index of the strength of evidence against the null
hypothesis. The term ‘accept the null hypothesis’ is also misleading
because it implies that we have concluded that the null hypothesis is true,
which we should not do. We cannot prove statistically that something,
such as a treatment effect, does not exist. It is better to say that we have
not rejected or have failed to reject the null hypothesis.
The probability of such an extreme value of the test statistic occurring if
the null hypothesis were true is often called the P value. It is not the
probability that the null hypothesis is true. This is a common
misconception. The null hypothesis is either true or it is not; it is not
random and has no probability. I suspect that many researchers have
managed to use significance tests quite effectively despite holding this
incorrect view.
We can use this confidence interval to carry out a significance test of the
null hypothesis that the difference between the means is zero, i.e. the
alternative hypothesis is that µ1 and µ2 are not equal. If the confidence
interval includes zero, then the probability of getting such extreme data if
the null hypothesis were true is greater than 0.05 (i.e. 1 - 0.95). If the
confidence interval excludes zero, then the probability of such extreme
data under the null hypothesis is less
than 0.05 and the difference is significant. Another way of doing the same
thing is to note that
This is the test statistic, and if it lies between -1.96 and +1.96 then the
probability of such an extreme value is greater than 0.05 and the
difference is not significant. If the test statistic is greater than 1.96 or less
than -1.96, there is a less than 0.05 probability of such data arising if the
null hypothesis were true, and the data are not consistent with null
hypothesis; the difference is significant at the 0.05 or 5% level. This is the
large sample Normal test or z test for two means.
For an example, in a study of respiratory symptoms in schoolchildren
(§8.5), we wanted to know whether children reported by their parents to
have respiratory symptoms had worse lung function than children who
were not reported to have symptoms. Ninety-two children were reported
to have cough during the day or at night, and their mean PEFR was
294.8 litre/min with standard deviation 57.1 litre/min; 1643 children were
reported not to have the symptom, and their mean PEFR was 313.6
litre/min with standard deviation 55.2 litre/min. We thus have two large
samples, and can apply the Normal test. We have
The difference between the two groups is [x with bar above]1 - [x with bar
above]2 = 294.8 - 313.6 = -18.8. The standard error of the difference is
Note that the standard error used here is not the same as that found in
§8.6. It is only correct if the null hypothesis is true. The formula of §8.6
should be used for finding the confidence interval. Thus the standard
error used for testing is not identical to that used for estimation, as was
the case for the comparison of two means. It is possible for the test to be
significant and the confidence interval include zero. This property is
possessed by several related tests and confidence intervals.
This is a large sample method, and is equivalent to the chi-squared test
for a 2 by 2 table (§13.1,2). How small the sample can be and methods
for small samples are discussed in §13.3-6.
Note that we do not need a different test for the ratio of two proportions,
as the null hypothesis that the ratio in the population is one is the same
as the null hypothesis that the difference in the population is zero.
For the comparison of PEFR in children with and without phlegm (§9.7),
for
example, suppose that the population means were in fact µ1 = 310 and
µ2 = 295 litre/min, and each population had standard deviation 55
litre/min. The sample sizes were n1 = 1708 and n2 = 27, so the standard
error of the difference would be
From Table 7.1, Φ(0.55) is between 0.691 and 0.726, about 0.71. The
power of the test would be 1 - 0.71 = 0.29. If these were the population
means and standard deviation, our test would have had a poor chance of
detecting the difference in means, even though it existed. The test would
have low power. Figure 9.3 shows how the power of this test changes
with the difference between population means. As the difference gets
larger, the power increases, getting closer and closer to 1. The power is
not zero even when the population difference is zero, because there is
always the possibility of a significant difference, even when the null
hypothesis is true. 1- power = β, the probability of a Type II or beta error
(§9.4) if the population difference = 15 litres/min.
Fig. 9.3. Power curve for a comparison of two means from samples
of size 1708 and 27
0.05 that one of the k tests will have a P value less than α if the null
hypotheses are true. Thus, if in a clinical trial we compare two treatments
within 5 subsets of patients, the treatments will be significantly different at
the 0.05 level if there is a P value less than 0.01 within any of the
subsets. This is the Bonferroni method. Note that they are not significant
at the 0.01 level, but at only the 0.05 level. The k tests together test the
composite null hypothesis that there is no treatment effect on any
variable.
We can do the same thing by multiplying the observed P value from the
significance tests by the number of tests, k, any kP which exceeds one
being ignored. Then if any kP is less than 0.05, the two treatments are
significant at the 0.05 level.
For example, Williams et al. (1992) randomly allocated elderly patients
discharged from hospital to two groups. The intervention group received
timetabled visits by health visitor assistants, the control patients group
were not visited unless there was perceived need. Soon after discharge
and after one year, patients were assessed for physical health, disability,
and mental state using questionnaire scales. There were no significant
differences overall between the intervention and control groups, but
among women aged 75–79 living alone the control group showed
significantly greater deterioration in physical score than did the
intervention group (P = 0.04), and among men over 80 years the control
group showed significantly greater deterioration in disability score than
did the intervention group (P = 0.03). The authors stated that ‘Two small
sub-groups of patients were possibly shown to have benefited from the
intervention…. These benefits, however, have to be treated with caution,
and may be due to chance factors.’ Subjects were cross-classified by age
groups, whether living alone, and sex, so there were at least eight
subgroups, if not more. Thus even if we consider the three scales
separately, only a P value less than 0.05/8 = 0.006 would provide
evidence of a treatment effect. Alternatively, the true P values are 8 ×
0.04 = 0.32 and 8 × 0.03 = 0.24.
A similar problem arises if we have multiple outcome measurements. For
example, Newnham et al. (1993) randomized pregnant women to receive
a series of Doppler ultrasound blood flow measurements or to control.
They found a significantly higher proportion of birthweights below the
10th and 3rd centiles (P = 0.006 and P = 0.02). These were only two of
many comparisons, however, and one would suspect that there may be
some spurious significant differences among so many. At least 35 were
reported in the paper, though only these two were reported in the
abstract. (Birthweight was not the intended outcome variable for the trial.)
These tests are not independent, because they are all on the same
subjects, using variables which may not be independent. The proportions
of birthweights below the 10th and 3rd centiles are clearly not
independent, for example. The probability that two correlated variables
both give non-significant differences when the null hypothesis is true is
greater than (1- α)2 because if the first test is not significant, the second
now has a probability greater than 1 - α of being not significant also.
(Similarly, the probability that both are significant exceeds α2, and the
probability that only one is significant is reduced.)
45. When comparing the means of two large samples using the
Normal test:
(a) the null hypothesis is that the sample means are equal;
(b) the null hypothesis is that the means are not significantly
different;
(c) standard error of the difference is the sum of the standard
errors of the means;
(d) the standard errors of the means must be equal;
(e) the test statistic is the ratio of the difference to its standard
error.
View Answer
Significance
Patients Controls
Rarely or 11 51
never
Rarely or 18 56
never
Rarely or 28 66
never
Rarely or 30 65
never
Several papers soon appeared in which this study was repeated, with
variations. None was identical in design to James' study and none
appeared to support his findings. Mayberry et al. (1978) interviewed 100
patients with Crohn's disease, mean duration nine years. They obtained
100 controls, matched for age and sex, from patients and their relatives
attending a fracture clinic. Cases and controls were interviewed about
their current breakfast habits (Table 9.3). The only significant difference
was an excess of fruit juice drinking in controls. Cornflakes were eaten by
29 cases compared to 22 controls, which was not significant. In this study
there was no particular tendency for cases to report more foods than
controls. The authors also asked cases whether they knew of an
association between food (unspecified) and Crohn's disease. The
association with cornflakes was reported by 29, and 12 of these had
stopped eating them, having previously eaten them regularly. In their 29
matched controls, 3 were past cornflakes eaters. Of the 71 Crohn's
patients who were unaware of the association, 21 had discontinued
eating cornflakes compared to 10 of their 71 controls. The authors
remarked ‘seemingly patients with Crohn's disease had significantly
reduced their consumption of cornflakes compared with controls,
irrespective of whether they were aware of the possible association’.
1. Are the cases and controls comparable in either of these studies?
View Answer
Crohn's
Foods at patients Controls Significance
breakfast (n = (n = 100) test
100)
Bread 91 86
Toast 59 64
Egg 31 37
Porridge 20 18
Weetabix, 21 19
shreddies
or
shredded
wheat
Cornflakes 29 22
Special K 4 7
Rice 6 6
krispies
Sugar 3 1
puffs
Bran or all 13 12
bran
Muesli 3 10
Any 55 55
Cereal
4. In the study of Mayberry et al. how many Crohn's cases and how
many controls had ever been regular eaters of cornflakes? How
does this compare with James' findings?
View Answer
6. For the data of Table 9.2, calculate the percentage of cases and
controls who said that they ate the various cereals. Now divide the
proportion of cases who said that they had eaten the cereal by the
proportion of controls who reported eating it. This tells us, roughly,
how much more likely cases were to report the cereal than were
controls. Do you think eating cornflakes is particularly important?
View Answer
10
Comparing the means of small samples
between the worse foot (in terms of ulceration, not capillaries) and the
better foot for the ulcerated patients. The first step is to find the
differences (worse – better). We then find the mean difference and its
standard error, as described in §8.2. These are in the last column of
Table 10.2.
19 16 17.5 9 ? 9.0
25 30 27.5 11 ? 11.0
25 29 27.0 15 10 12.5
26 33 29.5 16 21 18.5
26 28 27.0 18 18 18.0
30 28 29.0 18 18 18.0
33 36 34.5 19 26 22.5
33 29 31.0 20 ? 20.0
34 37 35.5 20 20 20.0
34 33 33.5 20 33 26.5
34 37 35.5 20 26 23.0
34 ? 34.0 21 15 18.0
35 38 36.5 22 23 22.5
36 40 38.0 22 ? 22.0
39 41 40.0 23 23 23.0
40 39 39.5 25 30 27.5
41 39 40.0 26 31 28.5
41 39 40.0 27 26 26.5
56 48 52.0 27 ? 27.0
35 23 29.0
47 42 44.5
? 24 24.0
? 28 28.0
Number 19 23
observation.
? = Missing data.
To find the 95% confidence interval for the mean difference we must
suppose that the differences follow a Normal distribution. To calculate the
interval, we first require the relevant point of the t distribution from Table
10.1. There are 16 non-missing differences and hence n - 1 = 15 degrees
of freedom associated with s2. We want a probability of 0.95 of being
closer to zero than t, so we go to Table 10.1 with probability = 1 - 0.95 =
0.05. Using the 15 d.f. row, we get t = 2.13. Hence the difference
between a sample mean and the population mean is less than 2.13
standard errors for 95% of samples, and the 95% confidence interval is
-0.81 - 2.13 × 1.51 to -0.81 + 2.13 × 1.51 = -4.03 to +2.41
capillaries/mm2.
On the basis of these data, the capillary density could be less in the
worse affected foot by as much as 4.03 capillaries/mm2, or greater by as
much as 2.41 capillaries/mm2. In the large sample case, we would use
the Normal distribution instead of the t distribution, putting 1.96 instead of
2.13. We would not then need the differences themselves to follow a
Normal distribution.
Fig. 10.3. Normal plot for differences and plot of difference against
average for the data of Table 10.2, ulcerated patients
We can also use the t distribution to test the null hypothesis that in the
population the mean difference is zero. If the null hypothesis were true,
and the differences follow a Normal distribution, the test statistic
mean/standard error would be from a t distribution with n - 1 degrees of
freedom. This is because the null hypothesis is that the mean difference
µ = 0, hence the numerator [x with bar above] - µ = [x with bar above].
We have the usual ‘estimate over standard error’ formula. For the
example, we have
If we go to the 15 degrees of freedom row of Table 10.1, we find that the
probability of such an extreme value arising is greater than 0.10, the 0.10
point of the distribution being 1.75. Using a computer we would find P =
0.6. The data are consistent with the null hypothesis and we have failed
to demonstrate the existence of a difference. Note that the confidence
interval is more informative than the significance test.
We could also use the sign test to test the null hypothesis of no
difference. This gives us 5 positives out of 12 differences (4 differences,
being zero, give no useful information) which gives a two sided
probability of 0.8, a little larger than that given by the t test. Provided the
assumption of a Normal distribution is true, the t test is preferred because
it is the most powerful test, and so most likely to detect differences
should they exist.
The validity of the paired t method described above depends on the
assumption that the differences are from a Normal distribution. We can
check the assumption of a Normal distribution by a Normal plot (§7.5).
Figure 10.3 shows a Normal plot for the differences. The points lie close
to the expected line, suggesting that there is little deviation from the
Normal.
Another plot which is a useful check here is the difference against the
subject mean (Figure 10.3). If the difference depends on magnitude, then
we should be careful of drawing any conclusion about the mean
difference. We may want to investigate this further, perhaps by
transforming the data (§10.4). In this case the difference between the two
feet does not appear to be related to the level of capillary density and we
need not be concerned about this.
The differences may look like a fairly good fit to the Normal even when
the measurements themselves do not. There are two reasons for this: the
subtraction removes variability between subjects, leaving the
measurement error which is more likely to be Normal, and the two
measurement errors are then added by the differencing, producing the
tendency of sums to the Normal seen in the Central Limit theorem (§7.3).
The assumption of a Normal distribution for the one sample case is quite
likely to be met. I discuss this further in §10.5.
Fig. 10.4. Scatter plot against group and Normal plot for the patient
averages of Table 10.2
For a practical example, Table 10.2 shows the average capillary density
over both feet (if present) for normal control subjects as well as ulcer
patients. We shall estimate the difference between the ulcerated patients
and controls. We can check the assumptions of Normal distribution and
uniform variance. From Table 10.2 the variances appear remarkably
similar, 53.12 and 53.47. Figure 10.4 shows that there appears to be a
shift of mean only. The Normal plot combines by groups by taking the
differences between each observation and its group mean, called the
residuals. This has a slight kink at the end but no pronounced curve,
suggesting that there is little deviation from the Normal. I therefore feel
quite happy that the assumptions of the two-sample t method are met.
First we find the common variance estimate, s2. The sums of squares
about the two sample means are 956.13 and 1176.32. This gives the
combined sum of squares about the sample means to be 956.13 +
1176.32 = 2132.45. The combined degrees of freedom are n1 + n2 - 2 =
19 + 23 - 2 = 40. Hence s2 = 2132.45/40 = 53.31. The standard error of
the difference between means is
The value of the t distribution for the 95% confidence interval is found
from the 0.05 column and 40 degrees of freedom row of Table 10.1,
giving t0.05 = 2.02. The difference between means (control – ulcerated) is
34.08 - 22.59 = 11.49. Hence the 95% confidence interval is 11.49 - 2.02
× 2.26 to 11.49 + 2.02 × 2.26, giving 6.92 to 16.06 capillaries/mm2.
Hence there is clearly a difference in capillary density between normal
controls and ulcerated patients.
To test the null hypothesis that in the population the control - ulcerated
difference is zero, the test statistic is difference over standard error,
11.49/2.26 = 5.08. If the null hypothesis were true, this would be an
observation from the t distribution with 40 degrees of freedom. From
Table 10.1, the probability of such an extreme value is less than 0.001.
Hence the data are not consistent with the null hypothesis and we can
conclude that there is strong evidence of a difference in the populations
which these patients represent.
unless b = 1, when we use the log. (I shall resist the temptation to prove
this, though I can. Any book on mathematical statistics will do it.) Thus, if
the standard deviation is proportional to the square root of the mean (i.e.
variance proportional to mean), e.g. Poisson variance (§6.7), b = 0.5, 1 -
b = 0.5, and we use a square root transformation. If the standard
deviation is proportional to the mean we log. If the standard deviation is
proportional to the square of the mean we have b = 2, 1 - b = -1, and we
use the reciprocal. Another, rarely seen transformation is used when
observations are Binomial proportions. Here the standard deviation
increases as the proportion goes from 0.0 to 0.5, then decreases as the
proportion goes from 0.5 to 1.0. This is the arcsine square root
transformation. Whether it works depends on how much other variation
there is. It has now been largely superseded by logistic regression
(§17.8).
When we have several groups we can plot log(s) against log([x with bar
above]) then draw a line through the points. The slope of the line is b
(see Healy 1968). Trial and error, however, combined with scatter plots,
histograms, and Normal plots, usually suffice.
Table 10.3 shows some data from a study of anthropometry and
diagnosis in patients with intestinal disease (Maugdal et al. 1985). We
were interested in differences in anthropometrical measurements
between patients with different diagnoses, and here we have the biceps
skinfold measurements for 20 patients with Crohn's disease and 9
patients with coeliac disease. The data have been put into order of
magnitude and it is fairly obvious that the distribution is skewed to the
right. Figure 10.5 shows this clearly. I have subtracted the group mean
from each observation, giving what is called the within-group residuals,
and then found both the frequency distribution and Normal plot. The
distribution is clearly skew, and this is reflected in the Normal plot, which
shows a pronounced curvature.
Fig. 10.6. Scatter plot, histogram, and Normal plot for the biceps
skinfold data, after square root, log, and reciprocal transformations
millimetres. How could they be, for they do not contain zero yet the
difference is not significant? They are in fact the 95% confidence limits
for the ratio of the Crohn's disease geometric mean to the coeliac
disease geometric mean (§7.4). If there were no difference, of course,
the expected value of this ratio would be one, not zero, and so lies within
the limits. The reason is that when we take the difference between the
logarithms of two numbers, we get the logarithm of their ratio, not of their
difference (§5A).
Two-sample 95%
t test, 27 d.f. Confidence
interval for Variance
Transformation difference
t P on larger/sma
transformed
scale
None, raw data 1.28 0.21 -0.71 to
3.07 mm
Because the log transformation is the only one which gives useful
confidence intervals, I would use it unless it were clearly inadequate for
the data, and another transformation clearly superior. When this happens
we are reduced to a significance test only, with no meaningful estimate.
Malabsorption patients
Fig. 10.7. Calculation of the area under the curve for one subject
617.850 1377.975
Fig. 10.8. Normal plots for area under the curve and log area for the
data of Table 10.5
where m and n are the degrees of freedom (§7A). For Normal data the
distribution of a sample variance s2 from n observations is that of σ2χ2n/(n
- 1) and when we divide one estimate of variance by another to give the F
ratio, the σ2 cancels out. Like other distributions derived from the Normal,
the F distribution cannot be integrated and so we must use a table.
Because it has two degrees of freedom, the table is cumbersome,
covering several pages, and I shall omit it. Most F methods are done
using computer programs which calculate the probability directly. The
table is usually only given as the upper percentage points.
To test the null hypothesis, we divide the larger variance by the smaller.
For the skinfold data of §10.4, the variances are 5.860 with 19 degrees of
freedom for the Crohn's patients and 3.860 with 8 degrees of freedom for
the coeliacs, giving F = 5.860/3.860 = 1.52. The probability of this being
exceeded by the F distribution with 19 and 8 degrees of freedom is 0.3,
the 5% point of the distribution being 3.16, so there is no evidence from
these data that the variance of skinfold differs between patients with
Crohn's disease and coeliac disease.
HIV % HIV
Diarrhoea %lactulose
status Mannitol status
6 4 7 3
7 5 9 5
8 6 10 6
8 6 11 6
9 6 11 6
11 8 13 8
If the null hypothesis were true, the expected value of this ratio would be
1.0.
Source Degrees
Sum of Mean Variance
of of
squares square ratio (F)
variation freedom
Total 23 139.958
Source Degrees
Sum of Mean Variance
of of
squares square ratio (F
variation freedom
Total 58 1559.036
Source Degrees
Sum of Mean Variance
of of
squares square ratio (
variation freedom
Total 41 3506.57
from which. There are a number of ways of doing this, called multiple
comparisons procedures. These are mostly designed to give only one
type I error (§9.3) per 20 analyses when the null hypothesis is true, as
opposed to doing t tests for each pair of groups, which gives one error
per 20 comparisons when the null hypothesis is true. I shall not go into
details, but look at a couple of examples. There are several tests which
can be used when the numbers in each group are the same, Tukey's
Honestly Significant Difference, the Newman-Keuls sequential procedure
(both called Studentized range tests), Duncan's multiple range test, etc.
The one you use will depend on which computer program you have. The
results of the Newman-Keuls sequential procedure for the data of Table
10.8 are shown in Table 10.13. Group 1 is significantly different from
groups 2 and 4, and group 3 from groups 2 and 4. At the 1% level, the
only significant differences are between group 3 and groups 2 and 4.
Fig. 10.10. Plots of the lactulose data on the natural scale and after
square root and log transformation
Source Degrees
Sum of Mean Variance
of of
squares square ratio (F
variation freedom
Total 58 3.25441
HIV 3 0.42870 0.14290 2.78
status
more limited, Gabriel's test can be used with unequal-sized groups. For
the root transformed lactulose data, the results of Gabriel's test are
shown in Table 10.14. This shows that the AIDS subjects are significantly
different from the asymptomatic HIV+ patients and from the HIV-controls.
For the mannitol data, most multiple comparison procedures will give no
significant differences because they are designed to give only one type I
error per analysis of variance. When the F test is not significant, no group
comparisons will be either.
2 S 2 N
3 N S 3 N S
4 S N S 4 N N S
ARC N ARC N
HIV+ S N HIV+ N N
HIV- S N N HIV- N N
for is to estimate some variances. There are two different variances in the
data. One is between measurements on the same person, the within-
subject variance which we shall denote by σ2w. In this example the
within subject variance is the measurement error, and we shall assume it
is the same for everyone. The other is the variance between the subjects'
true or average pulse rates, about which the individual measurements for
a subject are distributed. This is the average of all possible
measurements for that subject, not the average of the two measurements
we actually have. This variance is the between-subjects variance and
we shall denote it by σ2b. A single measurement observed from a single
individual is the sum of the subject's true pulse rate and the
measurement error. Such measurements therefore have variance σ2b +
σ2w. We can estimate both these variances from the anova table.
1 46 42 16 34 36 31
2 50 42 17 30 36 32
3 39 37 18 35 45 33
4 40 54 19 32 34 34
5 41 46 20 44 46 35
6 35 35 21 39 42 36
7 31 44 22 34 37 37
8 43 35 23 36 38 38
9 47 45 24 33 34 39
10 48 36 25 34 35 40
11 32 46 26 51 48 41
12 36 34 27 31 30 42
13 37 30 28 30 31 43
14 34 36 29 42 43 44
15 38 36 30 39 35 45
Source Degrees
Sum of Mean Variance
of of
squares square ratio (F)
variation freedom
Total 89 3.054.99
For the example, s2w = 14.37 and s2b = (54.74 - 14.37)/2 = 20.19. Thus
the
Intervention group
Number of Number of
Percentage
requests requests
Total Conforming conforming Total
20 20 100 7
7 7 100 37
16 15 94 38
31 28 90 28
20 18 90 20
24 21 88 19
7 6 86 9
6 5 83 25
30 25 83 120
66 53 80 89
5 4 80 22
43 33 77 76
43 32 74 21
23 16 70 127
64 44 69 22
6 4 67 34
18 10 56 10
Mean 81.6
SD 11.9
Appendices
10A Appendix: The ratio mean/standard
error
As if by magic, we have our sample mean over its standard error. I shall
not bother to go into this detail for the other similar ratios which we shall
encounter. Any quantity which follows a Normal distribution with mean
zero (such as [x with bar above] - µ), divided by its standard error, will
follow a t distribution provided the standard error is based on one sum of
squares and hence is related to the Chi-squared distribution.
51. Which of the following conditions must be met for a valid t test
between the means of two samples:
(a) the numbers of observations must be the same in the two
groups;
(b) the standard deviations must be approximately the same in the
two groups;
(c) the means must be approximately equal in the two groups;
(d) the observations must be from approximately Normal
distributions;
(e) the samples must be small.
View Answer
Unsuccessful
Successful donors
donors
n Mean (sd) n Mean (sd)
Compliance
Patient pa(O2) (kPa)
(ml/cmH
Waveform Waveform
Constant Decelerating Constant Deceleratin
6. Calculate the 95% confidence interval for the log difference and
transform back to the original scale. What does this mean and how
does it compare to that based on the untransformed data?
View Answer
11
Regression and correlation
11.2 Regression
Regression is a method of estimating the numerical relationship
between
Returning to Figure 11.3, the equation of the line which minimizes the
sum of squared deviations from the line in the outcome variable is found
quite easily (§11A). The solution is:
We do not need the sum of squares for Y yet, but we shall later.
first and the first measurement on the second. The regression equations
are 2nd pulse = 17.3 + 0.572 × 1st pulse and 1st pulse = 14.9 + 0.598 ×
2nd pulse. Each regression coefficient is less than one. This means that
for subjects with any given first pulse measurement, the predicted second
pulse measurement will be closer to the mean than the first
measurement, and for any given second pulse measurement, the
predicted first measurement will be closer to the mean than the second
measurement. This is regression towards the mean (§11.2). Regression
towards the mean is a purely statistical phenomenon, produced by the
selection of the given value of the predictor and the imperfect relationship
between the variables. Regression towards the mean may manifest itself
in many ways. For example, suppose we measure the blood pressure of
an unselected group of people and then select subjects with high blood
pressure, e.g. diastolic >95 mm Hg. If we then measure the selected
group again, the mean diastolic pressure for the selected group will be
less on the second occasion than on the first, without any intervention or
treatment. The apparent fall is caused by the initial selection.
If we are to estimate the variation about the line, we must assume that it
is the same all the way along the line, i.e. that the variance is uniform.
This is the same as for the two-sample t method (§10.3) and analysis of
variance (§10.9). For the
FEV1 data the sum of squares due to the regression is 0.074 3892 ×
576.352 = 3.18937 and the sum of squares about the regression is
9.43868 - 3.18937 = 6.24931. There are 20 - 2 = 18 degrees of freedom,
so the variance about the regression is s2 = 6.2493/18 = 0.34718. The
standard error of b is given by
From Table 10.1 this has two-tailed probability of less than 0.01. The
computer tells us that the probability is about 0.007. Hence the data are
inconsistent with the null hypothesis and the data provide fairly good
evidence that a relationship exists. If the sample were much larger, we
could dispense with the t distribution and use the Standard Normal
distribution in its place.
and allow for differences in mean height between the groups. We may
wish to do this to compare patients with respiratory disease on different
therapies, or to compare subjects exposed to different environmental
factors, such as air pollution, cigarette smoking, etc.
Fig. 11.6. Confidence intervals for the regression estimate
We need not go into the algebraic details of this. It is very similar to that
in §11C. For x = 177 we have
This gives a 95% confidence interval of 3.98 - 2.10 × 0.138 to 3.98 + 2.10
× 0.138 giving from 3.69 to 4.27 litres. Here 3.98 is the estimate and 2.10
is the 5% point of the t distribution with n - 2 = 18 degrees of freedom.
The standard error is a minimum at x = [x with bar above], and increases
as we move away from [x with bar above] in either direction. It can be
useful to plot the standard error and 95% confidence interval about the
line on the scatter diagram. Figure 11.6 shows this for the FEV1 data.
Notice that the lines diverge considerably as we reach the extremes of
the data. It is very dangerous to extrapolate beyond the data. Not only do
the standard errors become very wide, but we often have no reason to
suppose that the straight line relationship would persist.
The intercept a, the predicted value of Y when X = 0, is a special case of
this. Clearly, we cannot actually have a medical student of height zero
and with FEV1 of -9.19 litres. Figure 11.6 also shows the confidence
interval for the regression estimate with a much smaller scale, to show
the intercept. The confidence interval is very wide at height = 0, and this
does not take account of
For a student with a height of 177cm. the predicted FEVl is 3.98 litres,
with standard error 0.61 litres. Figure 11.7 shows the precision of the
prediction of a further observation. As we might expect, the 95%
confidence intervals include all but one of the 20 observations. This is
only going to be a useful prediction when the residual variance s2 is
small.
We can also use the regression equation of Y on X to predict X from Y.
This is much less accurate than predicting Y from X. The standard errors
are
Figure 11.9 shows something else, however. One point stands out as
having a rather larger residual than the others. This may be an outlier, a
point which may well come from a different population. It is often difficult
to know what to do with such data. At least we have been warned to
double check this point for transcription errors. It is all too easy to
transpose adjoining digits when transferring data from one medium to
another. This may have been the case here, as an FEV1 of 4.53, rather
than the 5.43 recorded, would have been more in line with the rest of the
data. If this happened at the point of recording, there is not much we can
do about it. We could try to measure the subject again, or exclude him
and see whether this makes any difference. I think that, on the whole, we
should work with all the data unless there are very good reasons for not
doing so. I have retained this case here.
11.9 Correlation
The regression method tells us something about the nature of the
relationship between two variables, how one changes with the other, but
it does not tell us how close that relationship is. To do this we need a
different coefficient, the correlation coefficient. The correlation coefficient
is based on the sum of products about the mean of the two variables, so I
shall start by considering the properties of the sum of products and why it
is a good indicator of the closeness of the relationship.
Figure 11.11 shows the scatter diagram of Figure 11.1 with two new axes
drawn through the mean point. The distances of the points from these
axes represent the deviations from the mean. In the top right section of
Figure 11.11, the deviations from the mean of both variables, FEV1 and
height, are positive. Hence, their products will be positive. In the bottom
left section, the deviations from the mean of the two variables will both be
negative. Again, their product will be positive. In the top left section of
Figure 11.11, the deviations of FEV1 from its mean will be positive, and
the deviation of height from its mean will be negative. The product of
these will be negative. In the bottom right section, the product will again
be negative. So in Figure 11.11 nearly all these products will be positive,
and their sum will be positive. We say that there is a positive correlation
between the two variables; as one increases so does the other. If one
variable decreased as the other increased, we would have a scatter
diagram where most of the points lay in the top left and bottom right
sections. In this
case the sum of the products would be negative and there would be a
negative correlation between the variables. When the two variables are
not related, we have a scatter diagram with roughly the same number of
points in each of the sections. In this case, there are as many positive as
negative products, and the sum is zero. There is zero correlation or no
correlation. The variables are said to be uncorrelated.
Fig. 11.11. Scatter diagram with axes through the mean point
The value of the sum of products depends on the units in which the two
variables are measured. We can find a dimensionless coefficient if we
divide the sum of products by the square roots of the sums of squares of
X and Y. This gives us the product moment correlation coefficient, or
the correlation coefficient for short, usually denoted by r.
If the n pairs of observations are denoted by (xi,yi), then r is given by
For the FEV1 and height we have
The effect of dividing the sum of products by the root sum of squares of
deviations of each variable is to make the correlation coefficient lie
between -1.0 and +1.0. When all the points lie exactly on a straight line
such that Y increases as X increases, r = 1. This can be shown by putting
a + bxi in place of yi in the equation for r; everything cancels out leaving r
= 1. When all the points lie exactly on a straight line with negative slope, r
= -1. When there is no relationship at all, r = 0, because the sum of
products is zero. The correlation coefficient describes the closeness of
the linear relationship between two variables. It does not matter which
variable we take to be Y and which to be X. There is no choice of
predictor and outcome variable, as there is in regression.
Fig. 11.12. Data where the correlation coefficient may be misleading
n 5% 1% n 5% 1% n 5%
n = Number of observations.
Even when X and Y are both Normally distributed, r does not itself
approach a Normal distribution until the sample size is in the thousands.
Furthermore, its distribution is rather sensitive to deviations from the
Normal in X and Y. However, if both variables are from Normal
distributions, Fisher's z transformation gives a Normally distributed
variable whose mean and variance are known in terms of the population
correlation coefficient which we wish to estimate. From this a confidence
interval can be found. Fisher's z transformation is
x y x y x y
47 51 49 52 51 46
46 53 50 56 46 48
50 57 42 46 46 47
52 54 48 52 45 55
46 55 60 53 52 49
36 53 47 49 54 61
47 54 51 52 48 53
46 57 57 50 47 48
36 61 49 50 47 50
44 57 49 49 54 44
We only have four subjects and only four points. By using the repeated
data, we are not increasing the number of subjects, but the statistical
calculation is done as if we have, and so the number of degrees of
freedom for the significance test is incorrectly increased and a spurious
significant correlation produced.
There are two simple ways to approach this type of data, and which is
chosen depends on the question being asked. If we want to know
whether subjects with a high value of X tend to have a high value of Y
also, we use the subject means and find the correlation between them. If
we have different numbers of observations for each subject, we can use
a weighted analysis, weighted by the number of observations for the
subject. If we want to know whether changes in one variable in the same
subject are parallelled by changes in the other, we need to use multiple
regression, taking subjects out as a factor (§17.1, §17.6). In either
replicates the order in which they were made is not important. Another is
in the design of cluster-randomized trials where the group is the cluster
and may have hundreds of observations within it (§18.8).
Appendices
11A Appendix: The least squares
estimates
This gives us
VAR(yi) is the same for all yi, say VAR(yi) = s2. Hence
3. Find the standard error for the difference between the slopes,
which are independent. Calculate a 95% confidence interval for the
difference.
View Answer
4. Use the standard error to test the null hypothesis that the slopes
are the same in the population from which these data come.
View Answer
Fig. 11.16. PEFR and height for female and male medical students
Females
Ht PEFR Ht PEFR Ht PEFR Ht PEFR
172 550
172 620
174 550
174 550
174 616
Females Males
Number 43 58
12
Methods based on rank order
A 7 4 9 17
B 11 6 21 14
We want to know whether there is any evidence that A and B are drawn
from populations with different levels of the variable. The null hypothesis
is that there is no tendency for members of one population to exceed
members of the other. The alternative is that there is such a tendency, in
one direction or the other. First we arrange the observations in ascending
order, i.e. we rank them:
4 6 7 9 11 14 17 21
A B A A B B A B
We now choose one group, say A. For each A, we count how many Bs
precede it. For the first A, 4, no Bs precede. For the second, 7, one B
precedes, for the third A, 9, one B, for the fourth, 17, three Bs. We add
these numbers of preceding Bs together to give U = 0 + 1 + 1 + 3 = 5.
Now, if U is very small, nearly all the As are less than nearly all the Bs. If
U is large, nearly all As are greater than nearly all Bs. Moderate values of
U mean that As and Bs are mixed. The minimum U is 0, when all Bs
exceed all As, and maximum U is n1 × n2 when all As exceed all Bs. The
magnitude of U has a meaning, because U/n1n2 is an estimate of the
probability that an observation drawn at random from population A would
exceed an observation drawn at random from population B.
There is another possible U, which we will call U′, obtained by counting
the number of As before each B, rather than the number of Bs before
each A. This would be 1 + 3 + 3 + 4 = 11. The two possible values of U
and U′ are related by U + U′ = n1n2. So we subtract U′ from n1n2 to give 4
× 4 - 11 = 5.
If we know the distribution of U under the null hypothesis that the
samples come from the same population, we can say with what
probability these data could have arisen if there were no difference. We
can carry out the test of significance. The distribution of U under the null
hypothesis can be found easily. The two sets of four observations can be
arranged in 70 different ways, from AAAABBBB to BBBBAAAA (8!/4!4! =
70, §6A). Under the null hypothesis these arrangements are all equally
likely and, hence, have probability 1/70. Each has its value of U, from 0 to
16, and by counting the number of arrangements which give each value
of U we can find the probability of that value. For example, U = 0 only
arises from the order AAAABBBB and so has probability 1/70 = 0.014. U
= 1 only arises from AAABABBB and so has probability 1/70 = 0.014
also. U = 2 can arise in two ways: AAABBABB and AABAABBB. It has
probability 2/70 = 0.029. The full set of probabilities is shown in Table
12.1.
We apply this to the example. For groups A and B, U = 5 and the
probability of this is 0.071. As we did for the sign test (§9.2) we consider
the probability of more extreme values of U, U = 5 or less, which is 0.071
+ 0.071 + 0.043 + 0.029 + 0.014 + 0.014 = 0.242.
This gives a one sided test. For a two-sided test, we must consider the
probabilities of a difference as extreme in the opposite direction. We can
see from Table 12.1 that the distribution of U is symmetrical, so the
probability of an equally extreme value in the opposite direction is also
0.242, hence the two-sided probability is 0.242 + 0.242 = 0.484. Thus the
observed difference would have been quite probable if the null
hypothesis were true and the two samples could have come from the
same population.
5 0.071 11 0.071
2 - - - - - - 0 0 0
3 - - - 0 1 1 2 2 3
4 - - 0 1 2 3 4 4 5
5 - 0 1 2 3 5 6 7 8
6 - 1 2 3 5 6 8 10 11
7 - 1 3 5 6 8 10 12 14
8 0 2 4 6 8 10 13 15 17
9 0 2 4 7 10 12 15 17 20
10 0 3 5 8 11 14 17 20 23
11 0 3 6 9 13 16 19 23 26
12 1 4 7 11 14 18 22 26 29
13 1 4 8 12 16 20 24 28 33
14 1 5 9 13 17 22 26 31 36
15 1 5 10 14 19 24 29 34 39
16 1 6 11 15 21 26 31 37 42
17 2 6 11 17 22 28 34 39 45
18 2 7 12 18 24 30 36 42 48
19 2 7 13 19 25 32 38 45 52
20 2 8 13 20 27 34 41 48 55
We can now turn to the practical analysis of some real data. Consider the
biceps skinfold thickness data of Table 10.4, reproduced as Table 12.3.
We will analyse these using the Mann-Whitney U test. Denote the
Crohn's disease group by A and the coeliac group by B. The joint order is
as follows:
skinfold 1.8 1.8 2.0 2.0 2.0 2.2 2.4 2.5 2.8 2.8
group A B B B B A A A A A
r1 r2 r3 r4
skinfold 3.0 3.2 3.6 3.8 3.8 4.0 4.2 4.2 4.4 4.8
group B A A A B A A B A A
rank 11 12 13 14.5 14.5 16 17.5 17.5 19 20
r5 r6 r7
skinfold 5.4 5.6 6.0 6.2 6.6 7.0 7.6 10.0 10.4
group B A A A A A B A A
rank 21 22 23 24 25 26 27 28 29
r8 r9
We denote the ranks of the B group by r1,r2,…,rn1. The number of As
preceding the first B must be r1 - 1, since there are no Bs before it and it
is the r1th observation. The number of As preceding the second B is r2 -
2, since it is the r2th observation, and one preceding observation is a B.
Similarly, the number preceding the third B is r3 - 3, and the number
preceding the ith B is ri - i. Hence we have:
That is, we add together the ranks of all the n1 observations, subtract
n1(n1 + 1)/2 and we have U. For the example, we have
From Table 7.1 this gives two-sided probability = 0.15, similar to that
found by the two sample t test (§10.3).
Neither Table 12.2 nor the above formula for the standard deviation of U
take ties into account; both assume the data can be fully ranked. Their
use for data with ties is an approximation. For small samples we must
accept this. For the Normal approximation, ties can be allowed for using
the following formula for the standard deviation of U when the null
hypothesis is true:
The Mann-Whitney U test is a non-parametric analogue of the two
sample t test. The advantage over the t test is that the only assumption
about the distribution of the data is that the observations can be ranked,
whereas for the t test we must assume the data are from Normal
distributions with uniform variance. There are disadvantages. For data
which are Normally distributed, the U test is less powerful than the t test,
i.e. the t test, when valid, can detect
0 291 0 66
1 43 1 22
2 16 2 7
3 20 3 7
4 13 4 2
5 3 5 4
6 1 6 4
7 4 7 3
8 3 8 3
9 1 9 2
10 1 10 2
11 2 12 2
12 1 13 1
15 1 15 1
16 1 16 1
17 2 20 1
18 2
20 1
27 1
33 1
Median 0 0
75%ile 1 3
2 0 2 1.5 1.5
17 15 2 1.5 1.5
3 0 3 3 3
7 2 5 4 4
8 1 7 6 6
14 7 7 6 6
23 16 7 6 6
34 25 9 8 8
79 65 14 9 9
60 41 19 10 10
71 29 42 12 12
Sum of 67
ranks
First, we rank the differences by their absolute values, i.e. ignoring the
sign. As in §12.2, tied observations are given the average of their ranks.
We now sum the ranks of the positive differences, 67, and the ranks of
the negative differences, 11 (Table 12.5). If the null hypothesis were true
and there was no difference, we would expect the rank sums for positive
and negative differences to be about the same, equal to 39 (their
average). The test statistic is the lesser of these sums, T. The smaller T
is, the lower the probability of the data arising by chance.
The distribution of T when the null hypothesis is true can be found by
enumerating all the possibilities, as described for the Mann–Whitney U
statistic. Table 12.6 gives the 5% and 1% points for this distribution, for
sample size n up to 25. For the example, n = 12 and so the difference
would be significant at the 5% level if T were less than or equal to 14. We
have T = 11, so the data are not consistent with the null hypothesis. The
data support the view that there is a real tendency for patients to have
fewer attacks while on the active treatment.
From Table 12.6, we can see that the probability that T ≤ 11 lies between
0.05 and 0.01. This is greater than the probability given by the sign test,
which was 0.006 (§9.2). Usually we would expect greater power, and
hence lower probabilities when the null hypothesis is false, when we use
more of the information. In this case, the greater probability reflects the
fact that the one negative difference, -25, is large. Examination of the
original data shows that this individual had very large numbers of attacks
on both treatments, and it seems possible that he may belong to a
different population from the other eleven.
Like Table 12.2, Table 12.6 is based on the assumption that the
differences can be fully ranked and there are no ties. Ties may occur in
two ways in this
test. Firstly, ties may occur in the ranking sense. In the example we had
two differences of +2 and three of +7. These were ranked equally: 1.5
and 1.5. and 6, 6 and 6. When ties are present between negative and
positive differences, Table 12.6 only approximates to the distribution of T.
5 - - 16 30 19
6 1 - 17 35 23
7 2 - 18 40 28
8 4 0 19 46 32
9 6 2 20 52 37
10 8 3 21 59 43
11 11 5 22 66 49
12 14 7 23 73 55
13 17 10 24 81 61
14 21 13 25 90 68
15 25 16
Ties may also occur between the paired observations, where the
observed difference is zero. In the same way as for the sign test, we omit
zero differences (§9.2). Table 12.6 is used with n as the number of non-
zero differences only, not the toal number of differences. This seems odd,
in that a lot of zero differences would appear to support the null
hypothesis. For example, if in Table 12.5 we had another dozen patients
with zero differences, the calculation and conclusion would be the same.
However, the mean difference would be smaller and the Wilcoxon test
tells us nothing about the size of the difference, only its existence. This
illustrates the danger of allowing significance tests to outweigh all other
ways of looking at the data.
From Table 7.1 this gives a two-tailed probability of 0.028, similar to that
obtained from Table 12.6.
We have three possible tests for paired data, the Wilcoxon, sign and
paired t methods. If the differences are Normally distributed, the t test is
the most powerful test. The Wilcoxon test is almost as powerful, however,
and in practice the difference is not great except for small samples. Like
the Mann–Whitney U test, the Wilcoxon is useless for very small
samples. The sign test is similar in power to the Wilcoxon for very small
samples, but as the sample size increases the Wilcoxon test becomes
much more powerful. This might be expected since the Wilcoxon test
uses more of the information. The Wilcoxon test uses the magnitude of
the differences, and hence requires interval data. This means that, as for
t methods, we will get different results if we transform the data. For truly
ordinal data we should use the sign test. The paired t method also gives
a confidence interval for the difference. The Wilcoxon test is purely a test
of significance, but a confidence interval for the median difference can be
found using the Binomial method described in §8.9.
4 - -
5 1.00 -
6 0.89 1.00
7 0.82 0.96
8 0.79 0.93
9 0.70 0.83
10 0.68 0.81
We have ignored the problem of ties in the above. We treat observations
with the same value as described in §12.2. We give them the average of
the ranks they would have if they were separable and apply the rank
correlation formula as described above. In this case the distribution of
Table 12.8 is only approximate.
There are several ways of calculating this coefficient, resulting in
formulae which appear quite different, though they give the same result
(see Siegel 1956).
observe whether the subjects are ordered in the same way by the two
variables, a concordant pair, ordered in opposite ways, a discordant
pair, or equal for one of the variables and so not ordered at all, a tied
pair. Kendall's τ is the proportion of concordant pairs minus the
proportion of discordant pairs. τ will be one if the rankings are identical,
as all pairs will be ordered in the same way, and minus one if the
rankings are exactly opposite, as all pairs will be ordered in the opposite
way.
We shall denote the number of concordant pairs (ordered the same way)
by nc, the number of discordant pairs (ordered in opposite ways) by nd,
and the difference, nc - nd, by S. There are n(n - 1)/2 pairs altogether, so
with the same value of Y by u, then the number of pairs which can
contribute to S is n(n - 1)/2 - Σ u(u - 1)/2. We now define τb by
Note that if there are no ties, Σt (t -1)/2 = 0 = Σ. When the rankings are
identical τb = 1, no matter how many ties there are. Kendall (1970) also
discusses two other ways of dealing with ties, obtaining coefficients τa
and τc, but their use is restricted.
We often want to test the null hypothesis that there is no relationship
between the two variables in the population from which our sample was
drawn. As usual, we are concerned with the probability of S being as or
more extreme (i.e. far from zero) than the observed value. Table 12.9
was calculated in the same way as Tables 12.1 and 12.2. It shows the
probability of being as extreme as the observed value of S for n up to 10.
For convenience, S is tabulated rather than τ. When ties are present this
is only an approximation.
When the sample size is greater than 10, S has an approximately Normal
distribution under the null hypothesis, with mean zero. If there are no ties,
the variance is
When there are ties, the variance formula is very complicated (Kendall
1970). I shall omit it, as in practice these calculations will be done using
computers anyway. If there are not many ties it will not make much
difference if the simple form is used.
For the example, S = 40, n = 17 and there are no ties, so the Standard
Normal variate is
From Table 7.1 of the Normal distribution we find that the two-sided
probability of a value as extreme as this is 0.06 × 2 = 0.12, which is very
similar to that found using Spearman's ρ. The product moment
correlation, r, gives r = 0.30, P = 0.24, but of course the non-Normal
distributions of the variables make this P invalid.
Why have two different rank correlation coefficients? Spearman's ρ is
older than Kendall's τ, and can be thought of as a simple analogue of the
product moment correlation coefficient, Pearson's r. τ is a part of a more
general and consistent system of ranking methods, and has a direct
interpretation, as the difference between the proportions of concordant
and discordant pairs. In general,
5% 1%
4 - -
5 10 -
6 13 15
7 15 19
8 18 22
9 20 26
10 23 29
make the observed value of the statistic closer to its expected value by
half of the interval between adjacent discrete values. This is a continuity
correction.
66. Ten men with angina were given an active drug and a placebo on
alternate days in random order. Patients were tested using the time
in minutes for which they could exercise until angina or fatigue
stopped them. The existence of an active drug effect could be
examined by:
(a) paired t test;
(b) Mann-Whitney U test;
(c) sign test;
(d) Wilcoxon matched pairs test;
(e) Spearman's ρ.
View Answer
13
The analysis of cross-tabulations
the second row, first column and 1344 × 258/1443 = 240.3 in the second
row, second column. We calculate the expected frequency for each row
and column combination, or cell. The 10 cells of Table 13.1 give us the
expected frequencies shown in Table 13.2. Notice that the row and
column totals are the same as in Table 13.1. In general, the expected
frequency for a cell of the contingency table is found by
It does not matter which variable is the row and which the column.
Other 3 36 39
Total 99 1344 1443
As will be explained in §13A, the distribution of this test statistic when the
null hypothesis is true and the sample is large enough is the Chi-squared
distribution (§7A) with (r - 1)(c - 1) degrees of freedom, where r is the
number of rows and
c is the number of columns. I shall discuss what is meant by ‘large
enough’ in §13.3. We are treating the row and column totals as fixed and
only considering the distribution of tables with these totals. The test is
said to be conditional on these totals. We can prove that we lose very
little information by doing this and we get a simple test.
No
Bronchitis Total
bronchitis
Now the null hypothesis ‘no association between cough and bronchitis’ is
the same as the null hypothesis ‘no difference between the proportions
with cough in the bronchitis and no bronchitis groups’. If there were a
difference, the variables would be associated. Thus we have tested the
same null hypothesis in two different ways. In fact these tests are exactly
equivalent. If we take the Normal deviate from §9.8, which was 3.49, and
square it, we get 12.2, the chi-squared value. The method of §9.8 and
§8.6 has the advantage that it can also give us a confidence interval for
the size of the difference, which the chi-squared method does not. Note
that the chi-squared test corresponds to the two-sided z test, even
though only the upper tail of the chi-squared distribution is used.
chi-squared test which does not satisfy the criterion is always open to the
charge that its validity is in doubt.
Streptomycin Control
Radiological
assessment Observed Expected Observed
Improvement 13 8.4 5
Deterioration 2 4.2 7
Death 0 2.3 5
Total 15 15 17
If the criterion is not satisfied we can usually combine or delete rows and
columns to give bigger expected values. Of course, this cannot be done
for 2 by 2 tables, which we consider in more detail below. For example,
Table 13.6 shows data from the MRC streptomycin trial (§2.2), the results
of radiological assessment for a subgroup of patients defined by a
prognostic variable. We want to know whether there is evidence of a
streptomycin effect within this subgroup, so we want to test the null
hypothesis of no effect using a chi-squared test. There are 4 out of 6
expected values less than 5, so the test on this table would not be valid.
We can combine the rows so as to raise the expected values. Since the
small expected frequencies are in the ‘deterioration’ and ‘death’ rows, it
makes sense to combine these to give a ‘deterioration or death’ row. The
expected values are then all greater than 5 and we can do the chi-
squared test with 1 degree of freedom. This editing must be done with
regard to the meaning of the various categories. In Table 13.6, there
would be no point in combining rows 1 and 3 to give a new category of
‘considerable improvement or death’ to be compared to the remainder, as
the comparison would be absurd. The new table is shown in Table 13.7.
We have
Under the null hypothesis this is from a Chi-squared distribution with one
degree of freedom, and from Table 13.3 we can see that the probability of
getting a value as extreme as 10.8 is less than 1%. We have data
inconsistent with the null hypothesis and we can conclude that the
evidence suggests a treatment effect in this subgroup.
If the table does not meet the criterion even after reduction to a 2 by 2
table, we can apply either a continuity correction to improve the
approximation to the Chi-squared distribution (§13.5), or an exact test
based on a discrete distribution (§13.4).
Improvement 13 8.4 5
Deterioration 2 6.6 12
or death
Total 15 15.0 17
Treatment A 3 1 4
Treatment B 2 2 4
Total 5 3 8
i. S D T
A 4 0 4
B 1 3 4
T 5 3 8
S D T
A 3 1 4
ii
B 2 2 4
T 5 3 8
S D T
A 2 2 4
iii.
B 3 1 4
T 5 3 8
S D T
A 1 3 4
iv.
B 4 0 4
T 5 3 8
(See §6A for the meaning of n!.) We can calculate this for each possible
table so find the probability for the observed table and each more
extreme one. For the example:
is, give rather larger probabilities than they should, though this is a matter
of debate. My own opinion is that Yates' correction and Fisher's exact test
should be used. If we must err, it seems better to err on the side of
caution.
Total
a b a + b
c d c + d
Total a + c b + d a + b + c + d
This can vary from minus infinity to plus infinity and thus is very useful in
fitting regression type models (§17.8). The logit is zero when p = 1/2 and
the logit of 1 - p is minus the logit of p:
Consider Table 13.4. The probability of cough for children with a history
of bronchitis is 26/273 = 0.09524. The odds of cough for children with a
history of bronchitis is 26/247 = 0.10526. The probability of cough for
children without a history of bronchitis is 44/1046 = 0.04207. The odds of
cough for children without a history of bronchitis is 44/1002 = 0.04391.
One way to compare children with and without bronchitis is to find the
ratio of the proportions of children with cough in the two groups (the
relative risk, §8.6). Another is to find the odds ratio, the ratio of the odds
of cough in children with bronchitis and children without bronchitis. This is
(26/247)/(44/1002) = 0.10526/0.04391 = 2.39718. Thus the odds of
cough in children with a history of bronchitis is 2.397 18 times the odds of
cough in children without a history of bronchitis.
If we denote the frequencies in the table by a,b,c. and d, as in Table
13.10, the odds ratio is given by
We can estimate the standard error and confidence interval using the log
of the odds ratio (§13C). The standard error of the log odds ratio is:
Hence we can find the 95% confidence interval. For Table 13.4, the log
odds ratio is loge (2.39718) = 0.87429, with standard error
Provided the sample is large enough, we can assume that the log odds
ratio comes from a Normal distribution and hence the approximate 95%
confidence interval is
0.87429 - 1.96 × 0.25736 to 0.87429 + 1.96 × 0.25736 = 0.36986 to
1.37872
To get a confidence interval for the odds ratio itself we must antilog:
The odds ratio can be used to estimate the relative risk in a case-control
study. The calculation of relative risk in §8.6 depended on the fact that we
could estimate the risks. We could do this because we had a prospective
study and so knew how many of the risk group developed the symptom.
This cannot be done if we start with the outcome, in this case cough at
age 14, and try to work back to the risk factor, bronchitis, as in a case–
control study.
Table 13.11 shows data from a case–control study of smoking and lung
cancer (see §3.8). We start with a group of cases, patients with lung
cancer and a group of controls, here hospital patients without cancer. We
cannot calculate risks (the column totals would be meaningless and have
been omitted), but we can still estimate the relative risk.
Suppose the prevalence of lung cancer is p, a small number, and the
table is as Table 13.10. Then we can estimate the probability of both
having lung cancer and being a smoker by pa/(a + b), because a/(a + b)
is the conditional probability of smoking in lung cancer patients (§6.8).
Similarly, the probability of being a smoker without lung cancer is (1-
p)c/(c + d). The probability of being a smoker is therefore pa/(a + b) + (1 -
p)c/(c + d), the probability of being a smoker with lung cancer plus the
probability of being a smoker without lung cancer. Because p is much
smaller than 1 - p, the first term can be ignored and
Thus the risk of lung cancer in smokers is about 14 times that of non-
smokers. This is a surprising result from a table with so few non-smokers,
but a direct estimate from the cohort study (Table 3.1) is 0.90/0.07 = 12.9,
which is very similar. The log odds ratio is 2.64210 and its standard error
is
Boy's smoking
Non-smoker Occasional Regular
estimate
Note that it does not matter which variable is X and which is Y. The sums
of squares and products are easy to work out. For example, for the
column variable, X, we have 1303 individuals with X=1, 1372 with X=2
and 172 with X=3. For our data we have
Similarly, Σy2i = 9165 and Σyi = 4953;
= 59.47
If the null hypothesis is true, χ2i is an observation from the Chi-squared
distribution with 1 degree of freedom. The value 59.47 is highly unlikely
from this distribution and the trend is significant.
There are several points to note about this method. The choice of values
for X and Y is arbitrary. By putting X = 1, 2 or 3 we assumed that the
difference between non-smokers and occasional smokers is the same as
that between occasional smokers and smokers. This need not be so and
a different choice of X would give a different chi-squared for trend
statistic. The choice is not critical, however. For example, putting X = 1, 2
or 4, so making regular smokers more different from occasional smokers
than occasional smokers are from non-smokers, we get x2 for trend to be
64.22. The fit to the data is rather better, but the conclusions are
unchanged.
The trend may be significant even if the overall contingency table chi-
squared is not. This is because the test for trend has greater power for
detecting trends than has the ordinary chi-squared test. On the other
hand, if we had an association where those who were occasional
smokers had far more symptoms than either non-smokers or regular
smokers, the trend test would not detect it. If the hypothesis we wish to
test involves the order of the categories, we should use the trend test, if it
does not we should use the contingency table test of §13.1. Note that the
trend test statistic is always less than the overall chi-squared statistic.
The distribution of the trend chi-squared statistic depends on a large
sample regression model, not on the theory given in §13A. The table
does not have to meet Cochran's rule (§13.3) for the trend test to be
valid. As long as there are at least 30 observations the approximation
should be valid.
Some computer programs offer a slightly different test, the Mantel–
Haenzsel trend test (not to be confused with the Mantel–Haenzsel
method for combining 2 by 2 tables, §17.11). This is almost identical to
the method described here. As an alternative to the chi-squared test for
trend, we could calculate Kendall's rank correlation coefficient, τb,
between X and Y (§12.5). For Table 13.12 we get τb = -0.136 with
standard error 0.018. We get a χ21 statistic by (τb/SE(τb))2 = 57.09. This is
very similar to the X2 for trend value 59.47.
to improve our analysis by taking into account the fact that this is the
same sample. We might expect, for instance, that symptoms on the two
occasions will be related.
This can be referred to Table 13.3 with one degree of freedom and is
clearly highly significant. There was a difference between the two ages.
As there was no change in any of the other symptoms studied, we
thought that this was possibly due to an epidemic of upper respiratory
tract infection just before the second questionnaire.
There is a continuity correction, again due to Yates. If the observed
frequency fyn increases by 1, fny decreases by 1 and fyn - fny increases by
2. Thus half the difference between adjacent possible values is 1 and we
make the observed difference nearer to the expected difference (zero) by
1. Thus the continuity corrected test statistic is
where |fyn - fny| is the absolute value, without sign. For Table 13.13:
There is very little difference because the expected values are so large
but if the expected values are small, say less than 20, the correction is
advisable. For small samples, we can also take fny as an observation
from the Binomial distribution with p = ½ and n = fyn + fny and proceed as
for the sign test (§9.2).
We can find a confidence interval for the difference between the
proportions. The estimated difference is p1 - p2 = (fyn - fyn)/n. We
rearrange this:
was the same. When the categories are not ordered, as Table 13.1 there
is a test due to Stuart (1955), described by Maxwell (1970). The test is
difficult to do and the situation is very unusual, so I shall omit details. My
free program Clinstat will do it (§1.3).
Table 13.14. Parity of 125 women attending
antenatal clinics at St. George's Hospital,
with the calculation of the chi-squared
goodness of fit test
We can also find an odds ratio for the matched table, called the
conditional odds ratio. Like McNemar's method, it uses the frequencies
in the off diagonal only. The estimate is very simple: fyn/fny. Thus for Table
13.13 the odds of having severe colds at age 12 is 144/256 = 0.56 times
that at age 14. This example is not very interesting, but the method is
particularly useful in matched case–control studies, where it provides an
estimate of the relative risk. A confidence interval is provided in the same
way as for the difference between proportions. We can estimate p =
fyn/(fyn + fny) and then the odds ratio is given by p/(1 - p). For the
example, p = 144/400 = 0.36 and turning p back to the odds ratio p/(1 - p)
= 0.36/(1 - 0.36) = 0.56 as before. The 95% confidence interval for p is
0.313 to 0.4071, as above. Hence the 95% confidence interval for the
conditional odds ratio is 0.31/(1 - 0.31) = 0.45 to 0.41/(1 - 0.41) = 0.69.
00.01– 21 12.01– 34
02.00 14.00
02.01– 16 14.01– 59
04.00 16.00
04.01– 22 16.01– 44
06.00 18.00
06.01– 104 18.01– 51
08.00 20.00
08.01– 95 20.01– 32
10.00 22.00
10.01– 66 22.01– 10
12.00 24.00
Appendices
13A Appendix: Why the chi-squared test
works
a given cell of the table would be from a Poisson distribution and the set
of Poisson variables corresponding to the cell frequency would be
independent of one another. Our table is one set of samples from these
Poisson distributions. However, we do not know the expected values of
these distributions under the null hypothesis; we only know their
expected values if the table has the row and column totals we observed.
We can only consider the subset of outcomes of these variables which
has the observed row and column totals. The test is said to be
conditional on these row and column totals.
f11 f12 r1
f21 f22 r2
Total c1 c2 n
The mean and variance of a Poisson variable are equal (§6.7). If the null
hypothesis is true, the means of these variables will be equal to the
expected frequency calculated in §13.1. Thus O, the observed cell
frequency, is from a Poisson distribution with mean E, the expected cell
frequency, and standard deviation √E. Provided E is large enough, this
Poisson distribution will be approximately Normal. Hence (O - E)/√E is
from a Normal distribution mean 0 and variance 1. Hence if we find
can happen in
If an event happens a times and does not happen b times, the log odds is
loge(a/b) - loge(a) - loge(b). The frequencies a and b are from
independent Poisson distributions with means estimated by a and b
respectively. Hence their variances are estimated by 1/a and 1/b
respectively. The variance of the log odds is given by
The log odds ratio is the difference between the log odds:
The variance of the log odds ratio is the sum of the variances of the log
odds and for table 2 we have
Child's report
Parents' report Total
Yes No
0 3 42 1
1–3 11 3 51
4–5 5 1 70
6–7 10 1 140
2. How many admissions were there during the heatwave and in the
correspond ing period of 1982? Would this be sufficient evidence to
conclude that heatwaves produce an increase in admissions?
View Answer
3. We can use the periods before and after the heatwave weeks as
controls for changes in other factors between the years. Divide the
years into three periods, before, during, and after the heatwave and
set up a two-way table showing numbers of admissions by period
and year.
View Answer
4. We can use this table to test for a heatwave effect. State the null
hypothesis and calculate the frequencies expected if the null
hypothesis were true.
View Answer
14
Choosing the statistical method
Ratio scales
The ratio of two quantities has a meaning, so we can say that one
observation is twice another. Human height is a ratio scale. Ratio scales
Interval scales
The interval or distance between points on the scale has precise
meaning, a change of one unit at one scale point is the same as a
change of one unit at another. For example, temperature in °C is an
interval scale, though not a ratio scale because the zero is arbitrary. We
can add and subtract on an interval scale. All ratio scales are also
interval scales. Interval scales allow us to calculate means and
variances, and to find standard errors and confidence intervals for these.
Ordinal scale
The scale enables us to order the subjects, from that with the lowest
value to that with the highest. Any ties which cannot be ordered are
assumed to be because the measurement is not sufficiently precise. A
typical example would be an anxiety score calculated from a
questionnaire. A person scoring 10 is more anxious than a person
scoring 8, but not necessarily higher by the same amount that a person
scoring 4 is higher than a person scoring 2.
Nominal scale
We can group subjects into categories which need not be ordered in any
way. Eye colour is measured on a nominal scale.
Dichotomous scales
Subjects are grouped into only two categories, for example: survived or
died. This is a special case of the nominal scale.
Clearly these classes are not mutually exclusive, and an interval scale is
also ordinal. Sometimes it is useful to apply methods appropriate to a
lower level of measurement, ignoring some of the information. The
combination of the type of comparsion and the scale of measurement
should direct us to the appropriate method.
(§13.1) will test the null hypothesis that there is no relationship between
group and variable, but takes no account of the ordering. This is done by
using the chi-squared test for trend, which takes the ordering into account
and provides a much more powerful test (§13.8).
Size of
Type of data Method
sample
Small, at Chi-squared
least one test with Yates'
expected correction
frequency < (§13.5),
5 Fisher's exact
test (§13.4)
Nominal data. Set the data out as a two way table as described above.
The chi-squared test for a two way table is the appropriate test (§13.1).
The condition for validity of the test, that at least 80% of the expected
frequencies should be greater than 5, must be met by combining or
deleting categories as appropriate (§13.3). If the table reduces to a 2 by 2
table without the condition being met, use Fisher's exact test.
Dichotomous data. For large samples, either present the data as two
proportions and use the Normal approximation to find the confidence
interval for the difference (§8.6), or set the data up as a 2 by 2 table and
do a chi-squared test (§13.1). These are equivalent methods. An odds
ratio can also be calculated (§13.7). If the sample is small, the fit to the
Chi-squared distribution can be improved by using Yates' correction
(§13.5). Alternatively, use Fisher's exact test (§13.4).
Interval,
Interval,
non- Ordinal
Normal
Normal
Nominal,
Nominal Dichotomous
ordered
Interval Rank Analysis of t test (§10.3
Normal correlation variance Normal test
(§12.4, (§10.9) (§8.5, §9.7
§12.5)
0 10 8 18
1 15 6 21
2 4 0 4
3 3 0 3
Total patients 32 14 46
Contrast sensitivity
Visual acuity
Case test
Before After Before After
6. Table 14.6 shows some data from a pre- and post-treatment study
of cataract patients. The second number in the visual acuity score
represents the size of letter which can be read at a distance of six
metres, so high numbers represent poor vision. For the contrast
sensitivity test, which is a measurement, high numbers represent
good vision. What methods could be used to test the difference in
visual acuity and in the contrast sensitivity test pre- and post-
operation? What method could be used to investigate the
relationship between visual acuity and the contrast sensitivity test
post-operation?
View Answer
Mother's age at
Asthma or wheeze child's birth
reported
15–19 20–29 30+
15
Clinical measurement
PEFR PEFR
Subject (litres/min) Subject (litres/min)
9 650 638
Total 33 445581.5
Source Degrees
Sum of Mean Variance
of of
squares square ratio (
variation freedom
Total 33 3.160104
We should check to see whether the error does depend on the value of
the measurement, usually being larger for larger values. We can do this
by plotting a scatter diagram of the absolute value of the difference (i.e.
ignoring the sign) and the mean of the two observations (Figure 15.1).
For the PEFR data, there is no obvious relationship. We can check this
by calculating a correlation (§11.9) or rank correlation coefficient (§12.4,
§12.5). For Figure 15.1 we have τ = 0.17, P = 0.3, so there is little to
suggest that the measurement error is related to the size of the PEFR.
Hence the coefficient of variation is not as appropriate as the within
subjects standard deviation as a representation of the measurement
error. For most medical measurements, the standard deviation is either
independent of or proportional to the measurement and so one of these
two approaches can be used.
Fig. 15.1. Absolute difference versus sum for 17 pairs of Wright
Peak Flow Meter measurements
data by using the average of each pair first, but this introduces an extra
stage in the calculation. Bland and Altman (1986) give details.
PEFR
Subject (litres/min) Difference
number Wright Mini Wright - mini
meter meter
3 516 520 -4
4 434 428 6
7 413 364 49
8 442 380 62
9 650 658 -8
12 656 626 30
13 267 260 7
14 478 477 1
16 423 350 73
Total -36
Mean 2.1
S.d. 38.8
The first step in the analysis is to plot the data as a scatter diagram
(Figure 15.2). If we draw the line of equality, along which the two
measurements would be exactly equal, this gives us an idea of the extent
to which the two methods agree. This is not the best way of looking at
data of this type, because much of the graph is empty space and the
interesting information is clustered along the line. A better approach is to
plot the difference between the methods against the sum or average. The
sign of the difference is important, as there is a possibility that one
method may give higher values than the other and this may be related to
the true value we are trying to measure. This plot is also shown in Figure
15.2.
Two methods of measurement agree if the difference between
observations on the same subject using both methods is small enough
for us to use the methods interchangeably. How small this difference has
to be depends on the measurement and the use to which it is to be put. It
is a clinical, not a statistical, decision. We quantify the differences by
estimating the bias, which is the mean difference, and the limits within
which most differences will lie. We estimate these limits from the mean
and standard deviation of the differences. If we are to estimate these
quantities, we want them to be the same for high values and for low
values of the measurement. We can check this from the plot. There is no
clear evidence of a relationship between difference and mean in Figure
15.4, and we can check this by a test of significance using the correlation
coefficient. We get r = 0.19, P = 0.5.
The mean difference is close to zero, so there is little evidence of overall
bias.
On the basis of these data we would not conclude that the two methods
are comparable or that the mini meter could reliably replace the Wright
peak flow meter. As remarked in §10.2, this meter had received
considerable wear.
When there is a relationship between the difference and the mean, we
can try to remove it by a transformation. This is usually accomplished by
the logarithm, and leads to an interpretation of the limits similar to that
described in §15.2. Bland and Altman (1986, 1999) give details.
cases of the disease, and the second is not, so this is clearly a poor
index.
Yes 4 1 0 5 2 3 5
No 5 90 0 95 0 95 95
Fig. 15.5. Scatter diagram and with ROC curve for the data of Table
15.6
The area under the ROC curve is often quoted (here it is 0.975 3). It
estimates the probability that a member of one population chosen at
random will exceed a member of the other population, in the same way
as does U/n1n2 in the Mann–Whitney U test (§12.2). It can be useful in
comparing different tests. In this study another blood test gave us an
area under the ROC curve = 0.9825, suggesting that the test may be
slightly better than CK.
We can also estimate the positive predictive value or PPV, the
probability that a subject who is test positive will be a true positive (i.e.
has the disease and is correctly classified), and the negative predictive
value or NPV, the probability that a subject who is test negative will be a
true negative (i.e. does not have the disease and is correctly classified).
These depend on the prevalence of the condition, Pprev, as well as the
sensitivity, Psens, and the specificity, pspec. If the sample is a single group
of people, we know the prevalence and can estimate PPV and NPV for
this population directly as simple proportions. If we started with a sample
of cases and a sample of controls, we do not know the prevalence, but
we can estimate PPV and NPV for a population with any given
prevalence. As described in §6.8, psens is the conditional probability of a
positive test given the disease, so the probability of being both test
positive and disease positive is psens × pprev. Similarly, the probability of
being both test negative and disease positive is (1 - pspec) × (1 - pprev).
The probability of being test positive is the sum of these (§6.2): psens ×
pprev + (1 - pspec) × (1 - pprev) and the PPV is
Hence, provided Normal assumptions hold, the standard error of the limit
of the reference interval is
Alive Deaths
<1 <1
<1 2
1 6
1 6
4 7
5 9
6 9
8 11
10 14
10
17
Table 15.7 shows some survival data, for patients with parathyroid
cancer. The survival times are recorded in completed years. A patient
who survived for 6 years and then died can be taken as having lived for 6
years and then died in the seventh. In the first year from diagnosis. one
patient died, two patients were observed for only part of this year, and 17
survived into the next year. The subjects who have only been observed
for part of the year are censored, also called lost to follow-up or
withdrawn from follow-up. (These are rather misleading names, often
wrongly interpreted as meaning that these subjects have dropped out of
the study. This may be the case, but most of these subjects are simply
still alive and their further survival is unknown.) There is no information
about the survival of these subjects after the first year, because it has not
happened yet. These patients are only at risk of dying for part of the year
and we cannot say that 1 out of 20 died as they may yet contribute
another death in the first year. We can say that such patients will
contribute half a year of risk, on average, so the number of patient years
at risk in the first year is 18 (17 who survived and 1 who died) plus 2
halves for those withdrawn from follow-up, giving 19 altogether. We get
an estimate of the probability of dying in the first year of 1/19, and an
estimated probability of surviving of 1 - 1/19. We can do this for each
year until the limits of the data are reached. We thus trace the survival of
these patients estimating the probability of death or survival at each year
and the cumulative probability of survival to each year. This set of
probabilities is called a life table.
To carry out the calculation, we first set out for each year, x, the number
alive at the start, nx, the number withdrawn during the year, wx, and the
number at risk, rx, and the number dying, dx (Table 15.8). Thus in year 1
the number at the start is 20, the number withdrawn is 2, the number at
risk r1 = n1 - 1/2w1 = 20 - 1/2 × 2 = 19 and the number of deaths is 1. As
there were 2 withdrawals and 1 death the number at the start of year 2 is
17. For each year we calculate the probability of dying in that year for
patients who have reached the beginning of it, qx = dx/rx, and hence the
probability of surviving to the next year, px = 1 - qx. Finally we calculate
the cumulative survival probability.
Withdrawn Prob.
Number At
Year during Deaths of
at start risk
year death
x nx wx rx dx qx
1 20 2 19 1 0.0526
2 17 2 16 0 0
3 15 0 15 1 0.0667
4 14 0 14 0 0
5 14 1 13.5 0 0
6 13 1 12.5 0 0
7 12 1 11.5 2 0.1739
8 9 0 9 1 0.1111
9 8 1 7.5 0 0
10 7 0 7 2 0.2857
11 5 2 4 0 0
12 3 0 3 1 0.3333
13 2 0 2 0 0
14 2 0 2 0 0
15 2 0 2 1 0.5000
16 1 0 1 0 0
17 1 0 1 0 0
18 1 1 0.5 0 0
For the first year, this is the probability of surviving that year, P1 = p1. For
the second year, it is the probability of surviving up to the start of the
second year, P1, times the probability of surviving that year, p2, to give P2
= p2P1. The probability of surviving for 3 years is similarly P3 = p3P2, and
so on. From this life table we can estimate the five year survival rate, a
useful measure of prognosis in cancer. For the parathyroid cancer, the
five year survival rate is 0.8842, or 88%. We can see that the prognosis
for this cancer is quite good. If we know the exact time of death or
withdrawal for each subject, then instead of using fixed time intervals we
use x as the exact time, with a row of the table for each time when either
an endpoint or a withdrawal occurs. Then rx = nx and we can omit the rx =
nx - 1/2wx step.
We can draw a graph of the cumulative survival probability, the survival
curve. This is usually drawn in steps, with abrupt changes in probability
(Figure 15.6). This convention emphasizes the relatively poor estimation
at the long survival end of the curve, where the small numbers at risk
produced large steps. When the exact times of death and censoring are
known, this is called a Kaplan-Meier survival curve. The times at which
observations are censored may be marked by small vertical lines above
the survival curve (Figure 15.7), and the number remaining at risk may be
written at suitable intervals below the time axis.
The standard error and confidence interval for the survival probabilities
can be found (see Armitage and Berry 1994). These are useful for
estimates such as five year survival rate. They do not provide a good
method for comparing
survival curves, as they do not include all the data, only using those up to
the chosen time. Survival curves start off together at 100% survival,
possibly diverge, but eventually come together at zero survival. Thus the
comparison would depend on the time chosen. Survival curves can be
compared by several significance tests, of which the best known is the
logrank test. This is a non-parametric test which makes use of the full
survival data without making any assumption about the shape of the
survival curve.
3 No Yes 4 10 13 No No
3 No No 18 3 13 No No
3 No Yes 5 27 13 No No
5 No No 19 20 14 No Yes
6 No Yes 3 10 14 No No
6 No Yes 4 6 14 No No
6 Yes Yes 3 18 16 No No
6 Yes Yes 7 9 17 No No
6 No No 25 9 17 No Yes
6 No Yes 4 6 17 No Yes
6 No Yes 4 13 18 Yes No
7 No Yes 3 7 18 No Yes
7 Yes Yes 10 48 19 No No
8 Yes Yes 6 6 20 No No
8 No No 15 1 20 No No
8 No Yes 1 12 20 No No
8 No Yes 5 6 21 No Yes
9 No Yes 2 15 21 No Yes
9 Yes Yes 7 6 21 No Yes
9 No No 19 8 22 No No
10 Yes Yes 14 8 22 No No
11 No Yes 8 12 23 No No
11 No No 15 15 24 No No
11 Yes No 5 8 24 No Yes
11 No Yes 3 6 24 No No
11 No Yes 4 6 25 No No
11 No Yes 13 18 25 No No
11 Yes No 7 8 26 No No
12 No Yes 4 6 28 No No
12 No Yes 4 8 28 Yes No
12 Yes Yes 7 19 29 No No
12 Yes No 7 3 29 Yes No
12 Yes No 8 1 29 No Yes
12 No No 6 6 30 No Yes
12 No No 26 4 30 No No
13 No No 13 6 30 Yes Yes
31 No Yes 5 6 38 No No
31 No No 26 3 38 Yes Yes
31 No No 7 24 38 No No
32 Yes Yes 10 12 40 No No
32 No Yes 5 6 41 No No
32 No No 4 6 41 No No
32 No No 18 10 42 No No
33 No No 13 9 42 No Yes
34 No No 15 8 42 No Yes
34 No No 20 30 42 No Yes
34 No Yes 15 8 43 Yes No
34 No No 27 8 44 No Yes
35 No No 6 12 44 No Yes
36 No No 18 5 45 No No
36 No Yes 6 16 47 No Yes
36 No Yes 5 6 48 No No
36 No Yes 8 17 48 No No
36 No No 5 4 53 No Yes
37 No Yes 5 7 60 Yes No
37 No No 19 4 61 No No
37 No Yes 4 4 65 No Yes
37 No Yes 4 12 70 No Yes
Time n1 d1 w1 n2 d2 w2 pd e1
3 65 0 1 79 0 2 0.000 0.000
4 64 0 0 77 0 1 0.000 0.000
5 64 0 1 76 0 0 0.000 0.000
6 63 0 1 76 5 5 0.036 2.266
7 62 0 0 66 2 1 0.016 0.969
8 62 1 1 63 2 2 0.024 1.488
9 60 0 1 59 1 1 0.008 0.504
10 59 0 0 57 1 0 0.009 0.509
11 59 2 1 56 1 5 0.026 1.539
12 56 2 2 50 3 3 0.047 2.642
13 52 0 4 44 1 1 0.010 0.542
14 48 0 2 42 0 1 0.000 0.000
16 46 0 1 41 2 0 0.023 1.057
17 45 1 1 39 0 3 0.012 0.536
18 43 1 0 36 1 1 0.025 1.089
19 42 0 1 34 1 1 0.013 0.553
20 41 0 3 32 0 0 0.000 0.000
21 38 0 0 32 0 3 0.000 0.000
22 38 0 2 29 0 0 0.000 0.000
23 36 0 1 29 0 0 0.000 0.000
24 35 0 2 29 1 1 0.016 0.547
25 33 0 2 27 1 0 0.017 0.550
26 31 1 1 26 0 1 0.018 0.544
28 29 1 1 25 0 0 0.019 0.537
29 27 1 1 25 1 1 0.038 1.038
30 25 0 1 23 2 1 0.042 1.042
31 24 0 2 20 0 1 0.000 0.000
32 22 0 2 19 1 1 0.024 0.537
33 20 0 1 17 0 0 0.000 0.000
34 19 0 3 17 0 1 0.000 0.000
35 16 0 1 16 0 0 0.000 0.000
36 15 0 2 16 0 3 0.000 0.000
37 13 0 1 13 0 3 0.000 0.000
38 12 0 2 10 1 0 0.045 0.545
40 10 0 1 9 0 0 0.000 0.000
41 9 0 2 9 0 0 0.000 0.000
42 7 0 1 9 0 3 0.000 0.000
43 6 1 0 6 0 0 0.083 0.500
44 5 0 0 4 0 2 0.000 0.000
45 5 0 1 4 0 0 0.000 0.000
47 4 0 0 4 0 1 0.000 0.000
48 4 0 2 3 0 0 0.000 0.000
53 2 0 0 3 0 1 0.000 0.000
60 2 1 0 2 0 0 0.250 0.500
61 1 0 1 2 0 0 0.000 0.000
65 0 0 0 2 0 1 0.000 0.000
70 0 0 0 1 0 1 0.000 0.000
Total 12 27 20.032
We can test the null hypothesis that the risk of recurrence in any month is
equal for the two populations by a chi-squared test:
There is one constraint, that the two frequencies add to the sum of the
expected (i.e. the total number of recurrences), so we lose one degree of
freedom, giving 2-1 = 1 degree of freedom. From Table 13.3. this has a
probability of 0.01.
Some texts describe this test differently, saying that under the null
hypothesis d1 is from a Normal distribution with mean e1 and variance
e1e2/(e1 + e2). This is algebraically identical to the chi-squared method,
but only works for two groups.
The logrank test is non-parametric, because we make no assumptions
about either the distribution of survival time or any difference in
recurrence rates. It requires the survival or censoring times to be exact. A
similar method for grouped data as in Table 15.8 is given by Mantel
(1966).
The logrank test is a test of significance and, of course, an estimate of
the difference is preferable if we can get one. The logrank test calculation
can be used to give us one: the hazard ratio. This is the ratio of the risk
of death in group 1 to the risk of death in group 2. For this to make sense,
we have to assume that this ratio is the same at all times, otherwise there
could not be a single estimate. (Compare the paired t method, §10.2.)
The risk of death is the number of deaths divided by the population at
risk, but the population keeps changing due to censoring. However, the
populations at risk in the two groups are proportional to the numbers of
expected deaths, e1 and e2. We can thus calculate the hazard ratio by
this type and could be turned into an expert system for statistical
analysis.
Although there have been some impressive achievements in the field of
computer diagnosis, it has to date made little progress towards
acceptance in routine medical practice. As computers become more
familiar to clinicians, more common in their surgeries and more powerful
in terms of data storage and processing speed, we may expect computer
aided diagnosis to become as well established as computer aided
statistical analysis is today.
16
Mortality statistics and population
structure
The terms ‘death rate’ and ‘mortality rate’ are used interchangeably. We
calculate the crude mortality rate for a population as:
If the period is in years, this gives the crude mortality rate as deaths per
1000 population per year.
The crude mortality rate is so called because no allowance is made for
the age distribution of the population, and comparisons between
populations with different age structures. For example, in 1901 the crude
mortality rate among adult males (aged over 15 years) in England and
Wales was 15.7 per 1000 per year, and in 1981 it was 14.8 per 1000 per
year. It seems strange that with all the improvements in medicine,
housing and nutrition between these times there has been so little
improvement in the crude mortality rate. To see why we must look at the
age-specific mortality rates, the mortality rates within narrow age
groups. Age-specific mortality rates are usually calculated for one, five or
ten year age groups. In 1901 the age specific mortality rate for men aged
15 to 19 was 3.5 deaths per 1000 per year, whereas in 1981 it was only
0.8. As Table 16.1 shows, the age specific mortality rate in 1901 was
greater than that in 1981 for every age group. However in 1901 there was
a much greater proportion of the population in the younger age groups,
where mortality was low, than there was in 1981. Correspondingly, there
was a smaller proportion of the 1901 population than the 1981 population
in the higher mortality older age groups. Although mortality was lower at
any given age in 1981, the greater proportion of older people meant that
there were almost as many deaths as in 1901.
To eliminate the effects of different age structures in the populations
which we want to compare, we can look at the age-specific death rates.
But if we are comparing several populations, this is a rather cumbersome
procedure, and it is often more convenient to calculate a single summary
figure from the age-specific
rates. There are many ways of doing this, of which three are frequently
used: the direct and indirect methods of age standardization and the life
table.
Standard Observed
Age
proportion in mortality
group a × i
age group rate per 1
(years)
(a) 000 (b)
Sum 7.2623
I shall take as an example the deaths due to cirrhosis of the liver among
male qualified medical practitioners in England and Wales, recorded
around the 1971 census. There were 14 deaths among 43570 doctors
aged below 65, a crude mortality rate of 14/43570 = 321 per million,
compared to 1423 out of 15247980 adult males (aged 15–64), or 93 per
million. The mortality among doctors appears high, but the medical
population may be older than the population of men as a whole, as it will
contain relatively few below the age of 25. Also the actual number of
deaths among doctors is small and any difference not explained by the
age effect may be due to chance. The indirect method enables us to test
this. Table 16.3 shows the age-specific mortality rates for cirrhosis of the
liver among all men aged 15 to 65, and the number of men estimated in
each ten-year-age group, for all men and for doctors. We can see that
the two age distributions do appear to be different.
The calculation of the expected number of deaths is similar to the direct
method, but different populations and rates are used. For each age
group, we take the number in the observed population, and multiply it by
the standard age specific mortality rate, which would be the probability of
dying if mortality in the observed population were the same as that in the
standard population. This gives us the number we would expect to die in
this age group in the observed population. We add these over the age
groups and obtain the expected number of deaths. The calculation is set
out in Table 16.4.
The expected number of deaths is 4.4965, which is considerably less
than the 14 observed. We usually express the result of the calculation as
the ratio of observed to expected deaths, called the standardized
mortality ratio or SMR. Thus the SMR for cirrhosis among doctors is
We usually multiply the SMR by 100 to get rid of the decimal point, and
report the SMR as 311. If we do not adjust for age at all, the ratio of the
crude death rates is 3.44, compared to the age adjusted figure of 3.11, so
the adjustment has made some, but not much, difference in this case.
Observed
Age Standard
population
group mortality a × b
number of
(years) rate (a)
doctors (b)
Total 4.4965
We can calculate a confidence interval for the SMR quite easily. Denote
the observed deaths by O and expected by E. It is reasonable to suppose
that the deaths are independent of one another and happening randomly
in time, so the observed number of deaths is from a Poisson distribution
(§6.7). The standard deviation of this Poisson distribution is the square
root of its mean and so can be estimated by the square root of the
observed deaths, √O. The expected number is calculated from a very
much larger sample and is so well estimated it can be treated as a
constant, so the standard deviation of 100 × O/E, which is the standard
error of the SMR, is estimated by 100 × √O/E. Provided the number of
deaths is large enough, say more than 10, an approximate 95%
confidence interval is given by
The confidence interval clearly excludes 100 and the high mortality
cannot be ascribed to chance.
For small observed frequencies tables based on the exact probabilities of
the Poisson distribution are available (Pearson and Hartley 1970). The
calculations are easily done by computer and my free program Clinstat
(§1.3) does them. There is also an exact method for comparing two
SMRs, which Clinstat does. For the cirrhosis data the exact 95%
confidence interval is 170 to 522. This is
because the mortality of the two sexes is very different. Age specific
death rates are higher in males than females at every age. Between
census years life tables are still produced but are only published in an
abridged form, giving lx at five year intervals only after age five (Table
16.6).
. . . .
. . . .
. . . .
The final column in Tables 16.5 and 16.6 is the expected life,
expectation of life or life expectancy, ex. This is the average life still to
be lived by those reaching age x. We have already calculated this as the
expected value of the probability distribution of year of death (§6E). We
can do the calculation in a number of other ways. For example, if we add
lx+1, lx+2, lx+3, etc. we will get the total number of years to be lived,
because the lx+1 who survive to x + 1 will have added lx+1 years to the
total, the lx+2 of these who survive from x + 1 to x + 2 will add a further
lx+2 years, and so on. If we divide this sum by lx we get the average
number of whole years to be lived. If we then remember that people do
not die on their birthdays, but scattered throughout the year, we can add
half to allow for the average of half year lived in the year of death. We
thus get
i.e. summing the li from age x + 1 to the end of the life table.
If many people die in early life, with high age-specific death rates for
children, this has a great effect on expectation of life at birth. Table 16.7
shows expectation of life at selected ages from four English Life Tables
(Office for National Statistics 1997). In 1991, for example, expectation of
life at birth for males was 74 years, compared to only 40 years in 1841,
an improvement of 34 years. However expectation of life at age 45 in
1991 was 31 years compared to 23 years in 1841, an improvement of
only 8 years. At age 65, male expectation of life was 11
years in 1841 and 14 years in 1991, an even smaller change. Hence the
change in life expectancy at birth was due to changes in mortality in early
life, not late life.
Table 16.6. Abridged Life Table 1988–90,
England and Wales
x lx ex lx ex
Females 42 52 72 79
15 yrs Males 43 47 54 59
Females 44 50 59 65
45 yrs Males 23 23 27 31
Females 24 26 31 36
65 yrs Males 11 11 12 14
Females 12 12 14 18
90. In 1971, the SMR for cirrhosis of the liver for men was 773 for
publicans and innkeepers and 25 for window cleaners, both being
significantly different from 100 (Donnan and Haskey 1977). We can
conclude that:
(a) publicans are more than 7 times as likely as the average
person to die from cirrhosis of the liver;
(b) the high SMR for publicans may be because they tend to be
found in the older age groups;
(c) being a publican causes cirrhosis of the liver;
(d) window cleaning protects men from cirrhosis of the liver;
(e) window cleaners are at high risk of cirrhosis of the liver.
View Answer
91. The age and sex structure of a population may be described by:
(a) a life table;
(b) a correlation coefficient;
(c) a standardized mortality ratio;
(d) a population pyramid;
(e) a bar chart.
View Answer
92. The following statistics are adjusted to allow for the age
distribution of the population:
(a) age-standardized mortality rate;
(b) fertility rate;
(c) perinatal mortality rate;
(d) crude mortality rate;
(e) expectation of life at birth.
View Answer
Age
group Great Britain Scotland
(years)
17
Multifactorial methods
41 178 540
Fig. 17.1. Muscle strength (MVC) against height
We can show the strengths of the linear relationships between all three
variables by their correlation matrix. This is a tabular display of the
correlation coefficients between each pair of variables, matrix being used
in its mathematical sense as a rectangular array of numbers. The
correlation matrix for the data of Table 17.1 is shown in Table 17.2. The
coefficients of the main diagonal are all 1.0, because they show the
correlation of the variable with itself, and the correlation matrix is
symmetrical about this diagonal. Because of this symmetry many
computer programs print only the part of the matrix below the diagonal.
Inspection of Table 17.2 shows that older men were shorter and weaker
than younger men. that taller men were stronger than shorter men, and
that the magnitudes of all three relationships was similar. Reference to
Table 11.2 with 41 - 2 = 39 degrees of freedom shows that all three
correlations are significant.
We could fit a regression line of the form MVC = a + b × age, from which
we could predict the mean MVC for any given age. However, MVC would
still vary with height. To investigate the effect of both age and height, we
can use multiple regression to fit a regression equation of the form
MVC = b0 + b1 × height + b2 × age
The coefficients are calculated by a least squares procedure, exactly the
same in principle as for simple regression. In practice, this is always done
using a computer program. For the data of Table 17.1, the multiple
regression equation is
MVC = -466 + 5.40 × height - 3.08 × age
From this, we would estimate the mean MVC of men with any given age
and height, in the population of which these are a sample.
There are a number of assumptions implicit here. One is that the
relationship between MVC and height is the same at each age, that is,
that there is no interaction between height and age. Another is that the
relationship between MVC and height is linear, that is of the form MVC =
a + b × height. Multiple regression analysis enables us to test both of
these assumptions.
Multiple regression is not limited to two predictor variables. We can have
any number, although the more variables we have the more difficult it
becomes to interpret the regression. We must, however, have more
points than variables, and as the degrees of freedom for the residual
variance are n - 1 - q if q variables are fitted, and this should be large
enough for satisfactory estimation of confidence intervals and tests of
significance. This will become clear after the next section.
Degrees
Source of Sum of Mean Variance
of
variation squares square ratio (F
freedom
Total 19 9.438
68
Degrees
Source of Sum of Mean Variance
of
variation squares square ratio (F
freedom
Total 40 503
344
Note that the square root of the variance ratio is 3.03, the value of t found
in §11.5. The two tests are equivalent. Note also that the regression sum
of squares divided by the total sum of squares = 3.189 37/9.438 68 =
0.337 9 is the square of the correlation coefficient, r = 0.58 (§11.5,
§11.10). This ratio, sum of squares due to regression over total sum of
squares, is the proportion of the variability accounted for by the
regression. The percentage variability accounted for or explained by the
regression is 100 times this, i.e. 34%.
Returning to the MVC data, we can test the significance of the regression
of MVC on height and age together by analysis of variance. If we fit the
regression model in §17.1, the regression sum of squares has two
degrees of freedom, because we have fitted two regression coefficients.
The analysis of variance for the MVC regression is shown in Table 17.4.
The regression is significant; it is unlikely that this association could have
arisen by chance if the null hypothesis were true. The proportion of
variability accounted for, denoted by R2, is 131 495/503 344 = 0.26. The
square root of this is called the multiple correlation coefficient, R. R2 must
lie between 0 and 1, and as no meaning can be given to the direction of
correlation in the multivariate case, R is also taken as positive. The larger
R is, the more closely correlated with the outcome variable the set of
predictor variables are. When R = 1 the variables are perfectly correlated
in the sense that the outcome variable is a linear combination of the
others. When the outcome variable is not linearly related to any of the
predictor variables, R will be small, but not zero.
We may wish to know whether both or only one of our variables leads to
the association. To do this, we can calculate a standard error for each
regression coefficient (Table 17.5). This will be done automatically by the
regression program. We can use this to test each coefficient separately
by a t test. We can
also find a confidence interval for each, using t standard errors on either
side of the estimate. For the example, both age and height have P = 0.04
and we can conclude that both age and height are independently
associated with MVC.
Predictor Standard t
Coefficient P
variable error ratio
Degrees
Source of Sum of Mean Variance
of
variation squares square ratio (F
freedom
Height 1 88 88 9.05
alone 511 511
Age 1 42 42 4.39
given 984 984
height
be exactly zero, but only within the limits of random variation. We can fit
such a model just as we fitted the first one. We get
Degrees
Source of Sum of Mean Variance
of
variation squares square ratio (F
freedom
Total 40 503
344
Height × 1 71 71 8.77
age 224 224
Degrees
Source of Sum of Mean Variance
of
variation squares square ratio (F)
freedom
Total 40 503
344
Linear 1 88 88 7.03
522 522
Residual 38 414 12
241 584
Degrees
Source of Sum of Mean Variance
of
variation squares square ratio (
freedom
Total 58 1
559.035
Residual 55 1 27.455
510.024
Degrees
Source of Sum of Mean Variance
of
variation squares square ratio (
freedom
Total 58 1
559.035
Residual 54 1 26.373
424.155
No MTB MTB
No 9 -5.4 20 10.3 No
No 10 -10.94 -5.26 -2.73 No
This does not mean that the cells were sucking TNF in from their
environment, but was an artifact of the assay method and represents
measurement error.
The subject means are shown in Figure 17.5(a). This suggests several
things: there is a strong donor effect (donor 6 is always high, donor 3 is
always low, for example), MTB and FAT each increase TNF, both
together have a greater effect than either individually, the distribution of
TNF is highly skew, the variance of TNF varies greatly from group to
group, and increases with the mean. As the mean for MTB and FAT
combined is much greater than the sum of their
individual means, the researcher thought there was synergy, i.e. that
MTB and FAT worked together, the presence of one enhancing the effect
of the other. She was seeking statistical support for this conclusion (Jan
Davies, personal communication).
Total 43 194.040
30
lack of interaction between the effects shows that the data are consistent
with this model, this view of what is happening. The lack of interaction
can be seen quite clearly in Figure 17.5(b), as the mean for MTB and FAT
looks very similar to the sum of the means for MTB alone and FAT alone.
Table 17.13. Estimated effects on TNF of MTB,
FAT and their interaction
Ratio
95% 95%
Effect (log effect
Confidence Confidence
scale) (natural
interval interval
scale)
The logit can take any value from minus infinity, when p = 0, to plus
infinity, when p = 1. We can fit regression models to the logit which are
very similar to the ordinary multiple regression and analysis of variance
models found for data from a Normal distribution. We assume that
relationships are linear on the logistic scale:
95%
Std.
Coef. z P Confidence
Err.
interval
When giving birth, women who have had a previous caesarian section
usually have a trial of scar, that is, they attempt a natural labour with
vaginal delivery and only have another caesarian if this is deemed
necessary. Several factors may increase the risk of a caesarian, and in
this study the factor of interest was obesity, as measured by the body
mass index or BMI, defined as weight/height2. The distribution of BMI is
shown in Figure 17.6 (data of Andreas Papadopoulos). For caesarians
the mean BMI was 26.4 kg/m2 and for vaginal deliveries the mean was
24.9 kg/m2. Two other variables had a strong relationship with a
subsequent caesarian. Women who had had a previous vaginal delivery
(PVD) were less likely to need a caesarian, odds ratio = 0.18, 95%
confidence interval 0.10 to 0.32. Women whose labour was induced had
an increased risk of a caesarian, odds ratio = 2.11, 95% confidence
interval 1.44 to 3.08. All these relationships were highly significant. The
question to be answered was whether the relationship between BMI and
caesarian section remained when the effects of induction and previous
deliveries were allowed for.
The results of the logistic regression are shown in Table 17.14. We have
the coefficients for the equation predicting the log odds of a caesarian:
log(o) = -3.700 0 + 0.088 3 × BMI + 0.647 1 × induction - 1.796 3 × PVD
where induction and PVD are 1 if present, 0 if not. Thus for woman who
had BMI = 25 kg/m2, not been induced and had a previous vaginal
delivery the log
95%
Odds ratio P Confidence
interval
where x1,…, xp are the predictor variables and b1,…, bp are the
coefficients which we estimate from the data. This is Cox's proportional
hazards model. Cox regression enables us to estimate the values of b1,
…, bp which best predict the observed survival. There is no constant term
b0, its place being taken by the baseline hazard function h0(t).
Table 15.7 shows the time to recurrence of gallstones, or the time for
which patients are known to have been gallstone-free, following
dissolution by bile acid treatment or lithotrypsy, with the number of
previous gallstones, their maximum diameter, and the time required for
their dissolution. The difference between patients with a single and with
multiple previous gallstones was tested using the logrank test (§15.6).
Cox regression enables us to look at continuous predictor variables, such
as diameter of gallstone, and to examine several predictor variables at
once. Table 17.16 shows the result of the Cox regression. We can earn-
out an approximate test of significance dividing the coefficient by its
standard error, and if the null hypothesis that the coefficient would be
zero in the population is true, this follows a Standard Normal distribution.
The chi-squared statistic tests the relationship between the time to
recurrence and the three variables together. The maximum diameter has
no significant relationship to time to recurrence, so we can try a model
without it (Table 17.17). As the change in overall chi-squared shows,
removing diameter has had very little effect.
The coefficients in Table 17.17 are the log hazard ratios. The coefficient
for
95%
Std.
Variable Coef. z P Conf.
Err.
interval
95%
Std.
Variable Coef. z P Conf.
Err.
interval
studies. A simple literature search is not enough. Not all studies which
have been started are published; studies which produce significant
differences are more likely to be published than those which do not (e.g.
Pocock and Hughes 1990; Easterbrook et al. 1991). Within a study,
results which are significant may be emphasized and parts of the data
which produce no differences may be ignored by the investigators as
uninteresting. Publication of unfavourable results may be discouraged by
the sponsors of research. Researchers who are not native English
speakers may feel that publication in the English language literature is
more prestigious as it will reach a wider audience, and so try there first,
only publishing in their own language if they cannot publish in English.
The English language literature may thus contain more positive results
than do other literatures. The phenomenon by which significant and
positive results are more likely to be reported, and reported more
prominently, than non-significant and negative ones is called publication
bias. Thus we must not only trawl the published literature for studies, but
use personal knowledge of ourselves and others to locate all the
unpublished studies. Only then should we carry out the meta-analysis.
When we have all the studies which meet the definition, we combine
them to get a common estimate of the effect of the treatment or risk
factor. We regard the studies as providing several observations of the
same population value. There are two stages in meta-analysis. First we
check that the studies do provide estimates of the same thing. Second,
we calculate the common estimate and its confidence interval. To do this
we may have the original data from all the studies, which we can
combine into one large data file with study as one of the variables, or we
may only have summary statistics obtained from publications.
If the outcome measure is continuous, such as mean fall in blood
pressure, we can check that subjects are from the same population by
analysis of variance, with treatment or risk factor, study, and interaction
between them in the model. Multiple regression can also be used,
remembering that study is a categorical variable and dummy variables
are required. We test the treatment times study interaction in the usual
way. If the interaction is significant this indicates that the treatment effect
is not the same in all studies, and so we cannot combine the studies. It is
the interaction which is important. It does not matter much if the mean
blood pressure varies from study to study. What matters is whether the
effect of the treatment on blood pressure varies more than we would
expect. We may want to examine the studies to see whether any
characteristic of the studies explains this variation. This might be a
feature of the subjects, the treatment or the data collection. If there is no
interaction, then the data are consistent with the treatment or risk factor
effect being constant. This is called a fixed effects model (see §10.12).
We can drop the interaction term from the model and the treatment or
risk factor effect is then the estimate we want. Its standard error and
confidence interval are found as described in §17.2. If there is an
interaction, we cannot estimate a single treatment effect. We can think of
the studies as a random sample of the possible trials and estimate the
mean treatment effect for this population. This is called the random
effects model (§10.12). The
confidence interval is usually much wider than that found using the fixed
effect model.
Dose
Study Vitamin A Controls
regime
Deaths Number Deaths Number
The odds ratios and their confidence intervals are shown in Figure 17.7.
The confidence interval is indicated by a line, the point estimate of the
odds ratio by a circle. In this picture the most important trial appears to be
study 2, with the widest confidence interval. In fact, it is the study with the
least effect on the whole estimate, because it is the study where the odds
ratio is least well estimated. In the second picture, the odds ratio is
indicated by the middle of a square. The area of the square is
proportional to the number of subjects in the study. This now makes
study 2 appear relatively unimportant, and makes the overall estimate
stand out.
There are many variants on this style of graph, which is sometimes called
a forest diagram. The graph is often shown with the studies on the
vertical axis
and the odds ratio or difference in mean on the horizontal axis (Figure
17.8). The combined estimate of the effect may be shown as a lozenge
or diamond shape and for odds ratios a logarithmic scale is often
employed, as in Figure 17.8.
Source Degrees
Sum of Mean Variance
of of
squares square ratio (F
variation freedom
Total 37 603.586
Std. z 95%
Variable Coef. P
Err. =coef/se Conf.
95. Table 17.21 shows the logistic regression of vein graft failure on
some potential explanatory variables. From this analysis:
(a) patients with high white cell counts were more likely to have
graft failure;
(b) the log odds of graft failure for a diabetic is between 0.389 less
and 2.435 greater than that for a non-diabetic;
(c) grafts were more likely to fail in female subjects, though this is
not significant;
(d) there were four types of graft;
(e) any relationship between white cell count and graft failure may
be due to smokers having higher white cell counts.
View Answer
Fig. 17.9. Oral and forehead temperature measurements made in a
group of pyrexic patients
Std.
Variable Coef. coef/se P
err.
Normal Trisomy-16
27 8.647
2. Figure 17.11 shows residual plots for the analysis of Table 17.24.
Are there any features of the data which might make the analysis
invalid?
View Answer
Degrees
Source of Sum of Mean Variance
of
variation squares square ratio (
freedom
Total 39 328.976
Degrees
Source of Sum of Mean Variance
of
variation squares square ratio (
freedom
Total 39 328.976
18
Determination of sample size
The 95% confidence limits would be, roughly, p ± 0.009. For example, if
the estimate were 0.02, the 95% confidence limits would be 0.011 to
0.029. If this accuracy were sufficient we could proceed.
These estimates of sample size are based on the assumption that the
sample is large enough to use the Normal distribution. If a very small
sample is indicated it will be inadequate and other methods must be used
which are beyond the scope of this book.
Let us assume that we are trying to detect a difference such that d will be
greater than 0. The first alternative is then extremely unlikely and can be
ignored. Thus we must have, for a significant difference: d/SE(d) > uα so
d > uαSE(d). The critical value which d must exceed is uαSE(d).
Now, d is a random variable, and for some samples it will be greater than
its mean, µd, for some it will be less than its mean. d is an observation
from a Normal distribution with mean µd and variance SE(d)2. We want d
to exceed the critical value with probability P, the chosen power of the
test. The value of the Standard Normal distribution which is exceeded
with probability P is -u2(1-P) (see Figure 18.1). (1 - P) is often represented
as β (beta). This is the probability of failing to obtain a significant
difference when the null hypothesis is false and the population difference
is µd. It is the probability of a Type II error (§9.4). The value which d
exceeds with probability P is the mean minus -u2(1-P) standard
deviations: µd - u2(1-P)SE(d). Hence for significance this must exceed the
critical value, uαSE(d). This gives
µd - u2(1-P)SE(d) = uαSE(d)
Putting the correct standard error formula into this will yield the required
sample size. We can rearrange it as
µ2d = (uα + u2(1-P))2SE(d)2
This is the condition which must be met if we are to have a probability P
of
10 3.33
20 2.36
50 1.49
100 1.05
200 0.75
500 0.47
1000 0.33
n2 n1
p2 n
0.90 39
0.80 105
0.70 473
0.65 1964
n1
p2
50 100 200 500 1000 2000
0.06 . . . . . .
0.07 . . . . . 4500
0.20 134 96 84 78 76 76
Table 18.6 shows n2 for different n1 and p2. For some values of n1 we get
a negative value of n2. This means that no value of n2 is large enough. It
is clear
that when the proportions themselves are small, the detection of small
differences requires very large samples indeed.
and we can estimate n, ρ or P given the other two. Table 18.7 shows the
sample size required to detect a correlation coefficient with a power of P
= 0.9 and a significance level α = 0.05.
ρ n ρ n ρ n
where n1 and n2 are the numbers of clusters in the two groups. For most
trials n1 = n2 = n. so
Hence, using the general method of §18.3, we can calculate the required
number of clusters by
19
Solutions to exercises
Some of the multiple choice questions are quite hard. If you score +1 for
a correct answer, -1 for an incorrect answer, and 0 for a part which you
omitted, I would regard 40% as the pass level, 50% as good, 60% as
very good, and 70% as excellent. These questions are hard to set and
some may be ambiguous, so you will not score 100%.
Solution to Exercise 2E
1. It was hoped that women in the KYM group would be more satisfied
with their care. The knowledge that they would receive continuity of care
was an important part of the treatment, and so the lack of blindness is
essential. More difficult is that KYM women were given a choice and so
may have felt more committed to whichever scheme, KYM or standard,
they had chosen, than did the control group. We must accept this
element of patient control as part of the treatment.
2. The study should be (and was) analysed by intention to treat (§2.5). As
often happens, the refusers did worse than did the acceptors of KYM,
and worse than
the control group. When we compare all those allocated to KYM with
those allocated to control, there is very little difference (Table 19.1).
Allocated to Allocated to
Method of KYM control
delivery
% n % n
Solution to Exercise 3E
1. Many cases of infection may be unreported, but there is not much that
could be done about that. Many organisms produce similar symptoms,
hence the
Solution to Exercise 4E
1. The stem and leaf plot is shown in Figure 19.1:
2. Minimum = 2.2, maximum = 6.0. The median is the average of the 20th
and 21st ordered observations, since the number of observations is even.
These are both 4.0, so the median is 4.0. The first quartile is between the
10th and 11th, which are both 3.6. The third quartile is between the 30th
and 31st observations, which are 4.5 and 4.6. We have q = 0.75, i = 0.75
× 41 = 30.75, and the quartile is given by 4.5 + (4.6 - 4.5) × 0.75 = 4.575
(§4.5). The box and whisker plot is shown in Figure 19.2.
Fig. 19.3. Histogram of blood glucose
3. The frequency distribution is derived easily from the stem and leaf plot:
Interval Frequency
2.0–2.4 1
2.5–2.9 1
3.0–3.4 6
3.5–3.9 10
4.0–4.4 11
4.5–4.9 8
5.0–5.4 2
5.5–5.9 0
6.0–6.4 1
Total 40
4. The histogram is shown in Figure 19.3. The distribution is symmetrical.
5. The mean is given by
6. As before, the sum is ∑xi = 16.2, The sum of squares about the mean
is then given by ∑xi2 = 66.5 and
9. For the limits, [x with bar above]-2s = 4.055 - 2 × 0.698 = 2.659, [x with
bar above] - s = 4.055 - 0.698 = 3.357, [x with bar above] = 4.055, [x with
bar above] + s = 4.055 + 0.698 = 4.753, and [x with bar above] + 2s =
4.055 + 2 × 0.698 = 5.451. Figure 19.3 shows the mean and standard
deviation marked on the histogram. The majority of points fall within one
standard deviation of the mean and nearly all within two standard
deviations of the mean. Because the distribution is symmetrical, it
extends just beyond the [x with bar above] ± 2s points on either side.
Relative
Category Frequency Angle
frequency
Alcoholism 57 0.038 85 14
Solution to Exercise 5E
1. This is the frequency distribution of a qualitative variable, so a pie chart
can be used to display it. The calculations are set out in Table 19.2.
Notice that we have lost one degree through rounding errors. We could
work to fractions of a degree, but the eye is unlikely to spot the
difference. The pie chart is shown in Figure 19.5.
2. See Figure 19.6.
3. There are several possibilities. In the original paper, Doll and Hill used
a separate bar chart for each disease, similar to Figure 19.7.
4. Line graphs can be used here, as we have simple time series (Figure
19.8). For an explanation of the difference between years, see §13E.
28. TTTTF. The probability of clinical disease is 0.5 × 0.5 = 0.25. The
probability of carrier status = probability that father passes the gene and
mother does not + probability that mother passes the gene and father
does not = 0.5 × 0.5 + 0.5 ×0.5 = 0.5. Probability of not inheriting the
gene = 0.5 × 0.5 = 0.25. Probability of not having clinical disease = 1 -
0.25 = 0.75. Successive chidren are independent, so the probabilities for
the second child are unaffected by the first (§6.2)
29. FTTFT. §6.3,4. The expected number is one (§6.6). The spins are
independent (§6.2). At least one tail means one tail (PROB = 0.5) or two
tails (PROB = 0.25). These are mutually exclusive, so the probability of at
least one tail is 0.5 + 0.25 = 0.75.
Table 19.3. Probability of surviving to different
ages
Survive Survive
Probability Probability
to age to age
10 0.959 60 0.758
20 0.952 70 0.524
30 0.938 80 0.211
40 0.920 90 0.022
Solution to Exercise 6E
1. Probability of survival to age 10. This illustrates the frequency
definition of probability. 959 out of 1000 survive, so the probability is
959/1000 = 0.959.
2. Survival and death are mutually exclusive, exhaustive events, so
PROB(survives) + PROB(dies) = 1. Hence PROB(dies) = 1 - 0.959 =
0.041.
3. These are the number surviving divided by 1000 (Table 19.3). The
events are not mutually exclusive, e.g. a man cannot survive to age 20 if
he does not survive to age 10. This does not form a probability
distribution.
4. The probability is found by
5 × 0.041 = 0.205
15 × 0.007 = 0.105
25 × 0.014 = 0.350
35 × 0.018 = 0.630
45 × 0.044 = 1.980
55 × 0.118 = 6.490
65 × 0.234 = 15.210
75 × 0.313 = 23.475
85 × 0.189 = 16.065
95 × 0.022 = 2.090
Total 66.600
Fig. 19.10. Histogram of the blood glucose data with the
corresponding Normal distribution curve, and Normal plot
35. FTTFF. §4.6, §7.3. The sample size should not affect the mean. The
relative sizes of mean, median and standard deviation depend on the
shape of the frequency distribution.
36. TFTTF. §7.2, §7.3. Adding, subtracting or multiplying by a constant,
or adding or subtracting an independent Normal variable gives a Normal
distribution. X2 follows a very skew Chi-squared distribution with one
degree of freedom and X/Y follows a t distribution with one degree of
freedom (§7A).
37. TTTTT. A gentle slope indicates that observations are far apart, a
steep slope that there are many observations close together. Hence
gentle-steep-gentle (‘S’ shaped) indicates long tails (§7.5).
Solution to Exercise 7E
1. The box and whisker plot shows a very slight degree of skewness, the
lower whisker being shorter than the upper and the lower half of the box
smaller than the upper. From the histogram it appears that the tails are a
little longer than the Normal curve of Figure 7.10 would suggest. Figure
19.10 shows the Normal distribution with the same mean and variance
superimposed on the histogram, which also indicates this.
2. We have n = 40. For i = 1 to 40 we want to calculate (i-0.5)/n = (2i-
1)/2n. This gives us a probability. We use Table 7.1 to find the value of
the Normal distribution corresponding to this probability. For example, for
i = 1 we have
39. FTFTF. §8.3. The sample mean is always in the middle of the limits.
Solution to Exercise 8E
1. The interval will be 1.96 standard deviations less than and greater than
the mean. The lower limit is 0.810 - 1.96 × 0.057 = 0.698 mmol/litre. The
upper limit is 0.810 + 1.96 × 0.057 = 0.922 mmol/litre.
2. For the diabetics, the mean is 0.719 and the standard deviation 0.068,
so the lower limit of 0.698 will be (0.698 - 0.719)/0.068 = -0.309 standard
deviations from the mean. From Table 7.1, the probability of being below
this is 0.38, so the probability of being above is 1 - 0.38 = 0.62. Thus the
probability that an insulin-dependent diabetic would be within the
reference interval would be 0.62 or 62%. This is the proportion we
require.
4. The 95% confidence interval is the mean ± 1.96 standard errors. For
the controls, 0.810 - 1.96 × 0.00482 to 0.810 + 1.96 × 0.00482 gives us
0.801 to 0.819 mmol/litre. This is much narrower than the interval of part
1. This is because the confidence interval tells us how far the sample
mean might be from the population mean. The 95% reference interval
tells us how far an individual observation might be from the population
mean.
5. The groups are independent, so the standard error of the difference
between means is given by:
46. TTFTT. §9.2. It is quite possible for either to be higher and deviations
in either direction are important (§9.5). n = 16 because the subject giving
the same reading on both gives no information about the difference and
is excluded from the test. The order should be random, as in a cross-over
trial (§2.6).
47. FFFFT. The trial is small and the difference may be due to chance,
but there may also be a large treatment effect. We must do a bigger trial
to increase the power (§9.9). Adding cases would completely invalidate
the test. If the null hypothesis is true, the test will give a ‘significant’ result
one in 20 times. If we keep adding cases and doing many tests we have
a very high chance of getting a ‘significant’ result on one of them, even
though there is no treatment effect (§9.10).
48. TFTTF. Large sample methods depend on estimates of variance
obtained
from the data. This estimate gets closer to the population value as the
sample size increases (§9.7, §9.8). The chance of an error of the first
kind is the significance level set in advance, say 5%. The larger the
sample the more likely we are to detect a difference should one exist
(§9.9). The null hypothesis depends on the phenomena we are
investigating, not on the sample size.
49. FTFFT. We cannot conclude causation in an observational study
(§3.6,7,8), but we can conclude that there is evidence of a difference
(§9.6). 0.001 is the probability of getting so large a difference if the null
hypothesis were true (§9.3).
Solution to Exercise 9E
1. Both control groups are drawn from populations which were easy to
get to, one being hospital patients without gastro-intestinal symptoms, the
other being fracture patients and their relatives. Both are matched for age
and sex; Mayberry et al. (1978) also matched for social class and marital
status. Apart from the matching factors, we have no way of knowing
whether cases and controls are comparable, or any way of knowing
whether controls are representative of the general population. This is
usual in case control studies and is a major problem with this design.
2. There are two obvious sources of bias: interviews were not blind and
information is being recalled by the subject. The latter is particularly a
problem for data about the past. In James' study subjects were asked
what they used to eat several years in the past. For the cases this was
before a definite event, onset of Crohn's disease, for the controls it was
not, the time being time of onset of the disease in the matched case.
3. The question in James' study was ‘what did you to eat in the past?’,
that in Mayberry et al. (1978) was ‘what do you eat now?’
4. Of the 100 patients with Crohn's disease, 29 were current eaters of
cornflakes. Of 29 cases who knew of the cornflakes association, 12 were
ex-eaters of cornflakes, and among the other 71 cases 21 were ex-eaters
of cornflakes, giving a total of 33 past but not present eaters of
cornflakes. Combining these with the 29 current consumers, we get 62
cases who had at some time been regular eaters of cornflakes. If we
carry out the same calculation for the controls, we obtain 3 + 10 = 13 past
eaters and with 22 current eaters this gives 35 sometime regular
cornflakes eaters. Cases were more likely than controls to have eaten
cornflakes regularly at some time, the proportion of cases reporting
having eaten cornflakes being almost twice as great as for controls.
Compare this to James' data, where 17/68 = 25% of controls and 23/34 =
68% of cases, 2.7 times as many, had eaten cornflakes regularly. The
results are similar.
5. The relationship between Crohn's disease and reported consumption
of cornflakes had a much smaller probability for the significance test and
hence stronger evidence that a relationship existed. Also, only one case
had never eaten cornflakes (it was also the most popular cereal among
controls).
6. Of the Crohn's cases, 67.6% (i.e. 23/34) reported having eaten
cornflakes regularly compared to 25.0% of controls. Thus cases were
67.6/25.0 = 2.7 times
56. FTTFT. §10.9. Sums of squares and degrees of freedom add up,
mean squares do not. Three groups gives two degrees of freedom. We
can have any sizes of groups.
For females,
For males,
2. For the standard error, we first need the variances about the line:
For females:
For males:
Rank 9 10 11 12 13 14 15
As for the sign test, the zero is omitted. Sum of ranks for positive
differences is T = 3.5 + 5 + 9 + 12 = 29.5. From Table 12.5 the 5% point
for n = 15 is 25, which T exceeds, so the difference is not significant at
the 5% level. The three tests give similar answers.
3. Using the log transformed differences in Table 19.7, we still have 4
positives, 11 negatives and 1 zero, with a sign test probability of 0.11848.
The transformation does not alter the direction of the changes and so
does not affect the sign test.
4. For the Wilcoxon matched pairs test on the log compliance:
Rank 1 2 3 4 5 6
Rank 7 8 9 10 11 12
Rank 13 14 15
Period
Year Before During After Total
heatwave heatwave heatwave
Period
Year Before During After Total
heatwave heatwave heatwave
2. There were 178 admissions during the heatwave in 1983 and 110 in
the corresponding weeks of 1982. We could test the null hypothesis that
these came from distributions with the same admission rate and we
would get a significant difference. This would not be convincing, however.
It could be due to other factors, such as the closure of another hospital
with resulting changes in catchment area.
3. The cross-tabulation is shown in Table 19.8.
4. The null hypothesis is that there is no association between year and
period, in other words that the distribution of admissions between the
periods will be the same for each year. The expected values are shown in
Table 19.9.
5. The chi-squared statistic is given by:
7. We want to test for the relationship between two variables, which are
both presented as categorical (Table 14.3). We use a chi-squared test for
a contingency table, χ2 = 38.1, d.f. = 6, P < 0.001. One possibility is that
some other variable, such as the mother's smoking or poverty, is related
to both maternal age and asthma. Another is that there is a cohort effect.
All the age 14–19 mothers were born during the second world war, and
some common historical experience may have produced the asthma in
their children.
8. The serial measurements of thyroid hormone could be summarized
using the area under the curve (§10.7). The oxygen dependence is tricky.
The babies who died had the worst outcome, but if we took their survival
time as the time they were oxygen dependent, we would be treating them
as if they had a good outcome. We must also allow for the babies who
went home on oxygen having a long but unknown oxygen dependence.
My solution was to assign an arbitrary large number of days, larger than
any for the babies sent home without oxygen, to the babies sent home on
oxygen. I assigned an even larger number of days to
the babies who died. I then used Kendall's tau b (§12.5) to assess the
relationship with thyroid hormone AUC. Kendall's rank correlation was
chosen in preference to Spearman's because of the large number of ties
which the arbitrary assignment of large numbers produced.
9. This is a comparison of two independent samples, so we use Table
14.1. The variable is interval and the samples are small. We could either
use the two sample t method (10.3) or the Mann–Whitney U test (§12.2).
The groups have similar variances, but the distribution shows a slight
negative skewness. As the two sample t method is fairly robust to
deviations from the Normal distribution and as I wanted a confidence
interval for the difference I chose this option. I did not think that the slight
skewness was sufficient to cause any problems.
By the two sample t method we get the difference between the means,
immobile - mobile, to be 7.06, standard error = 5.74, t = 1.23, P = 0.23,
95% confidence interval = -4.54 to 18.66 hours. By the Mann-Whitney,
we get U = 178.5, z = -1.06, P = 0.29. The two methods give very similar
results and lead to similar conclusions, as we expect them to do when
both methods are valid.
For the 95% confidence interval we take 1.96 standard errors on either
side of the limit, 1.96 × 0.0083439 = 0.016. The 95% confidence interval
for the lower reference limit is 0.696 - 0.016 to 0.696 + 0.016 = 0.680 to
0.712 or 0.68 to 0.71 mmol/litre. The confidence interval for the upper
limit is 0.924 - 0.016 to 0.696 + 0.016 = 0.908 to 0.940 or 0.91 to 0.94
mmol/litre. The reference interval is well estimated as far as sampling
errors are concerned.
6. Plasma magnesium did indeed increase with age. The variability did
not. This would mean that for older people the lower limit would be too
low and the upper limit too high, as the few above this would all be
elderly. We could simply estimate the reference interval separately at
different ages. We could do this using separate means but a common
estimate of variance, obtained by one-way analysis of variance (§10.9).
Or we could use the regression of magnesium on
age to get a formula which would predict the reference interval for any
age. The method chosen would depend on the nature of the relationship.
Great Britain
ASMRs
Scotland Scotland
Age Per Per population expected
group million thousand
(thousands) deaths
per per 13
year years
References
Altman, D.G. (1982). Statistics and ethics in medical research. In
Statistics in Practice (ed. S.M. Gore and D.G. Altman). British Medical
Association, London.
Anderson, H.R., Bland, J.M., Patel, S., and Peckham, C. (1986). The
natural history of asthma in childhood. Journal of Epidemiology and
Community Health, 40, 121–9.
Anon (1997). All trials must have informed consent. British Medical
Journal, 314, 1134–5.
Appleby, L. (1991). Suicide during pregnancy and in the first postnatal
year. British Medical Journal, 302, 137–40.
Balfour, R.P. (1991). Birds, milk and campylobacter. Lancet, 337, 176.
Ballard, R.A., Ballard, P.C., Creasy, R.K., Padbury, J., Polk, D.H.,
Bracken, M., Maya, F.R., and Gross, I. (1992). Respiratory disease in
very-low-birthweight infants after prenatal thyrotropin releasing
hormone and glucocorticoid. Lancet, 339, 510–5.
Banks, M.H., Bewley, B.R., Bland, J.M., Dean, J.R., and Pollard, V.M.
(1978). A long term study of smoking by secondary schoolchildren.
Archives of Disease in Childhood, 53, 12–19.
Bewley, T.H., Bland, J.M., Ilo, M., Walch, E., and Willington, G. (1975).
Census of mental hospital patients and life expectancy of those
unlikely to be discharged. British Medical Journal, 4, 671–5.
Bewley, T.H., Bland, J.M., Mechen, D., and Walch, E. (1981). ‘New
chronic’ patients. British Medical Journal, 283, 1161–4.
Bland, J.M. and Altman, D.G. (1998). Statistics Notes. Bayesians and
frequentists. British Medical Journal, 317, 1151.
Bland, J.M., Bewley, B.R., Banks, M.H., and Pollard, V.M. (1975).
Schoolchildren's beliefs about smoking and disease. Health
Education Journal, 34, 71–8.
Bland, J.M., Bewley, B.R., Pollard, V., and Banks, M.H. (1978). Effect
of children's and parents' smoking on respiratory symptoms. Archives
of Disease in Childhood, 53, 100–5.
Bland, J.M., Bewley, B.R., and Banks, M.H. (1979). Cigarette smoking
and children's respiratory symptoms: validity of questionnaire method.
Revue d'Epidemiologie et Santé Publique, 27, 69–76.
Bland, J.M., Mutoka, C., and Hutt, M.S.R. (1977). Kaposi's sarcoma in
Tanzania. East African Journal of Medical Research, 4, 47–53.
Bland, J.M. and Peacock, J.L. (2000). Statistical Questions in
Evidence-Based Medicine, University Press, Oxford.
Brooke, O.G., Anderson, H.R., Bland, J.M., Peacock, J., and Stewart,
M. (1989). The influence on birthweight of smoking, alcohol, caffeine,
psychosocial and socio-economic factors. British Medical Journal,
298, 795–801.
Burr, M.L., St. Leger, A.S., and Neale, E. (1976). Anti-mite measures
in mite-sensitive adult asthma: a controlled trial. Lancet, i, 333–5.
Cook, R.J. and Sackett, D.L. (1995). The number needed to treat: a
clinically useful measure of treatment effect. British Medical Journal,
310, 452–4.
Cox, D.R. (1972). Regression models and life tables. Journal of the
Royal Statistical Society Series B, 34, 187–220.
Curtis, M.J., Bland, J.M., and Ring, P.A. (1992). The Ring total knee
replacement—a comparison of survivorship. Journal of the Royal
Society of Medicine, 85, 208–10.
Dennis, M., O'Rourke, S., Slattery, J., Staniforth, T., and Warlow, C.
(1997). Evaluation of a stroke family care worker: results of a
randomised controlled trial. British Medical Journal, 314, 1071–11.
Doll, R. and Hill, A.B. (1950). Smoking and carcinoma of the lung.
British Medical Journal, ii, 739–48.
Doll, R. and Hill, A.B. (1956). Lung cancer and other causes of death
in relation to smoking: a second report on the mortality of British
doctors. British Medical Journal, ii, 1071–81.
Donnan, S.P.B. and Haskey, J. (1977). Alcoholism and cirrhosis of the
liver. Population Trends, 7, 18–24.
Finney, D.J., Latscha, R., Bennett, B.M., and Hsa, P. (1963). Tables
for Testing Significance in a 2 × 2 Contingency Table, Cambridge
University Press, London.
Hart, P.D. and Sutherland, I. (1977). BCG and vole bacillus in the
prevention of tuberculosis in adolescence and early adult life. British
Medical Journal, 2, 293–5.
Hickish, T., Colston, K., Bland, J.M., and Maxwell, J.D. (1989).
Vitamin D deficiency and muscle strength in male alcoholics. Clinical
Science, 77, 171–6.
Kaste, M., Kuurne, T., Vilkki, J., Katevuo, K., Sainio, K., and Meurala,
H. (1982). Is chronic brain damage in boxing a hazard of the past?
Lancet, ii, 1186–8.
Kerry, S.M. and Bland, J.M. (1998). Statistics Notes: Analysis of a trial
randomized in clusters. British Medical Journal, 316, 54.
Lee, K.L., McNeer, J.F., Starmer, F.C., Harris, P.J., and Rosati, R.A.
(1980). Clinical judgements and statistics: lessons form a simulated
randomized trial in coronary artery disease. Circulation, 61, 508–15.
Lemeshow, S., Hosmer, D.W., Klar, J., and Lwanga, S.K. (1990).
Adequacy of Sample Size in Health Studies, John Wiley and Sons,
Chichester.
Leonard, J.V, Whitelaw, A.G.L., Wolff, O.H., Lloyd, J.K., and Slack, S.
(1977). Diagnosing familial hypercholesterolaemia in childhood by
measuring serum cholesterol. British Medical Journal, 1, 1566–8.
Lucas, A., Morley, R., Cole, T.J., Lister, G., and Leeson-Payne, C.
(1992). Breast milk and subsequent intelligence quotient in children
born preterm. Lancet, 339, 510–5.
Mantel, N. (1966). Evaluation of survival data and two new rank order
statistics arising in its consideration. Cancer Chemotherapy Reports,
50, 163–70.
Mather, H.M., Nisbet, J.A., Burton, G.H., Poston, G.J., Bland, J.M.,
Bailey, P.A., and Pilkington, T.R.E. (1979). Hypomagnesaemia in
diabetes. Clinica Chemica Acta, 95, 235–42.
Meier, P. (1977). The biggest health experiment ever: the 1954 field
trial of the Salk poliomyelitis vaccine. In Statistics: A Guide to the
Biological and Health Sciences (ed. J.M. Tanur, et al.). Holden-Day,
San Francisco.
Mitchell, E.A., Bland, J.M., and Thompson, J.M.D. (1994). Risk factors
for readmission to hospital for asthma. Thorax, 49, 33–36.
Newnham, J.P., Evans, S.F., Con, A.M., Stanley, F.J., and Landau, L.I.
(1993). Effects of frequent ultrasound during pregnancy: a
randomized controlled trial. Lancet, 342, 887–91.
Peduzzi, P., Concato, J., Kemper, E., Holford, T.R., and Feinstein,
A.R. (1996). A simulation study of the number of events per variable
in logistic regression analysis. Journal of Clinical Epidemiology, 49,
1373–9.
Pritchard, B.N.C., Dickinson, C.J., Alleyne, G.A.O, Hurst, P., Hill, I.D.,
Rosenheim, M.L., and Laurence, D.R. (1963). Report of a clinical trial
from Medical Unit and MRC Statistical Unit, University College
Hospital Medical School, London. British Medical Journal, 2, 1226–7.
Rodin, D.A., Bano, G., Bland, J.M., Taylor, K., and Nussey, S.S.
(1998). Polycystic ovaries and associated metabolic abnormalities in
Indian subcontinent Asian women. Clinical Endocrinology, 49, 91–9.
Samuels, P., Bussel, J.B., Braitman, L.E., Tomaski, A., Druzin, M.L.,
Mennuti, M.T., and Cines, D.B. (1990). Estimation of the risk of
thrombocytopenia in the offspring of pregnant women with presumed
immune thrombocytopenia purpura. New England Journal of
Medicine, 323, 229–35.
Schulz, K.F., Chalmers, I., Hayes, R.J., and Altman, D.G. (1995). Bias
due to non-concealment of randomization and non-double-blinding.
Journal of the American Medical Association, 273, 408–12.
Snowdon, C., Garcia, J., and Elbourne, D.R. (1997). Making sense of
randomisation: Responses of parents of critically ill babies to random
allocation of treatment in a clinical trial. Social Science and Medicine,
15, 1337–55.
Southern, J.P., Smith, R.M.M., and Palmer, S.R. (1990). Bird attack
on milk bottles: possible mode of transmission of Campylobacter
jejuni to man. Lancet, 336, 1425–7.
Williams, E.I., Greenwell, J., and Groom, L.M. (1992). The care of
people over 75 years old after discharge from hospital: an evaluation
of timetabled visiting by Health Visitor Assistants. Journal of Public
Health Medicine, 14, 138–44.
Wroe, S.J., Sandercock, P., Bamford, J., Dennis, M., Slattery, J., and
Warlow, C. (1992). Diurnal variation in incidence of stroke:
Oxfordshire community stroke project. British Medical Journal, 304,
155–7.
Zelen, M. (1979). A new design for clinical trials. New England Journal
of Medicine, 300, 1242–5.
A
abridged life table 200–1
absolute difference 271–2
absolute value 239
accepting null hypothesis 140
accidents 53
acute myocardial infarction 277
addition rule 88
adjusted odds ratio 323
admissions to hospital 86 255–6 354 356 370–1
age 53 56–7 267 308–14 316 373
age, gestational 56–7
age in life table see life table
age-specific mortality rate 295–6 299–300 302 307 376–7
age-standardized mortality rate 74 296 302
age-standardized mortality ratio 297–9 303 307 376–7
agreement 272–5
AIDS 58 77–8 169–71 172 174–8 317–8
alpha spending 152
albumin 76–7
alcoholics 76–7 308–17
allocation to treatment 6–13 15 20–1 23
alterations to 11–13 21
alternate 6–7 11
alternate dates 11–12
by general practice 21 23
by ward 21
cheating in 12–13
known in advance 11
in clusters 21–2 179–81 344–6
minimization 13
non-random 11–13 21–2
physical randomization 12
random 7–11 15 17 20–1 25
systematic 11–12
using envelopes 12
using hospital number 11
alpha error 140
alternate allocation 6–7 11
alternative hypothesis 137 139–42
ambiguous questions 40–1
analgesics 15 18
analysis of covariance 321
analysis of variance 172–9 261–2 267–8 318–21
assumptions 173 175–6
balanced 318
in estimation of measurement error 271
fixed effects 177
Friedman 321
Kruskal–Wallis 217 261–2
in meta-analysis 327
multi-way 318–21
one-way 172–9 261–2
random effects 177–9
in regression 310–15 315
two-way 318
using ranks 217 261–2 321
angina pectoris 15–16 138–9 218–20
animal experiments 5 16–17 20–1 33
anticoagulant therapy 11–12 19 142
antidiuretic hormone 196–7
antilogarithm 83
appropriate confidence intervals for comparison 134
appropriate significance tests for comparison 142–3
anxiety 18 143 210
ARC 58 172 174–7
arcsine square root transformation 165
area under the curve 104–5 109–11 169–71 278 373–4
probability 104–5 109–11
serial data 169–71 373–4
ROC curve 278
arithmetic mean 59
arterial oxygen tension 183–4
arthritis 15 18 37 40
Asian women 35
assessment 19–20
ascertainment bias 38
association 230–2
asthma 21 265 267 332 372 373
atrophy of spinal chord 37
attack rate 303
attribute 47
AUC see area under the curve
average see mean
AVP 196–7
AZT (zidovudine) 77–8 169–71
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > B
B
babies 267 373–4
back-transformation 166–7 271
backwards regression 326
bar chart 73–5 354–6
bar notation 59
Bartlett's test 172
base of logarithm 82–4
baseline 79
baseline hazard 324
BASIC 107
Bayesian probability 87
Bayes' theorem 289
BCG vaccine 6–7 11 17 33 81
beta error 140 337
between groups sum of squares 174
between cluster variance 345–6
between subjects variance 178–9 204
bias 6 11–14 17–20 28 39–42 283–4 327 350 363
ascertainment 38
in allocation 11–13
ascertainment 38
in assessment 19–20
publication 327
in question wording 40–2
recall 39 350 363
in reporting 17–19
response 17–19
in sampling 28 31
volunteer 6 13–14 32
biceps skinfold 165–7 213–15 339
bimodal distribution 54–5
binary variable see dichotomous variable
Binomial distribution 89–91 94 103 106–8 110 128 130–1 132–3 180
and Normal distribution 91 106–8
mean and variance 94
probability 90–1
in sign test 138–9 247
biological variation 269
birds 45–6 255 350
birth rate 303 305
birthweight 150
blind assessment 19–20
blocks 9
blood pressure 19 28 117 191 268–9
BMI see body mass index
body mass index (BMI) 322–3
Bonferroni method 148–51
box and whisker plot 58 66 351 359
boxers 264
boxes 93–4
breast cancer 37 216–17
breast feeding 153
breathlessness 74–5
British Standards Institution 270
bronchitis 130–2 146 233–4
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > C
C
Campylobacter jejuni 44–6 255 350
C-T scanner 5–6 68
caesarian section 25 349
calculation error 70
calibration 194
cancer 23 32–9 41 69–74 216–17 241–3
breast 37 216–17
cervical cancer 23
lung 32 35–9 68–70 241–3 299
oesophagus 74 78–80
parathyroid
cancer registry 39
capillary density 159–64 174
cards 7 12 50
carry-over effect 15
case-control study 37–40 45–6 153–5 241–3 248 323 349–50 362–3
case fatality rate 303
case report 33–4
case series 33–4
cataracts 266 373
categorical data 47–8 373 see nominal data
cats 350
cause of death 70–3 75
cell of table 230
censored observations 281 308 324–5
census 27 47–8 86 294
decennial 27 294
hospital 27 47–8 86
local 27
national 27 294
years 294 299
centile 57–8 279–81
central limit theorem 107–8
cervical cancer 23
cervical smear 275
cervical cytology 22
chart
bar see bar chart
pie see pie chart
cheating in allocation 12–13
Chi-squared distribution 118–20 232–3
and sample variance 119–20 132
contingency tables 231–3 249–51
degrees of freedom 118–19 231–2 251
table 233
chi-squared test 230–6 238–40 243–51 249–51 258–9 261–2 371
372 373
contingency table 230–6 238–40 243–7 249–51 258–9 261–2 371
372 373
continuity correction 238–40 247 259 261
degrees of freedom 231–2 251
goodness of fit 248–9
logrank test 287–8
sample size 341
trend 243–5 259 261–2
validity 234–6 239–40 245
children see school
children choice of statistical method 257–267
cholesterol 55 326 345
cigarette smoking see smoking
cirrhosis 297–9 306 317
class interval 49–50
class variable 317
clinical trials 5–25 32–3 326–30
allocation to treatment 6–15 20–1 23
assessment 19–20
combining results from 326–30
cluster randomized 21–2 179–81 205 344–6 380
consent of subjects 22–4
cross-over 15–16 341
double blind 19–20
double masked see double blind
ethics 19 22–4
grouped sequential 152
informed consent 22–4
intention to treat 14–15 23 348 372
meta-analysis 326–30
placebo effect 17–19
randomized 7–11
sample size 336–42 344–6 347
selection of subjects 16–17
sequential 151–2
volunteer bias 13–14
Clinstat computer program 3 9 30 93 248 298
cluster randomization 21–2 179–81 205 344–6 380
cluster sampling 31 344–6
Cochran, W.G. 230
coefficient of variation 271
coefficients in regression 189 191–2 310–12 314 317 322–3 325
Cox 325
and interaction 314
logistic 322–3
multiple 310–12 314 317
simple linear 189 191–2
coeliac disease 34 165–7 213–15 339
cohort study 36–7 350
cohort, hypothetical in life table 299
coins 7 28 87–92
colds 69 241–3
colon transit time 267
combinations 97–8
combining data from different studies 326–30
common cold see colds
common estimate 326–30
common odds ratio 328–30
common proportion 145–7
common variance 162–4 173
comparison
multiple see multiple comparisons
of means 12–19 143–5 162–4 170–6 338–41 347 361 379–80
of methods of measurement 269–73
of proportions 130–2 145–7 233–4 245–7 259 341–3 347 372 379
of regression lines 208 9 367–8
of two groups 128–32 143–7 162–4 211–17 233–4 254 255–7 338–
43 344–6 347 361 372 379–80
of variances 172 260
within one group 159–62 217–20 245–7 257 260–1 341
compliance 183–4 228–9 363–7 369–70
computer 2 8–9 30 107 166 174 201 238 288–90 298 308 310
318
diagnosis 288–90
random number generation 8–9 107
program for confidence interval of proportion 132
programs for sampling 30
statistical analysis 2 174 201 298 308 310 318
conception 142
conditional logistic regression 323
conditional odds ratio 248
conditional probability 96–7
conditional test 250
confidence interval 126–34
appropriate for comparison 134
centile 133 280–1
correlation coefficient 200–1
difference between two means 128–9 136 162–4 361
difference between two proportions 130–1 243
difference between two regression coefficients 208–9 368
hazard ratio 288 325
mean 126–7 136 159–60 335–6 361
median 133
number needed to treat 290–1
odds ratio 241–3 248
percentile 133 280–1
predicted value in regression 194–5
proportion 128 132–3 336
quantile 133 280–1
ratio of two proportions 131–2
reference interval 280–1 290 375 378
regression coefficient 191–2
regression estimate 192–4
and sample size 335–6
or significance test 142 145 227
SMR 298–9 307 376–7
sensitivity 276
sensitivity 276
survival probability 283
transformed data 166–7
using rank order 216 220
confidence limits 126–34
confounding 34–5
consent of research subjects 22–4
conservative methods 15
constraint 118–19 250–1
contingency table 230 330
D
death 27 70–3 96 101–2 281
death certificate 27 294
death rate see mortality rate
decennial census 27 294
decimal dice 8
decimal places 70 268
decimal point 70
decimal system 69–70
decision tree 289–90
Declaration of Helsinki 22
degrees of freedom 61 67 118–20 153–4 159 169 171–2 191 231–2
251 288 309 311 319 331
analysis of variance 173–5
Chi-squared distribution 118–20
chi-squared test 231–2 251
F distribution 120
F test 171 173–5
goodness of fit test 248–9
logrank test 288
regression 191 310 313
sample size calculations 335
t distribution 120 157–8
t method 157–8 160–4
variance estimate 61 67 94–5 119 352–3
delivery 25 230–1 322–3 349
demography 299
denominator 68–9
dependent variable 187
depressive symptoms 18
Derbyshire 128
design effect 344–6 380
detection, below limit of 281
deviation from assumptions 161–2 164 167–8 175–6 196–7
deviations from mean 61 352
deviations from regression line 187–8
dexamethasone 290–1
diabetes 135–6 360–1
diagnosis 47–8 86 275–9 288–90 317
diagnostic test 136 275–9 361
diagrams 72–82 85–6
bar see bar chart
pie see pie chart
scatter see scatter diagram
diarrhoea 172 318
diastolic blood pressure see blood pressure
dice 7–8 87–9 122
dichotomous variable 258–62 308 317 321–3 325 328
difference against mean plot 161–2 184 271–5 364–5 367
differences 129–30 138–9 159–62 184 217–20 271–5 341 364–5
369–70
differences between two groups 128–31 136 143–7 162–7 211–17
258–9 338–43 344–6 347 362–3
digit preference 269
direct standardization 296
discharge from hospital 48
discrete data 47 49
discriminant analysis 289
distribution
Binomial see Binomial distribution
Chi-squared see Chi-squared distribution
cumulative frequency see cumulative frequency distribution
F see F distribution
frequency see frequency distribution
Normal see Normal distribution
Poisson see Poisson distribution
probability see probability distribution
Rectangular see Rectangular distribution
t see t distribution
Uniform see Uniform distribution
distribution-free methods 210
diurnal variation 249
DNA 97
doctors 36 68 86 297–9 356
Doppler ultrasound 150
dot plot 77
double blind 19–20
double dummy 18
double masked see double blind
double placebo see double dummy
drug 69
dummy treatment see placebo
dummy variables 317 328
Duncan's multiple range test 176
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > E
E
e, mathematical constant 83–4 95
ecological fallacy 42–3
ecological studies 42–3
eczema 97
election 28 32 41
electoral roll 30 32
embryos 333–4 378
enumeration district 27
envelopes 12
enzyme concentration 347 379–80
epidemiological studies 32 34–40 42–3 45–6 326
equality, line of 273–4
error 70 140 187 192 269–72 337
alpha 140
beta 140 337
calculation 70
first kind 140
measurement 269–72
second kind 140 337
term in regression model 187 192
type I 140
type II 140 337
estimate 61 122–36 326–30
estimation 122–36 335–6
ethical approval 32
ethics 4 19 22–4 32
evidence-based practice 1
expectation 92–4
of a distribution 92–3
of Binomial distribution 94
of Chi-squared distribution 118
of life 102 300–2 305 357–8
of sum of squares 60–4 98–9 119
expected frequency 230–31 26 250
expected number of deaths 297–9
expected value see expectation, expected frequency
experimental unit 21–2 180
experiments 5–25
animal 5 16–17 20–1 33
clinical see clinical trials
design of 5–25
factorial 10–11
laboratory 5 16–17 20–1
expert system 288–90
ex-prisoners 128
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > F
F
F distribution 118 120 334
F test 171 173–5 311 313–15 317–18 320 334 378
face-lifts 23
factor 317–18
factorial 90 97
factorial experiment 10–11
false negative 277–9
false positive 277–9
family of distributions 90 96
Farr, William 1
FAT see fixed activated T-cells
fat absorption 78 169–71
fatality rate 303
feet, ulcerated 159–64 174
fertility 142 302–3
fertility rate 303
FEVl 49–54 57–60 62–3 125–7 133 185–6 188–95 197–9 201 279–
80 310–11 335–6
fever tree 26
Fisher 1
Fisher's exact test 236–40 251–2 259 262
Fisher's z transformation 201 343
five figure summary 58
five year survival rate 283
fixed activated T-cells (FAT) 318–21
fixed effects 177–9 328
follow-up, lost to or withdrawn from 282
foot ulcers 159–64 174
forced expiratory volume see FEV1
forest diagram 330
forward regression 326
fourths 57
frequency 48–56 68–9 230–1 250
cumulative 48–51
density 52–4 104–5
distribution 48–56 66–7 103–5 351–2 354
expected 230–1 250
per unit 52–4
polygon 54
and probability 87 103–5
proportion 68
relative 48–50 53–4 104–5
tally system 50 54
in tables 71 230–1
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > G
G
G.P. 41
Gabriel's test 177
gallstones 284–8 324–5
Galton 186
gastric pH 265–6 372–3
Gaussian distribution see Normal distribution
gee whiz graph 79–80
geometric mean 113 167 320
geriatric admissions 86 255–6 354 356 370–1
gestational age 196–7
glucose 35 66–7 121–2 351–3 359–60
glue sniffing see volatile substance abuse
goodness of fit test 248–9
Gossett see Student
gradient 185–6
graft failure 331
graphs 72–82 85–6
group comparison see comparisons
grouped sequential trials 152
grouping of data 167
guidelines 179–81
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > H
H
harmonic mean 113
hay fever 97
hazard ratio 288 324–25
health 40–1
health centre 220–1
health promotion 347
healthy population 279 292–3
heart transplants 264
heatwave 86 255–6 356 370–1
height 75–6 87–8 93–4 112 159 185–6 188–95 197–9 201 208–9
308–17 367–9
Helsinki, Declaration of 22
heteroscedasticity 175
heterogeneity test 249 328–9
Hill, Bradford 1
histogram 50–7 67 72 75 103–4 267 303–4 352 354 356 359
historical controls 6
HIV 58 128 172 174–7
holes 93–4
homogeneity of odds ratios 328–9
homogeneity of variance see uniform variance
homoscedasticity 175
hospital admissions 86 255–6 356 370–1
hospital census 27 47–8 85
hospital controls 38–9
house-dust mite 265 372
housing tenure 230–1 317
Huff 79 81
human immunodeficiency virus see HIV
hypercholesterolaemia 55
hypertension 43 91 265 372
hypocalcaemia 34
hypothesis, alternative see alternative hypothesis
hypothesis, null see null hypothesis
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > I
I
ICC see intra-class correlation
ICD see International Classification of Disease
ileostomy 265 372
incidence 303
independent events 88 357
independent groups 128–32 143–7 162–4 172–7 211–17
independent random variables 93–4
independent trials 90
independent variable in regression 187
India 17 33
indirect standardization 296–9
induction of labour 322–3
infant mortality rate 303
infinity (∞) 291
inflammatory arthritis 40
informed consent 22–3
instrumental delivery 25 349
intention to treat 14–15 348–9 372
interaction 310 313–14 320–1 327–9 334 378
intercept 185–6
International Classification of Disease 70–72
inter-pupil distance 331
interquartile range 60
interval, class 49
interval estimate 126
interval scale 210 217 258–62 373
intra-class correlation coefficient 179 204–5 272 380
intra-cluster correlation coefficient 272 380
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > J
J
jittering in scatter diagrams 77
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > K
K
Kaplan-Meier survival curve 283
Kaposi's sarcoma 69 220–1
Kendall's rank correlation coefficient 222–6 245 261–2 373 374
continuity correction 226
in contingency tables 245
τ 222
table 225
tau 222
ties 23–4
compared to Spearman's 224–5
Kendall's test for two groups 217
Kent 245–7
Know Your Midwife trial 25 348–9
knowledge based system 289–90
Korotkov sounds 268–9
Kruskal-Wallis test 217 261–2
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > L
L
labour 322–3 348–9
laboratory experiment 5 16 20–1
lactulose 172 175–7
Lanarkshire milk experiment 12
laparoscopy 142
large sample 126 128–32 143–7 168–9 258–60 335–6
least squares 187–90 205–6 310
left censored data 281
Levene test 172
life expectancy 102 300–2 305 357–8
life table 101–2 282–3 296 299–302
limits of agreement 274–5
line graph 77–80 354 356
line of equality 273–4
linear constraint 118–19 243–5 250–1
linear regression see regression, multiple regression
linear relationship 185–209 243–5
linear trend in contingency table 243–5
Literary Digest 31
lithotrypsy 284
log see logarithm, logarithmic
log hazard 324–5
log-linear model 330
log odds 240 252–3 321–3
log odds ratio 241–2 252–3 323
logarithm 82–4 131
base of 82–4
logarithm of proportion 131
logarithm of ratio 131
logarithmic scale 81–2
logarithmic transformation 113–14 116 164–7 175–6 184
and coefficient of variation 271
and confidence interval 167
geometric mean 113 167
to equal variance 164–7 175–6 196–7 271
to Normal distribution 113–14 116 164–7 175–6 184 360 364–5
372
standard deviation 113–14
variance of 131 248
logistic regression 289 321–3 326 328–9 330
conditional 323
multinomial 330
ordinal 330
logit transformation 235 248–9 321–3
Lognormal distribution 83 113
logrank test 284 287–9 325
longitudinal study 36–7
loss to follow-up 282
Louis, Pierre-Charles-Alexandre 1
lung cancer 32 35–9 68–70 96 242–3 299
lung function see FEV1, PEFR, mean transit time, vital capacity
lymphatic drainage 40
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > M
M
magnesium 135–6 292–3 360–1 375–6
malaria 26
mannitol 58 172 174–7 317–18
Mann–Whitney U test 164 211–17 225–7 258–9 259 278 373–4
and two-sample t method 211 215–17
continuity correction 225–6
Normal approximation 215 225–6
and ROC curve 278
table 212
tables of 217
ties 213 215
Mantel's method for survival data 288
Mantel-Haenszel
method for combining, 2 by 2 tables 328
method for trend 245
marginal totals 230–1
matched samples 159–62 217–20 245–7 260 341 363–7 369–70
matching 39 45–6
maternal age 267 373
maternal mortality rate 303
maternity care 25
mathematics 2
matrix 309
maximum 58 65 169 345
maximum voluntary contraction 308–16
McNemar's test 245–7 260
mean transit time 265 368
mean 59–60 67
arithmetic 59
comparison of two 128–9 143–5 162–4 338–41 361 378–9
confidence interval for 126–7 132 335 361
deviations from 60
geometric 113 167
harmonic 113
of population 126–7 335–6
of probability distribution 92–4 105–6
of a sample 56–8 65–6 352–3
sample size 335–6 338–41
sampling distribution of 122–5
standard error of 126–7 136 156 335 361
sum of squares about 60–65
measurement 268–9
measurement error 269–72
measurement methods 272–5
median 56–9 133 216–7 220 351
confidence interval for 133 220
Medical Research Council 9
mercury 34
meta-analysis 326–30
methods of measurement 269–73
mice 21 33 333–4 378
midwives 25 342–3
mild hypertension 265 368
milk 12–13 45–6 255 349–50
mini Wright peak flow meter see peak flow meter
minimization 13
minimum 58 66 351
misleading graphs 78–81
missing denominator 69
missing zero 79–80
mites 265 372
MLn 3
MLWin 3
mode 55
modulus 239
Monte carlo methods 238
mortality 15 36 70–6 86 294–6 302–3 347 356 357–8 376–7
mortality rate 36 294–6 302–3
age-specific 295–6 299–300 302 307 376–7
age-standardized 296 302
crude 294–5 302
infant 303 305
neonatal 303
perinatal 303
mosquitos 26
MTB see mycobaterium tuberculosis
MTT see mean transit time
multifactorial methods 308–34
multi-level modelling 3
multinomial logistic regression 330
multiple comparisons 175–7
multiple regression 308–18 333–4
analysis of variance for 310–15
and analysis of variance 318
assumptions 310 315–16
backward 326
class variable 317–18
coefficients 310–12 314 378
computer programs 308 310 318
correlated predictor variables 312
degrees of freedom 310 312
dichotomous predictor 317
dummy variables 317–18
F test 311 313 317
factor 317–18
forward 326
interaction 310 313–14 333–4 378
least squares 310
linear 310 314
in meta-analysis 327
non-linear 310 314–15 378
Normal assumption 315–16
outcome variable 308
polynomial 314–15
predictor variable 308 312–13 316–18
quadratic term 315 316 378
qualitative predictors 316–18
R2 311
reference class 317
residual variance 310
residuals 315–16 333–4 378
significance tests 310–13
standard errors 311–12
stepwise 326
sum of squares 310 313–14 378
t tests 310–12 317
transformations 316
uniform variance 316
variance ratio 311
variation explained 311
multiple significance tests 148–52 169
multiplicative rule 88 90 92–4 96
multi-way analysis of variance 318–21
multi-way contingency tables 330
muscle strength 308–16
mutually exclusive events 88 90 357
mycobaterium tuberculosis (MTB) 318–21
myocardial infarction 277 347 379
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > N
N
Napier 83
natural history 26 33
natural logarithm 83
natural scale 81–2
nausea and vomiting 290–1
Nazi death camps 22
negative predictive value 278–9
neonatal mortality rate 300
New York 6–7 10
Newman-Keuls test 176
Nightingale, Florence 1
nitrite 265 372–3
NNH see number needed to harm
NNT see number needed to treat
nodes in breast cancer 216–17
nominal scale 210 258–62
non-parametric methods 210 226–7
non-significant 140–1 142–3 149
none detectable 281
Normal curve 106–9
normal delivery 25 349
Normal distribution 91 101–20
and Binomial 91 106–8
in confidence intervals 126–7 258–60 262 373
in correlation 200–1
derived distributions 118–20
independence of sample mean and variance 119–20
as limit 106–8
and normal range 279–81 293
of observations 112–18 156 210 258–62 359–60
and reference interval 279–81 293 375 378
in regression 187 192 194 315–16
in significance tests 143–7 258–60 262 368
standard error of sample standard deviation 132
in t method 156–8
tables 109–10
Normal plot 114–19 121–2 161 163 165–7 170–3 175–6 180–1 267
359–60
Normal probability paper 114
normal range see reference interval
null hypothesis 137 139–42
number needed to harm 290
number needed to treat 290–1
Nuremburg trials 22
nuisance variable 320
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > O
O
observational studies 5 26–46
observed and expected frequencies 230–1
occupation 96
odds 240 321–3
odds ratio 240–2 248 252–3 259 323 328–9
oesophogeal cancer 74 77–80
Office of National Statistics 294
on treatment analysis 15
one-sided percentage point 110
one-sided test 141–2 237
one-tailed test 141–2 237
opinion poll 29 32 41 347 378–9
ordered nominal scale 258–62
ordinal logistic regression 330
ordinal scale 210 220 258–62 373
outcome variable 187 190 308 321
outliers 58 196 378
overview 326–30
oxygen dependence 267 373–4
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > P
P
pa(O2) 183–4
pain 15–16 18
pain relief score 18
paired data 129–30 138–9 159–62 167–8 217–20 245–7 260 341
363–7 369–70 372
in large sample 129–30
McNemar's test see McNemar's test
sample size 341
sign test see sign test
t method see t methods
Wilcoxon see test Wilcoxon test
parameter 90
parametric methods 210 226–7
parathyroid cancer 282–4
parity 49 52–3 248–9
passive smoking 34–5
PCO see polycystic ovary disease
peak expiratory flow rate see PEFR
peak flow meter 269–75
peak value 169
Pearson's correlation coefficient see correlation coefficient
PEFR 54 128–9 144–5 147–8 208–9 265 269–75 363–4 368
percentage 68 71
percentage point 109–10 347 378
percentile 57 279–81
perinatal mortality rate 303
permutation 97–8
pH 265 372–3
phlegm 145 147–8
phosphomycin 69
physical mixing 12
pictogram 80–1
pie chart 72–3 80 354–5
pie diagram see pie chart
pilot study 335 339 341
Pitman's test 260
placebo 17–20 22
point estimate 125
Poisson distribution 95–6 108 165 248–50 252 298–9
Poisson heterogeneity test 249
Poisson regression 330
poliomyelitis 13–14 19 68 86 355
polycystic ovary disease 35
polygon see frequency polygon
polynomial regression 314–15
population 27–34 36 39 87 335–6
census 27 294
estimate 294
mean 126–7 335–6
national 27 294
projection 302
pyramid 303–5
restricted 33
standard deviation 124–5
statistical usage 28
variance 124–5
positive predictive value 278
power 147–8 337–46
p–p plot 117–18
precision 268–9
predictor variable 187 190 308 312–13 316–18 321 323 324
pregnancy 25 49 348–9
premature babies 267
presenting data 68–86
presenting tables 71–2
prevalence 35 90 278–9 303
probability 87–122
addition rule 88
conditional 96–7
density function 104–6
distribution 88–9 92–4 103–6 357–8
of dying 101–2 299–300 357–8
multiplication rule 88 96
paper 114
in significance tests 137 9
of survival 101–2 357–8
that null hypothesis is true 140
product moment correlation coefficient see correlation coefficient
pronethalol 15–16 138–9 217–20
proportion 68–9 71 128 130–3 165 321–3
arcsine square root transformation 165
confidence interval for 128 132–3 336
denominator 69
difference between two 130–1 145–7 233–4 245–7 341–3 347
as outcome variable 321–3
ratio of two 131–2 147
sample size 336 341–3 347
standard error 128 336
in tables 71
of variability explained 191 200
proportional frequency 48
proportional hazards model 324–5
prosecutor's fallacy 97
prospective study 36–7
protocol 268
pseudo-random 8
publication bias 327
pulmonary tuberculosis see tuberculosis
pulse rate 178–9 190–1 204
P value 1 139–41
P value spending 152
pyramid, population 303–5
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > Q
Q
q–q plot see quantile–quantile plot
quadratic term 315 316 378
qualitative data 47 258–62 316–18
quantile 56–8 116–18 133 279–81
confidence interval 133 280–1
quantile-quantile plot 116–18
quantitative data 47 49
quartile 57–8 66 351
quasi-random sampling 31
questionnaires 36 40–2
quota sampling 28–29
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > R
R
r, correlation coefficient 198–9
r2 199–20 311
rS, Spearman rank correlation 220
R, multiple correlation coefficient 311
R2 311
radiological appearance 20
RAGE 23
random allocation 7–11 15 17 20–3 25
by general practice 21 23
by ward 21
in clusters 21–2 344–6
random blood glucose 66–7
random effects 177–9 328
random numbers 8 10 29–30
random sampling 9 29–32 38 90
random variable 87–118
addition of a constant 93
difference between two 94
expected value of 92–4
mean of 92–4
multiplied by a constant 92
sum of two 92–3
variance of 92–4
randomization see random allocation
randomized consent 23
randomizing devices 7–8 87 90
range 59–60 279
interquartile 59–60
normal see reference interval
reference see reference interval
rank 211 213–14 218 221 223
rank correlation 220–6 261–2 373 374
choice of 226 261–2
Kendall's 222–6 261–2 373 374
Spearman's 220–2 226 261–2 374
rank order 211 213–14 221
rank sum test 210–20
one sample see Wilcoxon
two sample see Mann Whitney
rate 68–9 71
age specific mortality 295–6 299–300 302 307
age standardized mortality 296 302
attack 303
birth 303 305
case fatality 303
crude mortality 294–5 302
denominator 69
fertility 303
five year survival 283
incidence 303
infant mortality 303 305
maternal mortality 303
mortality 294–6 302–3
multiplier 68 295
neonatal mortality 303
perinatal mortality 303
prevalence 303
response 31–2
stillbirth 303
survival 283
ratio
odds see odds ratio
of proportions 131–2 147
scale 257–8
standardized mortality see standardized mortality ratio
rats 20
raw data 167
recall bias 39 350 363
receiver operating characteristic curve see ROC curve
reciprocal transformation 165–7
Rectangular distribution 107–8
reference class 317
reference interval 33 136 279–81 293 361 375 378
confidence interval 280–1 293 375 378
by direct estimation 280–1
sample size 347 378
using Normal distribution 279–80 293 361 375 378
using transformation 280
refusing treatment 13–15 25
register of deaths 27
regression 185–9 199–200 205–7 208–9 261–2 308–18 312–30
333–4
analysis of variance for 310–15
assumptions 187 191–2 194–5 196–7
backward 326
coefficient 189 191–2
comparing two lines 208–9 367–8
confidence interval 192
in contingency table 234–5
and correlation 199–200
Cox 324–5
dependent variable 187
deviations from 187
deviations from assumptions 196–7
equation 189
error term 187 192
estimate 192–3
explanatory variable 187
forward 326
gradient 185–6
independent variable 187
intercept 185–6
least squares 187–90 205–6
line 187
linear 189
logistic 321–3 326 328–9
multinomial logistic 330
multiple see multiple regression
ordinal logistic 330
outcome variable 187 190
outliers 196
perpendicular distance from line 187–8
Poisson 330
polynomial see polynomial regression
prediction 192–4
predictor variable 187 190
proportional hazards 324–5
residual sum of squares 191
residual variance 191
residuals 194–6
significance test 192
simple linear 189
slope 185–6
standard error 191–4
stepwise 326
sum of products 189
sum of squares about 191–2 310
sum of squares due to 191–2
towards the mean 186–7 191
variability explained 191 200
variance about line 191–2 205–6
X on Y 190–1
rejecting null hypothesis 140–1
relationship between variables 33 73–8 185–209 220–6 230–45 257
261–2 308–34
relative frequency 48–50 53 103–5
relative risk 132 241–3 248 323
reliability 272
repeatability 33 269–72
repeated observations 169–71 202–3
repeated significance tests 151–2 169
replicates 177
representative sample 28–32 34
residual mean square 174 270
residual standard deviation 191–2 270
residual sum of squares 174 310 312
residual variance 173 310
residuals 165–6 175–6 267 315–16 333–4
about regression line 194–6 315–16
plots of 162–4 173–4 194–6 315–16 333–4 378
within groups 165–6 175–6
respiratory disease 32 34–5
respiratory symptoms 32 34–5 41 125–9 142–7 233–4 240–1 243–7
254
response bias 17–19
response rate 31–2
response variable see outcome variable
retrospective study 39
rheumatoid arthritis 37
Richter scale 114
risk 131–2
risk factor 39 326–7 350
RND (X) 107
robustness to deviations from assumptions 167–9
ROC curve 277–8
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > S
S
s2, symbol for variance 61
saline 13–14
Salk vaccine 13–14 17 19 68 355
salt 43
sample 87
large 127–31 168–9 258–60 262 335–6
mean see mean
size see size of sample
small 130–1 132–3 156–69 227 258–60 262 344
variance see variance
sampling 27–34
in clinical studies 32–4 293 375
cluster 31
distribution 122–5 127
in epidemiological studies 32 34–9
experiment 63–4 122–5
frame 29
multi-stage 30
quasi-random 31
quota 29
random 29–31
simple random 29–30
stratified 31
systematic 31
scanner 5–6
scatter diagram 75–7 185–6
scattergram see scatter diagram
schoolchildren 12–13 17 22 31 34–5 41 43 128–32 143–7 233–4
240–1 243–7 254
schools 22 31 34
screening 15 22 81 216–7 265 275–9
selection of subjects 16–17 32–3 37–9
in case control studies 37–9
in clinical trials 16–17
self 31–2
self selection 31–2
semen analysis 183
semi-parametric 325
sensitivity 276–8
sequential analysis 151–2
sequential trials 151–2
serial measurements 169–71
sex 71–2
sign test 138–9 161 210 217 219–20 228 246–7 260 369–70 372
373
signed-rank test see Wilcoxon
significance and importance 142–3
significance and publication 327
significance level 140–1 147
significance tests 137–55
multiple 148–52 169
and sample size 336–8
in subsets 149–50
inferior to confidence intervals 142 145
significant difference 140
significant digits see significant figures
significant figures 69–72 268–9
size of sample 32 147–8 335–47
accuracy of estimation 344
in cluster randomization 344–6
correlation coefficient 343–4
and estimation 335–0
paired samples 6–341
reference interval 347 378
and significance tests 147–8 336–8
single mean 335–6
single proportion 336 378–9
two means 338–41 379–80
two proportions 341–3 379
skew distribution 56 59 67 112–14 116–17 165 167–8 360
skinfold thickness 165–7 213–15 335
slope 185–6
small samples 156–67 227 258–60
smoking 22 26 31–2 34–9 41 67 74–5 241–3 356
SMR 297–9 303 307 376–7
Snow, John 1
sodium 116–17
somites 333–4 378
South East London Screening Study 15
Spearman's rank correlation coefficient 220–2 226 261–2 373
table 219
ties 219
specificity 276–8
spinal chord atrophy 37
square root transformation 165–7 175–7
squares, sum of see sum of squares
standard age specific mortality rates 297–8
standard deviation 60 62–4 67 92–4 119–21
degrees of freedom for 63–4 67 119
of diiferences 159–62
of population 123–4
of probability distribution 92–4 105
of sample 62–4 67 119 353
of sampling distribution 123–4
and transformation 113–14
and standard error 126
standard error of 132
within subjects 269–70
standard error 122–5
and confidence intervals 126–7
centite 280
correlation coefficient 201 343
difference between two means 128–9 136 338–41 361 379–80
difference between two proportions 130–1 145–7 341–3 379
difference between two regression coefficients 208 367–8
different in significance test and confidence interval 147
log hazard ratio 325
log odds ratio 241–2 252–3
logistic regression coefficient 322
mean 123–5 136 335–6 361
percentile 280
predicted value in regression 192–4
proportion 128 336 378–9
quantile 280
ratio of two proportions 121–2
reference interval 280 370–1 378
regression coefficient 191–2 311–12 317
regression estimate 192–3
SMR 298–9 377
standard deviation 132
survival rate 283–4 341
T
t distribution 120 156–9
degrees of freedom 120 153–4 157–8
and Normal distribution 120 156–8
shape of 157
table 158
t methods 114 156–69
assumptions 161–8 184 365–7
confidence intervals 159–63 164 167
deviations from assumptions 161–2 164 167–8
difference between means in matched sample 159–61 184 260 363–
7 370 372
difference between means in two samples 162–7 258–9 262
one sample 159–62 184 260 363–7 370 372
paired 159–62 167–8 184 217 220 260 363–7 370 372
regression coefficient 191–2 310–12 317
single mean 159–62 176
two sample 162–7 217 258–9 262 317 373–4
unpaired same as two sample
table of probability distribution
Chi-squared 233
correlation coefficient 200
Kendall's τ 225
Mann–Whitney U 212
Normal 109–10
Spearman's ρ 222
t 158
Wilcoxon matched pairs 219
table of sample size for correlation coefficient 344
tables of random numbers 8–9 29–30
tables, presentation of 71–2
tables, two way 230–48
tails of distributions 56 359–60
tally system 49–50 54
Tanzania 69 220–4
TB see tuberculosis
telephone survey 42
temperature 10 70 86 210 255–6 332
test, diagnostic 136 275–9 361
test, significance see significance test
test statistic 136 337
three dimensional effect in graphs 80
thrombosis 11–12 36 345
thyroid hormone 267 373–4
ties in rank tests 213 215 218–19 222–4
ties in sign test 138
time 324–5
time series 77–8 169–71 354 356
time to peak 169
time, survival see survival time
TNF see tumour necrosis factor
total sum of squares 174
transformations 112–14 163–7 320
arcsine square root 165
and confidence intervals 167
Fisher's z 201 343
logarithmic 112–14 116 163–7 170–1 175–6 184 320 364–7 369–
70
logit 240 252–3
to Normal distribution 112–14 116 164–7 175–6 184
reciprocal 113 165–7
and significant figures 269
square root 165–7 175–7
to uniform variance 163–7 168 175–6 196–7 271
treated group 5–7
treatment 5–7 326–7
treatment guidelines 179–81
trend in contingency tables 243–5
chi-squared test 243–5
Kendall's τb 245
Mantel–Haenzsel 245
trial, clinical see clinical trial
trial of scar 322–3
triglyceride 55–6 58–59 63 112–13 280–2
trisomy-16 333–4 378
true difference 147
true negative 278
true positive 278
tuberculosis 6–7 9–10 17 81–2 290
Tukey 54 58
Tukey's Honestly Significant Difference 176
tumour growth 20
tumour necrosis factor (TNF) 318–21
Tuskegee Study 22
twins 204
two-sample t test see t methods
two-sample trial 16
two-sided percentage point 110
two-sided test 141–2
two-tailed test 141–2
type I error 140
type II error 140 337
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > U
U
ulcerated feet 159–64 174
ultrasonography 134
unemployment 42
Uniform distribution 107–8 249
uniform variance 159 162–4 167–8 175–6 187 191 196–7 316 319–
20
unimodal distribution 55
unit of analysis 21–2 179–81
urinary infection 69
urinary nitrite 265 372–3
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > V
V
vaccine 6–7 11 13–14 17 19
validity of chi-squared test 234–6 239–40 245
variability 59–64 269
variability explained by regression 191 200
variable 47
categorical 47
continuous 47 49
dependent 187
dichotomous 259–62
discrete 47 49
explanatory 187
independent 187
nominal 210 259–62
nuisance 320
ordinal 210 259–62
outcome 187 190 308 321
predictor 187 190 308 312–13 316–18 321 323 324
qualitative 47 316–18
quantitative 47
random see random variable
variance 59–64 67
about regression line 191–2 205–6
analysis of see analysis of variance
between clusters 345–6
between subjects 178–9 204
common 162–4 170 173
comparison in paired data 260
comparison of several 172
comparison of two 171 260
degrees of freedom for 61 63–4 352–3
estimate 59–64 124–5
of logarithm 131 252
population 123–4
of probability distribution 91–4 105
of random variable 91–4
ratio 120 311
residual 192 205–6 310
sample 59–64 67 94 98–9 119 352–3
uniform 162 163–7 168 174–6 187 196–7 316
within clusters 345–6
within subjects 178–9 204 269–72
variation, coefficient of 271
visual acuity 266 373
vital capacity 75–6
vital statistics 302–3
vitamin A 328–9
vitamin D 115–16
volatile substance abuse 42 307 376–7
volunteer bias 6 13–14 32
volunteers 5–6 13–14 16–17
VSA see volatile substance abuse
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > W
W
Wandsworth Health District 86 255–6 356
website 3 4
weight gain 20–1
wheeze 267
whooping cough 265 373
Wilcoxon test 217–20 260 369–70 373
matched pairs 217–20 260 369–70 373
one sample 217–20 260 369–70 373
signed rank 217–20 260 369–70 373
table 219
ties 218–19
two sample 217 see Mann-Whitney
withdrawn from follow-up 282
within cluster variance 345–6
within group residuals see residuals
within groups sum of squares 173
within groups variance 173
within subjects variance 178–9 204
within subjects variation 178–9 269–72
Wooif's test 328
Wright peak flow meter see peak flow meter
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > X
X
[x with bar above], symbol for mean 59
X-ray 19–20 81 179–81
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > Y
Y
Yates' correction 238–40 247 259 261
Authors: Bland, Martin
Title: Introduction to Medical Statistics, An, 3rd Edition
Copyright ©2000 Oxford University Press
> Back of Book > Index > Z
Z
z test 143–7 258–9 262 234
z transformation 201 343
zero, missing 78–80
zidovudine see AZT
% symbol 71
! (symbol for factorial) 90 97
∞ (symbol for infinity) 291
| (symbol for given) 96
| (symbol for absolute value) 239
α (symbol for alpha) 140
β (symbol for beta) 140
χ (symbol for chi) 118–19
µ (symbol for mu) 92–3
φ (symbol for phi) 108
Φ (symbol for Phi) 109
ρ (symbol for rho) 220–2
Σ (symbol for summation) 57
σ (symbol for sigma) 92–3
τ (symbol for tau) 222–5