Nothing Special   »   [go: up one dir, main page]

LECTURE NOTE ON EPIDEMIOLOGY3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 123

Lesson 1: Introduction to Epidemiology46

Section 1: Definition of Epidemiology


Students of journalism are taught that a good news story, whether it be about a
bank robbery, dramatic rescue, or presidential candidate’s speech, must include
the 5 W’s: what, who, where, when and why (sometimes cited as why/how). The
5 W’s are the essential components of a news story because if any of the five are
missing, the story is incomplete.

The same is true in characterizing epidemiologic events, whether it be an


outbreak of norovirus among cruise ship passengers or the use of mammograms
to detect early breast cancer. The difference is that epidemiologists tend to use
synonyms for the 5 W’s: diagnosis or health event (what), person (who), place
(where), time (when), and causes, risk factors, and modes of transmission
(why/how).
The word epidemiology comes from the Greek words epi, meaning on or
upon, demos, meaning people, and logos, meaning the study of. In other words,
the word epidemiology has its roots in the study of what befalls a population.
Many definitions have been proposed, but the following definition captures the
underlying principles and public health spirit of epidemiology:

Epidemiology is the study of the distribution and determinants of health-


related states or events in specified populations, and the application of this
study to the control of health problems (1).

Key terms in this definition reflect some of the important principles of


epidemiology.

Study

Epidemiology is a scientific discipline with sound methods of scientific inquiry at


its foundation. Epidemiology is data-driven and relies on a systematic and
unbiased approach to the collection, analysis, and interpretation of data. Basic
epidemiologic methods tend to rely on careful observation and use of valid
comparison groups to assess whether what was observed, such as the number
of cases of disease in a particular area during a particular time period or the
frequency of an exposure among persons with disease, differs from what might
be expected. However, epidemiology also draws on methods from other scientific
fields, including biostatistics and informatics, with biologic, economic, social, and
behavioral sciences.

In fact, epidemiology is often described as the basic science of public health, and
for good reason. First, epidemiology is a quantitative discipline that relies on a
working knowledge of probability, statistics, and sound research methods.
Second, epidemiology is a method of causal reasoning based on developing and
testing hypotheses grounded in such scientific fields as biology, behavioral
sciences, physics, and ergonomics to explain health-related behaviors, states,
and events. However, epidemiology is not just a research activity but an integral
component of public health, providing the foundation for directing practical and
appropriate public health action based on this science and causal reasoning.(2)

Distribution

Epidemiology is concerned with the frequency and pattern of health events in a


population:

Frequency refers not only to the number of health events such as the number of
cases of meningitis or diabetes in a population, but also to the relationship of that
number to the size of the population. The resulting rate allows epidemiologists to
compare disease occurrence across different populations.

Pattern refers to the occurrence of health-related events by time, place, and


person. Time patterns may be annual, seasonal, weekly, daily, hourly, weekday
versus weekend, or any other breakdown of time that may influence disease or
injury occurrence. Place patterns include geographic variation, urban/rural
differences, and location of work sites or schools. Personal characteristics
include demographic factors which may be related to risk of illness, injury, or
disability such as age, sex, marital status, and socioeconomic status, as well as
behaviors and environmental exposures.

Characterizing health events by time, place, and person are activities


of descriptive epidemiology, discussed in more detail later in this lesson.

Determinants
Determinant: any factor, whether event, characteristic, or other definable entity,
that brings about a change in a health condition or other defined characteristic.
Epidemiology is also used to search for determinants, which are the causes and
other factors that influence the occurrence of disease and other health-related
events. Epidemiologists assume that illness does not occur randomly in a
population, but happens only when the right accumulation of risk factors or
determinants exists in an individual. To search for these determinants,
epidemiologists use analytic epidemiology or epidemiologic studies to provide the
“Why” and “How” of such events. They assess whether groups with different
rates of disease differ in their demographic characteristics, genetic or
immunologic make-up, behaviors, environmental exposures, or other so-called
potential risk factors. Ideally, the findings provide sufficient evidence to direct
prompt and effective public health control and prevention measures.

Health-related states or events

Epidemiology was originally focused exclusively on epidemics of communicable


diseases (3) but was subsequently expanded to address endemic communicable
diseases and non-communicable infectious diseases. By the middle of the 20th
Century, additional epidemiologic methods had been developed and applied to
chronic diseases, injuries, birth defects, maternal-child health, occupational
health, and environmental health. Then epidemiologists began to look at
behaviors related to health and well-being, such as amount of exercise and seat
belt use. Now, with the recent explosion in molecular methods, epidemiologists
can make important strides in examining genetic markers of disease risk. Indeed,
the term health-related states or events may be seen as anything that affects the
well-being of a population. Nonetheless, many epidemiologists still use the term
“disease” as shorthand for the wide range of health-related states and events
that are studied.

Specified populations

Although epidemiologists and direct health-care providers (clinicians) are both


concerned with occurrence and control of disease, they differ greatly in how they
view “the patient.” The clinician is concerned about the health of an individual;
the epidemiologist is concerned about the collective health of the people in a
community or population. In other words, the clinician’s “patient” is the individual;
the epidemiologist’s “patient” is the community. Therefore, the clinician and the
epidemiologist have different responsibilities when faced with a person with
illness. For example, when a patient with diarrheal disease presents, both are
interested in establishing the correct diagnosis. However, while the clinician
usually focuses on treating and caring for the individual, the epidemiologist
focuses on identifying the exposure or source that caused the illness; the number
of other persons who may have been similarly exposed; the potential for further
spread in the community; and interventions to prevent additional cases or
recurrences.

Application

Epidemiology is not just “the study of” health in a population; it also involves
applying the knowledge gained by the studies to community-based practice. Like
the practice of medicine, the practice of epidemiology is both a science and an
art. To make the proper diagnosis and prescribe appropriate treatment for a
patient, the clinician combines medical (scientific) knowledge with experience,
clinical judgment, and understanding of the patient. Similarly, the epidemiologist
uses the scientific methods of descriptive and analytic epidemiology as well as
experience, epidemiologic judgment, and understanding of local conditions in
“diagnosing” the health of a community and proposing appropriate, practical, and
acceptable public health interventions to control and prevent disease in the
community.

Summary

Epidemiology is the study (scientific, systematic, data-driven) of the distribution


(frequency, pattern) and determinants (causes, risk factors) of health-related
states and events (not just diseases) in specified populations (patient is
community, individuals viewed collectively), and the application of (since
epidemiology is a discipline within public health) this study to the control of health
problems.

Exercise 1.1

Below are three key terms taken from the definition of epidemiology, followed by
a list of activities that an epidemiologist might perform. Match the term to the
activity that best describes it. You should match only one term per activity.
1. Distribution
2. Determinants
3. Application

1. ____ 1. Compare food histories between persons


with Staphylococcus food poisoning and those without
2. ____ 2. Compare frequency of brain cancer among anatomists with
frequency in general population
3. ____ 3. Mark on a map the residences of all children born with birth
defects within 2 miles of a hazardous waste site
4. ____ 4. Graph the number of cases of congenital syphilis by year for the
country
5. ____ 5. Recommend that close contacts of a child recently reported with
meningococcal meningitis receive Rifampin
6. ____ 6. Tabulate the frequency of clinical signs, symptoms, and laboratory
findings among children with chickenpox in Cincinnati, Ohio

Section 2: Historical Evolution of Epidemiology


Although epidemiology as a discipline has blossomed since World War II,
epidemiologic thinking has been traced from Hippocrates through John Graunt,
William Farr, John Snow, and others. The contributions of some of these early
and more recent thinkers are described below.(5)

Circa 400 B.C.

Epidemiology’s roots are nearly 2,500 years old.


Hippocrates attempted to explain disease occurrence from a rational rather than
a supernatural viewpoint. In his essay entitled “On Airs, Waters, and Places,”
Hippocrates suggested that environmental and host factors such as behaviors
might influence the development of disease.

1662

Another early contributor to epidemiology was John Graunt, a London


haberdasher and councilman who published a landmark analysis of mortality
data in 1662. This publication was the first to quantify patterns of birth, death, and
disease occurrence, noting disparities between males and females, high infant
mortality, urban/rural differences, and seasonal variations.(5)

1800

William Farr built upon Graunt’s work by systematically collecting and analyzing
Britain’s mortality statistics. Farr, considered the father of modern vital statistics
and surveillance, developed many of the basic practices used today in vital
statistics and disease classification. He concentrated his efforts on collecting vital
statistics, assembling and evaluating those data, and reporting to responsible
health authorities and the general public.(4)

1854

In the mid-1800s, an anesthesiologist named John Snow was conducting a series


of investigations in London that warrant his being considered the “father of field
epidemiology.” Twenty years before the development of the microscope, Snow
conducted studies of cholera outbreaks both to discover the cause of disease
and to prevent its recurrence. Because his work illustrates the classic sequence
from descriptive epidemiology to hypothesis generation to hypothesis testing
(analytic epidemiology) to application, two of his investigations will be described
in detail.

Snow conducted one of his now famous studies in 1854 when an epidemic of
cholera erupted in the Golden Square of London.(5) He began his investigation
by determining where in this area persons with cholera lived and worked. He
marked each residence on a map of the area, as shown in Figure 1.1. Today, this
type of map, showing the geographic distribution of cases, is called a spot map.

Figure 1.1 Spot map of deaths from cholera in Golden Square area, London,
1854 (redrawn from original)
Image Description
Source: Snow J. Snow on cholera. London: Humphrey Milford: Oxford University Press; 1936.

Because Snow believed that water was a source of infection for cholera, he
marked the location of water pumps on his spot map, then looked for a
relationship between the distribution of households with cases of cholera and the
location of pumps. He noticed that more case households clustered around
Pump A, the Broad Street pump, than around Pump B or C. When he questioned
residents who lived in the Golden Square area, he was told that they avoided
Pump B because it was grossly contaminated, and that Pump C was located too
inconveniently for most of them. From this information, Snow concluded that the
Broad Street pump (Pump A) was the primary source of water and the most likely
source of infection for most persons with cholera in the Golden Square area. He
noted with curiosity, however, that no cases of cholera had occurred in a two-
block area just to the east of the Broad Street pump. Upon investigating, Snow
found a brewery located there with a deep well on the premises. Brewery
workers got their water from this well, and also received a daily portion of malt
liquor. Access to these uncontaminated rations could explain why none of the
brewery’s employees contracted cholera.
To confirm that the Broad Street pump was the source of the epidemic, Snow
gathered information on where persons with cholera had obtained their water.
Consumption of water from the Broad Street pump was the one common factor
among the cholera patients. After Snow presented his findings to municipal
officials, the handle of the pump was removed and the outbreak ended. The site
of the pump is now marked by a plaque mounted on the wall outside of the
appropriately named John Snow Pub.

Figure 1.2 John Snow Pub, London

Image Description
Source: The John Snow Society [Internet]. London: [updated 2005 Oct 14; cited 2006 Feb 6]. Available
from: http://johnsnowsociety.orgexternal icon.
Snow’s second investigation reexamined data from the 1854 cholera outbreak in
London. During a cholera epidemic a few years earlier, Snow had noted that
districts with the highest death rates were serviced by two water companies: the
Lambeth Company and the Southwark and Vauxhall Company. At that time, both
companies obtained water from the Thames River at intake points that were
downstream from London and thus susceptible to contamination from London
sewage, which was discharged directly into the Thames. To avoid contamination
by London sewage, in 1852 the Lambeth Company moved its intake water works
to a site on the Thames well upstream from London. Over a 7-week period during
the summer of 1854, Snow compared cholera mortality among districts that
received water from one or the other or both water companies. The results are
shown in Table 1.1.
Table 1.1 Mortality from Cholera in the Districts of London Supplied by the
Southwark and Vauxhall and the Lambeth Companies, July 9–August 26,
1854
Population
Districts with Water (1851 Number of Deaths from Cholera Death Rate p
Supplied By: Census) Cholera Population

Southwark and Vauxhall 167,654 844 5.0


Only

Lambeth Only 19,133 18 0.9

Both Companies 1300,149 652 2.2


Source: Snow J. Snow on cholera. London: Humphrey Milford: Oxford University Press; 1936.

The data in Table 1.1 show that the cholera death rate was more than 5 times
higher in districts served only by the Southwark and Vauxhall Company (intake
downstream from London) than in those served only by the Lambeth Company
(intake upstream from London). Interestingly, the mortality rate in districts
supplied by both companies fell between the rates for districts served exclusively
by either company. These data were consistent with the hypothesis that water
obtained from the Thames below London was a source of cholera. Alternatively,
the populations supplied by the two companies may have differed on other
factors that affected their risk of cholera.

To test his water supply hypothesis, Snow focused on the districts served by both
companies, because the households within a district were generally comparable
except for the water supply company. In these districts, Snow identified the water
supply company for every house in which a death from cholera had occurred
during the 7-week period. Table 1.2 shows his findings.

Table 1.2 Mortality from Cholera in London Related to the Water Supply of
Individual Houses in Districts Served by Both the Southwark and Vauxhall
Company and the Lambeth Company, July 9–August 26, 1854
Water Supply of Population (1851 Number of Deaths Cholera Death Rate
Individual House Census) from Cholera Population

Southwark and Vauxhall 98,862 419 4.2


Only

Lambeth Only 154,615 80 0.5


Source: Snow J. Snow on cholera. London: Humphrey Milford: Oxford University Press; 1936.

This study, demonstrating a higher death rate from cholera among households
served by the Southwark and Vauxhall Company in the mixed districts, added
support to Snow’s hypothesis. It also established the sequence of steps used by
current-day epidemiologists to investigate outbreaks of disease. Based on a
characterization of the cases and population at risk by time, place, and person,
Snow developed a testable hypothesis. He then tested his hypothesis with a
more rigorously designed study, ensuring that the groups to be compared were
comparable. After this study, efforts to control the epidemic were directed at
changing the location of the water intake of the Southwark and Vauxhall
Company to avoid sources of contamination. Thus, with no knowledge of the
existence of microorganisms, Snow demonstrated through epidemiologic studies
that water could serve as a vehicle for transmitting cholera and that
epidemiologic information could be used to direct prompt and appropriate public
health action.

19th and 20th centuries

In the mid- and late-1800s, epidemiological methods began to be applied in the


investigation of disease occurrence. At that time, most investigators focused on
acute infectious diseases. In the 1930s and 1940s, epidemiologists extended their
methods to noninfectious diseases. The period since World War II has seen an
explosion in the development of research methods and the theoretical
underpinnings of epidemiology. Epidemiology has been applied to the entire
range of health-related outcomes, behaviors, and even knowledge and attitudes.
The studies by Doll and Hill linking lung cancer to smoking (6) and the study of
cardiovascular disease among residents of Framingham, Massachusetts (7) are
two examples of how pioneering researchers have applied epidemiologic
methods to chronic disease since World War II. During the 1960s and early 1970s
health workers applied epidemiologic methods to eradicate naturally occurring
smallpox worldwide.(8) This was an achievement in applied epidemiology of
unprecedented proportions.

In the 1980s, epidemiology was extended to the studies of injuries and violence.
In the 1990s, the related fields of molecular and genetic epidemiology (expansion
of epidemiology to look at specific pathways, molecules and genes that influence
risk of developing disease) took root. Meanwhile, infectious diseases continued
to challenge epidemiologists as new infectious agents emerged (Ebola virus,
Human Immunodeficiency virus (HIV)/ Acquired Immunodeficiency Syndrome
(AIDS)), were identified (Legionella, Severe Acute Respiratory Syndrome
(SARS)), or changed (drug-resistant Mycobacterium tuberculosis, Avian
influenza). Beginning in the 1990s and accelerating after the terrorist attacks of
September 11, 2001, epidemiologists have had to consider not only natural
transmission of infectious organisms but also deliberate spread through biologic
warfare and bioterrorism.

Today, public health workers throughout the world accept and use epidemiology
regularly to characterize the health of their communities and to solve day-to-day
problems, large and small.

Section 3: Uses
Epidemiology and the information generated by epidemiologic methods have
been used in many ways.(9) Some common uses are described below.

Assessing the community’s health

Public health officials responsible for policy development, implementation, and


evaluation use epidemiologic information as a factual framework for decision
making. To assess the health of a population or community, relevant sources of
data must be identified and analyzed by person, place, and time (descriptive
epidemiology).

 What are the actual and potential health problems in the community?
 Where are they occurring?
 Which populations are at increased risk?
 Which problems have declined over time?
 Which ones are increasing or have the potential to increase?
 How do these patterns relate to the level and distribution of public health
services available?

More detailed data may need to be collected and analyzed to determine whether
health services are available, accessible, effective, and efficient. For example,
public health officials used epidemiologic data and methods to identify baselines,
to set health goals for the nation in 2000 and 2010, and to monitor progress
toward these goals.(10, 11, 12)

Making individual decisions

Many individuals may not realize that they use epidemiologic information to make
daily decisions affecting their health. When persons decide to quit smoking, climb
the stairs rather than wait for an elevator, eat a salad rather than a cheeseburger
with fries for lunch, or use a condom, they may be influenced, consciously or
unconsciously, by epidemiologists’ assessment of risk. Since World War II,
epidemiologists have provided information related to all those decisions. In the
1950s, epidemiologists reported the increased risk of lung cancer among
smokers. In the 1970s, epidemiologists documented the role of exercise and
proper diet in reducing the risk of heart disease. In the mid-1980s,
epidemiologists identified the increased risk of HIV infection associated with
certain sexual and drug-related behaviors. These and hundreds of other
epidemiologic findings are directly relevant to the choices people make every day,
choices that affect their health over a lifetime.

Completing the clinical picture

When investigating a disease outbreak, epidemiologists rely on health-care


providers and laboratorians to establish the proper diagnosis of individual
patients. But epidemiologists also contribute to physicians’ understanding of the
clinical picture and natural history of disease. For example, in late 1989, a
physician saw three patients with unexplained eosinophilia (an increase in the
number of a specific type of white blood cell called an eosinophil) and myalgias
(severe muscle pains). Although the physician could not make a definitive
diagnosis, he notified public health authorities. Within weeks, epidemiologists
had identified enough other cases to characterize the spectrum and course of the
illness that came to be known as eosinophilia-myalgia syndrome.(13) More
recently, epidemiologists, clinicians, and researchers around the world have
collaborated to characterize SARS, a disease caused by a new type of
coronavirus that emerged in China in late 2002.(14) Epidemiology has also been
instrumental in characterizing many non-acute diseases, such as the numerous
conditions associated with cigarette smoking — from pulmonary and heart
disease to lip, throat, and lung cancer.

Searching for causes

Much epidemiologic research is devoted to searching for causal factors that


influence one’s risk of disease. Ideally, the goal is to identify a cause so that
appropriate public health action might be taken. One can argue that
epidemiology can never prove a causal relationship between an exposure and a
disease, since much of epidemiology is based on ecologic reasoning.
Nevertheless, epidemiology often provides enough information to support
effective action. Examples date from the removal of the handle from the Broad St.
pump following John Snow’s investigation of cholera in the Golden Square area
of London in 1854, (5) to the withdrawal of a vaccine against rotavirus in 1999
after epidemiologists found that it increased the risk of intussusception, a
potentially life-threatening condition.(15) Just as often, epidemiology and
laboratory science converge to provide the evidence needed to establish
causation. For example, epidemiologists were able to identify a variety of risk
factors during an outbreak of pneumonia among persons attending the American
Legion Convention in Philadelphia in 1976, even though the Legionnaires’
bacillus was not identified in the laboratory from lung tissue of a person who had
died from Legionnaires’ disease until almost 6 months later.(16)

Exercise 1.2

In August 1999, epidemiologists learned of a cluster of cases of encephalitis


caused by West Nile virus infection among residents of Queens, New York. West
Nile virus infection, transmitted by mosquitoes, had never before been identified
in North America.
Describe how this information might be used for each of the following:

1. Assessing the community’s health


2. Making decisions about individual patients
3. Documenting the clinical picture of the illness
4. Searching for causes to prevent future outbreaks

Section 4: Core Epidemiologic Functions


In the mid-1980s, five major tasks of epidemiology in public health practice were
identified: public health surveillance, field investigation, analytic studies,
evaluation, and linkages. (17) A sixth task, policy development, was recently
added. These tasks are described below.

Public health surveillance

Public health surveillance is the ongoing, systematic collection, analysis,


interpretation, and dissemination of health data to help guide public health
decision making and action. Surveillance is equivalent to monitoring the pulse of
the community. The purpose of public health surveillance, which is sometimes
called “information for action,” (18) is to portray the ongoing patterns of disease
occurrence and disease potential so that investigation, control, and prevention
measures can be applied efficiently and effectively. This is accomplished through
the systematic collection and evaluation of morbidity and mortality reports and
other relevant health information, and the dissemination of these data and their
interpretation to those involved in disease control and public health decision
making.

Figure 1.3. Surveillance Cycle


Image Description
Morbidity and mortality reports are common sources of surveillance data for local
and state health departments. These reports generally are submitted by health-
care providers, infection control practitioners, or laboratories that are required to
notify the health department of any patient with a reportable disease such as
pertussis, meningococcal meningitis, or AIDS. Other sources of health-related
data that are used for surveillance include reports from investigations of
individual cases and disease clusters, public health program data such as
immunization coverage in a community, disease registries, and health surveys.

Most often, surveillance relies on simple systems to collect a limited amount of


information about each case. Although not every case of disease is reported,
health officials regularly review the case reports they do receive and look for
patterns among them. These practices have proven invaluable in detecting
problems, evaluating programs, and guiding public health action.

While public health surveillance traditionally has focused on communicable


diseases, surveillance systems now exist that target injuries, chronic diseases,
genetic and birth defects, occupational and potentially environmentally-related
diseases, and health behaviors. Since September 11, 2001, a variety of systems
that rely on electronic reporting have been developed, including those that report
daily emergency department visits, sales of over-the-counter medicines, and
worker absenteeism.(19, 20) Because epidemiologists are likely to be called upon
to design and use these and other new surveillance systems, an epidemiologist’s
core competencies must include design of data collection instruments, data
management, descriptive methods and graphing, interpretation of data, and
scientific writing and presentation.
Field investigation

As noted above, surveillance provides information for action. One of the first
actions that results from a surveillance case report or report of a cluster is
investigation by the public health department. The investigation may be as limited
as a phone call to the health-care provider to confirm or clarify the circumstances
of the reported case, or it may involve a field investigation requiring the
coordinated efforts of dozens of people to characterize the extent of an epidemic
and to identify its cause.

The objectives of such investigations also vary. Investigations often lead to the
identification of additional unreported or unrecognized ill persons who might
otherwise continue to spread infection to others. For example, one of the
hallmarks of investigations of persons with sexually transmitted disease is the
identification of sexual partners or contacts of patients. When interviewed, many
of these contacts are found to be infected without knowing it, and are given
treatment they did not realize they needed. Identification and treatment of these
contacts prevents further spread.

For some diseases, investigations may identify a source or vehicle of infection


that can be controlled or eliminated. For example, the investigation of a case
of Escherichia coli O157:H7 infection usually focuses on trying to identify the
vehicle, often ground beef but sometimes something more unusual such as fruit
juice. By identifying the vehicle, investigators may be able to determine how
many other persons might have already been exposed and how many continue
to be at risk. When a commercial product turns out to be the culprit, public
announcements and recalling the product may prevent many additional cases.

Symbol of EIS

Occasionally, the objective of an investigation may simply be to learn more about


the natural history, clinical spectrum, descriptive epidemiology, and risk factors of
the disease before determining what disease intervention methods might be
appropriate. Early investigations of the epidemic of SARS in 2003 were needed to
establish a case definition based on the clinical presentation, and to characterize
the populations at risk by time, place, and person. As more was learned about
the epidemiology of the disease and communicability of the virus, appropriate
recommendations regarding isolation and quarantine were issued.(21)

Field investigations of the type described above are sometimes referred to as


“shoe leather epidemiology,” conjuring up images of dedicated, if haggard,
epidemiologists beating the pavement in search of additional cases and clues
regarding source and mode of transmission. This approach is commemorated in
the symbol of the Epidemic Intelligence Service (EIS), CDC’s training program for
disease detectives — a shoe with a hole in the sole.

Analytic studies

Surveillance and field investigations are usually sufficient to identify causes,


modes of transmission, and appropriate control and prevention measures. But
sometimes analytic studies employing more rigorous methods are needed. Often
the methods are used in combination — with surveillance and field investigations
providing clues or hypotheses about causes and modes of transmission, and
analytic studies evaluating the credibility of those hypotheses.

Clusters or outbreaks of disease frequently are investigated initially with


descriptive epidemiology. The descriptive approach involves the study of disease
incidence and distribution by time, place, and person. It includes the calculation
of rates and identification of parts of the population at higher risk than others.
Occasionally, when the association between exposure and disease is quite
strong, the investigation may stop when descriptive epidemiology is complete
and control measures may be implemented immediately. John Snow’s 1854
investigation of cholera is an example. More frequently, descriptive studies, like
case investigations, generate hypotheses that can be tested with analytic studies.
While some field investigations are conducted in response to acute health
problems such as outbreaks, many others are planned studies.

The hallmark of an analytic epidemiologic study is the use of a valid comparison


group. Epidemiologists must be skilled in all aspects of such studies, including
design, conduct, analysis, interpretation, and communication of findings.

 Design includes determining the appropriate research strategy and study


design, writing justifications and protocols, calculating sample sizes,
deciding on criteria for subject selection (e.g., developing case definitions),
choosing an appropriate comparison group, and designing questionnaires.
 Conduct involves securing appropriate clearances and approvals,
adhering to appropriate ethical principles, abstracting records, tracking
down and interviewing subjects, collecting and handling specimens, and
managing the data.
 Analysis begins with describing the characteristics of the subjects. It
progresses to calculation of rates, creation of comparative tables (e.g.,
two-by-two tables), and computation of measures of association (e.g., risk
ratios or odds ratios), tests of significance (e.g., chi-square test),
confidence intervals, and the like. Many epidemiologic studies require
more advanced analytic techniques such as stratified analysis, regression,
and modeling.
 Finally, interpretation involves putting the study findings into perspective,
identifying the key take-home messages, and making sound
recommendations. Doing so requires that the epidemiologist be
knowledgeable about the subject matter and the strengths and
weaknesses of the study.

Evaluation

Epidemiologists, who are accustomed to using systematic and quantitative


approaches, have come to play an important role in evaluation of public health
services and other activities. Evaluation is the process of determining, as
systematically and objectively as possible, the relevance, effectiveness,
efficiency, and impact of activities with respect to established goals.(22)

 Effectiveness refers to the ability of a program to produce the intended or


expected results in the field; effectiveness differs from efficacy, which is
the ability to produce results under ideal conditions.
 Efficiency refers to the ability of the program to produce the intended
results with a minimum expenditure of time and resources.

The evaluation itself may focus on plans (formative evaluation), operations


(process evaluation), impact (summative evaluation), or outcomes — or any
combination of these. Evaluation of an immunization program, for example, might
assess the efficiency of the operations, the proportion of the target population
immunized, and the apparent impact of the program on the incidence of vaccine-
preventable diseases. Similarly, evaluation of a surveillance system might
address operations and attributes of the system, its ability to detect cases or
outbreaks, and its usefulness.(23)

Linkages

Epidemiologists working in public health settings rarely act in isolation. In fact,


field epidemiology is often said to be a “team sport.” During an investigation an
epidemiologist usually participates as either a member or the leader of a
multidisciplinary team. Other team members may be laboratorians, sanitarians,
infection control personnel, nurses or other clinical staff, and, increasingly,
computer information specialists. Many outbreaks cross geographical and
jurisdictional lines, so co-investigators may be from local, state, or federal levels
of government, academic institutions, clinical facilities, or the private sector. To
promote current and future collaboration, the epidemiologists need to maintain
relationships with staff of other agencies and institutions. Mechanisms for
sustaining such linkages include official memoranda of understanding, sharing of
published or on-line information for public health audiences and outside partners,
and informal networking that takes place at professional meetings.

Policy development

The definition of epidemiology ends with the following phrase: “…and the
application of this study to the control of health problems.” While some
academically minded epidemiologists have stated that epidemiologists should
stick to research and not get involved in policy development or even make
recommendations, (24) public health epidemiologists do not have this luxury.
Indeed, epidemiologists who understand a problem and the population in which it
occurs are often in a uniquely qualified position to recommend appropriate
interventions. As a result, epidemiologists working in public health regularly
provide input, testimony, and recommendations regarding disease control
strategies, reportable disease regulations, and health-care policy.

Exercise 1.3

Match the appropriate core function to each of the statements below.


1. Public health surveillance
2. Field investigation
3. Analytic studies
4. Evaluation
5. Linkages
6. Policy development

1. ____ 1. Reviewing reports of test results for Chlamydia trachomatis from


public health clinics
2. ____ 2. Meeting with directors of family planning clinics and college health
clinics to discuss Chlamydia testing and reporting
3. ____ 3. Developing guidelines/criteria about which patients coming to the
clinic should be screened (tested) for Chlamydia infection
4. ____ 4. Interviewing persons infected with Chlamydia to identify their sex
partners
5. ____ 5. Conducting an analysis of patient flow at the public health clinic to
determine waiting times for clinic patients
6. ____ 6. Comparing persons with symptomatic versus
asymptomatic Chlamydia infection to identify predictors

Section 5: The Epidemiologic Approach


As with all scientific endeavors, the practice of epidemiology relies on a
systematic approach. In very simple terms, the epidemiologist:

 Counts cases or health events, and describes them in terms of time, place,
and person;
 Divides the number of cases by an appropriate denominator to calculate
rates; and
 Compares these rates over time or for different groups of people.

An epidemiologist:

 Counts
 Divides
 Compares
Before counting cases, however, the epidemiologist must decide what a case is.
This is done by developing a case definition. Then, using this case definition, the
epidemiologist finds and collects information about the case-patients. The
epidemiologist then performs descriptive epidemiology by characterizing the
cases collectively according to time, place, and person. To calculate the disease
rate, the epidemiologist divides the number of cases by the size of the population.
Finally, to determine whether this rate is greater than what one would normally
expect, and if so to identify factors contributing to this increase, the
epidemiologist compares the rate from this population to the rate in an
appropriate comparison group, using analytic epidemiology techniques. These
epidemiologic actions are described in more detail below. Subsequent tasks,
such as reporting the results and recommending how they can be used for public
health action, are just as important, but are beyond the scope of this lesson.

Defining a case

Before counting cases, the epidemiologist must decide what to count, that is,
what to call a case. For that, the epidemiologist uses a case definition. A case
definition is a set of standard criteria for classifying whether a person has a
particular disease, syndrome, or other health condition. Some case definitions,
particularly those used for national surveillance, have been developed and
adopted as national standards that ensure comparability. Use of an agreed-upon
standard case definition ensures that every case is equivalent, regardless of
when or where it occurred, or who identified it. Furthermore, the number of cases
or rate of disease identified in one time or place can be compared with the
number or rate from another time or place. For example, with a standard case
definition, health officials could compare the number of cases of listeriosis that
occurred in Forsyth County, North Carolina in 2000 with the number that occurred
there in 1999. Or they could compare the rate of listeriosis in Forsyth County in
2000 with the national rate in that same year. When everyone uses the same
standard case definition and a difference is observed, the difference is likely to
be real rather than the result of variation in how cases are classified.

To ensure that all health departments in the United States use the same case
definitions for surveillance, the Council of State and Territorial Epidemiologists
(CSTE), CDC, and other interested parties have adopted standard case
definitions for the notifiable infectious diseases.(25) These definitions are revised
as needed. In 1999, to address the need for common definitions and methods for
state-level chronic disease surveillance, CSTE, the Association of State and
Territorial Chronic Disease Program Directors, and CDC adopted standard
definitions for 73 chronic disease indicators.(29)

Other case definitions, particularly those used in local outbreak investigations,


are often tailored to the local situation. For example, a case definition developed
for an outbreak of viral illness might require laboratory confirmation where such
laboratory services are available, but likely would not if such services were not
readily available.

Components of a case definition for outbreak investigations

A case definition consists of clinical criteria and, sometimes, limitations on time,


place, and person. The clinical criteria usually include confirmatory laboratory
tests, if available, or combinations of symptoms (subjective complaints), signs
(objective physical findings), and other findings. Case definitions used during
outbreak investigations are more likely to specify limits on time, place, and/or
person than those used for surveillance. Contrast the case definition used for
surveillance of listeriosis (see box below) with the case definition used during an
investigation of a listeriosis outbreak in North Carolina in 2000.(25, 26)

Both the national surveillance case definition and the outbreak case definition
require a clinically compatible illness and laboratory confirmation of Listeria
monocytogenes from a normally sterile site, but the outbreak case definition adds
restrictions on time and place, reflecting the scope of the outbreak.

Listeriosis — Surveillance Case Definition

Clinical description
Infection caused by Listeria monocytogenes, which may produce any of several
clinical syndromes, including stillbirth, listeriosis of the newborn, meningitis,
bacteriemia, or localized infections

Laboratory criteria for diagnosis


Isolation of L. monocytogenes from a normally sterile site (e.g., blood or
cerebrospinal fluid or, less commonly, joint, pleural, or pericardial fluid)

Case classification
Confirmed: a clinically compatible case that is laboratory confirmed
Source: Centers for Disease Control and Prevention. Case definitions for
infectious conditions under public health surveillance. MMWR Recommendations
and Reports 1997:46(RR-10):49-50.

Listeriosis — Outbreak Investigation

Case definition
Clinically compatible illness with L. monocytogenes isolated
 From a normally sterile site
 In a resident of Winston-Salem, North Carolina
 With onset between October 24, 2000 and January 4, 2001
Source: MacDonald P, Boggs J, Whitwam R, Beatty M, Hunter S, MacCormack N, et al. Listeria-associated birth complications linked with
homemade Mexican-style cheese, North Carolina, October 2000 [abstract]. 50th Annual Epidemic Intelligence Service Conference; 2001 Apr
23–27; Atlanta, GA.

Many case definitions, such as that shown for listeriosis, require laboratory
confirmation. This is not always necessary, however; in fact, some diseases have
no distinctive laboratory findings. Kawasaki syndrome, for example, is a
childhood illness with fever and rash that has no known cause and no specifically
distinctive laboratory findings. Notice that its case definition (see box below) is
based on the presence of fever, at least four of five specified clinical findings, and
the lack of a more reasonable explanation.

Kawasaki Syndrome — Case Definition

Clinical description
A febrile illness of greater than or equal to 5 days’ duration, with at least four of
the five following physical findings and no other more reasonable explanation for
the observed clinical findings:
 Bilateral conjunctival injection
 Oral changes (erythema of lips or oropharynx, strawberry tongue, or
fissuring of the lips)
 Peripheral extremity changes (edema, erythema, or generalized or
periungual desquamation)
 Rash
 Cervical lymphadenopathy (at least one lymph node greater than or equal
to 1.5 cm in diameter)
Laboratory criteria for diagnosis
None

Case classification
Confirmed: a case that meets the clinical case definition

Comment: If fever disappears after intravenous gamma globulin therapy is


started, fever may be of less than 5 days’ duration, and the clinical case definition
may still be met.
Source: Centers for Disease Control and Prevention. Case definitions for infectious conditions under public health surveillance. MMWR
Recommendations and Reports 1990:39(RR-13):18.

Criteria in case definitions

A case definition may have several sets of criteria, depending on how certain the
diagnosis is. For example, during an investigation of a possible case or outbreak
of measles, a person with a fever and rash might be classified as having a
suspected, probable, or confirmed case of measles, depending on what evidence
of measles is present (see box below).

Measles (Rubeola) — 1996 Case Definition

Clinical description
An illness characterized by all the following:
 A generalized rash lasting greater than or equal to 3 days
 A temperature greater than or equal to 101.0°F (greater than or equal to
38.3°C)
 Cough, coryza, or conjunctivitis

Laboratory criteria for diagnosis


 Positive serologic test for measles immunoglobulin M antibody, or
 Significant rise in measles antibody level by any standard serologic assay,
or
 Isolation of measles virus from a clinical specimen

Case classification
Suspected:
Any febrile illness accompanied by rash

Probable:

A case that meets the clinical case definition, has noncontributory or no serologic
or virologic testing, and is not epidemiologically linked to a confirmed case

Confirmed:

A case that is laboratory confirmed or that meets the clinical case definition and
is epidemiologically linked to a confirmed case. (A laboratory-confirmed case
does not need to meet the clinical case definition.)

Comment: Confirmed cases should be reported to National Notifiable Diseases


Surveillance System. An imported case has its source outside the country or
state. Rash onset occurs within 18 days after entering the jurisdiction, and illness
cannot be linked to local transmission. Imported cases should be classified as:
 International. A case that is imported from another country
 Out-of-State. A case that is imported from another state in the United
States. The possibility that a patient was exposed within his or her state of
residence should be excluded; therefore, the patient either must have been
out of state continuously for the entire period of possible exposure (at least
7-18 days before onset of rash) or have had one of the following types of
exposure while out of state: a) face-to-face contact with a person who had
either a probable or confirmed case or b) attendance in the same
institution as a person who had a case of measles (e.g., in a school,
classroom, or day care center).
An indigenous case is defined as a case of measles that is not imported. Cases
that are linked to imported cases should be classified as indigenous if the
exposure to the imported case occurred in the reporting state. Any case that
cannot be proved to be imported should be classified as indigenous.
Source: Centers for Disease Control and Prevention. Case definitions for infectious conditions under public health surveillance. MMWR
Recommendations and Reports 1997:46(RR-10):23–24.

A case might be classified as suspected or probable while waiting for the


laboratory results to become available. Once the laboratory provides the report,
the case can be reclassified as either confirmed or “not a case,” depending on
the laboratory results. In the midst of a large outbreak of a disease caused by a
known agent, some cases may be permanently classified as suspected or
probable because officials may feel that running laboratory tests on every patient
with a consistent clinical picture and a history of exposure (e.g., chickenpox) is
unnecessary and even wasteful. Case definitions should not rely on laboratory
culture results alone, since organisms are sometimes present without causing
disease.

Modifying case definitions

Case definitions can also change over time as more information is obtained. The
first case definition for SARS, based on clinical symptoms and either contact with
a case or travel to an area with SARS transmission, was published in CDC’s
Morbidity and Mortality Weekly Report (MMWR) on March 21, 2003 (see box
below).(27) Two weeks later it was modified slightly. On March 29, after a novel
coronavirus was determined to be the causative agent, an interim surveillance
case definition was published that included laboratory criteria for evidence of
infection with the SARS-associated coronavirus. By June, the case definition had
changed several more times. In anticipation of a new wave of cases in 2004, a
revised and much more complex case definition was published in December
2003.(28)

CDC Preliminary Case Definition for Severe Acute Respiratory Syndrome (SARS) — March 21,
2003

Suspected case
Respiratory illness of unknown etiology with onset since February 1, 2003, and
the following criteria:
 Documented temperature > 100.4°F (>38.0°C)
 One or more symptoms with respiratory illness (e.g., cough, shortness of
breath, difficulty breathing, or radiographic findings of pneumonia or acute
respiratory distress syndrome)
 Close contact *within 10 days of onset of symptoms with a person under
investigation for or suspected of having SARS or travel within 10 days of
onset of symptoms to an area with documented transmission of SARS as
defined by the World Health Organization (WHO)
* Defined as having cared for, having lived with, or having had direct contact with
respiratory secretions and/or body fluids of a person suspected of having SARS.
Source: Centers for Disease Control and Prevention. Outbreak of severe acute respiratory syndrome–worldwide, 2003. MMWR 2003:52:226–8.

Variation in case definitions

Case definitions may also vary according to the purpose for classifying the
occurrences of a disease. For example, health officials need to know as soon as
possible if anyone has symptoms of plague or anthrax so that they can begin
planning what actions to take. For such rare but potentially severe communicable
diseases, for which it is important to identify every possible case, health officials
use a sensitive case definition. A sensitive case definition is one that is broad or
“loose,” in the hope of capturing most or all of the true cases. For example, the
case definition for a suspected case of rubella (German measles) is “any
generalized rash illness of acute onset.” (25) This definition is quite broad, and
would include not only all cases of rubella, but also measles, chickenpox, and
rashes due to other causes such as drug allergies. So while the advantage of a
sensitive case definition is that it includes most or all of the true cases, the
disadvantage is that it sometimes includes other illnesses as well.

On the other hand, an investigator studying the causes of a disease outbreak


usually wants to be certain that any person included in a study really had the
disease. That investigator will prefer a specific or “strict” case definition. For
instance, in an outbreak of Salmonella Agona infection, the investigators would
be more likely to identify the source of the infection if they included only persons
who were confirmed to have been infected with that organism, rather than
including anyone with acute diarrhea, because some persons may have had
diarrhea from a different cause. In this setting, the only disadvantages of a strict
case definition are the requirement that everyone with symptoms be tested and
an underestimation of the total number of cases if some people with
salmonellosis are not tested.

Exercise 1.4

Investigators of an outbreak of trichinosis used a case definition with the


following categories:

Clinical Criteria
Confirmed case:
Signs and symptoms plus laboratory confirmation

Probable case:

Acute onset of at least three of the following four features: myalgia, fever, facial
edema, or eosinophil count greater than 500/mm3

Possible case:

Acute onset of two of the four features plus a physician diagnosis of trichinosis

Suspect case:

Unexplained eosinophilia

Not a case:

Failure to fulfill the criteria for a confirmed, probable, possible, or suspect case

Time:

Onset after October 1, 2006

Place:

Metropolitan Atlanta

Person:

Any

Using this case definition, assign the appropriate classification to each of the
persons included in the line listing below. Use the highest rate classification
possible. (All were residents of Atlanta with acute onset of symptoms in
November.)

ID# Last Name Myalgias Fever Facial Edema Eosinophil Count Physician Diagnosis Laboratory Confi

1 Anderson yes yes no 495 trichinosis yes

2 Buffington yes yes yes pending possible trichinosis pending


ID# Last Name Myalgias Fever Facial Edema Eosinophil Count Physician Diagnosis Laboratory Confi

3 Callahan yes yes no 1,100 possible trichinosis pending

4 Doll yes yes no 2,050 EMS* pending

5 Ehrlich no yes no 600 trichinosis not done

* Eosinophilia-Myalgia

Consider the initial case definition for SARS presented (on page 1–26 in the
book). Explain how the case definition might address the purposes listed below.

1. Diagnosing and caring for individual patients


2. Tracking the occurrence of disease
3. Doing research to identify the cause of the disease
4. Deciding who should be quarantined (quarantine is the separation or
restriction of movement of persons who are not ill but are believed to have
been exposed to infection, to prevent further transmission)

Using counts and rates

As noted, one of the basic tasks in public health is identifying and counting cases.
These counts, usually derived from case reports submitted by health-care
workers and laboratories to the health department, allow public health officials to
determine the extent and patterns of disease occurrence by time, place, and
person. They may also indicate clusters or outbreaks of disease in the
community.

Counts are also valuable for health planning. For example, a health official might
use counts (i.e., numbers) to plan how many infection control isolation units or
doses of vaccine may be needed.

Rate:
the number of cases
divided by
the size of the population per unit of time
However, simple counts do not provide all the information a health department
needs. For some purposes, the counts must be put into context, based on the
population in which they arose. Rates are measures that relate the numbers of
cases during a certain period of time (usually per year) to the size of the
population in which they occurred. For example, 42,745 new cases of AIDS were
reported in the United States in 2002.(30) This number, divided by the estimated
2002 population, results in a rate of 15.3 cases per 100,000 population. Rates are
particularly useful for comparing the frequency of disease in different locations
whose populations differ in size. For example, in 2003, Pennsylvania had over
twelve times as many births (140,660) as its neighboring state, Delaware (11,264).
However, Pennsylvania has nearly ten times the population of Delaware. So a
more fair way to compare is to calculate rates. In fact, the birth rate was greater
in Delaware (13.8 per 1,000 women aged 15–44 years) than in Pennsylvania (11.4
per 1,000 women aged 15–44 years).(31)

Rates are also useful for comparing disease occurrence during different periods
of time. For example, 19.5 cases of chickenpox per 100,000 were reported in 2001
compared with 135.8 cases per 100,000 in 1991. In addition, rates of disease
among different subgroups can be compared to identify those at increased risk of
disease. These so-called high risk groups can be further assessed and targeted
for special intervention. High risk groups can also be studied to identify risk
factors that cause them to have increased risk of disease. While some risk
factors such as age and family history of breast cancer may not be modifiable,
others, such as smoking and unsafe sexual practices, are. Individuals can use
knowledge of the modifiable risk factors to guide decisions about behaviors that
influence their health.

Section 6: Descriptive Epidemiology


The 5W’s of descriptive epidemiology:
What = health issue of concern
Who = person
Where = place
When = time
Why/how = causes, risk factors, modes of transmission
As noted earlier, every novice newspaper reporter is taught that a story is
incomplete if it does not describe the what, who, where, when, and why/how of a
situation, whether it be a space shuttle launch or a house fire. Epidemiologists
strive for similar comprehensiveness in characterizing an epidemiologic event,
whether it be a pandemic of influenza or a local increase in all-terrain vehicle
crashes. However, epidemiologists tend to use synonyms for the five W’s listed
above: case definition, person, place, time, and causes/risk factors/modes of
transmission. Descriptive epidemiology covers time, place, and person.

Compiling and analyzing data by time, place, and person is desirable for several
reasons.

 First, by looking at the data carefully, the epidemiologist becomes very


familiar with the data. He or she can see what the data can or cannot
reveal based on the variables available, its limitations (for example, the
number of records with missing information for each important variable),
and its eccentricities (for example, all cases range in age from 2 months to
6 years, plus one 17-year-old.).
 Second, the epidemiologist learns the extent and pattern of the public
health problem being investigated — which months, which neighborhoods,
and which groups of people have the most and least cases.
 Third, the epidemiologist creates a detailed description of the health of a
population that can be easily communicated with tables, graphs, and maps.
 Fourth, the epidemiologist can identify areas or groups within the
population that have high rates of disease. This information in turn
provides important clues to the causes of the disease, and these clues can
be turned into testable hypotheses.

Time

The occurrence of disease changes over time. Some of these changes occur
regularly, while others are unpredictable. Two diseases that occur during the
same season each year include influenza (winter) and West Nile virus infection
(August–September). In contrast, diseases such as hepatitis B and salmonellosis
can occur at any time. For diseases that occur seasonally, health officials can
anticipate their occurrence and implement control and prevention measures,
such as an influenza vaccination campaign or mosquito spraying. For diseases
that occur sporadically, investigators can conduct studies to identify the causes
and modes of spread, and then develop appropriately targeted actions to control
or prevent further occurrence of the disease.

In either situation, displaying the patterns of disease occurrence by time is critical


for monitoring disease occurrence in the community and for assessing whether
the public health interventions made a difference.

Time data are usually displayed with a two-dimensional graph. The vertical or y-
axis usually shows the number or rate of cases; the horizontal or x-axis shows
the time periods such as years, months, or days. The number or rate of cases is
plotted over time. Graphs of disease occurrence over time are usually plotted as
line graphs (Figure 1.4) or histograms (Figure 1.5).

Figure 1.4 Reported Cases of Salmonellosis per 100,000 Population, by


Year — United States, 1972–2002

Image Description
Source: Centers for Disease Control and Prevention. Summary of notifiable diseases–United States, 2002. Published April 30, 2004, for
MMWR 2002;51(No. 53): p. 59.

Figure 1.5 Number of Intussusception Reports After the Rhesus Rotavirus


Vaccine-tetravalent (RRV-TV) by Vaccination Date — United States,
September 1998–December 1999

Image Description
Source: Zhou W, Pool V, Iskander JK, English-Bullard R, Ball R, Wise RP, et al. In: Surveillance Summaries, January 24, 2003. MMWR
2003;52(No. SS-1):1–26.
Sometimes a graph shows the timing of events that are related to disease trends
being displayed. For example, the graph may indicate the period of exposure or
the date control measures were implemented. Studying a graph that notes the
period of exposure may lead to insights into what may have caused illness.
Studying a graph that notes the timing of control measures shows what impact, if
any, the measures may have had on disease occurrence.

As noted above, time is plotted along the x-axis. Depending on the disease, the
time scale may be as broad as years or decades, or as brief as days or even
hours of the day. For some conditions — many chronic diseases, for example —
epidemiologists tend to be interested in long-term trends or patterns in the
number of cases or the rate. For other conditions, such as foodborne outbreaks,
the relevant time scale is likely to be days or hours. Some of the common types
of time-related graphs are further described below. These and other graphs are
described in more detail in Lesson 4.

Secular (long-term) trends. Graphing the annual cases or rate of a disease


over a period of years shows long-term or secular trends in the occurrence of the
disease (Figure 1.4). Health officials use these graphs to assess the prevailing
direction of disease occurrence (increasing, decreasing, or essentially flat), help
them evaluate programs or make policy decisions, infer what caused an increase
or decrease in the occurrence of a disease (particularly if the graph indicates
when related events took place), and use past trends as a predictor of future
incidence of disease.

Seasonality. Disease occurrence can be graphed by week or month over the


course of a year or more to show its seasonal pattern, if any. Some diseases
such as influenza and West Nile infection are known to have characteristic
seasonal distributions. Seasonal patterns may suggest hypotheses about how
the infection is transmitted, what behavioral factors increase risk, and other
possible contributors to the disease or condition. Figure 1.6 shows the seasonal
patterns of rubella, influenza, and rotavirus. All three diseases display consistent
seasonal distributions, but each disease peaks in different months — rubella in
March to June, influenza in November to March, and rotavirus in February to
April. The rubella graph is striking for the epidemic that occurred in 1963 (rubella
vaccine was not available until 1969), but this epidemic nonetheless followed the
seasonal pattern.

Figure 1.6 Seasonal Pattern of Rubella, Influenza and Rotavirus

Image Description
Source: Dowell SF. Seasonal Variation in Host Susceptibility and Cycles of Certain Infectious Diseases. Emerg Infect Dis. 2001;5:369–74.
Day of week and time of day. For some conditions, displaying data by day of
the week or time of day may be informative. Analysis at these shorter time
periods is particularly appropriate for conditions related to occupational or
environmental exposures that tend to occur at regularly scheduled intervals. In
Figure 1.7, farm tractor fatalities are displayed by days of the week.(32) Note that
the number of farm tractor fatalities on Sundays was about half the number on
the other days. The pattern of farm tractor injuries by hour, as displayed in Figure
1.8 peaked at 11:00 a.m., dipped at noon, and peaked again at 4:00 p.m. These
patterns may suggest hypotheses and possible explanations that could be
evaluated with further study. Figure 1.9 shows the hourly number of survivors and
rescuers presenting to local hospitals in New York following the attack on the
World Trade Center on September 11, 2001.

Figure 1.7 Farm Tractor Deaths by Day of Week

Image Description

Figure 1.8 Farm Tractor Deaths by Hour of Day

Image Description
Source: Goodman RA, Smith JD, Sikes RK, Rogers DL, Mickey JL. Fatalities associated with farm tractor injuries: an epidemiologic study.
Public Health Rep 1985;100:329–33.

Figure 1.9 World Trade Center Survivors and Rescuers

Image Description
Source: Centers for Disease Control and Prevention. Rapid Assessment of Injuries Among Survivors of the Terrorist Attack on the World
Trade Center — New York City, September 2001. MMWR 2002;51:1–5.
Epidemic period. To show the time course of a disease outbreak or epidemic,
epidemiologists use a graph called an epidemic curve. As with the other graphs
presented so far, an epidemic curve’s y-axis shows the number of cases, while
the x-axis shows time as either date of symptom onset or date of diagnosis.
Depending on the incubation period (the length of time between exposure and
onset of symptoms) and routes of transmission, the scale on the x-axis can be as
broad as weeks (for a very prolonged epidemic) or as narrow as minutes (e.g.,
for food poisoning by chemicals that cause symptoms within minutes).
Conventionally, the data are displayed as a histogram (which is similar to a bar
chart but has no gaps between adjacent columns). Sometimes each case is
displayed as a square, as in Figure 1.10. The shape and other features of an
epidemic curve can suggest hypotheses about the time and source of exposure,
the mode of transmission, and the causative agent. Epidemic curves are
discussed in more detail in Lessons 4 and 6.
Figure 1.10 Cases of Salmonella Enteriditis — Chicago, February 13–21, by
Date and Time of Symptom Onset

Image Description
Source: Cortese M, Gerber S, Jones E, Fernandez J. A Salmonella Enteriditis outbreak in Chicago. Presented at the Eastern Regional
Epidemic Intelligence Service Conference, March 23, 2000, Boston, Massachusetts.

Place

Describing the occurrence of disease by place provides insight into the


geographic extent of the problem and its geographic variation. Characterization
by place refers not only to place of residence but to any geographic location
relevant to disease occurrence. Such locations include place of diagnosis or
report, birthplace, site of employment, school district, hospital unit, or recent
travel destinations. The unit may be as large as a continent or country or as small
as a street address, hospital wing, or operating room. Sometimes place refers
not to a specific location at all but to a place category such as urban or rural,
domestic or foreign, and institutional or noninstitutional.

Consider the data in Tables 1.3 and 1.4. Table 1.3 displays SARS data by source
of report, and reflects where a person with possible SARS is likely to be
quarantined and treated.(33) In contrast, Table 1.4 displays the same data by
where the possible SARS patients had traveled, and reflects where transmission
may have occurred.

Table 1.3 Reported Cases of SARS through November 3, 2004 — United


States, by Case Definition Category and State of Residence
Total Cases Total Suspect Total Probable Total Confirmed
Location Reported Cases Reported Cases Reported Cases Reported

Alaska 1 1 0 0

California 29 22 5 2

Colorado 2 2 0 0

Florida 8 6 2 0

Georgia 3 3 0 0
Total Cases Total Suspect Total Probable Total Confirmed
Location Reported Cases Reported Cases Reported Cases Reported

Hawaii 1 1 0 0

Illinois 8 7 1 0

Kansas 1 1 0 0

Kentucky 6 4 2 0

Maryland 2 2 0 0

Massachusett 8 8 0 0
s

Minnesota 1 1 0 0

Mississippi 1 0 1 0

Missouri 3 3 0 0

Nevada 3 3 0 0

New Jersey 2 1 0 1

New Mexico 1 0 0 1

New York 29 23 6 0

North 4 3 0 1
Carolina

Ohio 2 2 0 0

Pennsylvania 6 5 0 1

Rhode Island 1 1 0 0

South 3 3 0 0
Carolina

Tennessee 1 1 0 0

Texas 5 5 0 0
Total Cases Total Suspect Total Probable Total Confirmed
Location Reported Cases Reported Cases Reported Cases Reported

Utah 7 6 0 1

Vermont 1 1 0 0

Virginia 3 2 0 1

Washington 12 11 1 0

West Virginia 1 1 0 0

Wisconsin 2 1 1 0

Puerto Rico 1 1 0 0

Total 158 131 19 8

Adapted from: Centers for Disease Control and Prevention. Severe Acute Respiratory Syndrome (SARS) Report of Cases in the United
States; Available from:http://cdc.gov/od/oc/media/presskits/sars/cases.htm.

Table 1.4 Reported Cases of SARS through November 3, 2004 — United


States, by High-Risk Area Visited
Count
Area * Percent

Hong Kong City, China 45 28

Toronto, Canada 35 22

Guangdong Province, China 34 22

Beijing City, China 25 16

Shanghai City, China 23 15

Singapore 15 9

China, mainland 15 9

Taiwan 10 6

Anhui Province, China 4 3


Count
Area * Percent

Hanoi, Vietnam 4 3

Chongqing City, China 3 2

Guizhou Province, China 2 1

Macoa City, China 2 1

Tianjin City, China 2 1

Jilin Province, China 2 1

Xinjiang Province 1 1

Zhejiang Province, China 1 1

Guangxi Province, China 1 1

Shanxi Province, China 1 1

Liaoning Province, China 1 1

Hunan Province, China 1 1

Sichuan Province, China 1 1

Hubei Province, China 1 1

Jiangxi Province, China 1 1

Fujian Province, China 1 1

Jiangsu Province, China 1 1

Yunnan Province, China 0 0

Hebei Province, China 0 0

Qinghai Province, China 0 0

Tibet (Xizang) Province, 0 0


China
Count
Area * Percent

Hainan Province 0 0

Henan Province, China 0 0

Gansu Province, China 0 0

Shandong Province, China 0 0

* 158 reported case-patients visited 232 areas


Data Source: Heymann DL, Rodier G. Global Surveillance, National Surveillance, and SARS. Emerg Infect Dis. 2004;10:173–175.

Although place data can be shown in a table such as Table 1.3 or Table 1.4, a
map provides a more striking visual display of place data. On a map, different
numbers or rates of disease can be depicted using different shadings, colors, or
line patterns, as in Figure 1.11.

Figure 1.11 Mortality Rates for Asbestosis, by State — United States, 1968–
1981 and 1982–2000

Image Description
Source: Centers for Disease Control and Prevention. Changing patterns of pneumoconiosis mortality–United States, 1968–2000. MMWR
2004;53:627–32.
Another type of map for place data is a spot map, such as Figure 1.12. Spot
maps generally are used for clusters or outbreaks with a limited number of cases.
A dot or X is placed on the location that is most relevant to the disease of interest,
usually where each victim lived or worked, just as John Snow did in his spot map
of the Golden Square area of London (Figure 1.1). If known, sites that are
relevant, such as probable locations of exposure (water pumps in Figure 1.1), are
usually noted on the map.

Figure 1.12 Spot Map of Giardia Cases

Image Description
Analyzing data by place can identify communities at increased risk of disease.
Even if the data cannot reveal why these people have an increased risk, it can
help generate hypotheses to test with additional studies. For example, is a
community at increased risk because of characteristics of the people in the
community such as genetic susceptibility, lack of immunity, risky behaviors, or
exposure to local toxins or contaminated food? Can the increased risk,
particularly of a communicable disease, be attributed to characteristics of the
causative agent such as a particularly virulent strain, hospitable breeding sites, or
availability of the vector that transmits the organism to humans? Or can the
increased risk be attributed to the environment that brings the agent and the host
together, such as crowding in urban areas that increases the risk of disease
transmission from person to person, or more homes being built in wooded areas
close to deer that carry ticks infected with the organism that causes Lyme
disease? (More techniques for graphic presentation are discussed in Lesson 4.)

Person

“Person” attributes include age, sex, ethnicity/race, and socioeconomic status.


Because personal characteristics may affect illness, organization and analysis of
data by “person” may use inherent characteristics of people (for example, age,
sex, race), biologic characteristics (immune status), acquired characteristics
(marital status), activities (occupation, leisure activities, use of
medications/tobacco/drugs), or the conditions under which they live
(socioeconomic status, access to medical care). Age and sex are included in
almost all data sets and are the two most commonly analyzed “person”
characteristics. However, depending on the disease and the data available,
analyses of other person variables are usually necessary. Usually
epidemiologists begin the analysis of person data by looking at each variable
separately. Sometimes, two variables such as age and sex can be examined
simultaneously. Person data are usually displayed in tables or graphs.

Age. Age is probably the single most important “person” attribute, because
almost every health-related event varies with age. A number of factors that also
vary with age include: susceptibility, opportunity for exposure, latency or
incubation period of the disease, and physiologic response (which affects, among
other things, disease development).

When analyzing data by age, epidemiologists try to use age groups that are
narrow enough to detect any age-related patterns that may be present in the data.
For some diseases, particularly chronic diseases, 10-year age groups may be
adequate. For other diseases, 10-year and even 5-year age groups conceal
important variations in disease occurrence by age. Consider the graph of
pertussis occurrence by standard 5-year age groups shown in Figure 1.13a. The
highest rate is clearly among children 4 years old and younger. But is the rate
equally high in all children within that age group, or do some children have higher
rates than others?

Figure 1.13a Pertussis by 5-Year Age Groups

Image Description

Figure 1.13b Pertussis by <1, 4-Year, Then 5-Year Age Groups

Image Description
To answer this question, different age groups are needed. Examine Figure 1.13b,
which shows the same data but displays the rate of pertussis for children under 1
year of age separately. Clearly, infants account for most of the high rate among
0–4 year olds. Public health efforts should thus be focused on children less than
1 year of age, rather than on the entire 5-year age group.

Sex. Males have higher rates of illness and death than do females for many
diseases. For some diseases, this sex-related difference is because of genetic,
hormonal, anatomic, or other inherent differences between the sexes. These
inherent differences affect susceptibility or physiologic responses. For example,
premenopausal women have a lower risk of heart disease than men of the same
age. This difference has been attributed to higher estrogen levels in women. On
the other hand, the sex-related differences in the occurrence of many diseases
reflect differences in opportunity or levels of exposure. For example, Figure 1.14
shows the differences in lung cancer rates over time among men and women.(34)
The difference noted in earlier years has been attributed to the higher prevalence
of smoking among men in the past. Unfortunately, prevalence of smoking among
women now equals that among men, and lung cancer rates in women have been
climbing as a result.(35)

Figure 1.14 Lung Cancer Rates — United States, 1930–1999


Image Description
Data Source: American Cancer Society [Internet]. Atlanta: The American Cancer Society, Inc. Available
from: http://cancer.org/docroot/PRO/content/PRO_1_1_ Cancer_ Statistics_2005_Presentation.aspexternal icon.
Ethnic and racial groups. Sometimes epidemiologists are interested in
analyzing person data by biologic, cultural or social groupings such as race,
nationality, religion, or social groups such as tribes and other geographically or
socially isolated groups. Differences in racial, ethnic, or other group variables
may reflect differences in susceptibility or exposure, or differences in other
factors that influence the risk of disease, such as socioeconomic status and
access to health care. In Figure 1.15, infant mortality rates for 2002 are shown by
race and Hispanic origin of the mother.

Figure 1.15 Infant Mortality Rates for 2002, by Race and Ethnicity of Mother

Image Description
Source: Centers for Disease Control and Prevention. QuickStats: Infant mortality rates*, by selected racial/ethnic populations — United
States, 2002, MMWR 2005;54(05):126.
Socioeconomic status. Socioeconomic status is difficult to quantify. It is made
up of many variables such as occupation, family income, educational
achievement or census track, living conditions, and social standing. The
variables that are easiest to measure may not accurately reflect the overall
concept. Nevertheless, epidemiologists commonly use occupation, family income,
and educational achievement, while recognizing that these variables do not
measure socioeconomic status precisely.

The frequency of many adverse health conditions increases with decreasing


socioeconomic status. For example, tuberculosis is more common among
persons in lower socioeconomic strata. Infant mortality and time lost from work
due to disability are both associated with lower income. These patterns may
reflect more harmful exposures, lower resistance, and less access to health care.
Or they may in part reflect an interdependent relationship that is impossible to
untangle: Does low socioeconomic status contribute to disability, or does
disability contribute to lower socioeconomic status, or both? What accounts for
the disproportionate prevalence of diabetes and asthma in lower socioeconomic
areas? (36, 37)
A few adverse health conditions occur more frequently among persons of higher
socioeconomic status. Gout was known as the “disease of kings” because of its
association with consumption of rich foods. Other conditions associated with
higher socioeconomic status include breast cancer, Kawasaki syndrome, chronic
fatigue syndrome, and tennis elbow. Differences in exposure account for at least
some if not most of the differences in the frequency of these conditions.

Exercise 1.6

Using the data in Tables 1.5 and 1.6, describe the death rate patterns for the
“Unusual Event.” For example, how do death rates vary between men and
women overall, among the different socioeconomic classes, among men and
women in different socioeconomic classes, and among adults and children in
different socioeconomic classes? Can you guess what type of situation might
result in such death rate patterns?

Table 1.5 Deaths and Death Rates for an Unusual Event, by Sex and
Socioeconomic Status
Socioeconomic Status

Sex Measure High Middle Low

Males Persons at risk 179 173 499

Deaths 120 148 441

Death rate (%) 67.0 85.5 88.4

Females Persons at risk 143 107 212

Deaths 9 13 132

Death rate (%) 6.3 12.6 62.3

Both sexes Persons at risk 322 280 711

Deaths 129 161 573

Death rate (%) 40.1 57.5 80.6


Table 1.6 Deaths and Death Rates for an Unusual Event, by Age and
Socioeconomic Status
Socioeconomic Status

Age Group Measure High/Middle Low

Adults Persons at risk 566 664

Deaths 287 545

Death rate (%) 50.7 82.1

Children Persons at risk 36 47

Deaths 3 28

Death rate (%) 8.3 59.6

All Ages Persons at risk 602 711

Deaths 290 573

Death rate (%) 48.2 80.6

Check your answer.

References (This Section)


32. Goodman RA, Smith JD, Sikes RK, Rogers DL, Mickey JL. Fatalities
associated with farm tractor injuries: an epidemiologic study. Public Health
Rep 1985;100:329–33.
33. Heyman DL, Rodier G. Global surveillance, national surveillance, and
SARS. Emerg Infect Dis. 2003;10:173–5.
34. American Cancer Society [Internet]. Atlanta: The American Cancer Society,
Inc. Available
from: http://www.cancer.org/Research/CancerFactsFigures/cancer-facts-
figures-2005/external icon.
35. Centers for Disease Control and Prevention. Current trends. Lung cancer
and breast cancer trends among women–Texas. MMWR
1984;33(MM19):266.
36. Liao Y, Tucker P, Okoro CA, Giles WH, Mokdad AH, Harris VB, et. al.
REACH 2010 surveillance for health status in minority communities —
United States, 2001–2002. MMWR 2004;53:1–36.

Next Page: Analytic Epidemiology


Previous Page

Lesson 1 Overview

Image Description

Figure 1.4
Description: A line graph shows a dramatic peak indicating an outbreak caused
by contaminated pasteurized milk in Illinois. Return to text.

Figure 1.5
Description: A histogram shows the number of reported cases of
intussusception by month. Return to text.

Figure 1.6
Description: Three line graphs show a comparison of the number of reported
cases of rubella, influenza, and rotavirus by month and year comparing
frequency, duration, and severity of each. Return to text.

Figure 1.7
Description: Histogram shows comparison of the number of tractor deaths by
day of the week. Differences by day are easily seen. Return to text.

Figure 1.8
Description: Histogram shows comparison of the number of tractor deaths by
hour. Differences by hour are easily seen. Return to text.

Figure 1.9
Description: A histogram with different colored bars indicating the number of
World Trade Center nonrescuer survivors and rescuers treated in hospitals. A
dramatic increase and decrease in the number survivors compared to rescuers
within a few hours after the attack can be seen. Return to text.
Figure 1.10
Description: Histogram shows each case represented by a square stacked into
columns. The number of cases by date and time after a party is seen. Return to
text.

Figure 1.11
Description: Two rate distribution maps show an increase in age-adjusted
mortality rate for asbestosis in almost all states over time. Return to text.

Figure 1.12
Description: A map shows the geographic location of primary cases. Return to
text.

Figure 1.13a
Description: Bar chart shows pertussis cases in age groups of 4 year intervals.
The majority of cases occur in children aged 0-4 years. Return to text.

Figure 1.13b
Description: Bar chart shows the same data as Figure 1.13a displayed with
different age groups. The majority of pertussis cases occur in children younger
than 1 year of age. Return to text.

Figure 1.14
Description: Line graph with 2 lines shows more lung cancer deaths in men than
in women. Lung cancer deaths in men is higher than for women but has been
decreasing slightly since the early 1990s. Return to text.

Figure 1.15
Description: Bar chart shows infant mortality rates by race/ethnicity as separate
bars. Differences in race and ethnicity are easily seen. Return to text.

Section 7: Analytic Epidemiology


As noted earlier, descriptive epidemiology can identify patterns among cases and
in populations by time, place and person. From these observations,
epidemiologists develop hypotheses about the causes of these patterns and
about the factors that increase risk of disease. In other words, epidemiologists
can use descriptive epidemiology to generate hypotheses, but only rarely to test
those hypotheses. For that, epidemiologists must turn to analytic epidemiology.

Key feature of analytic


epidemiology =
Comparison group
The key feature of analytic epidemiology is a comparison group. Consider a large
outbreak of hepatitis A that occurred in Pennsylvania in 2003.(38) Investigators
found almost all of the case-patients had eaten at a particular restaurant during
the 2–6 weeks (i.e., the typical incubation period for hepatitis A) before onset of
illness. While the investigators were able to narrow down their hypotheses to the
restaurant and were able to exclude the food preparers and servers as the
source, they did not know which particular food may have been contaminated.
The investigators asked the case-patients which restaurant foods they had eaten,
but that only indicated which foods were popular. The investigators, therefore,
also enrolled and interviewed a comparison or control group — a group of
persons who had eaten at the restaurant during the same period but who did not
get sick. Of 133 items on the restaurant’s menu, the most striking difference
between the case and control groups was in the proportion that ate salsa (94% of
case-patients ate, compared with 39% of controls). Further investigation of the
ingredients in the salsa implicated green onions as the source of infection.
Shortly thereafter, the Food and Drug Administration issued an advisory to the
public about green onions and risk of hepatitis A. This action was in direct
response to the convincing results of the analytic epidemiology, which compared
the exposure history of case-patients with that of an appropriate comparison
group.

When investigators find that persons with a particular characteristic are more
likely than those without the characteristic to contract a disease, the
characteristic is said to be associated with the disease. The characteristic may
be a:
 Demographic factor such as age, race, or sex;
 Constitutional factor such as blood group or immune status;
 Behavior or act such as smoking or having eaten salsa; or
 Circumstance such as living near a toxic waste site.
Identifying factors associated with disease help health officials appropriately
target public health prevention and control activities. It also guides additional
research into the causes of disease.

Thus, analytic epidemiology is concerned with the search for causes and effects,
or the why and the how. Epidemiologists use analytic epidemiology to quantify
the association between exposures and outcomes and to test hypotheses about
causal relationships. It has been said that epidemiology by itself can never prove
that a particular exposure caused a particular outcome. Often, however,
epidemiology provides sufficient evidence to take appropriate control and
prevention measures.

Epidemiologic studies fall into two categories: experimental and observational.

Experimental studies

In an experimental study, the investigator determines through a controlled


process the exposure for each individual (clinical trial) or community (community
trial), and then tracks the individuals or communities over time to detect the
effects of the exposure. For example, in a clinical trial of a new vaccine, the
investigator may randomly assign some of the participants to receive the new
vaccine, while others receive a placebo shot. The investigator then tracks all
participants, observes who gets the disease that the new vaccine is intended to
prevent, and compares the two groups (new vaccine vs. placebo) to see whether
the vaccine group has a lower rate of disease. Similarly, in a trial to prevent onset
of diabetes among high-risk individuals, investigators randomly assigned
enrollees to one of three groups — placebo, an anti-diabetes drug, or lifestyle
intervention. At the end of the follow-up period, investigators found the lowest
incidence of diabetes in the lifestyle intervention group, the next lowest in the
anti-diabetic drug group, and the highest in the placebo group.(39)

Observational studies

In an observational study, the epidemiologist simply observes the exposure and


disease status of each study participant. John Snow’s studies of cholera in
London were observational studies. The two most common types of
observational studies are cohort studies and case-control studies; a third type is
cross-sectional studies.
Cohort study. A cohort study is similar in concept to the experimental study. In a
cohort study the epidemiologist records whether each study participant is
exposed or not, and then tracks the participants to see if they develop the
disease of interest. Note that this differs from an experimental study because, in
a cohort study, the investigator observes rather than determines the participants’
exposure status. After a period of time, the investigator compares the disease
rate in the exposed group with the disease rate in the unexposed group. The
unexposed group serves as the comparison group, providing an estimate of the
baseline or expected amount of disease occurrence in the community. If the
disease rate is substantively different in the exposed group compared to the
unexposed group, the exposure is said to be associated with illness.

The length of follow-up varies considerably. In an attempt to respond quickly to a


public health concern such as an outbreak, public health departments tend to
conduct relatively brief studies. On the other hand, research and academic
organizations are more likely to conduct studies of cancer, cardiovascular
disease, and other chronic diseases which may last for years and even decades.
The Framingham study is a well-known cohort study that has followed over 5,000
residents of Framingham, Massachusetts, since the early 1950s to establish the
rates and risk factors for heart disease.(7) The Nurses Health Study and the
Nurses Health Study II are cohort studies established in 1976 and 1989,
respectively, that have followed over 100,000 nurses each and have provided
useful information on oral contraceptives, diet, and lifestyle risk factors.(40)
These studies are sometimes called follow-up or prospectivecohort studies,
because participants are enrolled as the study begins and are then followed
prospectively over time to identify occurrence of the outcomes of interest.

An alternative type of cohort study is a retrospective cohort study. In this type of


study both the exposure and the outcomes have already occurred. Just as in a
prospective cohort study, the investigator calculates and compares rates of
disease in the exposed and unexposed groups. Retrospective cohort studies are
commonly used in investigations of disease in groups of easily identified people
such as workers at a particular factory or attendees at a wedding. For example, a
retrospective cohort study was used to determine the source of infection of
cyclosporiasis, a parasitic disease that caused an outbreak among members of a
residential facility in Pennsylvania in 2004.(41) The investigation indicated that
consumption of snow peas was implicated as the vehicle of the cyclosporiasis
outbreak.

Case-control study. In a case-control study, investigators start by enrolling a


group of people with disease (at CDC such persons are called case-patients
rather than cases, because case refers to occurrence of disease, not a person).
As a comparison group, the investigator then enrolls a group of people without
disease (controls). Investigators then compare previous exposures between the
two groups. The control group provides an estimate of the baseline or expected
amount of exposure in that population. If the amount of exposure among the
case group is substantially higher than the amount you would expect based on
the control group, then illness is said to be associated with that exposure. The
study of hepatitis A traced to green onions, described above, is an example of a
case-control study. The key in a case-control study is to identify an appropriate
control group, comparable to the case group in most respects, in order to provide
a reasonable estimate of the baseline or expected exposure.

Cross-sectional study. In this third type of observational study, a sample of


persons from a population is enrolled and their exposures and health outcomes
are measured simultaneously. The cross-sectional study tends to assess the
presence (prevalence) of the health outcome at that point of time without regard
to duration. For example, in a cross-sectional study of diabetes, some of the
enrollees with diabetes may have lived with their diabetes for many years, while
others may have been recently diagnosed.

From an analytic viewpoint the cross-sectional study is weaker than either a


cohort or a case-control study because a cross-sectional study usually cannot
disentangle risk factors for occurrence of disease (incidence) from risk factors for
survival with the disease. (Incidence and prevalence are discussed in more detail
in Lesson 3.) On the other hand, a cross-sectional study is a perfectly fine tool for
descriptive epidemiology purposes. Cross-sectional studies are used routinely to
document the prevalence in a community of health behaviors (prevalence of
smoking), health states (prevalence of vaccination against measles), and health
outcomes, particularly chronic conditions (hypertension, diabetes).

In summary, the purpose of an analytic study in epidemiology is to identify and


quantify the relationship between an exposure and a health outcome. The
hallmark of such a study is the presence of at least two groups, one of which
serves as a comparison group. In an experimental study, the investigator
determines the exposure for the study subjects; in an observational study, the
subjects are exposed under more natural conditions. In an observational cohort
study, subjects are enrolled or grouped on the basis of their exposure, then are
followed to document occurrence of disease. Differences in disease rates
between the exposed and unexposed groups lead investigators to conclude that
exposure is associated with disease. In an observational case-control study,
subjects are enrolled according to whether they have the disease or not, then are
questioned or tested to determine their prior exposure. Differences in exposure
prevalence between the case and control groups allow investigators to conclude
that the exposure is associated with the disease. Cross-sectional studies
measure exposure and disease status at the same time, and are better suited to
descriptive epidemiology than causation.

Exercise 1.7

Classify each of the following studies as:

1. Experimental
2. Observational cohort
3. Observational case-control
4. Observational cross-sectional
5. Not an analytical or epidemiologic study

1. ____ 1. Representative sample of residents were telephoned and


asked how much they exercise each week and whether they
currently have (have ever been diagnosed with) heart disease.
2. ____ 2.

Occurrence of cancer was identified between April 1991 and


July 2002 for 50,000 troops who served in the first Gulf War
(ended April 1991) and 50,000 troops who served elsewhere
during the same period.

3. ____ 3. Persons diagnosed with new-onset Lyme disease were


asked how often they walk through woods, use insect repellant,
wear short sleeves and pants, etc. Twice as many patients without
Lyme disease from the same physician’s practice were asked the
same questions, and the responses in the two groups were
compared.
4. ____ 4. Subjects were children enrolled in a health maintenance
organization. At 2 months, each child was randomly given one of
two types of a new vaccine against rotavirus infection. Parents were
called by a nurse two weeks later and asked whether the children
had experienced any of a list of side-effects.

Section 8: Concepts of Disease Occurrence


A critical premise of epidemiology is that disease and other health events do not occur randomly
in a population, but are more likely to occur in some members of the population than others
because of risk factors that may not be distributed randomly in the population. As noted earlier,
one important use of epidemiology is to identify the factors that place some members at greater
risk than others.

Causation

A number of models of disease causation have been proposed. Among the simplest of these is
the epidemiologic triad or triangle, the traditional model for infectious disease. The triad consists
of an external agent, a susceptible host, and an environment that brings the host and agent
together. In this model, disease results from the interaction between the agent and the susceptible
host in an environment that supports transmission of the agent from a source to that host. Two
ways of depicting this model are shown in Figure 1.16.

Agent, host, and environmental factors interrelate in a variety of complex ways to produce
disease. Different diseases require different balances and interactions of these three components.
Development of appropriate, practical, and effective public health measures to control or prevent
disease usually requires assessment of all three components and their interactions.

Figure 1.16 Epidemiologic Triad

Image Description

Agent originally referred to an infectious microorganism or pathogen: a virus, bacterium,


parasite, or other microbe. Generally, the agent must be present for disease to occur; however,
presence of that agent alone is not always sufficient to cause disease. A variety of factors
influence whether exposure to an organism will result in disease, including the organism’s
pathogenicity (ability to cause disease) and dose.
Over time, the concept of agent has been broadened to include chemical and physical causes of
disease or injury. These include chemical contaminants (such as the L-tryptophan contaminant
responsible for eosinophilia-myalgia syndrome), as well as physical forces (such as repetitive
mechanical forces associated with carpal tunnel syndrome). While the epidemiologic triad serves
as a useful model for many diseases, it has proven inadequate for cardiovascular disease, cancer,
and other diseases that appear to have multiple contributing causes without a single necessary
one.

Host refers to the human who can get the disease. A variety of factors intrinsic to the host,
sometimes called risk factors, can influence an individual’s exposure, susceptibility, or response
to a causative agent. Opportunities for exposure are often influenced by behaviors such as sexual
practices, hygiene, and other personal choices as well as by age and sex. Susceptibility and
response to an agent are influenced by factors such as genetic composition, nutritional and
immunologic status, anatomic structure, presence of disease or medications, and psychological
makeup.

Environment refers to extrinsic factors that affect the agent and the opportunity for exposure.
Environmental factors include physical factors such as geology and climate, biologic factors
such as insects that transmit the agent, and socioeconomic factors such as crowding, sanitation,
and the availability of health services.

Component causes and causal pies

Because the agent-host-environment model did not work well for many non-infectious diseases,
several other models that attempt to account for the multifactorial nature of causation have been
proposed. One such model was proposed by Rothman in 1976, and has come to be known as the
Causal Pies.(42) This model is illustrated in Figure 1.17. An individual factor that contributes to
cause disease is shown as a piece of a pie. After all the pieces of a pie fall into place, the pie is
complete — and disease occurs. The individual factors are called component causes. The
complete pie, which might be considered a causal pathway, is called a sufficient cause. A
disease may have more than one sufficient cause, with each sufficient cause being composed of
several component causes that may or may not overlap. A component that appears in every pie or
pathway is called a necessary cause, because without it, disease does not occur. Note in Figure
1.17 that component cause A is a necessary cause because it appears in every pie.

Figure 1.17 Rothman’s Causal Pies

Image Description
Source: Rothman KJ. Causes. Am J Epidemiol 1976;104:587–592.
The component causes may include intrinsic host factors as well as the agent and the
environmental factors of the agent-host-environment triad. A single component cause is rarely a
sufficient cause by itself. For example, even exposure to a highly infectious agent such as
measles virus does not invariably result in measles disease. Host susceptibility and other host
factors also may play a role.
At the other extreme, an agent that is usually harmless in healthy persons may cause devastating
disease under different conditions. Pneumocystis carinii is an organism that harmlessly colonizes
the respiratory tract of some healthy persons, but can cause potentially lethal pneumonia in
persons whose immune systems have been weakened by human immunodeficiency virus (HIV).
Presence of Pneumocystis carinii organisms is therefore a necessary but not sufficient cause of
pneumocystis pneumonia. In Figure 1.17, it would be represented by component cause A.

As the model indicates, a particular disease may result from a variety of different sufficient
causes or pathways. For example, lung cancer may result from a sufficient cause that includes
smoking as a component cause. Smoking is not a sufficient cause by itself, however, because not
all smokers develop lung cancer. Neither is smoking a necessary cause, because a small fraction
of lung cancer victims have never smoked. Suppose Component Cause B is smoking and
Component Cause C is asbestos. Sufficient Cause I includes both smoking (B) and asbestos (C).
Sufficient Cause II includes smoking without asbestos, and Sufficient Cause III includes asbestos
without smoking. But because lung cancer can develop in persons who have never been exposed
to either smoking or asbestos, a proper model for lung cancer would have to show at least one
more Sufficient Cause Pie that does not include either component B or component C.

Note that public health action does not depend on the identification of every component cause.
Disease prevention can be accomplished by blocking any single component of a sufficient cause,
at least through that pathway. For example, elimination of smoking (component B) would
prevent lung cancer from sufficient causes I and II, although some lung cancer would still occur
through sufficient cause III.

Exercise 1.8

Read the Anthrax Fact Sheet on the following 2 pages, then answer the questions below.

1. Describe its causation in terms of agent, host, and environment.


a. Agent:
b. Host:
c. Environment:
2. For each of the following risk factors and health outcomes, identify whether they are necessary
causes, sufficient causes, or component causes.

Risk Factor/Health Outcome

1. _____ Hypertension / Stroke


2. _____ Treponema pallidum / Syphilis
3. _____ Type A personality / Heart disease
4. _____ Skin contact with a strong acid /Burn

Anthrax Fact Sheet

What is anthrax?
Anthrax is an acute infectious disease that usually occurs in animals such as
livestock, but can also affect humans. Human anthrax comes in three forms,
depending on the route of infection: cutaneous (skin) anthrax, inhalation anthrax,
and intestinal anthrax. Symptoms usually occur within 7 days after exposure.

Cutaneous: Most (about 95%) anthrax infections occur when the bacterium
enters a cut or abrasion on the skin after handling infected livestock or
contaminated animal products. Skin infection begins as a raised itchy
bump that resembles an insect bite but within 1–2 days develops into a
vesicle and then a painless ulcer, usually 1–3 cm in diameter, with a
characteristic black necrotic (dying) area in the center. Lymph glands in the
adjacent area may swell. About 20% of untreated cases of cutaneous
anthrax will result in death. Deaths are rare with appropriate antimicrobial
therapy.
Inhalation: Initial symptoms are like cold or flu symptoms and can include a sore
throat, mild fever, and muscle aches. After several days, the symptoms
may progress to cough, chest discomfort, severe breathing problems and
shock. Inhalation anthrax is often fatal. Eleven of the mail-related cases
were inhalation; 5 (45%) of the 11 patients died.
Intestinal: Initial signs of nausea, loss of appetite, vomiting, and fever are
followed by abdominal pain, vomiting of blood, and severe diarrhea.
Intestinal anthrax results in death in 25% to 60% of cases.

While most human cases of anthrax result from contact with infected animals or
contaminated animal products, anthrax also can be used as a biologic weapon.
In 1979, dozens of residents of Sverdlovsk in the former Soviet Union are thought
to have died of inhalation anthrax after an unintentional release of an aerosol
from a biologic weapons facility. In 2001, 22 cases of anthrax occurred in the
United States from letters containing anthrax spores that were mailed to
members of Congress, television networks, and newspaper companies.

What causes anthrax?


Anthrax is caused by the bacterium Bacillus anthracis. The anthrax bacterium
forms a protective shell called a spore. B. anthracis spores are found naturally in
soil, and can survive for many years.

How is anthrax diagnosed?


Anthrax is diagnosed by isolating B. anthracis from the blood, skin lesions, or
respiratory secretions or by measuring specific antibodies in the blood of persons
with suspected cases.

Is there a treatment for anthrax?


Antibiotics are used to treat all three types of anthrax. Treatment should be
initiated early because the disease is more likely to be fatal if treatment is
delayed or not given at all.

How common is anthrax and where is it found?


Anthrax is most common in agricultural regions of South and Central America,
Southern and Eastern Europe, Asia, Africa, the Caribbean, and the Middle East,
where it occurs in animals. When anthrax affects humans, it is usually the result
of an occupational exposure to infected animals or their products. Naturally
occurring anthrax is rare in the United States (28 reported cases between 1971
and 2000), but 22 mail-related cases were identified in 2001.

Infections occur most commonly in wild and domestic lower vertebrates (cattle,
sheep, goats, camels, antelopes, and other herbivores), but it can also occur in
humans when they are exposed to infected animals or tissue from infected
animals.

How is anthrax transmitted?


Anthrax can infect a person in three ways: by anthrax spores entering through a
break in the skin, by inhaling anthrax spores, or by eating contaminate,
undercooked meat. Anthrax is not spread from person to person. The skin
(“cutaneous”) form of anthrax is usually the result of contact with infected
livestock, wild animals, or contaminated animal products such as carcasses,
hides, hair, wool, meat, or bone meal. The inhalation form is from breathing in
spores from the same sources. Anthrax can also be spread as a bioterrorist
agent.

Who has an increased risk of being exposed to anthrax?


Susceptibility to anthrax is universal. Most naturally occurring anthrax affects
people whose work brings them into contact with livestock or products from
livestock. Such occupations include veterinarians, animal handlers, abattoir
workers, and laboratorians. Inhalation anthrax was once called Woolsorter’s
Disease because workers who inhaled spores from contaminated wool before it
was cleaned developed the disease. Soldiers and other potential targets of
bioterrorist anthrax attacks might also be considered at increased risk.

Is there a way to prevent infection?


In countries where anthrax is common and vaccination levels of animal herds are
low, humans should avoid contact with livestock and animal products and avoid
eating meat that has not been properly slaughtered and cooked. Also, an anthrax
vaccine has been licensed for use in humans. It is reported to be 93% effective in
protecting against anthrax. It is used by veterinarians, laboratorians, soldiers, and
others who may be at increased risk of exposure, but is not available to the
general public at this time.

For a person who has been exposed to anthrax but is not yet sick, antibiotics
combined with anthrax vaccine are used to prevent illness.

Section 9: Natural History and Spectrum of Disease


Natural history of disease refers to the progression of a disease process in an
individual over time, in the absence of treatment. For example, untreated
infection with HIV causes a spectrum of clinical problems beginning at the time of
seroconversion (primary HIV) and terminating with AIDS and usually death. It is
now recognized that it may take 10 years or more for AIDS to develop after
seroconversion.(43) Many, if not most, diseases have a characteristic natural
history, although the time frame and specific manifestations of disease may vary
from individual to individual and are influenced by preventive and therapeutic
measures.

Figure 1.18 Natural History of Disease Timeline

Image Description
Source: Centers for Disease Control and Prevention. Principles of epidemiology, 2nd ed. Atlanta: U.S. Department of Health and Human
Services;1992.
The process begins with the appropriate exposure to or accumulation of factors
sufficient for the disease process to begin in a susceptible host. For an infectious
disease, the exposure is a microorganism. For cancer, the exposure may be a
factor that initiates the process, such as asbestos fibers or components in
tobacco smoke (for lung cancer), or one that promotes the process, such as
estrogen (for endometrial cancer).

After the disease process has been triggered, pathological changes then occur
without the individual being aware of them. This stage of subclinical disease,
extending from the time of exposure to onset of disease symptoms, is usually
called the incubation period for infectious diseases, and the latency period for
chronic diseases. During this stage, disease is said to be asymptomatic (no
symptoms) or inapparent. This period may be as brief as seconds for
hypersensitivity and toxic reactions to as long as decades for certain chronic
diseases. Even for a single disease, the characteristic incubation period has a
range. For example, the typical incubation period for hepatitis A is as long as 7
weeks. The latency period for leukemia to become evident among survivors of
the atomic bomb blast in Hiroshima ranged from 2 to 12 years, peaking at 6–7
years.(44) Incubation periods of selected exposures and diseases varying from
minutes to decades are displayed in Table 1.7.

Table 1.7 Incubation Periods of Selected Exposures and Diseases


Table 1.7 Incubation Periods of Selected Exposures and Diseases

Incubation/
Exposure Clinical Effect Perio

Saxitoxin and similar Paralytic shellfish poisoning (tingling, numbness few minutes–
toxins from around lips and fingertips, giddiness, incoherent
shellfish speech, respiratory paralysis, sometimes death)

Organophosphorus Nausea, vomiting, cramps, headache, nervousness, few minutes–


ingestion blurred vision, chest pain, confusion, twitching,
convulsions

Salmonella Diarrhea, often with fever and cramps usually 6–4

SARS-associated corona Severe Acute Respiratory Syndrome (SARS) 3–10 days, u


virus days
Incubation/
Exposure Clinical Effect Perio

Varicella-zoster virus Chickenpox 10–21 days, u


16 da

Treponema pallidum Syphilis 10–90 days,


week

Hepatitis A virus Hepatitis 14–50 days, a


week

Hepatitis B virus Hepatitis 50–180 days,


3 mon

Human AIDS <1 to 15+


immunodeficiency
virus

Atomic bomb radiation Leukemia 2–12 ye


(Japan)

Radiation (Japan, Thyroid cancer 3–20+ y


Chernobyl)

Radium (watch dial Bone cancer 8–40 ye


painters)

Although disease is not apparent during the incubation period, some pathologic
changes may be detectable with laboratory, radiographic, or other screening
methods. Most screening programs attempt to identify the disease process
during this phase of its natural history, since intervention at this early stage is
likely to be more effective than treatment given after the disease has progressed
and become symptomatic.

The onset of symptoms marks the transition from subclinical to clinical disease.
Most diagnoses are made during the stage of clinical disease. In some people,
however, the disease process may never progress to clinically apparent illness.
In others, the disease process may result in illness that ranges from mild to
severe or fatal. This range is called the spectrum of disease. Ultimately, the
disease process ends either in recovery, disability or death.

For an infectious agent, infectivity refers to the proportion of exposed persons


who become infected. Pathogenicity refers to the proportion of infected
individuals who develop clinically apparent disease. Virulence refers to the
proportion of clinically apparent cases that are severe or fatal.

Because the spectrum of disease can include asymptomatic and mild cases, the
cases of illness diagnosed by clinicians in the community often represent only the
tip of the iceberg. Many additional cases may be too early to diagnose or may
never progress to the clinical stage. Unfortunately, persons with inapparent or
undiagnosed infections may nonetheless be able to transmit infection to others.
Such persons who are infectious but have subclinical disease are called carriers.
Frequently, carriers are persons with incubating disease or inapparent infection.
Persons with measles, hepatitis A, and several other diseases become infectious
a few days before the onset of symptoms. However carriers may also be persons
who appear to have recovered from their clinical illness but remain infectious,
such as chronic carriers of hepatitis B virus, or persons who never exhibited
symptoms. The challenge to public health workers is that these carriers, unaware
that they are infected and infectious to others, are sometimes more likely to
unwittingly spread infection than are people with obvious illness.

Section 10: Chain of Infection


As described above, the traditional epidemiologic triad model holds that
infectious diseases result from the interaction of agent, host, and environment.
More specifically, transmission occurs when the agent leaves its reservoir or
host through a portal of exit, is conveyed by some mode of transmission, and
enters through an appropriate portal of entry to infect a susceptible host. This
sequence is sometimes called the chain of infection.

Figure 1.19 Chain of Infection

Image Description
Source: Centers for Disease Control and Prevention. Principles of epidemiology, 2nd ed. Atlanta: U.S. Department of Health and Human
Services;1992.

Reservoir

The reservoir of an infectious agent is the habitat in which the agent normally
lives, grows, and multiplies. Reservoirs include humans, animals, and the
environment. The reservoir may or may not be the source from which an agent is
transferred to a host. For example, the reservoir of Clostridium botulinum is soil,
but the source of most botulism infections is improperly canned food containing C.
botulinum spores.

Human reservoirs. Many common infectious diseases have human reservoirs.


Diseases that are transmitted from person to person without intermediaries
include the sexually transmitted diseases, measles, mumps, streptococcal
infection, and many respiratory pathogens. Because humans were the only
reservoir for the smallpox virus, naturally occurring smallpox was eradicated after
the last human case was identified and isolated.8

Human reservoirs may or may not show the effects of illness. As noted earlier, a
carrier is a person with inapparent infection who is capable of transmitting the
pathogen to others. Asymptomatic or passive or healthy carriers are those who
never experience symptoms despite being infected. Incubatory carriers are those
who can transmit the agent during the incubation period before clinical illness
begins. Convalescent carriers are those who have recovered from their illness
but remain capable of transmitting to others. Chronic carriers are those who
continue to harbor a pathogen such as hepatitis B virus or Salmonella Typhi, the
causative agent of typhoid fever, for months or even years after their initial
infection. One notorious carrier is Mary Mallon, or Typhoid Mary, who was an
asymptomatic chronic carrier of Salmonella Typhi. As a cook in New York City
and New Jersey in the early 1900s, she unintentionally infected dozens of people
until she was placed in isolation on an island in the East River, where she died 23
years later.(45)

Carriers commonly transmit disease because they do not realize they are
infected, and consequently take no special precautions to prevent transmission.
Symptomatic persons who are aware of their illness, on the other hand, may be
less likely to transmit infection because they are either too sick to be out and
about, take precautions to reduce transmission, or receive treatment that limits
the disease.

Animal reservoirs. Humans are also subject to diseases that have animal
reservoirs. Many of these diseases are transmitted from animal to animal, with
humans as incidental hosts. The term zoonosis refers to an infectious disease
that is transmissible under natural conditions from vertebrate animals to humans.
Long recognized zoonotic diseases include brucellosis (cows and pigs), anthrax
(sheep), plague (rodents), trichinellosis/trichinosis (swine), tularemia (rabbits),
and rabies (bats, raccoons, dogs, and other mammals). Zoonoses newly
emergent in North America include West Nile encephalitis (birds), and
monkeypox (prairie dogs). Many newly recognized infectious diseases in humans,
including HIV/AIDS, Ebola infection and SARS, are thought to have emerged
from animal hosts, although those hosts have not yet been identified.

Environmental reservoirs. Plants, soil, and water in the environment are also
reservoirs for some infectious agents. Many fungal agents, such as those that
cause histoplasmosis, live and multiply in the soil. Outbreaks of Legionnaires
disease are often traced to water supplies in cooling towers and evaporative
condensers, reservoirs for the causative organism Legionella pneumophila.

Portal of exit

Portal of exit is the path by which a pathogen leaves its host. The portal of exit
usually corresponds to the site where the pathogen is localized. For example,
influenza viruses and Mycobacterium tuberculosis exit the respiratory tract,
schistosomes through urine, cholera vibrios in feces, Sarcoptes scabiei in
scabies skin lesions, and enterovirus 70, a cause of hemorrhagic conjunctivitis, in
conjunctival secretions. Some bloodborne agents can exit by crossing the
placenta from mother to fetus (rubella, syphilis, toxoplasmosis), while others exit
through cuts or needles in the skin (hepatitis B) or blood-sucking arthropods
(malaria).

Modes of transmission

An infectious agent may be transmitted from its natural reservoir to a susceptible


host in different ways. There are different classifications for modes of
transmission. Here is one classification:
 Direct
o Direct contact
o Droplet spread
 Indirect
o Airborne
o Vehicleborne
o Vectorborne (mechanical or biologic)
In direct transmission, an infectious agent is transferred from a reservoir to a
susceptible host by direct contact or droplet spread.

Direct contact occurs through skin-to-skin contact, kissing, and sexual


intercourse. Direct contact also refers to contact with soil or vegetation harboring
infectious organisms. Thus, infectious mononucleosis (“kissing disease”) and
gonorrhea are spread from person to person by direct contact. Hookworm is
spread by direct contact with contaminated soil.

Droplet spread refers to spray with relatively large, short-range aerosols


produced by sneezing, coughing, or even talking. Droplet spread is classified as
direct because transmission is by direct spray over a few feet, before the droplets
fall to the ground. Pertussis and meningococcal infection are examples of
diseases transmitted from an infectious patient to a susceptible host by droplet
spread.

Indirect transmission refers to the transfer of an infectious agent from a


reservoir to a host by suspended air particles, inanimate objects (vehicles), or
animate intermediaries (vectors).

Airborne transmission occurs when infectious agents are carried by dust or


droplet nuclei suspended in air. Airborne dust includes material that has settled
on surfaces and become resuspended by air currents as well as infectious
particles blown from the soil by the wind. Droplet nuclei are dried residue of less
than 5 microns in size. In contrast to droplets that fall to the ground within a few
feet, droplet nuclei may remain suspended in the air for long periods of time and
may be blown over great distances. Measles, for example, has occurred in
children who came into a physician’s office after a child with measles had left,
because the measles virus remained suspended in the air.(46)
Vehicles that may indirectly transmit an infectious agent include food, water,
biologic products (blood), and fomites (inanimate objects such as handkerchiefs,
bedding, or surgical scalpels). A vehicle may passively carry a pathogen — as
food or water may carry hepatitis A virus. Alternatively, the vehicle may provide
an environment in which the agent grows, multiplies, or produces toxin — as
improperly canned foods provide an environment that supports production of
botulinum toxin by Clostridium botulinum.

Vectors such as mosquitoes, fleas, and ticks may carry an infectious agent
through purely mechanical means or may support growth or changes in the agent.
Examples of mechanical transmission are flies carrying Shigella on their
appendages and fleas carrying Yersinia pestis, the causative agent of plague, in
their gut. In contrast, in biologic transmission, the causative agent of malaria or
guinea worm disease undergoes maturation in an intermediate host before it can
be transmitted to humans (Figure 1.20).

Portal of entry

The portal of entry refers to the manner in which a pathogen enters a susceptible
host. The portal of entry must provide access to tissues in which the pathogen
can multiply or a toxin can act. Often, infectious agents use the same portal to
enter a new host that they used to exit the source host. For example, influenza
virus exits the respiratory tract of the source host and enters the respiratory tract
of the new host. In contrast, many pathogens that cause gastroenteritis follow a
so-called “fecal-oral” route because they exit the source host in feces, are carried
on inadequately washed hands to a vehicle such as food, water, or utensil, and
enter a new host through the mouth. Other portals of entry include the skin
(hookworm), mucous membranes (syphilis), and blood (hepatitis B, human
immunodeficiency virus).

Figure 1.20 Complex Life Cycle of Dracunculus medinensis (Guinea worm)

Image Description
Source: Centers for Disease Control and Prevention. Principles of epidemiology, 2nd ed. Atlanta: U.S. Department of Health and Human
Services;1992.

Host
The final link in the chain of infection is a susceptible host. Susceptibility of a host
depends on genetic or constitutional factors, specific immunity, and nonspecific
factors that affect an individual’s ability to resist infection or to limit pathogenicity.
An individual’s genetic makeup may either increase or decrease susceptibility.
For example, persons with sickle cell trait seem to be at least partially protected
from a particular type of malaria. Specific immunity refers to protective antibodies
that are directed against a specific agent. Such antibodies may develop in
response to infection, vaccine, or toxoid (toxin that has been deactivated but
retains its capacity to stimulate production of toxin antibodies) or may be
acquired by transplacental transfer from mother to fetus or by injection of
antitoxin or immune globulin. Nonspecific factors that defend against infection
include the skin, mucous membranes, gastric acidity, cilia in the respiratory tract,
the cough reflex, and nonspecific immune response. Factors that may increase
susceptibility to infection by disrupting host defenses include malnutrition,
alcoholism, and disease or therapy that impairs the nonspecific immune
response.

Implications for public health

Knowledge of the portals of exit and entry and modes of transmission provides a
basis for determining appropriate control measures. In general, control measures
are usually directed against the segment in the infection chain that is most
susceptible to intervention, unless practical issues dictate otherwise.

Interventions are directed at:


 Controlling or eliminating agent at source of transmission
 Protecting portals of entry
 Increasing host’s defenses
For some diseases, the most appropriate intervention may be directed at
controlling or eliminating the agent at its source. A patient sick with a
communicable disease may be treated with antibiotics to eliminate the infection.
An asymptomatic but infected person may be treated both to clear the infection
and to reduce the risk of transmission to others. In the community, soil may be
decontaminated or covered to prevent escape of the agent.

Some interventions are directed at the mode of transmission. Interruption of


direct transmission may be accomplished by isolation of someone with infection,
or counseling persons to avoid the specific type of contact associated with
transmission. Vehicleborne transmission may be interrupted by elimination or
decontamination of the vehicle. To prevent fecal-oral transmission, efforts often
focus on rearranging the environment to reduce the risk of contamination in the
future and on changing behaviors, such as promoting handwashing. For airborne
diseases, strategies may be directed at modifying ventilation or air pressure, and
filtering or treating the air. To interrupt vectorborne transmission, measures may
be directed toward controlling the vector population, such as spraying to reduce
the mosquito population.

Some strategies that protect portals of entry are simple and effective. For
example, bed nets are used to protect sleeping persons from being bitten by
mosquitoes that may transmit malaria. A dentist’s mask and gloves are intended
to protect the dentist from a patient’s blood, secretions, and droplets, as well to
protect the patient from the dentist. Wearing of long pants and sleeves and use
of insect repellent are recommended to reduce the risk of Lyme disease and
West Nile virus infection, which are transmitted by the bite of ticks and
mosquitoes, respectively.

Some interventions aim to increase a host’s defenses. Vaccinations promote


development of specific antibodies that protect against infection. On the other
hand, prophylactic use of antimalarial drugs, recommended for visitors to
malaria-endemic areas, does not prevent exposure through mosquito bites, but
does prevent infection from taking root.

Finally, some interventions attempt to prevent a pathogen from encountering a


susceptible host. The concept of herd immunity suggests that if a high enough
proportion of individuals in a population are resistant to an agent, then those few
who are susceptible will be protected by the resistant majority, since the
pathogen will be unlikely to “find” those few susceptible individuals. The degree
of herd immunity necessary to prevent or interrupt an outbreak varies by disease.
In theory, herd immunity means that not everyone in a community needs to be
resistant (immune) to prevent disease spread and occurrence of an outbreak. In
practice, herd immunity has not prevented outbreaks of measles and rubella in
populations with immunization levels as high as 85% to 90%. One problem is that,
in highly immunized populations, the relatively few susceptible persons are often
clustered in subgroups defined by socioeconomic or cultural factors. If the
pathogen is introduced into one of these subgroups, an outbreak may occur.
Exercise 1.9

Information about dengue fever is provided on the following pages. After studying
this information, outline the chain of infection by identifying the reservoir(s),
portal(s) of exit, mode(s) of transmission, portal(s) of entry, and factors in host
susceptibility.

1. Reservoirs:

2. Portals of exit:

3. Modes of transmission:

4. Portals of entry:

5. Factors in host susceptibility:

Check your answer.


Dengue Fact Sheet

What is dengue?
Dengue is an acute infectious disease that comes in two forms: dengue and
dengue hemorrhagic fever. The principal symptoms of dengue are high fever,
severe headache, backache, joint pains, nausea and vomiting, eye pain, and
rash. Generally, younger children have a milder illness than older children and
adults.

Dengue hemorrhagic fever is a more severe form of dengue. It is characterized


by a fever that lasts from 2 to 7 days, with general signs and symptoms that could
occur with many other illnesses (e.g., nausea, vomiting, abdominal pain, and
headache). This stage is followed by hemorrhagic manifestations, tendency to
bruise easily or other types of skin hemorrhages, bleeding nose or gums, and
possibly internal bleeding. The smallest blood vessels (capillaries) become
excessively permeable (“leaky”), allowing the fluid component to escape from the
blood vessels. This may lead to failure of the circulatory system and shock,
followed by death, if circulatory failure is not corrected. Although the average
case-fatality rate is about 5%, with good medical management, mortality can be
less than 1%.
What causes dengue?
Dengue and dengue hemorrhagic fever are caused by any one of four closely
related flaviviruses, designated DEN-1, DEN–2, DEN-3, or DEN-4.

How is dengue diagnosed?


Diagnosis of dengue infection requires laboratory confirmation, either by isolating
the virus from serum within 5 days after onset of symptoms, or by detecting
convalescent-phase specific antibodies obtained at least 6 days after onset of
symptoms.

What is the treatment for dengue or dengue hemorrhagic fever?


There is no specific medication for treatment of a dengue infection. Persons who
think they have dengue should use analgesics (pain relievers) with
acetaminophen and avoid those containing aspirin. They should also rest, drink
plenty of fluids, and consult a physician. Persons with dengue hemorrhagic fever
can be effectively treated by fluid replacement therapy if an early clinical
diagnosis is made, but hospitalization is often required.

How common is dengue and where is it found?


Dengue is endemic in many tropical countries in Asia and Latin America, most
countries in Africa, and much of the Caribbean, including Puerto Rico. Cases
have occurred sporadically in Texas. Epidemics occur periodically. Globally, an
estimated 50 to 100 million cases of dengue and several hundred thousand cases
of dengue hemorrhagic fever occur each year, depending on epidemic activity.
Between 100 and 200 suspected cases are introduced into the United States
each year by travelers.

How is dengue transmitted?


Dengue is transmitted to people by the bite of an Aedes mosquito that is infected
with a dengue virus. The mosquito becomes infected with dengue virus when it
bites a person who has dengue or DHF and after about a week can transmit the
virus while biting a healthy person. Monkeys may serve as a reservoir in some
parts of Asia and Africa. Dengue cannot be spread directly from person to person.

Who has an increased risk of being exposed to dengue?


Susceptibility to dengue is universal. Residents of or visitors to tropical urban
areas and other areas where dengue is endemic are at highest risk of becoming
infected. While a person who survives a bout of dengue caused by one serotype
develops lifelong immunity to that serotype, there is no cross-protection against
the three other serotypes.

What can be done to reduce the risk of acquiring dengue?


There is no vaccine for preventing dengue. The best preventive measure for
residents living in areas infested with Aedes aegypti is to eliminate the places
where the mosquito lays her eggs, primarily artificial containers that hold water.

Items that collect rainwater or are used to store water (for example, plastic
containers, 55-gallon drums, buckets, or used automobile tires) should be
covered or properly discarded. Pet and animal watering containers and vases
with fresh flowers should be emptied and scoured at least once a week. This will
eliminate the mosquito eggs and larvae and reduce the number of mosquitoes
present in these areas.

For travelers to areas with dengue, as well as people living in areas with dengue,
the risk of being bitten by mosquitoes indoors is reduced by utilization of air
conditioning or windows and doors that are screened. Proper application of
mosquito repellents containing 20% to 30% DEET as the active ingredient on
exposed skin and clothing decreases the risk of being bitten by mosquitoes. The
risk of dengue infection for international travelers appears to be small, unless an
epidemic is in progress.

Can epidemics of dengue hemorrhagic fever be prevented?


The emphasis for dengue prevention is on sustainable, community-based,
integrated mosquito control, with limited reliance on insecticides (chemical
larvicides and adulticides). Preventing epidemic disease requires a coordinated
community effort to increase awareness about dengue/DHF, how to recognize it,
and how to control the mosquito that transmits it. Residents are responsible for
keeping their yards and patios free of sites where mosquitoes can be produced.
Source: Centers for Disease Control and Prevention [Internet]. Dengue Fever. [updated 2005 Aug 22]. Available
from https://www.cdc.gov/ncidod/dvbid/dengue/index.htm.

References (This Section)


45. Leavitt JW. Typhoid Mary: captive to the public’s health. Boston: Beacon
Press; 1996.
46. Remington PL, Hall WN, Davis IH, Herald A, Gunn RA. Airborne
transmission of measles in a physician’s office. JAMA 1985;253:1575–7.

Next Page: Epidemic Disease Occurrence


Previous Page

Lesson 1 Overview

Image Description

Figure 1.19
Description: The chain of infection has 3 main parts. A reservoir such as a
human and an agent such as an amoeba. The mode of transmission can include
direct contact, droplets, a vector such as a mosquito, a vehicle such as food, or
the airborne route. The susceptible host has multiple portals of entry such as the
mouth or a syringe. Return to text.

Figure 1.20
Description: The agent Dracunculus medinensis, develops in the intermediate
host (fresh water copepod). Man acquires the infection by ingesting infected
copepods in drinking water.

An infected individual enters the water. When a blister (caused by adult female
worm) comes into contact with water, it rapidly becomes an ulcer through which
the adult female worm releases first-stage larvae. The larvae are ingested by
copepods.

Within 10 to 14 days larvae ingested by the copepods develop into infective third
stage larvae. The susceptible individual consumes water containing infected
copepods. Infected individuals are symptom free for 10 to 14 months then
ingested third-stage larvae mature into adult worms.

The adult female worm provokes the formation of a painful blister in the skin of
the infected individual. The infected individual approaches water source
containing noninfected copepods (“water fleas” or “Cyclops”). Then the cycle
starts over. Return to text.

Section 11: Epidemic Disease Occurrence


Level of disease

The amount of a particular disease that is usually present in a community is


referred to as the baseline or endemic level of the disease. This level is not
necessarily the desired level, which may in fact be zero, but rather is the
observed level. In the absence of intervention and assuming that the level is not
high enough to deplete the pool of susceptible persons, the disease may
continue to occur at this level indefinitely. Thus, the baseline level is often
regarded as the expected level of the disease.

While some diseases are so rare in a given population that a single case
warrants an epidemiologic investigation (e.g., rabies, plague, polio), other
diseases occur more commonly so that only deviations from the norm warrant
investigation. Sporadic refers to a disease that occurs infrequently and
irregularly. Endemic refers to the constant presence and/or usual prevalence of
a disease or infectious agent in a population within a geographic
area. Hyperendemic refers to persistent, high levels of disease occurrence.

Occasionally, the amount of disease in a community rises above the expected


level. Epidemic refers to an increase, often sudden, in the number of cases of a
disease above what is normally expected in that population in that
area. Outbreak carries the same definition of epidemic, but is often used for a
more limited geographic area. Cluster refers to an aggregation of cases grouped
in place and time that are suspected to be greater than the number expected,
even though the expected number may not be known. Pandemic refers to an
epidemic that has spread over several countries or continents, usually affecting a
large number of people.

Epidemics occur when an agent and susceptible hosts are present in adequate
numbers, and the agent can be effectively conveyed from a source to the
susceptible hosts. More specifically, an epidemic may result from:
 A recent increase in amount or virulence of the agent,
 The recent introduction of the agent into a setting where it has not been
before,
 An enhanced mode of transmission so that more susceptible persons are
exposed,
 A change in the susceptibility of the host response to the agent, and/or
 Factors that increase host exposure or involve introduction through new
portals of entry.(47)
The previous description of epidemics presumes only infectious agents, but non-
infectious diseases such as diabetes and obesity exist in epidemic proportion in
the U.S.(51, 52)

Exercise 1.10

For each of the following situations, identify whether it reflects:

1. Sporadic disease
2. Endemic disease
3. Hyperendemic disease
4. Pandemic disease
5. Epidemic disease

1. ____ 22 cases of legionellosis occurred within 3 weeks among residents of


a particular neighborhood (usually 0 or 1 per year)
2. ____ Average annual incidence was 364 cases of pulmonary tuberculosis
per 100,000 population in one area, compared with national average of 134
cases per 100,000 population
3. ____ Over 20 million people worldwide died from influenza in 1918–1919
4. ____ Single case of histoplasmosis was diagnosed in a community
5. ____ About 60 cases of gonorrhea are usually reported in this region per
week, slightly less than the national average

Check your answer.

Epidemic Patterns

Epidemics can be classified according to their manner of spread through a


population:
 Common-source
o Point
o Continuous
o Intermittent
 Propagated
 Mixed
 Other
A common-source outbreak is one in which a group of persons are all exposed to
an infectious agent or a toxin from the same source.

If the group is exposed over a relatively brief period, so that everyone who
becomes ill does so within one incubation period, then the common-source
outbreak is further classified as a point-source outbreak. The epidemic of
leukemia cases in Hiroshima following the atomic bomb blast and the epidemic of
hepatitis A among patrons of the Pennsylvania restaurant who ate green onions
each had a point source of exposure.(38, 44) If the number of cases during an
epidemic were plotted over time, the resulting graph, called an epidemic curve,
would typically have a steep upslope and a more gradual downslope (a so-called
“log-normal distribution”).

Figure 1.21 Hepatitis A Cases by Date of Onset, November–December, 1978

Image Description
Source: Centers for Disease Control and Prevention. Unpublished data; 1979.
In some common-source outbreaks, case-patients may have been exposed over
a period of days, weeks, or longer. In a continuous common-source outbreak,
the range of exposures and range of incubation periods tend to flatten and widen
the peaks of the epidemic curve (Figure 1.22). The epidemic curve of
an intermittent common-source outbreak often has a pattern reflecting the
intermittent nature of the exposure.

Figure 1.22 Diarrheal Illness in City Residents by Date of Onset and


Character of Stool, December 1989–January 1990

Image Description
Source: Centers for Disease Control and Prevention. Unpublished data; 1990.
A propagated outbreak results from transmission from one person to another.
Usually, transmission is by direct person-to-person contact, as with syphilis.
Transmission may also be vehicleborne (e.g., transmission of hepatitis B or HIV
by sharing needles) or vectorborne (e.g., transmission of yellow fever by
mosquitoes). In propagated outbreaks, cases occur over more than one
incubation period. In Figure 1.23, note the peaks occurring about 11 days apart,
consistent with the incubation period for measles. The epidemic usually wanes
after a few generations, either because the number of susceptible persons falls
below some critical level required to sustain transmission, or because
intervention measures become effective.

Figure 1.23 Measles Cases by Date of Onset, October 15, 1970—January 16,
1971

Image Description
Source: Centers for Disease Control and Prevention. Measles outbreak—Aberdeen, S.D. MMWR 1971;20:26.
Some epidemics have features of both common-source epidemics and
propagated epidemics. The pattern of a common-source outbreak followed by
secondary person-to-person spread is not uncommon. These are called mixed
epidemics. For example, a common-source epidemic of shigellosis occurred
among a group of 3,000 women attending a national music festival (Figure 1.24).
Many developed symptoms after returning home. Over the next few weeks,
several state health departments detected subsequent generations
of Shigella cases propagated by person-to-person transmission from festival
attendees.(48)

Figure 1.24 Shigella Cases at a Music Festival by Day of Onset, August 1988

Image Description
Adapted from: Lee LA, Ostroff SM, McGee HB, Johnson DR, Downes FP, Cameron DN, et al. An outbreak of shigellosis at an outdoor music
festival. Am J Epidemiol 1991;133:608–15.
Finally, some epidemics are neither common-source in its usual sense nor
propagated from person to person. Outbreaks of zoonotic or vectorborne disease
may result from sufficient prevalence of infection in host species, sufficient
presence of vectors, and sufficient human-vector interaction. Examples (Figures
1.25 and 1.26) include the epidemic of Lyme disease that emerged in the
northeastern United States in the late 1980s (spread from deer to human by deer
ticks) and the outbreak of West Nile encephalitis in the Queens section of New
York City in 1999 (spread from birds to humans by mosquitoes).(49, 50)

Figure 1.25 Number of Reported Cases of Lyme Disease by Year — United


States, 1992–2003.

Image Description
Data Source: Centers for Disease Control and Prevention. Summary of notifiable diseases — United States, 2003. Published April 22, 2005,
for MMWR 2003;52(No. 54):9,17,71–72.

Figure 1.26 Number of Reported Cases of West Nile Encephalitis — New


York City, 1999

Image Description
Source: Centers for Disease Control and Prevention. Outbreak of West Nile-Like Viral Encephalitis — New York, 1999. MMWR
1999;48(38):845–9.

Exercise 1.11

For each of the following situations, identify the type of epidemic spread with
which it is most consistent.

1. Point source
2. Intermittent or continuous common source
3. Propagated

1. ____ 21 cases of shigellosis among children and workers at a day


care center over a period of 6 weeks, no external source identified
incubation period for shigellosis is usually 1—3 days)
2. ____ 36 cases of giardiasis over 6 weeks traced to occasional use of
a supplementary reservoir (incubation period for giardiasis 3–25 days or
more, usually 7–10 days)
3. ____ 43 cases of norovirus infection over 2 days traced to the ice
machine on a cruise ship (incubation period for norovirus is usually 24–48
hours)

2 Measures of Disease Frequency


Learning Objectives
After reading this chapter, you will be able to do the following:

1. Define and calculate prevalence


2. Classify individuals as either at risk of disease or not
3. Define and calculate incidence proportion
4. Construct intervals of person-time at risk for a given population
5. Define and calculate incidence rate
6. Differentiate between incidence and prevalence, and explain the mathematical
relationship between them

In public health, we often want to quantify disease—how many people are affected by
this health outcome? At first glance, this might seem like a simple question, but once
you consider the many applications of quantifying disease, the complexities become
apparent. In this chapter, you will learn about 3 measures of disease
frequency: counts, prevalence, and incidence.

Counts (a.k.a. Frequencies)


Sometimes, particularly for extremely rare conditions, we only need to know how
many people are sick. How many cases of disease X or health behavior Y were there?
A count is just a number—there are no fractions, numerators, or denominators, and
the units are always “people.”
During the 2017/2018 academic year, for instance, an outbreak of meningococcal
meningitis at Oregon State University (OSU) was quantified by counts: 6 students got
sick.i
From surveillance data (see chapter 3), we know that the expected number of cases of
meningococcal meningitis in a given year is zero. Therefore, 6 cases constitute a level
quite above what is expected and would be termed an epidemic (see chapter 3).
For rare conditions like this one, simply knowing how many cases there are is
sufficient for a proper public health response. Since we normally expect none,
officials at OSU and the local health department just needed to know that there were 6
students with meningococcal meningitis in order to mount a response (in this case,
requiring students 25 years old and younger to be vaccinated).i
Similarly, we could look at animal rabies cases observed in Oregon over a 10-year
period:
Figure 2-1

Source: https://www.oregon.gov/oha/PH/…/2015-Rabies.pdf

The above picture is useful from a public health infrastructure perspective[1]; if you
work at the health department in Josephine County, then you might want to keep a
few doses of rabies vaccine on hand (since cases of rabies in animals are often
discovered because the infected animal bit a human, who must then be vaccinated).
However, if you work at the health department in Wallowa County, which has had no
recorded cases of animal rabies in 10 years, then maybe your resources would be
better spent on things other than vaccine doses that will likely expire before they are
used (assuming you could quickly get doses of the vaccine from the state or
neighboring counties if they ever became necessary).
Counts are less useful if we want to compare 2 populations. For instance, 1,000 cases
of flu in Ashland, New Hampshire, versus 100,000 cases of flu in New York City—we
cannot compare these 2 figures at a glance, because the denominators (i.e., the number
of people living in each city) are so different.

Incidence and Prevalence


There are 2 commonly used measures of disease frequency that incorporate
denominator information: one is a measure of existing disease (prevalence), and the
other is a measure of new disease (incidence). Incidence is used to study causes of
disease, whereas prevalence is used more for resource allocation.

Prevalence
Prevalence is a proportion, meaning that everyone who appears in the numerator must
also appear in the denominator. This also means that prevalence ranges from zero (no
one has the disease) to one (everyone has the disease), and it is usually expressed as a
percent.
Prevalence gives us a snapshot of the population-level disease burden at a given time.
The formula for prevalence is
# cases present in the population at a specified time# people in the population at that time#
cases present in the population at a specified time# people in the population
at that time

Looking at the formula for prevalence, you can see that everyone in the numerator is
also in the denominator. Like counts, prevalence is used for resource-planning
purposes. Consider the following question a public health authority might be faced
with: How much money should our county health department spend on health
education about smoking versus on physical activity? One metric for deciding might
be which behavior is more prevalent in the local population.[2]
The “at a specified time” part of the prevalence definition could refer either to a
specific date (e.g., what was the prevalence of flu in Newport, OR on January 22,
2018?) or to a time point in people’s lives (e.g., what is the prevalence of
breastfeeding at 6 weeks postpartum?).
The numerator for prevalence is all current cases. Whether the cases were diagnosed
yesterday or 20 years ago doesn’t matter; both would appear in the numerator. Thus
prevalence is affected both by the rate at which new cases occur (the incidence, see
below) and by how long people typically live with disease. Prevalence is therefore
less useful for conditions such as a cold or the flu (where people recover quickly)
because once they recover, they are no longer a prevalent case. At any given point in
time, most people don’t have the flu, and so it would seem like the disease burden is
quite low based on point prevalence, or prevalence calculated on a specific date. In
such instances, we sometimes calculate period prevalence instead, which is just
prevalence of disease over the course of a longer time frame: for example, what was
the prevalence of flu in Newport, OR during the entire 2017-2018 flu season? The
numerator here would be all of the cases that occurred at any time during those
months (counting only the first instance if anyone was unlucky enough to have
influenza twice), and the denominator would be everyone who lived in Newport
during those same months.

Fig. 2-2

As another example, in Figure 2-2, the prevalence of being light orange is 4/12 = 0.33
= 33%. Note that prevalence does not have units (though providing the specified
time is often appropriate and never wrong).

Prevalence Examples
Example 1 (Hypothetical Data)
If there are 5,000 students who live in the dorms at Oregon State University (OSU),
and during winter term 2018, 400 of them had the flu at some point, then the
prevalence of flu was
400/5,000 = 0.08 = 8.0% of students living in the dorms during winter term 2018 had the
flu at some point
The above is an example of a period prevalence, since we were calculating it over a
time period longer than one day. It is also an example of the specified time being
calendar time—for everyone involved, the specified time was the 2018 winter term.
Example 2 (Based on Known Birth, Infant Death, and Breastfeeding Rates in
Oregon)
In 2012, 48,972 babies were born in Oregon. At 14 weeks postpartum, 33,399 of them
were being breastfed, and 146 had died. What is the prevalence of breastfeeding at 14
weeks postpartum?
Here we need to subtract the 146 infants who died before 14 weeks from the
denominator, as they are no longer part of the population:
33,399/(48,972-146) = 0.684 = 68.4% of infants born in Oregon in 2012 were being
breastfed at 14 weeks postpartum

The above is an example of the specified time being a particular point in someone’s
life: the day on which a given baby turns 14 weeks old varies depending on the day he
or she was born. This is not a period prevalence, because everyone was assessed on
one day—we have just spread those days out throughout the year.
Example 3 (Based on National Estimates)
You can also reverse the calculations to establish the number of people with a disease,
given the prevalence and population size. In a report on bone health by the Centers for
Disease Control and Prevention (CDC),v the authors reported that the prevalence of
osteoporosis among men aged 65 and older was 5.6%, and the prevalence among
women aged 65 and older was 24.8%. According to data from the US Census
Bureau,vi as of July 1, 2017, there were an estimated 22,564,684 men and 28,293,995
women aged 65 and older living in the US. Applying the prevalence, we can estimate
that:
22,564,684 × 0.056 = 1,263,622 men aged 65 or older
and
28,293,995 × 0.248 = 7,016,911 women aged 65 or older
currently have osteoporosis in the US.

Incidence
Incidence is a tricky word in epidemiology, because while it is always a measure of
new cases, there are 2 possible denominators and at least a half-dozen words that all
refer to this same thing. Yikes!
The numerator for incidence is always the number of new cases of a disease observed
over some time period. This means that, to study incidence, you must (1) follow
people for some length of time (the length varies according to the disease—a few
hours or days for a foodborne illness versus a few decades for some cancers) and (2)
start with a population at risk—that is, people who are at risk of developing the disease
(at risk of becoming a case). Usually, at a minimum, we therefore exclude people who
already have the disease—such people cannot become an incident case because they
are already a prevalent case. We also exclude anyone not capable of getting the
disease, either because they are immune or because they lack the proper organs (e.g.,
biological females cannot get testicular cancer). Furthermore, because you are
establishing the number of new cases, it is always necessary to include time-based
units when reporting an incidence.
One way of calculating incidence is to include in the denominator the number of
people who were at risk of getting the condition at the start of your follow-up time
period. This calculation yields the incidence proportion. It’s also called, depending on
which source you’re reading, the cumulative incidence, or the risk.[vii]
Incidence Proportion = # new cases observed during some time period# people at risk at the
start of the time period# new cases observed during some time period# people at
risk at the start of the time period

The incidence proportion is interpreted as the average risk (chance) of developing the
disease over some time period.

Incidence examples
Example 1 (Hypothetical Data)
If there are 25 students in a particular class, and one person came to class on Monday
of the first week already sick with the flu (this person is a prevalent case—they are
already sick, so are not at risk), and 2 more people got the flu on Wednesday of that
same week, then what was the incidence of flu during Week 1?
Our numerator would be the number of new cases of flu—here, 2. The denominator is
the population at risk, so we must subtract out the student who already has the flu
because they are not at risk. So the denominator is 24.
The incidence of flu in that class during Week 1 was thus:
2/24 = 0.083 = 8.3 per 100 per week
Although prevalence is usually expressed as a percent, for incidence we use “per 100,”
“per 1,000,” “per 10,000,” and so on. The precise power of 10 is not standardized; just
choose one that gives you a whole(ish) number of people: 8.3 per 100 is the same as 83
per 1,000.
Additionally, it is vital that you specify the time period over which you observed
incidence, because interpretation (e.g., how much of a problem this particular disease
is) varies widely depending on time. For instance, 2 cases of breast cancer per 100
women in one week are very different than 2 cases per 100 women in 20 years. The
former might warrant public health intervention, while the latter almost certainly
would not.
Example 2 (Hypothetical Data)
If we want to compare the incidences between 2 populations, it is important to express
them in the same power of 10 (e.g., both must be “per 100” or “per 1,000”) and also to
convert them to cover the same time frame.[3]
If City A has an incidence of norovirus of 25/1,000 per month and City B has an
incidence of norovirus of 500/10,000 per year, we cannot compare them. We must first
convert one of the denominators and one of the time frames so that they are
comparable.
Here we’ll convert the numbers for City A. First, the denominator needs to be 10,000,
not 1000. If we multiply the incidence for City A by 10/10,[4] the incidence in City A is
now 250/10,000 per month.
Then we need to adjust the time frame—here by multiplying by 12 (as there are 12
months in a year). The incidence in City A then becomes:
250 × 12 = 3000/10,000 per year
Compared to City B’s incidence of 500/10,000 per year, the incidence is higher in City
A.
Some of you will have spotted a potentially questionable assumption made above: that
the incidence in City A—which was measured only over one month—is constant for
the entire year. This may or may not actually be true, and in real life, you would have
to do a little digging to determine whether it was likely true or not before declaring
that norovirus was more common in City A. What if the 1-month data point was from
an anomalous month, when City A had a huge norovirus outbreak, for instance? This
is not uncommon, because like flu, norovirus is seasonal.

We have thus far been looking at incidences with a relatively straightforward


population calculation—for example, the number of students living in a particular
dorm at a particular time. The other kind of incidence is the incidence rate. Some
epidemiology texts will call this the incidence density.[vii] Importantly, the numerator is
still the number of new cases observed over a given period of time. But the
denominator is now the sum of the person-time at risk.
The need for this “other” kind of incidence stems from the fact that populations are
not static: some people are born, others die, people move in and out. Thus if you
quantify the population at risk at the start of your observation window, you are at best
only approximating the population, particularly if you follow people for a year or
more. Instead, we could look at each person in the population and determine how long
they were at risk. Figure 2-3 shows a hypothetical population with 10 people, all of
whom were at risk at the start of a 1-year follow-up observation period:
Figure 2-3

Person 1 enrolled in the study January 1 and was followed through December 31
without developing the disease of interest (they may have been diagnosed with other
things, but if they have not contracted the disease we’re studying, then they’re still at
risk). Person 1 contributed 12 person-months at risk.
Person 2 enrolled January 1 and developed the disease at the end of August. Person 2
contributed 8 person-months at risk. Person 2 is still alive after August but can no
longer contribute person-time at risk because now they are a prevalent case.
Person 3 didn’t enroll until February 1 and was then followed for the rest of the year
without developing the disease of interest. Person 3 contributed 11 person-months at
risk.
Person 4 enrolled January 1 and developed the disease at the end of September. Person
4 contributed 9 person-months at risk.
Person 5 enrolled July 1 and did not develop the disease during follow-up. Person 5
contributed 6 person-months at risk.
Person 6 enrolled January 1 and was lost to follow-up at the end of April (this person
could have moved away, stopped returning calls, or maybe died of something else—
these are called competing risks). Person 6 contributed 4 person-months at risk. We can
still count these months, because during that time, Person 6 was in the study, and was
still at risk—not knowing the outcome (if they moved, etc.) or having their follow-up
terminated because of death from a competing risk does not negate the fact that we
observed Person 6 for 4 months while they were at risk.
Person 7 enrolled January 1 and was followed through December 31 without
developing the disease under investigation. Person 7 contributed 12 person-months at
risk.
Person 8 enrolled March 1 and developed the disease at the end of June. Person 8
contributed 4 person-months at risk.
Person 9 enrolled in the study January 1 and was followed through December 31
without incident. Person 9 contributed 12 person-months at risk.
Person 10 enrolled in the study April 1 and was followed through December 31 without
incident. Person 10 contributed 9 person-months at risk.
To calculate the incidence rate, then, our numerator is still the number of new cases
we observed during the follow-up time—here, there were 3 new cases (persons 2, 4,
and 8). The denominator is now the sum, in months, of the person-time at risk
contributed by all participants.
Calculating Incidence Rate from Data in Figure 2-3
First sum the total person time at risk:
12 + 8 + 11 + 9 + 6 + 4 + 12 + 4 + 12 + 9 = 87 person-months at risk (PMAR)
Then calculate the incidence rate:
3/87 PMAR = 0.0345 per person-month (PM)
That looks a little ugly, so let’s move the decimal place:
3.45 per 100 person-months
We could instead express this in terms of years by multiplying our original by 12
(because there are 12 months in the year):
(0.0345 per PM)(12) = 0.414 per person-year
Finally, we could make it have at least one whole person:
4.14 per 10 person-years
In other words, in Figure 2-3, we observed 4.14 new cases of disease for every
observed 10 person-years at risk.

The strengths of the person-time approach are that it allows a more nuanced view of
the population at risk and is more realistic: not everyone enrolls in a study on exactly
day one, some people experience competing risks or are lost to follow-up, and
sometimes a case pops up almost immediately, so that person contributes very little
person-time to the denominator (whereas with incidence proportion, they would add a
full person).
Limitations of the person-time approach are that it is more complex, and it does not
distinguish between 100 people followed for one month (totaling 100 PM), 10 people
followed for 10 months (totaling 100 PM), and one person followed for 100 months
(still totaling 100 PM). Additionally, loss to follow-up is probably not random (this
could also affect incidence proportion if people drop out because they’re feeling
poorly but before they are recorded as an incident case). It is thus useful to state the
time-period over which people were eligible to be followed (in Figure 2-3, one year).
Table 2-1 compares the two types of incidence.

Table 2-1: Comparison of incidence proportion and incidence rate

Incidence Proportion Incidence Ra

Numerator new cases over a period of time new cases over a period of ti

Denominator number of people at risk at the start sum of person-time at risk


You must: define the time frame report the person-time units

risk
A.K.A. cumulative incidence incidence density
absolute risk

Range 0-1 (it’s a proportion)[5] 0 to infinity

Prior Person-Time at Risk


What about all that time before our study started? If we enroll a bunch of 50 year-olds
who are at risk of heart disease and follow them recording the person-time, why
not also count the person-time at risk from prior to study entry? Each person would
yield an extra 50 years of person-time at risk!
We can’t do this because we are missing all of the prevalent cases. Some proportion
of people develop heart disease prior to age 50, and would thus not be eligible for our
study. Without data on how many person-years at risk those people had prior to
developing heart disease, our incidence would be artificially low, because we add 50
person-years at risk per person to the denominator without accounting for the entire
population, which includes some cases that are prevalent by age 50.

Uses of Incidence and Prevalence


As stated above, incidence is used to study the causes of disease. Prevalence is less
useful for this because the disease has already happened; we thus have no way of
knowing whether the disease or the exposure happened first (necessary for
establishing causality). For instance, obesity is associated with lower levels of
physical activity—one possible scenario is that lower levels of physical activity lead
to obesity, secondary to an energy imbalance. However, another equally possible
scenario is that obesity came first and the person subsequently reduced their amount
of physical activity, possibly secondary to joint pain. Studying prevalent obesity cases
does not allow us to distinguish between these scenarios.
Incidence, on the other hand, can easily be used to study potential causes of disease.
When studying incidence, we know that everyone is disease-free at baseline, since we
study only the population at risk. Therefore, any exposures assessed at the beginning
came before disease onset by definition.
Prevalence is more useful as a way of assessing the disease burden in a particular
community, perhaps for purposes of resource allocation. For instance, state health
departments in the Northeast and upper-Midwest spend a portion of their budgets
on Lyme disease prevention education (e.g., billboards about tucking your pants into
your socks) because Lyme disease is quite prevalent in those regions:

Figure 2-4
Source: https://www.cdc.gov/…/maps.html
However, the prevalence of Lyme disease in Colorado is extremely low; health
departments in Colorado would do well to spend their money elsewhere. Prevalence
data are also useful for health care administrators: if you know that 80% of your
nursing home residents have dementia in some form, then this has implications for
staffing, standard operating procedures, and potentially even for the layout and design
of the space (pictorial signs on the walls to indicate the purposes of rooms, for
instance).

Relationship between Incidence and Prevalence


As mentioned above, prevalence is affected by both the incidence (how many new
cases pop up) and the disease’s duration. If people live longer with a disease, then
they remain prevalent cases for longer. Thus
Prevalence≈Incidence×average
duration.Prevalence≈Incidence×average duration.

Here is an example:

Figure
2-5
Source: CDC Fact Sheet HIV in the United States, July 2010
Figure 2-5 shows the prevalence (blue line) and incidence (red dotted line) of HIV. In
the early 1980s at the beginning of this epidemic, before we knew what caused AIDS
and before we knew that condom use, screening blood donations, and universal
precautions by health care personnel could prevent the spread of the virus, the
incidence kept going up. More people got infected, and then they in turn infected
others. However, we also could not treat HIV initially, and so people would die of
AIDS within a few years. The early rise in prevalence is thus attributable solely to the
rising incidence. Then we discovered how to prevent new cases: thus, the incidence
went down, and while the prevalence took a couple years to catch up, it eventually
leveled off. In 1996, access to highly active antiretroviral treatments (HAART)
became common,[viii] and it was now possible for people to “live with HIV.” The
increasing prevalence, starting in the late 1990s, is thus due entirely to an increase in
patient survival, or the average duration of illness (you can see that the incidence is
steady at that time).
Prevalence therefore comprises 2 characteristics of a disease within a population: the
incidence and the average survival time. A change in either one of these components
would lead to a change in prevalence; thus, when a change in prevalence is observed,
the smart public health professional pauses to consider whether the change is due to a
change in the number of new cases (incidence) or to a change in available treatments
(and thus survival). One can see that a public health department’s response to each of
these scenarios would be different.

Summary
This chapter discusses 3 measures of disease frequency: counts, which are used for
extremely rare conditions; prevalence, which considers new and existing cases and is
used for resource allocation; and incidence, which considers only new cases and is
used to study disease etiology.
Incidence can further be broken down into incidence proportion (which uses the
number of people at risk as the denominator) and the incidence rate (which uses the
sum of the person-time at risk as the denominator). Prevalence is approximately equal
to the incidence multiplied by the average survival time after diagnosis.
3 Surveillance
KELLY JOHNSON AND MARIT L. BOVBJERG

Learning Objectives
After reading this chapter, you will be able to do the following:

1. Define epidemic and explain that word’s relationship to epidemiology


2. Define surveillance and explain how surveillance relates to epidemiology overall
3. Describe some common surveillance systems and methods used in the US
4. Explain the rationale behind notifiable condition reporting, and how this pertains to epidemics,
epidemiology, and surveillance

The root word for epidemiology is epidemic. An epidemic is “an increase, often sudden, in the
number of cases of disease above what is normally expected in that population in that area.”i

How do we know how much is “expected”? Surveillance!

Public health surveillance is defined by the World Health Organization (WHO) as

the continuous, systematic collection, analysis, and interpretation of health related data needed
for the planning, implementation, and evaluation of public health practice. Such surveillance can
(1) serve as an early warning system for impending public health emergencies, (2) document the
impact of an intervention, or track progress towards specified goals, and (3) monitor and clarify
the epidemiology of health problems, to allow priorities to be set and to inform public health
policy and strategies.ii

Surveillance activities can be either passive or active. In passive surveillance, the health
department passively receives reports of suspected injury or illness. Think of this as waiting for
disease reports to come to you. Many routine surveillance activities are passive—for instance,
systems keeping track of communicable diseases, cancer, and injuries. Epidemiologists collect
case reports that are sent to them by health care providers, laboratories, schools, or other entities
that are required by law to report this information. In active surveillance, on the other hand,
epidemiologists actively seek out cases of disease. For example, during an outbreak of
salmonellosis associated with a specific source (say, a restaurant), epidemiologists may contact
health care providers in the area and ask each for a list of patients seen with symptoms consistent
with salmonellosis. These patients are then contacted to see if they were exposed to the suspected
source (here, the restaurant). National surveys, such as the National Health and Nutrition
Examination Survey (NHANES),iii are also considered active surveillance. The benefit of active
surveillance is that it generally results in more complete data, while passive surveillance relies on
others (who have numerous duties other than disease reporting) to report cases. The downside to
active surveillance is that it is more resource-intensive, with increased personnel and financial
requirements.iv
Some surveillance activities can be further characterized as population-based. The goal of
population-based surveillance is to find every case that occurs within a population, and it is
usually part of a mandated effort to collect cases of a specific condition of interest. An example
of a condition for which we do population-based surveillance is cancer. Cancer registries aim to
capture every case of cancer that occurs in the population the registry covers. This allows
clinicians and public health professionals to monitor for trends in diagnoses that might signify a
concerning change in the environment and/or for trends in survival that might follow
improvements in treatment.

When reporting a specific condition is not required by law, public health officials must estimate
incidence and prevalence in other ways. Sentinel surveillance involves case reporting from a
limited number of hand-picked reporting sites. This type of surveillance is conducted when high-
quality data are needed, passive systems are unable to provide these data, and resources are too
scarce for complete, population-based active surveillance.v For example, annual influenza virus
surveillance collects positive influenza specimens from a variety of selected sites each year
for genotyping. From these specimens, we are able to determine which strains of the influenza
virus are circulating each year and thus which strains to include in the annual influenza vaccine.
Surveys can also be used to conduct surveillance on a representative sample of the population.

Notifiable Conditions
There is a list of conditions—mostly infectious diseases, but a few chronic diseases and injuries
also make the list—that must be reported to the Centers for Disease Control and Prevention
(CDC) whenever they are encountered by clinicians or health department officials. For example,
say a patient presents to a primary care clinic complaining of high fever, cough, and watery eyes
followed by a full-body rash. The nurse practitioner who sees the patient diagnoses measles. This
clinic must then report the measles case to the local health department, who in turn reports it to
the state health department, who in turn reports it to the CDC. This reporting ideally happens
quickly, in a matter of days (or within hours for a potentially major threat).

The list of nationally notifiable conditions is reviewed every year or so and revised according to
current public health threats and priorities. For instance, Zika virus and its associated congenital
conditions were added to the list in 2016. (The 2020 list of notifiable conditions can be
found here.)

Some of the conditions on the list are extremely rare (human rabies, plague) or have even been
eradicated (small pox). However, they remain on the notifiable conditions list because in these
cases, our expected level (also called the endemic level) is 0, and these conditions are dangerous
enough that even one suspected case would be cause for an immediate public health intervention.

Each condition on the list has an associated set of case criteria, so after available evidence for a
given patient (including laboratory data, symptoms, relevant exposures, and physician diagnoses)
is collected, it is compared against the case criteria to either confirm or rule out a given case
report. These case criteria are in place to make sure that all epidemiologists are evaluating case
reports consistently. For example, the current case criteria for Lyme disease, last revised in 2017,
can be found here.

For most of the conditions, the reporting criteria specify that these are new cases so that
incidence can be calculated from these data. However, there are exceptions to this—hepatitis C is
challenging to identify in its initial stage because few patients exhibit symptoms,vi resulting in a
large number of hepatitis C infections that are identified during laboratory testing for something
unrelated or once symptoms of liver damage occur. For conditions like this, the CDC requests
notification of any newly diagnosed cases—regardless of whether they are also a new onset case.

The CDC publishes weekly data tabulating all reported cases of the notifiable
conditions.[1] Figure 3-1 shows a screenshot of the notifiable conditions tables for meningitis, in
Oregon, during the last few months of 2017.vii

Figure 3-1

These cases were part of the epidemic of meningococcal meningitis that occurred at
Oregon State University (OSU) during the 2017/2018 academic year—the expected
number of cases of meningitis is 0, and the university, after consultation with the local
and state health departments, took action (requiring students age 25 and younger to be
vaccinated before they could register for classesviii) after only 6 cases were reported
over the course of several weeks. The complete data tables for the notifiable
conditions can be found here.

Notifiable Conditions and Privacy


At this point, most people are familiar with the Health Insurance Portability and
Accountability Act (HIPAA) Privacy Rule (45 CFR 154.512[b]). This rule states that
your health care provider cannot disclose details about your health or the care that you
received (collectively called protected health information) without your permission,
with some exceptions for insurance and payments, coordinating care with other
providers, and so on. Indeed, most clinics require that you acknowledge annually that
they have informed you of their HIPAA-compliant privacy policies. Many people are
unaware, however, that public health functions such as notifiable condition reporting
are exempt from the HIPAA Privacy Rule. Indeed, the US Department of Health and
Human Services states on its website, “The HIPAA Privacy Rule recognizes the
legitimate need for public health authorities and others responsible for ensuring public
health and safety to have access to protected health information to carry out their
public health mission…Accordingly, the Rule permits covered entities to disclose
protected health information without authorization for specified public health
purposes.”
Source: https://www.hhs.gov/hipaa/for-professionals/special-topics/public-
health/index.html

Cancer Registries
Cancer is a notifiable condition and is worth its own mention, because generally
cancer reporting requirements are more extensive than those for other conditions.
Depending on the state, a physician who diagnoses a type of cancer (other than non-
melanoma skin cancers) must report extensive information to the health department,
potentially including the type of tumor, the stage at which it was
diagnosed, histology information, treatments given, and the eventual outcome (death,
recurrence, etc.). Since cancer cases are reported upon diagnosis,[2] this can also be a
potential source of incidence data, with the same caveats discussed above for hepatitis
C (i.e., that sometimes a diagnosis occurs quite late in the disease process). Cancer
registries are somewhat unique compared to other notifiable conditions data because
patients are followed over time. Several states contribute their cancer registry data to
the Surveillance, Epidemiology, and End Results (SEER) database,ix which is
available for both surveillance and research purposes.
Vital Statistics
Birth and death certificates—together called vital statisticsx —constitute another
ongoing surveillance system. Local hospitals report births and deaths up to their state
health departments, who in turn report to the CDC. By keeping track of the health of
newborns and childbearing women, as well as causes of death for everyone, public
health officials can spot potential emerging trends that would warrant intervention.
Annual reports summarizing all births and deaths that occurred in the US, as well as
any notable changes from previous years, can be found on the CDC’s website here.

Survey-Based Surveillance Systems


The US conducts numerous surveillance activities that involve direct data collection
from individual residents, usually via questionnaires, although NHANES includes
physical exam and laboratory data as well. Other examples of surveillance include the
Behavioral Risk Factor Surveillance System (BRFSS),xi which is a telephone-based
survey of adults, who are asked to self-report their health and health behaviors; and
the Pregnancy Risk Assessment Monitoring System (PRAMS),xii which is a paper-
based survey of women who have recently given birth, who report on health and
health care utilization for themselves and their newborn(s). Data from these surveys
are used by public health professionals to monitor trends over time—for instance, the
map on seat belt use shown in chapter 1 was made using BRFSS data—but they
contain only data from prevalent cases. On the plus side, the survey data are freely
available to students and researchers, and numerous articles are published each year
using these datasets.

Conclusions
Surveillance activities allow epidemiologists and other public health professionals to
monitor the “usual” levels of disease in a population. The ultimate goal of surveillance
is to notice potential public health threats early so that a proper response can be
mounted before a public health crisis ensues. The US has numerous surveillance
systems operating at any one time, and much of the benefit from these systems
develops as data are compared over time.
Appendix 1: How to Read an
Epidemiologic Study
Key Takeaways
A standard epidemiology study (not counting the abstract—more on this later) has 4
parts:

 Introduction
 Methods
 Results
 Discussion

Usually these are labelled, but not always. Sometimes they have different labels (eg,
“background” instead of “introduction.”) Even without labels, epidemiology papers
are almost always organized in this order. Details about each section are discussed
below.

Introduction
The INTRODUCTION usually consists of three things:

 What we already know about a topic


o A very select summary of what we know! It is important to remember that intros are
NOT exhaustive literature reviews. Furthermore, what things are included is entirely
at the authors’ discretion (with some input from peer reviewers and editors), which
means that you do see the occasional biased/incomplete introduction.
 What we don’t know about the topic (ie, what is the gap in the literature?)
 What this study will do to address that gap
o Usually concluding with “our study question was…” or “our objective here was…”

The introduction is where you will find answers to questions like “What is the public
health or clinical problem this study is trying to address?” and “What was their
research question?”

Methods
The METHODS is just that—a description of the methods used for this study. Ideally,
the methods section will describe:

 How they got their sample from the target population/what dataset was used
o Including inclusion/exclusion criteria, with rationales as appropriate
 What is the study design (including design-specific relevant details, such as how
participants were randomized, if it’s a randomized controlled trial)
o Occasionally, if a study has been done using a well-known dataset (e.g., the NHANES
data—see Chapter 3), the methods section will just direct the reader to other
publications in which these methods are described in detail, rather than re-printing all
of the information
 What was the exposure, how and when was it measured, and how was it
operationalized in the analysis
o ie, did they ask people their ages, but then dichotomize into “old” (>65) vs. “young”
(65 and younger)?
 What was the outcome, how and when was it measured, and how was it
operationalized in the analysis
o ie, what was the case definition used for diagnosis? Were cases identified via clinics,
or self-report, or some other method?
 What confounders and/or effect modifiers were included, how they were chosen, how
they were measured, and how they were operationalized in the analysis
o Collectively these are referred to as “covariables”
o Any variables listed under “adjusted for” or “included in the model” are confounders
o Any variables listed as “interactions” or “stratified by” are effect modifiers
 The statistical methods used

The methods section also should include a sentence about ethics/IRB approval, and
informed consent, if applicable.
As a beginning epidemiology student, do not be concerned if you do not understand
everything in the methods section! This is particularly true if the study included
laboratory assays (e.g. to measure blood lead levels), but also pertains to the statistical
methods. Papers must include enough detail in the methods section so that other
scientists can evaluate, and potentially replicate, the work—which means they are
written for other epidemiologists who are publishing papers, all of whom potentially
have many years’ worth of training in relevant methods.
Your task as a first-time epidemiology student (or as an end-user of epidemiologic
research who has some – but not a lot of – training in the field) is to read the methods
carefully enough to spot any potential sources of bias, given your level of
understanding. For instance, after reading this book, my hope is that you could spot
egregious selection bias by reading the authors’ methods and thinking through “who
did they get, who did they miss?”. However, I would not expect that you would be
able to spot a bias introduced because the authors violated one of the assumptions of
the statistical model that they used. Sometimes I read papers myself where I don’t
quite follow the methods, particularly for laboratory-based measurements. In those
cases, I just trust the peer review process—several pairs of eyes were on any given
study before mine, so probably the methods are kosher. If it seems like maybe there’s
a problem, I ask one of my laboratory (or statistics, or clinical, depending on what my
question is) colleagues about it.
Bottom line: read the methods carefully, but if there are parts you don’t quite follow,
don’t worry about it unless you are going to cite that work yourself. In that case, ask
around and find someone who can help you interpret the methods.

Results
The RESULTS section contains…results. What did they find? This section is usually
very dense in terms of numbers. There will be odds ratios, risk ratios, confidence
intervals, p-values, etc. Usually the results section begins with a discussion of the
sample that was in the study, and this often further includes a table of demographics
and relevant risk factors (usually “Table 1”). This table is a good place to get a feel
for who was in the study (and therefore who was not). Then usually the authors will
discuss the MAIN results: what did they find pertaining to their primary research
question? They will not say what the results mean in this section (that’s for the
“discussion,” below)—this section just presents the numbers. Expect to go back and
forth between the text and the tables and the figures several times—most journals
expressly ask authors not to duplicate results (meaning, if the results are presented in a
table, don’t repeat them in the text). Thus, to understand all of the results yourself,
you will need to read both the text and the tables/figures. The last few paragraphs of
the results section are used to present subgroup analyses, or bias/sensitivity analyses.
As you read results sections, think about what you read in the methods section. Do
you believe these results, given the methods used?

Discussion
The DISCUSSION section is the authors’ opinions about what the results mean. It
usually begins with a summary of the main findings, and then compares these findings
to other published findings on the same or similar topics. It should include a
limitations (or strengths and limitations) sub-section, in which the authors provide a
very frank picture of what limitations their study had (where there might have been
bias, etc). If, when reading the methods and results sections, you thought of a
potential bias that is NOT discussed here…pause. Perhaps this study is not the best
source, then? All limitations should be acknowledged. (Corollary: no study is
perfect! All have limitations. We can try to minimize, but need to ‘fess up to the ones
that remain.) The discussion section usually concludes with some kind of
recommendation, either for policy, or further research. Again, this is the authors’
opinion. Indeed, you are welcome to disagree entirely with any or all of a given
discussion section—discussion sections are opinion, not fact.

Abstract
Finally, there is the ABSTRACT. Usually found at the beginning of the paper, often
in its own separate box—this is a brief summary of the entire thing. Sometimes these
same subheadings will be in the abstract, other times not. WARNING: You cannot
understand a paper just by reading the abstract. Often only one or two main
results are presented in the abstract, and the methods are quite sparse, as abstracts are
limited usually to a few hundred words. Never cite a paper if all you have read is the
abstract. This will come back to haunt you.

A few other details


Below are a few more points for if/when you are looking for papers yourself.
Consider these things as you determine whether it’s worth reading and/or citing a
paper that you have found.

 Just under the paper’s title is a list of the authors, their affiliations, and (usually) their
credentials. Are these the kinds of people who need to be on this study? For instance,
if a study on appropriate treatment for congestive heart failure does not include a
cardiologist as an author, maybe that’s a problem. If a study includes fancy statistical
methods beyond basic logistic or linear regression, but the author list includes only
clinicians, and no one with specialized statistical training, maybe that’s a problem.
 Somewhere, usually on either the first or last page, or sometimes between the
conclusion and the reference list, is a note about funding and other conflicts of interest.
These are often illuminating. For instance, I know of a study casting doubt on the
benefits of breastfeeding that was funded by the International Formula Council.[i]
 Many journals will list dates—the date the article was submitted, the date it was
received in revised form (meaning, it was received, sent out for review, the reviews
were sent back to the authors, who then made the changes), and the date it was
accepted. If the initial submission date and the acceptance date are not at least six
weeks (and more realistically six months) apart, then it’s possible that the peer review
process was circumvented in some way. This happens. Sad but true.

4 Introduction to 2 x 2 Tables,
Epidemiologic Study Design, and
Measures of Association
Learning Objectives
After reading this chapter, you will be able to do the following:

1. Interpret data found in a 2 x 2 table


2. Compare and contrast the 4 most common types of epidemiologic studies: cohort studies,
randomized controlled trials, case-control studies, and cross-sectional studies
3. Calculate and interpret relative measures of association (risk ratios, rate ratios, odds ratios)
4. Explain which measures are preferred for which study designs and why
5. Discuss the differences between absolute and relative measures of association

In epidemiology, we are often concerned with the degree to which a particular exposure might
cause (or prevent) a particular disease. As detailed later in chapter 10, it is difficult to claim
causal effects from a single epidemiologic study; therefore, we say instead that exposures and
diseases are (or are not) statistically associated. This means that the exposure
is disproportionately distributed between individuals with and without the disease. The degree
to which exposures and health outcomes are associated is conveyed through a measure of
association. Which measure of association to choose depends on whether you are working
with incidence or prevalence data, which in turn depends on the type of study design used. This
chapter will therefore provide a brief outline of common epidemiologic study designs interwoven
with a discussion of the appropriate measure(s) of association for each. In chapter 9, we will
return to study designs for a more in-depth discussion of their strengths and weaknesses.

Necessary First Step: 2 x 2 Notation


Before getting into study designs and measures of association, it is important to understand the
notation used in epidemiology to convey exposure and disease data: the 2 x 2 table. A 2 x 2 table
(or two-by-two table) is a compact summary of data for 2 variables from a study—namely, the
exposure and the health outcome. Say we do a 10-person study on smoking and hypertension,
and collect the following data, where Y indicates yes and N indicates no:

Table 4-1

Participant # Smoker? Hypertension?

1 Y Y

2 Y N

3 Y Y

4 Y Y

5 N N

6 N Y

7 N N

8 N N

9 N Y

10 N N
You can see that we have 4 smokers, 6 nonsmokers, 5 individuals with hypertension, and 5
without. In this example, smoking is the exposure and hypertension is the health outcome, so we
say that the 4 smokers are “exposed” (E+), the 6 nonsmokers are “unexposed” (E−), the 5 people
with hypertension are “diseased” (D+), and the 5 people without hypertension are “nondiseased”
(D−). This information can be organized into a 2 × 2 table:
Table 4-2

D+ D-

E+ 3 1

E- 2 4

The 2 × 2 table summarizes the information from the longer table above so that you can quickly
see that 3 individuals were both exposed and diseased (persons 1, 3, and 4); one individual was
exposed but not diseased (person 2); two individuals were unexposed but diseased (persons 6 and
9); and the remaining 4 individuals were neither exposed nor diseased (persons 5, 7, 8, and 10).
Though it does not really matter whether exposure or disease is placed on the left or across the
top of a 2 × 2 table, the convention in epidemiology is to have exposure on the left and disease
across the top.

When discussing 2 x 2 tables, epidemiologists use the following shorthand to refer to specific
cells:

Table 4-3

D+ D-

E+ A B

E- C D

It is often helpful to calculate the margin totals for a 2 x 2 table:

Table 4-4

D+ D- Total
E+ 3 1 4

E- 2 4 6

Total 5 5 10

Or:

Table 4-5

D+ D- Total

E+ A B A+B

E- C D C+D

Total A+C B+D A+B+C+D

The margin totals are sometimes helpful when calculating various measures of association (and
to check yourself against the original data).
Continuous versus Categorical Variables
Continuous variables are things such as age or height, where the possible values for a given
person are infinite, or close to it. Categorical variables are things such as religion or favorite
color, where there is a discrete list of possible answers. Dichotomous variables are a special case
of categorical variable where there are only 2 possible answers. It is possible to dichotomize a
continuous variable—if you have an “age” variable, you could split it into “old” and “young.”
However, is it not always advisable to do this because a lot of information is lost. Furthermore,
how does one decide where to dichotomize? Does “old” start at 40, or 65? Epidemiologists
usually prefer to leave continuous variables continuous to avoid having to make these judgment
calls.

Nonetheless, having dichotomous variables (a person is either exposed or not, either diseased or
not) makes the math much easier to understand. For the purposes of this book, then, we will
assume that all exposure and disease data can be meaningfully dichotomized and placed into 2×2
tables.
Studies That Use Incidence Data

Cohorts
There are 4 types of epidemiologic studies that will be covered in this book,[1] two of which
collect incidence data: prospective cohort studies and randomized controlled trials. Since
these study designs use incidence data, we instantly know 3 things about these study types. One,
we are looking for new cases of disease. Two, there is thus some longitudinal follow-up that
must occur to allow for these new cases to develop. Three, we must start with those who were at
risk (i.e., without the disease or health outcome) as our baseline.

The procedure for a prospective cohort study (hereafter referred to as just a “cohort study,”
though see the inset box on retrospective cohort studies later in this chapter) begins with
the target population, which contains both diseased and non-diseased individuals:

Figure 4-1

As discussed in chapter 1, we rarely conduct studies on entire populations because they are too
big for it to be logistically feasible to study everyone in the population. Therefore we draw
a sample and perform the study with the individuals in the sample. For a cohort study, since we
will be calculating incidence, we must start with individuals who are at risk of the outcome. We
thus draw a non-diseased sample from the target population:
Figure 4-2

The next step is to assess the exposure status of the individuals in our sample and determine
whether they are exposed or not:

Figure 4-3

After assessing which participants were exposed, our 2 x 2 table (using the 10-person
smoking/HTN data example from above) would look like this:

Table 4-6
D+ D- Total

E+ 0 4 4

E- 0 6 6

Total 0 10 10

By definition, at the beginning of a cohort study, everyone is still at risk of developing the
disease, and therefore there are no individuals in the D+ column. In this hypothetical example,
based on the data above, we will observe 5 cases of incident hypertension as the study
progresses–but at the beginning, none of these cases have yet occurred.

We then follow the participants in our study for some length of time and observe incident cases
as they arise.

Figure 4-4

As mentioned in chapter 2, the length of follow-up varies depending on the disease process in
question. For a research question regarding childhood exposure and late-onset cancer, the length
of follow-up would be decades. For an infectious disease outbreak, the length of follow-up might
be a matter of days or even hours, depending on the incubation period of the particular disease.
Assuming we are calculating incidence proportions (which use the number of people at risk in
the denominator) in our cohort, our 2 × 2 table at the end of the smoking/HTN study would look
like this:

Table 4-7

D+ D- Total

E+ 3 1 4

E- 2 4 6

Total 5 5 10

It is important to recognize that when epidemiologists talk about a 2 × 2 table from a cohort
study, they mean the 2 × 2 table at the end of the study—the 2 × 2 table from the beginning was
much less interesting, as the D+ column was empty!

From this 2 × 2 table, we can calculate a number of useful measures, detailed below.
Calculating the Risk Ratio from the Hypothetical Smoking/Hypertension Cohort Study
We can start by calculating the overall incidence of disease in our sample (assume that our
smoking/HTN study included 10 years of follow-up):
Incidence proportion = number of new casespopulation at risk at baseline=510number of
new casespopulation at risk at baseline=510 = 50 cases per 100 people in 10 years
Using ABCD notation for a 2 x 2 table, the formula for the overall incidence proportion is:
(A+C)(A+B+C+D)(A+C)(A+B+C+D)
We can also calculate the incidence only among exposed individuals:
IE+ = A(A+B)=34A(A+B)=34 = 75 per 100 in 10 years

Likewise, we can calculate the incidence only among unexposed individuals:


IE- = C(C+D)=26C(C+D)=26 = 33 per 100 in 10 years

Recall that our original goal with the cohort study was to see whether exposure is associated with
disease. We thus need to compare the IE+ to the IE-. The most common way of doing this is to
calculate their combined ratio:
Risk Ratio = IE+IE−=75 per 100 in 10 years33 per 100 in 10 yearsIE+IE−=75 per 100 in 10
years33 per 100 in 10 years = 2.27
Using ABCD notation, the formula for RR is:
A(A+B)C(C+D)
A(A+B)C(C+D)

Note that risk ratios (RR) have no units, because the time-dependent units for the 2 incidences
cancel out.

If the RR is greater than 1, it means that we observed more disease in the exposed group than in
the unexposed group. Likewise, if the RR is less than 1, it means that we observed less disease in
the exposed group than in the unexposed group. If we assume causality, an exposure with an RR
< 1 is preventing disease, and an exposure with an RR > 1 is causing disease. The null value for
a risk ratio is 1.0, which would mean that there was no observed association between exposure
and disease. You can see how this would be the case—if the incidence was identical in the
exposed and unexposed groups, then the RR would be 1, since x divided by x is 1.

Because the null value is 1.0, one must be careful if using the words higher or lower when
interpreting RRs. For instance, an RR of 2.0 means that the disease is twice as common, or
twice as high, in the exposed compared to the unexposed—not that it is 2 times more common,
or 2 times higher, which would be an RR of 3.0 (since the null value is 1, not 0). If you do not see
the distinction between these, don’t sweat it—just memorize and use the template sentence
below, and your interpretation will be correct.

The correct interpretation of an RR is:


“The risk of [disease] was [RR] times as high in [exposed] compared to [unexposed]
over [x] days/months/years.”

Using our smoking/HTN example:


“The risk of hypertension was 2.27 times as high in smokers compared to nonsmokers over 10
years.”

The key phrase is times as high; with it, the template sentence works regardless of whether the
RR is above or below 1. For an RR of 0.5, saying “0.5 times as high” means that you multiply the
risk in the unexposed by 0.5 to get the risk in the exposed, yielding a lower incidence in the
exposed—as one expects with an RR < 1.

If our cohort study instead used a person-time approach, the 2 x 2 table at the end of the study
would have a column for sum of the person-time at risk (PTAR):

Table 4-8

D+ D- Total Σ PTAR
E+ 3 1 4 27.3 PY

E- 2 4 6 52.9 PY

Total 5 5 10 80.2 PY
Calculating the Rate Ratio from the Hypothetical Smoking/Hypertension Cohort Study
Using a person-time denominator, the incidence rate for the overall study is:
I = 5 new cases80.2 PY5 new cases80.2 PY = 6.2 per 100 person-years
Likewise, the incidence rate among exposed persons is:
IE+ = 327.3327.3 = 11.0 per 100 person-years
And the incidence among unexposed persons is:
IE+ = 259.2259.2 = 3.8 per 100 person-years
We again take the ratio of incidence in the exposed to incidence in the unexposed, this time
calculating a rate ratio (also abbreviated RR):
RR = IE+IE−IE+IE− = 2.9
As when using incidence proportions, the units cancel out, and we are left with just a number.

The interpretation is the same as it would be for the risk ratio; one just needs to substitute the
word rate for the word risk:
The rate of hypertension was 2.9 times as high in smokers compared to non-smokers, over 10
years.

Notice that the interpretation sentence still includes the duration of the study, even though some
individuals (the 4 who developed hypertension) were censored before that time. This is because
knowing how long people were followed for (and thus given time to develop disease) is still
important when interpreting the findings. As discussed in chapter 2, 100 years of person-time can
be accumulated in any number of different ways; knowing that the duration of the study was 10
years (rather than 1 year or 50 years) might make a difference in terms of how (or if) one applies
the findings in practice.
“Relative Risk”
Both the risk ratio and the rate ratio are abbreviated RR. This abbreviation (and the risk ratio
and/or rate ratio) is often referred to by epidemiologists as relative risk. This is an example of
inconsistent lexicon in the field of epidemiology; in this book, I use risk ratio and rate
ratio separately (rather than relative risk as an umbrella term) because it is helpful, in my opinion,
to distinguish between studies using the population at risk vs. those using a person-time at risk
approach. Regardless, a measure of association called RR is always calculated as incidence in the
exposed divided by incidence in the unexposed.
Retrospective Cohort Studies
Throughout this book, I will focus on prospective cohort studies. One can also conduct
a retrospective cohort study, mentioned here because public health and clinical practitioners will
encounter retrospective cohort studies in the literature. In theory, a retrospective cohort study is
conducted exactly like a prospective cohort study: one begins with a non-diseased sample from
the target population, determines who was exposed, and “follows” the sample
for x days/months/years, looking for incident cases of disease. The difference is that, for a
retrospective cohort study, all this has already happened, and one reconstructs this information
using existing records. The most common way to do retrospective cohort studies is by using
employment records (which often have job descriptions useful for surmising exposure—for
instance, the floor manager was probably exposed to whatever chemicals were on the factory
floor, whereas human resource officers probably were not), medical records, or other
administrative datasets (e.g., military records).

Continuing with our smoking/HTN 10-year cohort example, one might do a retrospective cohort
using medical records as follows:

 Go back to all the records from 10 years ago and determine who already had hypertension
(these people are not at risk and are therefore not eligible) or otherwise does not meet the
sample inclusion criteria
 Determine, among those at risk 10 years ago, which individuals were smokers
 Determine which members of the sample then developed hypertension during the intervening
10 years

Retrospective cohorts are analyzed just like prospective cohorts—that is, by calculating rate
ratios or risk ratios. However, for beginning epidemiology students, retrospective cohorts are
often confused with case-control studies; therefore we will focus exclusively on prospective
cohorts for the remainder of this book. (Indeed, occasionally even seasoned scientists are
confused about the difference!)i

Randomized Controlled Trials


The procedure for a randomized controlled trial (RCT) is exactly the same as the procedure for a
prospective cohort, with one exception: instead of allowing participants to self-select into
“exposed” and “unexposed” groups, the investigator in an RCT randomly assigns some
participants (usually half) to “exposed” and the other half to “unexposed.” In other words,
exposure status is determined entirely by chance. This is the type of study required by the Food
and Drug Administration for approval of new drugs: half of the participants in the study are
randomly assigned to the new drug and half to the old drug (or to a placebo, if the drug is
intended to treat something previously untreatable). The diagram for an RCT is as follows:
Figure 4-5

Note that the only difference between an RCT and a prospective cohort is the first box: instead of
measuring existing exposures, we now tell people whether they will be exposed or not. We are
still measuring incident disease, and we are therefore still calculating either the risk ratio or the
rate ratio.

Observational versus Experimental Studies


Cohort studies are a subclass of observational studies, meaning the researcher is
merely observing what happens in real life—people in the study self-select into being
exposed or not depending on their personal preferences and life circumstances. The
researcher then measures and records a given person’s level of exposure. Cross-
sectional and case-control studies are also observational. Randomized controlled trials,
on the other hand, are experimental studies—the researcher is conducting an
experiment that involves telling people whether they will be exposed to a condition or
not (e.g., to a new drug).

Studies That Use Prevalence Data


Following participants while waiting for incident cases of disease is expensive and
time-consuming. Often, epidemiologists need a faster (and cheaper) answer to their
question about a particular exposure/disease combination. One might instead take
advantage of prevalent cases of disease, which by definition have already occurred
and therefore require no wait. There are 2 such designs that I will cover: cross-
sectional studies and case-control studies. For both of these, since we are not using
incident cases, we cannot calculate the RR, because we have no data on incidence. We
instead calculate the odds ratio (OR).
Cross-sectional
Cross-sectional studies are often referred to as snapshot or prevalence studies: one
takes a “snapshot” at a particular point in time, determining who is exposed and who
is diseased simultaneously. The following is a visual:

Fi
gure 4-6

Note that the sample is now no longer composed entirely of those at risk because we
are using prevalent cases—thus by definition, some proportion of the sample will be
diseased at baseline. As mentioned, we cannot calculate the RR in this scenario, so
instead we calculate the OR.
Calculating the Odds Ratio from the Hypothetical Smoking/Hypertension Cross-
Sectional Study
The formula for OR for a cross-sectional study is:
OR = odds of disease in the exposed groupodds of disease in the unexposed groupodds of
disease in the exposed groupodds of disease in the unexposed group
The odds of an event is defined statistically as the number of people who experienced
an event divided by the number of people who did not experience it. Using 2 × 2
notation, the formula for OR is:
OR = ABCD=ADBCABCD=ADBC
For our smoking/HTN example, if we assume those data came from a cross-sectional
study, the OR would be:
OR = 3124=3∗ 42∗ 13124=3∗ 42∗ 1 = 6.0
Again there are no units.

The interpretation of an OR is the same as that of an RR, with the


word odds substituted for risk:
The odds of hypertension were 6.0 times as high in smokers compared to nonsmokers.

Note that we now no longer mention time, as these data came from a cross-sectional
study, which does not involve time. As with interpretation of RRs, ORs greater than 1
mean the exposure is more common among diseased, and ORs less than 1 mean the
exposure is less common among diseased. The null value is again 1.0.
For 2 x 2 tables from cross-sectional studies, one can additionally calculate the
overall prevalence of disease as
Prevalence = (A+C)(A+B+C+D)(A+C)(A+B+C+D)

Finally, some authors will refer to the OR in a cross-sectional study as the prevalence
odds ratio—presumably, just as a reminder that cross-sectional studies are conducted
on prevalent cases. The calculation of such a measure is exactly the same as the OR as
presented above.
OR versus RR
As you can see from the (hypothetical) example data in this chapter, the OR will
always be further from the null value than the RR. The more common the disease, the
more this is true. If the disease has a prevalence of about 5% or less, then the OR does
provide a close approximation of the RR; however, as the disease in question becomes
more common (as in this example, with a hypertension prevalence of 40%), the OR
deviates further and further from the RR.
Occasionally, you will see a cohort study (or very rarely, an RCT) that reports the OR
instead of the RR. Technically this is not correct, because cohorts and RCTs use
incident cases, so the best choice for a measure of association is the RR. However,
one common statistical modeling technique—logistic regression—automatically
calculates ORs. While it is possible to back-calculate the RR from these numbers,
often investigators do not bother and instead just report the OR. This is troublesome
for a couple of reasons: first, it is easier for human brains to interpret risks as opposed
to odds, and therefore risks should be used when possible; and second, cohort studies
and RCTs almost always have relatively common outcomes (see chapter 9), thus
reporting the OR makes it seem as if the exposure is a bigger problem (or a better
solution, if OR < 1) than it “really” is.

Case-Control
The final type of epidemiologic study that is commonly used is the case-control study.
It also begins with prevalent cases and thus is faster and cheaper than longitudinal
(prospective cohort or RCT) designs. To conduct a case-control study, one first draws
a sample of diseased individuals (cases):

Figure 4-7

Then a sample of nondiseased individuals (controls):


Figure 4-8

First and foremost, note that both cases and controls come from the same underlying
population. This is extremely important, lest a researcher conduct a biased case-
control study (see chapter 9 for more on this). After sampling cases and controls, one
measures exposures at some point in the past. This might be yesterday (for a
foodborne illness) or decades ago (for osteoporosis):
Figure 4-9

Again, we cannot calculate incidence because we are using prevalent cases, so instead
we calculate the OR in the same manner as above. The interpretation is identical, but
now we must refer to the time period because we explicitly looked at past exposure
data:
The odds of hypertension are 6.0 times as high in people who were smokers 10 years
ago, compared to people who were nonsmokers 10 years ago.

Note, however, that one cannot calculate the overall sample prevalence using a 2 × 2
table from a case-control study, because we artificially set the prevalence in our
sample (usually at 50%) by deliberately choosing individuals who were diseased for
our cases.
Exposure OR versus Disease OR
Technically, for a case-control study, one calculates the disease OR rather than
the exposure OR (which is presented under cross-sectional studies). In other words,
since in case-control studies we begin with disease, we are calculating the odds of
being exposed among those who are diseased compared to the odds of being exposed
among those who are not diseased:
ORdisease =(A/C)(B/D)=ADBC=(A/C)(B/D)=ADBC
The exposure odds ratio, you will remember, calculates the odds of being diseased
among those who are exposed, compared to the odds of being diseased among those
who are unexposed:
ORexposure =(A/B)(C/D)=ADBC=(A/B)(C/D)=ADBC
In advanced epidemiology classes, one is expected to appreciate the nuances of this
difference and to articulate the rationale behind it. However, since both the exposure
and the disease odds ratios simplify to the same final equation, here we will not
differentiate between them. The interpretation is the same: an OR > 1 means that
disease is more common in the exposed group (or exposure is more common in the
diseased group—same thing), and an OR < 1 means that disease is less common in the
exposed group (or exposure is less common in the diseased group—again, same thing).
Risk Difference
RR and OR are known as relative or ratio measures of association for obvious reasons.
These measures can be misleading, however, if the absolute risks (incidences) are
small.[2] For example, if a cohort study was done, and investigators observed an
incidence in the exposed of 1 per 1,000,000 in 20 years and an incidence in the
unexposed, and an incidence in the unexposed of 2 per 1,000,000 in 20 years, the RR
would be 0.5: there is a 50% reduction in disease in the exposed group. Break out the
public health intervention! However, this ratio measure masks an important
truth: the absolute difference in risk is quite small: 1 in a million.
To address this issue, epidemiologists sometimes calculate instead the risk difference
instead:
RD = IE+ – I E-
Unfortunately, this absolute measure of association is not often seen in the literature,
perhaps because interpretation implies causation more explicitly or because it is more
difficult to control for confounding variables (see chapter 7) when calculating
difference measures.
Regardless, in our smoking/HTN example, the RD is:
RD = IE+ – I E- = 75 per 100 in 10 years – 33 per 100 in 10 years = 42 per 100 in 10 years
Note that the RD has the same units as incidence, since units do not cancel when
subtracting. The interpretation is as follows:
Over 10 years, the excess number of cases of HTN attributable to smoking is 42; the
remaining 33 would have occurred anyway.
You can see how this interpretation assigns a more explicitly causal role to the
exposure.
More common (but still not nearly as common as the ratio measures) are a pair of
measures derived from the RD: the attributable risk (AR) and the number needed to
treat/number needed to harm (NNT/NNH).
The AR is calculated as RD/IE+. Here,
AR = 42 per 100 in 10 years / 75 per 100 in 10 years = 56%
Interpretation:
56% of cases can be attributed to smoking, and the rest would have happened anyway.
Again this implies causality; furthermore, because diseases all have more than one
cause (see chapter 10), the ARs for each possible cause will sum to well over 100%,
making this measure less useful.
Finally, calculating NNT/NNH (both of which are similar, with the former being for
preventive exposures and the latter for harmful ones) is simple:
NNT = 1/RD
In our example,
NNH = 1 / 42 per 100 per 10 years = 1/0.42 per 10 years = 2.4
Interpretation:
Over 10 years, for every 2.4 smokers, 1 will develop hypertension.
For a protective exposure, the NNT (commonly used in clinical circles) is interpreted
as the number you need to treat in order to prevent one case of a bad outcome. For
harmful exposures, as in our smoking/HTN example, it is the number needed to be
exposed to cause one bad outcome. For many drugs in common use, the NNTs are in
the hundreds or even thousands.[iii][iv]

Conclusions
Epidemiologic data are often summarized in 2 × 2 tables. There are 2 main measures
of association commonly used in epidemiology: the risk ratio/rate ratio (relative risk)
and the odds ratio. The former is calculated for study designs that collect data on
incidence: cohorts and RCTs. The latter is calculated for study designs that use
prevalent cases: cross-sectional studies and case-control studies. Absolute measures of
association (e.g., risk difference) are not seen as often in epidemiologic literature, but
it is nonetheless always important to keep the absolute risks (incidences) in mind
when interpreting results.
Below is a table summarizing the concepts from this chapter:

Preferred Measure

Study Design Methods Summary Incident or Prevalent Cases?

of Association

Start with a nondiseased sample, determine exposure,


Cohort Incident Risk ratio or rate ratio
follow over time.

Start with a nondiseased sample, assign exposure, follow


RCT Incident Risk ration or rate ratio
over time

Start with diseased (cases), recruit comparable nondiseased


Case-Control Prevalent Odds ratio
(controls), look at previous exposures

From a sample, assess both exposure status and disease


Cross-sectional Prevalent Odds ratio
status simultaneously

Personal
Public Health Surveillance
Section 2: Purpose and Characteristics of Public Health
Surveillance
Public health surveillance provides and interprets data to facilitate the prevention
and control of disease. To achieve this purpose, surveillance for a disease or
other health problem should have clear objectives. These objectives should
include a clear description of how data that are collected, consolidated, and
analyzed for surveillance will be used to prevent or control the disease. For
example, the objective of surveillance for tuberculosis might be to identify
persons with active disease to ensure that their disease is adequately treated.
For such an objective, data collection should be sufficiently frequent, timely, and
complete to allow effective treatment. Alternatively, the objective might be to
determine whether control measures for tuberculosis are effective. To meet this
objective, one might track the temporal trend of tuberculosis, and data might not
need to be collected as quickly or as frequently. Surveillance for a health problem
can have more than one objective.

After the objectives for surveillance have been determined, critical characteristics
of surveillance are usually apparent, including:

 Timeliness, to implement effective control measures;


 Representation, to provide an accurate picture of the temporal trend of
the disease;
 Sensitivity, to allow identification of individual persons with disease to
facilitate treatment; quarantine, or other appropriate control measures; and
 Specificity, to exclude persons not having disease.
Other characteristics of well-conducted surveillance are described in Appendix A.
The importance of each of these characteristics can vary according to the
purpose of surveillance, the disease under surveillance, and the planned use of
surveillance data (See Table 5.7 in Appendix A). To establish the objectives of
surveillance for a particular disease in a specific setting and to select an
appropriate method of conducting surveillance for that disease, asking and
answering the following questions will be helpful.
 What is the health-related event under surveillance? What is its case
definition?
 What is the purpose and what are the objectives of surveillance?
 What are the planned uses of the surveillance data?
 What is the legal authority for any data collection?
 Where is the organizational home of the surveillance?
 Is the system integrated with other surveillance and health information
systems?
 What is the population under surveillance?
 What is the frequency of data collection (weekly, monthly, annually)?
 What data are collected and how? Would a sentinel approach
or sampling be more effective?
 What are the data sources? What approach is used to obtain data?
 During what period should surveillance be conducted? Does it need to be
continuous, or can it be intermittent or short-term?
 How are the data processed and managed? How are they routed,
transferred, stored? Does the system comply with applicable standards for
data formats and coding schemes? How is confidentiality maintained?
 How are the data analyzed? By whom? How often? How thoroughly?
 How is the information disseminated? How often are reports distributed?
To whom? Does it get to all those who need to know, including the medical
and public health communities and policymakers? (9, 10)

Importance of Surveillance and


Detection in Public Health Initiatives
The World Health Organization (WHO) defines public health surveillance as the “continuous,
systematic collection, analysis and interpretation of health-related data needed for the planning,
implementation, and evaluation of public health practice.”(1) The aggregation of quality health-
related data is essential to the success of all public health initiatives. Without correct and current
data, diseases are misunderstood, health programs do not accomplish their goals, and resources
are incorrectly allocated. Functioning surveillance systems are necessary for the success of global
health initiatives. In developing countries, however, surveillance systems that collect useful and
representative data are often non-existent and hard to create. The failure of surveillance systems in
developing countries is often due to limited available resources, lack of knowledgeable staff,
disorganization, and poor infrastructure for finding and reporting cases.(2) Stronger public health
surveillance systems in developing countries will allow public health officials to more accurately
describe and assess the state of health problems. Reliable data can improve health promotion
programs, and help policy makers and investors allocate resources effectively.(3)

Data and Defining the Health Problem


A health problem must be well defined before it can be solved. Surveillance systems generate data
that help public health officials understand existing and emerging infectious and non-infectious
diseases. Without a proper understanding of the health problem (etiology, distribution, and
mechanism of infection), it will be difficult to ameliorate the health issue. Continued data collection is
needed to monitor new diseases that threaten global health security (like the Ebola virus) and the
changes in distribution and virulence of well-known diseases (like the Influenza virus). Information
collected on novel diseases include characteristics such as the type of pathogen involved, symptoms
caused, the infected population, and the morbidity and mortality rates.(4) Without surveillance, public
health officials would be stabbing blindly at health problems, which is a waste of precious resources.
Understanding the pathogen involved helps scientists understand where and how to intervene.

Data and Health Programming


Once data generated from surveillance systems are compiled and analyzed, scientists can draw a
picture of the health problem and begin to develop public health interventions. Evidence-based
practice in public health depends on current and correct data.(5) After a program is created and
implemented, continued surveillance is important to the program’s evaluation. Program evaluation
allows leaders to modify the program to make it more successful.(6)

The Smallpox Eradication Program (1966 to 1978) eradicated the smallpox virus through education
and mass vaccination programs across the globe. A competent and integrated surveillance system
was crucial to the success of the program. As new cases were reported, ring vaccination was
implemented to prevent the spread of the illness to non-immunized people.(7) Without a network that
allowed for quick detection and action, smallpox might never have been eradicated. Surveillance is
vital in control, elimination, and eradication initiatives, but it is often where programs fail.
Governments and organizations should understand the importance of surveillance in disease and
epidemic control, because in many cases when surveillance fails, the program fails, as
demonstrated in a case study below.(8)

Data, Public Policy, and Funding


Surveillance systems that generate specific data on diseases and geographic areas are imperative
because they help measure the relative importance of a health event.(9) Facts about disease
distribution and determinants that come from surveillance help politicians and organizations make
more informed decisions about where, when, and how to spend money and time in order to elicit the
best results. Without quality public health data, interventions may be misguided and
wasteful.(10) Numbers and statistics should be the basis upon which funders and politicians make
their decisions. Confidence in the surveillance system that generated these numbers is crucial in
resource allocation. Should more money go towards controlling a disease that kills one million
people per year, or one thousand? Such questions cannot even be raised without surveillance and
data collection.

The main types of information collected by surveillance systems to measure the relative importance
of a disease are:(11)
 Incidence/prevalence
 Severity (case fatality rate)
 Mortality rate
 Productivity loss
 Premature mortality (YPLL)
 Costs in medical care
 Preventability of disease

You might also like