BIOSTAT Chapter2
BIOSTAT Chapter2
BIOSTAT Chapter2
Frequency
Distributions
1
Frequency Distributions
After collecting data, the first task for a
researcher is to organize and simplify the
data so that it is possible to get a general
overview of the results. This is the goal of
descriptive statistical techniques.
One method for simplifying and organizing data
is to construct a frequency distribution.
FREQUENCY DISTRIBUTIONS
(CONT.)
A table that organizes data values into classes
or intervals along with number of values that
fall in each class (frequency, f ).
Grouped
Age of Frequency, f
Voters
25
18-30
202
38
31-42
508
217
43-54
620
1462
55-66
413
932
67-78
158
15
78-90
32
Peas per
pod
Freq, f
Freq,
Peas per pod
f
1
18
12
Grouped Frequency
Distribution
Key Concepts:
83
84
62
62
43
72
48
46
59
93
64
59
32
54
45
55
45
76
72
40
51
51
72
83
49
62
85
74
40
49
65
38
55
77
63
38
43
63
69
To
construct
a
frequency
distribution of the given raw data,
we first find the highest serum
level and prepare a column of the
these levels beginning from the
highest value and ending at the
lowest one. Since the highest
serum level is 93 and the lowest is
32, we have:
Serum
Level
Serum
Level
93
62
III
85
59
II
84
55
83
II
54
79
51
II
77
49
II
76
48
74
46
72
III
45
II
69
43
II
65
40
II
64
38
II
63
II
32
FREQUENCY
10-19
20-29
30-39
40-39
50-59
1
15
Steps in Constructing
FDT:
K= 1+3.322 log n
where K=approximate number of classes
n= number of class
41
42
51
52
61
62
71
72
81
82
91
92
101
Tally
Classes
32
42
41
51
Frequency
IIII
IIII-IIII
IIII
52
61
62
71
IIII-III
72
81
IIII-III
82
91
IIII
92
101
4
8
8
5
1
N=40
Frequency Distribution
Graphs
In a frequency distribution graph, the score
categories (X values) are listed on the X axis
and the frequencies are listed on the Y axis.
When the score categories consist of
numerical scores from an interval or ratio
scale, the graph should be either a
histogram or a polygon.
Histograms
In a histogram, a bar is centered above each
score (or class interval) so that the height of
the bar corresponds to the frequency and the
width extends to the real limits, so that
adjacent bars touch.
Polygons
In a polygon, a dot is centered above each
score so that the height of the dot
corresponds to the frequency. The dots are
then connected by straight lines.
An
additional line is drawn at each end to bring
the graph back to a zero frequency.
28
Bar graphs
When the score categories (X values) are
measurements from a nominal or an
ordinal scale, the graph should be a bar
graph.
A bar graph is just like a histogram
except that gaps or spaces are left
between adjacent bars.
30
Relative frequency
Many populations are so large that it is
impossible to know the exact number of
individuals (frequency) for any specific
category.
In these situations, population distributions
can be shown using relative frequency
instead of the absolute number of individuals
for each category.
32
Smooth curve
If the scores in the population are measured on
an interval or ratio scale, it is customary to
present the distribution as a smooth curve
rather than a jagged histogram or polygon.
The smooth curve emphasizes the fact that the
distribution is not showing the exact frequency
for each category.
34
Frequency distribution
graphs
Frequency distribution graphs are useful
because they show the entire set of scores.
At a glance, you can determine the highest
score, the lowest score, and where the scores
are centered.
The graph also shows whether the scores are
clustered together or scattered over a wide
range.
36
Shape
A graph shows the shape of the distribution.
A distribution is symmetrical if the left side
of the graph is (roughly) a mirror image of the
right side.
One example of a symmetrical distribution is
the bell-shaped normal distribution.
On the other hand, distributions are skewed
when scores pile up on one side of the
distribution, leaving a "tail" of a few extreme
values on the other side.
37
Positively and
Negatively
Distributions
aSkewed
positively
skewed distribution,
In
the
scores tend to pile up on the left side of
the distribution with the tail tapering off to
the right.
In a negatively skewed distribution, the
scores tend to pile up on the right side and
the tail points to the left.
38
Percentiles, Percentile
Ranks,
and Interpolation
40
Percentiles, Percentile
Ranks,
and
Interpolation
(cont.)
find percentiles and percentile ranks, two
To
new
columns are placed in the frequency distribution table:
One is for cumulative frequency (cf) and the other is for
cumulative percentage (c%).
Each cumulative percentage identifies the percentile
rank for the upper real limit of the corresponding score
or class interval. When scores or percentages do not
correspond to upper real limits or cumulative
percentages, you must use interpolation to determine
the corresponding ranks and percentiles. Interpolation
is a mathematical process based on the assumption that
the scores and the percentages change in a regular,
linear fashion as you move through an interval from one
end to the other.
41
Interpolation
When scores or percentages do not
correspond to upper real limits or
cumulative percentages, you must use
interpolation
to
determine
the
corresponding ranks and percentiles.
Interpolation is a mathematical process
based on the assumption that the scores
and the percentages change in a regular,
linear fashion as you move through an
interval from one end to the other.
42
Stem-and-Leaf Displays
A stem-and-leaf display provides a very
efficient method for obtaining and
displaying a frequency distribution.
Each score is divided into a stem
consisting of the first digit or digits, and a
leaf consisting of the final digit.
Finally, you go through the list of scores,
one at a time, and write the leaf for each
score beside its stem.
The resulting display provides an
organized picture of the entire distribution.
The number of leafs beside each stem
corresponds to the frequency, and the
individual leafs identify the individual
scores.
44
Descriptive Statistics
Sample Illustration:
Which Group is Smarter?
Class A--IQs of 13 Students
102
115
127
162
128
109
131
103
131
89
96
111
80
109
93
87
98
106
140
93
110
119
97
120
105
109
Descriptive Statistics
Which group is smarter now?
Class A--Average IQ
110.54
Class B--Average IQ
110.23
Other Graphs
49
50
Split each data value at the same place value to form the
stem and a leaf. (Want 5-20 stems).
1. Split stems
2. Back to back stem plots.
51
Dot Plots
Dot plot
Consists of a graph in which each data value is
plotted as a point along a scale of values
Figure 2-5
Time Series
(Paired data)
Time Series
Data set is composed of quantitative entries
taken at regular intervals over a period of time.
e.g., The amount of precipitation measured
each day for one month.
Quantitativ
e data
time
Time-Series Graph
Number of Screens at Drive-In Movies
Theaters
Figure 2-8
55
Pie Chart
Pareto Chart
A vertical bar graph in which the
height of each bar represents
frequency or relative frequency.
Frequency
Categories
Marital Status
Never Married
55.3
Married
127.7
Widowed
13.9
Divorced
22.8
Total: 219.7
55.3
219.7
127.7
219.7
13.9
219.7
22.8
219.7
0.25 or 25%