Nothing Special   »   [go: up one dir, main page]

Lecture #2: Statistics: Type of Sampling Techniques A. Probability Sampling

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 19

LECTURE #2: STATISTICS

Type of Sampling Techniques

a. Probability Sampling - Samples are chosen in such a way that each member of the population has a
known though not necessarily equal chance of being included in the sample. It is also called unbiased
sampling.

Probability sampling is defined as a sampling technique in which the researcher chooses samples
from a larger population using a method based on the theory of probability. For a participant to be
considered as a probability sample, he/she must be selected using a random selection.

The most critical requirement of probability sampling is that everyone in your population has a
known and equal chance of getting selected. For example, if you have a population of 100 people, every
person would have odds of 1 in 100 for getting selected. Probability sampling gives you the best chance
to create a sample that is truly representative of the population.

Probability sampling uses statistical theory to randomly select a small group of people (sample)
from an existing large population and then predict that all their responses will match the overall
population.

Example of probability sampling

Let us take an example to understand this sampling technique. The population of the US alone is
330 million. It is practically impossible to send a survey to every individual to gather information. Use
probability sampling to collect data, even if you collect it from a smaller population.

For example, an organization has 500,000 employees sitting at different geographic locations.
The organization wishes to make certain amendments in its human resource policy, but before they roll
out the change, they want to know if the employees will be happy with the change or not. However, it’s a
tedious task to reach out to all 500,000 employees. This is where probability sampling comes handy. A
sample from the larger population i.e., from 500,000 employees, is chosen. This sample will represent the
population. Deploy a survey now to the sample. From the responses received, management will now be
able to know whether employees in that organization are happy or not about the amendment.

1. Simple Random Sampling - All members of the population have a chance of being included in the
sample.

Simple random sampling, as the name suggests, is an entirely random method of selecting the
sample. This sampling method is as easy as assigning numbers to the individuals (sample) and then
randomly choosing from those numbers through an automated process. Finally, the numbers that are
chosen are the members that are included in the sample.
There are two ways in which researchers choose the samples in this method of sampling: The
lottery system and using number generating software/ random number table. This sampling technique
usually works around a large population and has its fair share of advantages and disadvantages.

2. Systematic Sampling - It selects every kth member of the population with starting point determined at
random.

Systematic sampling is when you choose every “nth” individual to be a part of the sample. For
example, you can select every 5th person to be in the sample. Systematic sampling is an extended
implementation of the same old probability technique in which each member of the group is selected at
regular periods to form a sample. There’s an equal opportunity for every member of a population to be
selected using this sampling technique.

3. Stratified Sampling/Random quota sampling - This is used when the population can be subdivided
into several smaller groups or strata.

Stratified random sampling involves a method where the researcher divides a more extensive
population into smaller groups that usually don’t overlap but represent the entire population. While
sampling, organize these groups and then draw a sample from each group separately.

A standard method is to arrange or classify by sex, age, ethnicity, and similar ways. Splitting
subjects into mutually exclusive groups and then using simple random sampling to choose members from
groups.

Members of these groups should be distinct so that every member of all groups get equal opportunity to
be selected using simple probability. This sampling method is also called “random quota sampling.”

4. Cluster Sampling/Random cluster sampling - This is sometimes called area sampling. It is usually
used when the population is very very large. In this technique groups or clusters instead of individuals are
randomly selected.

Random cluster sampling is a way to select participants randomly that are spread out
geographically. For example, if you wanted to choose 100 participants from the entire population of the
U.S., it is likely impossible to get a complete list of everyone. Instead, the researcher randomly selects
areas (i.e., cities or counties) and randomly selects from within those boundaries.

Cluster sampling usually analyzes a particular population in which the sample consists of more
than a few elements, for example, city, family, university, etc. Researchers then select the clusters by
dividing the population into various smaller sections.

b. Non-Probability Sampling - Each member of the population does not have a known chance of being
included in the sample. Instead, personal judgement plays a very important role in the selection. It is also
called biased sampling.
Non-probability sampling is defined as a sampling technique in which the researcher selects
samples based on the subjective judgment of the researcher rather than random selection. It is a less
stringent method. This sampling method depends heavily on the expertise of the researchers. It is carried
out by observation, and researchers use it widely for qualitative research.

Non-probability sampling is a sampling method in which not all members of the population
have an equal chance of participating in the study, unlike probability sampling. Each member of the
population has a known chance of being selected. Non-probability sampling is most useful for exploratory
studies like a pilot survey (deploying a survey to a smaller sample compared to pre-determined sample
size). Researchers use this method in studies where it is impossible to draw random probability sampling
due to time or cost considerations.

1. Quota Sampling - It is similar to the stratified sampling. The only difference is that the selection
of the members of samples in the stratified is done randomly.

Hypothetically consider, a researcher wants to study the career goals of male and female
employees in an organization. There are 500 employees in the organization, also known as the
population. To understand better about a population, the researcher will need only a sample, not
the entire population. Further, the researcher is interested in particular strata within the
population. Here is where quota sampling helps in dividing the population into strata or groups.

2. Purposive sampling/judgemental Sampling - Involves choosing the respondents on the basis of


predetermined criteria set by the researcher.

In the judgmental sampling method, researchers select the samples based purely on the
researcher’s knowledge and credibility. In other words, researchers choose only those people who
they deem fit to participate in the research study. Judgmental or purposive sampling is not a
scientific method of sampling, and the downside to this sampling technique is that the
preconceived notions of a researcher can influence the results. Thus, this research technique
involves a high amount of ambiguity.

3. Convenience Sampling - or accidental sampling - The researcher uses subjects that are readily
available to form the samples.

Convenience sampling is a non-probability sampling technique where samples are


selected from the population only because they are conveniently available to the researcher.
Researchers choose these samples just because they are easy to recruit, and the researcher did not
consider selecting a sample that represents the entire population.

Ideally, in research, it is good to test a sample that represents the population. But, in some
research, the population is too large to examine and consider the entire population. It is one of the
reasons why researchers rely on convenience sampling, which is the most common non-
probability sampling method, because of its speed, cost-effectiveness, and ease of availability of
the sample.
Non-probability sampling examples
Here are three simple examples of non-probability sampling to understand the subject better.

1. An example of convenience sampling would be using student volunteers known to the


researcher. Researchers can send the survey to students belonging to a particular school,
college, or university, and act as a sample.
2. In an organization, for studying the career goals of 500 employees, technically, the sample
selected should have proportionate numbers of males and females. Which means there should
be 250 males and 250 females. Since this is unlikely, the researcher selects the groups or strata
using quota sampling.
3. Researchers also use this type of sampling to conduct research involving a particular illness in
patients or a rare disease. Researchers can seek help from subjects to refer to other subjects
suffering from the same ailment to form a subjective sample to carry out the study.

When to use non-probability sampling?

● Use this type of sampling to indicate if a particular trait or characteristic exists in a population.
● Researchers widely use the non-probability sampling method when they aim at conducting
qualitative research, pilot studies, or exploratory research.

● Researchers use it when they have limited time to conduct research or have budget constraints.
● When the researcher needs to observe whether a particular issue needs in-depth analysis, he
applies this method.

● Use it when you do not intend to generate results that will generalize the entire population.

Advantages of non-probability sampling


Here are the advantages of using the non-probability technique

● Non-probability sampling techniques are a more conducive and practical method for
researchers deploying surveys in the real world. Although statisticians prefer probability
sampling because it yields data in the form of numbers, however, if done correctly, it can
produce similar if not the same quality of results.
● Getting responses using non-probability sampling is faster and more cost-effective than
probability sampling because the sample is known to the researcher. The respondents respond
quickly as compared to people randomly selected as they have a high motivation level to
participate.

Difference between non-probability sampling and probability sampling:


Non-probability sampling Probability sampling

Sample selection based on the subjective judgment of The sample is selected at random.
the researcher.

Not everyone has an equal chance to participate. Everyone in the population has an equal chance of
getting selected.

The researcher does not consider sampling bias. Used when sampling bias has to be reduced.

Useful when the population has similar traits. Useful when the population is diverse.

The sample does not accurately represent the population. Used to create an accurate sample.

Finding respondents is easy. Finding the right respondents is not easy.

2. Data Gathering Techniques

A. Direct or interview method

Direct Personal Interviews – Methods of Primary Data Collection.

A face-to-face contact is made with the informants (persons from whom the information is to be
obtained) under this method of collecting data. The interviewer asks them questions pertaining to the
survey and collects the desired information. Thus, if a person wants to collect data about the working
conditions of the workers of the BHEL, Trichy, he would go to the factory, contact the workers and
obtain the desired information. The information collected in this manner is first hand and also original in
character.
An interview is generally a qualitative research technique which involves asking open-ended
questions to converse with respondents and collect elicit data about a subject. The interviewer in most
cases is the subject matter expert who intends to understand respondent opinions in a well-planned and
executed series of questions and answers. Interviews are similar to focus groups and surveys when it
comes to garnering information from the target market but are entirely different in their operation – focus
groups are restricted to a small group of 6-10 individuals whereas surveys are quantitative in nature.
Interviews are conducted with a sample from a population and the key characteristic they exhibit is their
conversational tone.

Fundamental Types of Interviews in Research

● Structured Interviews

Structured interviews are defined as research tools that are extremely rigid in their operations and
allow very little or no scope of prompting the participants to obtain and analyze results. It is thus also
known as a standardized interview and is significantly quantitative in its approach. Questions in this
interview are pre-decided according to the required detail of information.

Structured interviews are excessively used in survey research with the intention of maintaining
uniformity throughout all the interview sessions. They can be closed-ended as well as open-ended –
according to the type of target population. Closed-ended questions can be included to understand user
preferences from a collection of answer options whereas open-ended can be included to gain details about
a particular section in the interview.

● Semi-Structured Interviews

Semi-structured interviews offer a considerable amount of leeway to the researcher to probe the
respondents along with maintaining basic interview structure. Even if it is a guided conversation between
researchers and interviewees – an appreciable flexibility is offered to the researchers. A researcher can be
assured that multiple interview rounds will not be required in the presence of structure in this type of
research interview. Keeping the structure in mind, the researcher can follow any idea or take creative
advantage of the entire interview. Additional respondent probing is always necessary to garner
information for a research study. The best application of semi-structured interview is when the researcher
doesn’t have time to conduct research and requires detailed information about the topic.

● Unstructured Interviews
Also called in-depth interviews, unstructured interviews are usually described as conversations
held with a purpose in mind – to gather data about the research study. These interviews have the least
number of questions as they lean more towards a normal conversation but with an underlying subject.

The main objective of most researchers using unstructured interviews is to build a bond with the
respondents due to which there are high chances that the respondents will be 100% truthful with their
answers. There are no guidelines for the researchers to follow and so, they can approach the participants
in any ethical manner to gain as much information as they possibly can for their research topic.

Methods of Research Interviews:

There are three methods to conduct research interviews, each of which is peculiar in its
application and can be used according to the research study requirement.

● Personal Interviews

Personal interviews are one of the most used types of interviews, where the questions are asked
personally directly to the respondent. For this, a researcher can have a guide online surveys to take note of
the answers. A researcher can design his/her survey in such a way that they take notes of the comments or
points of view that stand out from the interviewee.

● Telephonic Interviews

Telephonic interviews are widely used and easy to combine with online surveys to carry out
research effectively.

● Email or Web Page Interviews

Online research is growing more and more because consumers are migrating to a more virtual
world and it is best for each researcher to adapt to this change. The increase in people with Internet access
has made it popular that interviews via email or web page stand out among the types of interviews most
used today. For this nothing better than an online survey. More and more consumers are turning to online
shopping, which is why they are a great niche to be able to carry out an interview that will generate
information for the correct decision making.

B. Indirect or questionnaire method

A questionnaire is a research instrument that consists of a set of questions or other types of


prompts that aims to collect information from a respondent. A research questionnaire is typically a mix of
close-ended questions and open-ended questions. Open-ended, long-form questions offer the respondent
the ability to elaborate on their thoughts. Research questionnaires were developed in 1838 by the
Statistical Society of London. The data collected from a data collection questionnaire can be both
qualitative as well as quantitative in nature. A questionnaire may or may not be delivered in the form of a
survey, but a survey always consists of a questionnaire.

Questionnaire Examples

The best way to understand how questionnaires work is to see the types of questionnaires
available. Some examples of a questionnaire are:

● Customer Satisfaction Questionnaire: This type of research can be used in any situation where
there’s an interaction between a customer and an organization. For example, you might send a
customer satisfaction survey after someone eats at your restaurant. You can use the study to
determine if your staff is offering excellent customer service and a positive overall experience.
● Product Use Satisfaction Questionnaire: You can use this template to better understand your
product’s usage trends and similar products. This also allows you to collect customer preferences
about the types of products they enjoy or want to see on the market.
● Company Communications Evaluation Questionnaire: Unlike the other examples, a company
communications evaluation looks at internal and external communications. It can be used to
check if the policies of the organization are being enforced across the board, both with employees
and clients.

The above survey questions are typically easy to use, understand, and execute. Additionally, the
standardized answers of a survey questionnaire instead of a person-to-person conversation make it easier
to compile usable data. The most significant limitation of a data collection questionnaire is that
respondents need to read all of the questions and respond to them. For example, you send an invitation
through email asking respondents to complete the questions on social media. If a target respondent
doesn’t have the right social media profiles, they can’t answer your questions.

C. Registration method

A register is a depository of information on fishing vessels, companies, gear, licenses or


individual fishers. It can be used to obtain complete enumeration through a legal requirement. Registers
are implemented when there is a need for accurate knowledge of the size and type of the fishing fleet and
for closer monitoring of fishing activities to ensure compliance with fishery regulations. They may also
incorporate information related to fiscal purposes (e.g. issuance or renewal of fishing licenses). Although
registers are usually implemented for purposes other than to collect data, they can be very useful in the
design and implementation of a statistical system, provided that the data they contain are reliable, timely
and complete.

Registration data types


In most countries, vessels, especially commercial fishing vessels, and chartered or contract
fishing vessels are registered with the fisheries authorities. Data on vessel type, size, gear type, country of
origin, fish holding capacity, number of fishers and engine horsepower should be made available for the
registry.

Companies dealing with fisheries agencies are registered for various purposes. These companies
may not only include fishing companies, but also other type of companies involved in processing and
marketing fishery products. Data, such as the number of vessels, gear type and vessel size of registered
fishing companies, should be recorded during such registration. Processing companies should provide
basic data on the type of processing, type of raw material, capacity of processing, and even the source of
material.

Fishing vessels and fishing gears may often be required to hold a valid fishing licence. Unlike
vessel registers, licences tend to be issued for access to specific fisheries over a set period of time.
Because licences may have to be periodically renewed, they can be a useful way to update information on
vessel and gear characteristics.

Registry design

A registry must not only capture new records, but be able to indicate that a particular record is
inactive (e.g. a company has ceased operations) or record changes in operations (e.g. a company's
processing capacity has increased). If licences must be renewed each year, data collected from licensing is
particularly useful, as records are updated on an annual basis.

Registry data also contain criteria for the classification of fishing units into strata. These
classifications are usually based on assumptions and a priori knowledge regarding differences on catch
rates, species composition and species selectively.
In general, vessel registers are complex systems requiring well-established administrative
procedures supported by effective data communications, data storage and processing components. As
such, they predominantly deal with only certain types and size of fishing units, most often belonging to
industrial and semi-industrial fleets. Small-scale and subsistence fisheries involving large numbers of
fishing units are often not part of a register system or, if registered, are not easily traced so as to allow
validation or updating.

D. Experimental method

An experiment is a data collection method where you as a researcher change some variables and
observe their effect on other variables. The variables that you manipulate are referred to as independent
while the variables that change as a result of manipulation are dependent variables. Imagine a
manufacturer is testing the effect of drug strength on the number of bacteria in the body. The company
decides to test drug strength at 10mg, 20mg and 40mg. In this example, drug strength is the independent
variable while number of bacteria is the dependent variable. The drug administered is the treatment, while
10mg, 20mg and 40mg are the levels of the treatment.
The greatest advantage of using an experiment is that you can explore causal relationships that an
observational study cannot. Additionally, experimental research can be adapted to different fields like
medical research, agriculture, sociology, and psychology. Nevertheless, experiments have the
disadvantage of being expensive and requiring a lot of time.

Recall that data can be collected in two main ways: (1) through sample surveys or (2) through
designed experiments. While sample surveys lead to observational studies, designed experiments enable
researchers to control variables, leading to additional conclusions.

A designed experiment is a controlled study whose purpose is to control as many factors as


possible to isolate the effects of a particular factor. Designed experiments must be carefully set up to
achieve their purposes.

The variables in a designed experiment that are controlled are called the explanatory variables or
are sometimes called the factors. Factors have values that can be changed by the researcher and are
considered as possible causes. Examples of factors are:

● The dosage of a drug in a medical experiment


● The type of teaching method in an education experiment
● One drug by itself compared with that drug used in conjunction with another
● The designed experiment analyzes the effects of the factors on the response variable. Response
variables are not part of a controlled environment and have values that are measured by the
researcher.

Examples of response variables are:

● The blood pressures of the patients


● The test scores for a class
● The sizes of a cancerous tumor for patients

3. Types of Questionnaire

There are roughly two types of questionnaires, structured and unstructured. A mixture of these
both is the quasi-structured questionnaire that is used mostly in social science research.

A. Structure Question

Structured questionnaires include pre-coded questions with well-defined skipping patterns to


follow the sequence of questions. Most of the quantitative data collection operations use structured
questionnaires. Fewer discrepancies, easy to administer consistency in answers and easy for the data
management are advantages of such structured questionnaires.

B. Unstructured Question
Unstructured questionnaires include open-ended and vague opinion-type questions. Maybe
questions are not in the format of interrogative sentences and the moderator or the enumerator has to
elaborate the sense of the question. Focus group discussions use such a questionnaire.

Not all questions are easily pre-coded with almost possible alternatives to answers. Given answer
alternatives of some questions in the standard questionnaires are left as ‘others’ (please specify). A
common and pragmatic practice is that most of the questions are structured, however, it is comfortable to
have some unstructured questions whose answers are not feasible to enumerate completely. Such a type
of questionnaire is called a quasi-structured questionnaire.

4. Measure of Central Tendency (Grouped and ungrouped data)

In real-world applications, you can use tables and graphs of various kinds to show information
and to extract information from data that can lead to analyses and predictions. Graphs allow you to
communicate a message from data.

Measures of central tendency are a key way to discuss and communicate with graphs. The term
central tendency refers to the middle, or typical, value of a set of data, which is most commonly measured
by using the three m's: mean, median, and mode. The mean, median, and mode are known as the
measures of central tendency. In this lesson, you will explore these three concepts.

Data can be classified in various forms. One way to distinguish between data is in terms of
grouped and ungrouped data.

● Ungrouped data - when the data has not been placed in any categories and no
aggregation/summarization has taken place on the data then it is known as ungrouped data.
Ungrouped data is also known as raw data.
● Grouped data - when raw data have been grouped in different classes then it is said to be
grouped data.

For example, consider the following :

Height of students:
(171,161,155,155,183,191,185,170,172,177,183,190,139,149,150,150,152,158,159,174,178,179,
190,170,143,165,167,187,169,182,163,149,174,174,177,181,170,182,170,145,143): This is
raw/ungrouped data.

The following table shows the grouped data from the above mentioned raw data
Central Tendencies

As the names suggest, central tendencies have something to do with the center. Central tendency
is the central location in a probability distribution. There are many measures for central tendencies like
mean, mode, median, interquartile range, percentiles, geometric mean, harmonic mean, etc. The most
common measures of central tendencies used are discussed below.

A. Mean

MEAN: Also known as the arithmetic average. It is calculated by the summation of all
values divided by the number of values.

eg, The mean of “15,11,14,3,21,17,22,16,19,16,5,7,9,20,4” is 13.26667.

B. Mode

MODE: The most frequently occurring item/value in a data set is called mode. Bimodal
is used in the case when there is a tie b/w two values. Multimodal is when a given dataset
has more than two values with the same occurring frequency.

eg. 7,11,14.25,15,15,15,15,15,19,19,29,81. Mode is 15

C. Median

MEDIAN: The median of a dataset is described as the middlemost value in the ordered
arrangement of the values in the dataset.

NOTE: For an odd number of the dataset, the median is the middle value. For an even
number of the dataset, the median is the average of the two middle values.

eg. 15,11,14,3,21,17,22,16,19,16,5,7,9,20,4
Let’s arrange this data in ascending order
3,4,5,7,8,9,11,14,15,16,16,17,19,19,20,22,22. The median is n+1/2 = 17+1/2 = 18/2 = 9

Advantage of Median : It is not influenced by larger values. It remain immune to outliers.


“The data must be at least ordinal for the median to be meaningful”

1. Quartile:

The values which divide an array (a set of data arranged in ascending or descending order) into
four equal parts are called Quartiles. The first, second and third quartiles are denoted by Q1, Q2,Q3
respectively. The first and third quartiles are also called the lower and upper quartiles respectively. The
second quartile represents the median, the middle value.

Quartiles for Ungrouped Data:


Quartiles for ungrouped data are calculated by the following formulae.

Here, n = 20

i.

The value of the 5th item is 36 and that of the 6th item is 37. Thus, the first quartile is a value 0.25th of
the way between 36 and 37, which are 36.25. Therefore, = 36.25. Similarly,

ii.
The value of the 10th item is 54 and that of the 11th item is 55. Thus the second quartile is the 0.5th of the
value 54 and 55. Since the difference between 54 and 55 is of 1, therefore 54 + 1(0.5) = 54.5. Hence, =
54.5. Likewise,

The value of the 15th item is 68 and that of the 16th item is 70. Thus the third quartile is a value 0.75th of
the way between 68 and 70. As the difference between 68 and 70 is 2, so the third quartile will be 68 +
2(0.75) = 69.5. Therefore, = 69.5.

Quartiles for Grouped Data:


The quartiles may be determined from grouped data in the same way as the median except that in place of
n/2 we will use n/4. For calculating quartiles from grouped data we will form cumulative frequency
column. Quartiles for grouped data will be calculated from the following formulae;

= Median.
Where,
l = lower class boundary of the class containing the , i.e. the class corresponding to the
cumulative frequency in which n/4 or 3n/4 lies
h = class interval size of the class containing .
f = frequency of the class containing .
n = number of values, or the total frequency.
C.F = cumulative frequency of the class preceding the class containing .

i. The first quartile is the value of or the 30th item from the lower end. From Table 18
we see that cumulative frequency of the third class is 22 and that of the fourth class is 50. Thus lies in
the fourth class i.e. 140 – 149.
ii. The thirds quartile is the value of or 90th item from the lower end. The
cumulative frequency of the fifth class is 75 and that of the sixth class is 93. Thus, lies in the sixth
class i.e. 160 – 169.

Conclusion
From we conclude that 25% of the students weigh 142.36 pounds or less and 75% of the
students weigh 167.83 pounds or less.

2. Deciles:

The values which divide an array into ten equal parts are called deciles. The first, second,…… ninth
deciles by respectively. The fifth decile ( corresponds to median. The second, fourth,
sixth and eighth deciles which collectively divide the data into five equal parts are called quintiles.
Deciles for Ungrouped Data:
Deciles for ungrouped data will be calculated from the following formulae;

Decile for Grouped Data

Decile for grouped data can be calculated from the following formulae;
Where,
l = lower class boundary of the class containing the , i.e. the class corresponding to the
cumulative frequency in which 2n/10 or 9n/10 lies
h = class interval size of the class containing .
f = frequency of the class containing .
n = number of values, or the total frequency.
C.F = cumulative frequency of the class preceding the class containing .
Conclusion:
From we conclude that 40% students weigh 148.79 pounds or less, 70% students weigh
164.5 pounds or less and 90% students weigh 182.83 pounds or less.

3. Percentiles:

The values which divide an array into one hundred equal parts are called percentiles. The first, second,
……. Ninety-ninth percentile are denoted by The 50th percentile ( ) corresponds to the
median. The 25th percentile corresponds to the first quartile and the 75th percentile
corresponds to the third quartile.
Percentiles for Ungrouped Data:
Percentile from ungrouped data could be calculated from the following formulae;

Percentiles for Grouped Data:


Percentiles can also be calculated for grouped data which is done with the help of following formulae;
Where,
l = lower class boundary of the class containing the , i.e. the class corresponding to the
cumulative frequency in which 35n/100 or 99n/100 lies
h = class interval size of the class containing. .
f = frequency of the class containing .
n = number of values, or the total frequency.
C.F = cumulative frequency of the class preceding the class containing .
Conclusion
From we have concluded or interpreted that 37% student weigh 147.5 pounds or less.
Similarly, 45% students weigh 151.1 pounds or less and 90% students weigh 182.83 pounds or less.

5. Constructing Frequency distribution table (group data)

The frequency (f) of a particular observation is the number of times the observation occurs in the
data. The distribution of a variable is the pattern of frequencies of the observation. Frequency
distributions are portrayed as frequency tables, histograms, or polygons.

Frequency distributions can show either the actual number of observations falling in each range
or the percentage of observations. In the latter instance, the distribution is called a relative frequency
distribution. Frequency distribution tables can be used for both categorical and numeric variables.
Continuous variables should only be used with class intervals, which will be explained shortly.

Example 1 – Constructing a frequency distribution table

A survey was taken on Maple Avenue. In each of 20 homes, people were asked how many cars were
registered to their households. The results were recorded as follows:

1, 2, 1, 0, 3, 4, 0, 1, 1, 1, 2, 2, 3, 2, 3, 2, 1, 4, 0, 0

Use the following steps to present this data in a frequency distribution table.

1. Divide the results (x) into intervals, and then count the number of results in each interval. In
this case, the intervals would be the number of households with no car (0), one car (1), two
cars (2) and so forth.
2. Make a table with separate columns for the interval numbers (the number of cars per
household), the tallied results, and the frequency of results in each interval. Label these
columns Number of cars, Tally and Frequency.
3. Read the list of data from left to right and place a tally mark in the appropriate row. For
example, the first result is a 1, so place a tally mark in the row beside where 1 appears in the
interval column (Number of cars). The next result is a 2, so place a tally mark in the row
beside the 2, and so on. When you reach your fifth tally mark, draw a tally line through the
preceding four marks to make your final frequency calculations easier to read.
4. Add up the number of tally marks in each row and record them in the final column entitled
Frequency. The cumulative frequency distribution table should look like this:

By looking at this frequency distribution table quickly, we can see that out of 20
households surveyed, 4 households had no cars, 6 households had 1 car, etc.

Example 2 – Constructing a cumulative frequency distribution table


A cumulative frequency distribution table is a more detailed table. It looks almost the same as a frequency
distribution table but it has added columns that give the cumulative frequency and the cumulative
percentage of the results, as well.

At a recent chess tournament, all 10 of the participants had to fill out a form that gave their names,
address and age. The ages of the participants were recorded as follows:

36, 48, 54, 92, 57, 63, 66, 76, 66, 80

Use the following steps to present these data in a cumulative frequency distribution table.

1. Divide the results into intervals, and then count the number of results in each interval. In
this case, intervals of 10 are appropriate. Since 36 is the lowest age and 92 is the highest
age, start the intervals at 35 to 44 and end the intervals with 85 to 94.
2. Create a table similar to the frequency distribution table but with three extra columns.

● In the first column or the Lower value column, list the lower value of the result
intervals. For example, in the first row, you would put the number 35.
● The next column is the Upper value column. Place the upper value of the result
intervals. For example, you would put the number 44 in the first row.
● The third column is the Frequency column. Record the number of times a result
appears between the lower and upper values. In the first row, place the number 1.
● The fourth column is the Cumulative frequency column. Here we add the
cumulative frequency of the previous row to the frequency of the current row. Since
this is the first row, the cumulative frequency is the same as the frequency.
However, in the second row, the frequency for the 35–44 interval (i.e., 1) is added
to the frequency for the 45–54 interval (i.e., 2). Thus, the cumulative frequency is 3,
meaning we have 3 participants in the 34 to 54 age group.
1+2=3
● The next column is the Percentage column. In this column, list the percentage of the
frequency. To do this, divide the frequency by the total number of results and
multiply by 100. In this case, the frequency of the first row is 1 and the total number
of results is 10. The percentage would then be 10.0.
10.0. (1 ÷ 10) X 100 = 10.0
● The final column is Cumulative percentage. In this column, divide the cumulative
frequency by the total number of results and then to make a percentage, multiply by
100. Note that the last number in this column should always equal 100.0. In this
example, the cumulative frequency is 1 and the total number of results is 10,
therefore the cumulative percentage of the first row is 10.0.
10.0. (1 ÷ 10) X 100 = 10.0

The cumulative frequency distribution table should look like this:

You might also like