Nominal, Ordinal, Scale Variable
Nominal, Ordinal, Scale Variable
Nominal, Ordinal, Scale Variable
http://www.unesco.org/webworld/idams/advguide/Chapt1_
3.htm
.3 Types of Variables
We can distinguish between two types of variables according to the level of measurement:
1. Continuous or Quantitative Variables.
2. Discrete or Qualitative Variables.
A quantitative variable is one in which the variates differ in magnitude, e.g. income, age, GNP,
etc. A qualitative variable is one in which the variates differ in kind rather than in magnitude, e.g.
marital status, gender, nationality, etc.
Continuous or Quantitative Variables
Interval scale data has order and equal intervals. Interval scale variables are measured on
a linear scale, and can take on positive or negative values. It is assumed that the intervals
keep the same importance throughout the scale. They allow us not only to rank order the
items that are measured but also to quantify and compare the magnitudes of differences
between them. We can say that the temperature of 40C is higher than 30C, and an
increase from 20C to 40C is twice as much as the increase from 30C to 40C. Counts
are interval scale measurements, such as counts of publications or citations, years of
education, etc.
They occur when the measurements are continuous, but one is not certain whether they
are on a linear scale, the only trustworthy information being the rank order of the
observations. For example, if a scale is transformed by an exponential, logarithmic or any
other nonlinear monotonic transformation, it loses its interval - scale property. Here, it
would be expedient to replace the observations by their ranks.
Ratio data are also interval data, but they are not measured on a linear scale. . With
interval data, one can perform logical operations, add, and subtract, but one cannot
multiply or divide. For instance, if a liquid is at 40 degrees and we add 10 degrees, it will
be 50 degrees. However, a liquid at 40 degrees does not have twice the temperature of a
liquid at 20 degrees because 0 degrees does not represent "no temperature" -- to multiply
or divide in this way we would have to use the Kelvin temperature scale, with a true zero
point (0 degrees Kelvin = -273.15 degrees Celsius). In social sciences, the issue of "true
zero" rarely arises, but one should be aware of the statistical issues involved.
There are three different ways to handle the ratio-scaled variables.
Discrete variables are also called categorical variables. A discrete variable, X, can take on a finite
number of numerical values, categories or codes. Discrete variables can be classified into the
following categories:
1. Nominal variables
2. Ordinal variables
3. Dummy variables from quantitative variables
4. Preference variables
5. Multiple response variables
1. Nominal Variables
Nominal variables allow for only qualitative classification. That is, they can be measured
only in terms of whether the individual items belong to certain distinct categories, but we
cannot quantify or even rank order the categories: Nominal data has no order, and the
assignment of numbers to categories is purely arbitrary. Because of lack of order or equal
intervals, one cannot perform arithmetic (+, -, /, *) or logical operations (>, <, =) on the
nominal data. Typical examples of such variables are:
Gender:
1. Male
2. Female
Marital Status:
1. Unmarried
2. Married
3. Divorcee
4. Widower
2. Ordinal Variables
A discrete ordinal variable is a nominal variable, but its different states are ordered in a
meaningful sequence. Ordinal data has order, but the intervals between scale points may
be uneven. Because of lack of equal distances, arithmetic operations are impossible, but
logical operations can be performed on the ordinal data. A typical example of an ordinal
variable is the socio-economic status of families. We know 'upper middle' is higher than
'middle' but we cannot say 'how much higher'. Ordinal variables are quite useful for
subjective assessment of 'quality; importance or relevance'. Ordinal scale data are very
frequently used in social and behavioral research. Almost all opinion surveys today
request answers on three-, five-, or seven- point scales. Such data are not appropriate for
analysis by classical techniques, because the numbers are comparable only in terms of
relative magnitude, not actual magnitude.
Consider for example a questionnaire item on the time involvement of scientists in the
'perception and identification of research problems'. The respondents were asked to
indicate their involvement by selecting one of the following codes:
1 = Very low or nil
2 = Low
3 = Medium
4 = Great
5 = Very great
Here, the variable 'Time Involvement' is an ordinal variable with 5 states.
Ordinal variables often cause confusion in data analysis. Some statisticians treat them as
nominal variables. Other statisticians treat them as interval scale variables, assuming that
the underlying scale is continuous, but because of the lack of a sophisticated instrument,
they could not be measured on an interval scale.
3. Dummy Variables from Quantitative Variables
A quantitative variable can be transformed into a categorical variable, called a dummy
variable by recoding the values. Consider the following example: the quantitative
variable Age can be classified into five intervals. The values of the associated categorical
variable, called dummy variables, are 1, 2,3,4,5:
[Up to 25]
[25, 40 ]
[40, 50]
[50, 60]
[Above 60]
4. Preference Variables
Preference variables are specific discrete variables, whose values are either in a
decreasing or increasing order. For example, in a survey, a respondent may be
asked to indicate the importance of the following nine sources of information in
his research and development work, by using the code [1] for the most important
source and [9] for the least important source:
1. Literature published in the country
2. Literature published abroad
3. Scientific abstracts
4. Unpublished reports, material, etc.
5. Discussions with colleagues within the research unit
6. Discussions with colleagues outside the research unit but within
institution
7. Discussions with colleagues outside the institution
8. Scientific meetings in the country
Note that preference data are also ordinal. The interval distance from the first preference to the
second preference is not the same as, for example, from the sixth to the seventh preference.
1. Multiple Response Variables
Multiple response variables are those, which can assume more than one value. A typical
example is a survey questionnaire about the use of computers in research. The
respondents were asked to indicate the purpose(s) for which they use computers in their
research work. The respondents could score more than one category.
1. Statistical analysis
2. Lab automation/ process control
3. Data base management, storage and retrieval
4. Modeling and simulation
5. Scientific and engineering calculations
6. Computer aided design (CAD)
7. Communication and networking
8. Graphics
Since IDAMS does not handle multiple response variables, dummy variables have to be created
for each category prior to analysis.
http://pirate.shu.edu/~wachsmut/Teaching/MATH11
01/Intro/what-variable.html
1.4 Variables and Distributions
When we are looking at a particular population, selecting samples to make inferences, we need
to record our observations or the characteristics of the data we are studying.
A variable is the term used to record a particular characteristic of the population we are studying.
For example, if our population consists of pictures taken from Mars, we might use the following
variables to capture various characteristics of our population:
quality of a picture
title of a picture
and so on. It is useful to put variables into different categories, as different statistical procedures
apply to different types of variables. Variables can be categorized into two broad categories,
numerical and categorical:
Categorical Variables are variables that have a limited number of distinct values or categories.
They are sometimes called discrete variables.
Numeric Variables refer to characteristics that have a numeric values. They are usually
continuous variables, i.e. all values in an interval are possible.
Categorical variables again split up into two groups, ordinal and nominal variables:
Ordinal variables represent categories with some intrinsic order (e.g., low, medium, high;
strongly agree, agree, disagree, strongly disagree). Ordinal variables could consist of numeric
values that represent distinct categories (e.g., 1=low, 2=medium, 3=high). To best remember this
type of variable, think of "ordinal" containing the word "order".
Nominal variables represent categories with no intrinsic order (e.g., job category or company
division). Nominal variables could also consist of numeric values that represent distinct
categories (e.g., 1=Male, 2=Female).
It is usually not difficult to decide whether a variable is categorical (discrete) or numerical
(continues).
Example: An experiment is conducted to test whether a particular drug will successfully lower
the blood pressure of people. The data collected consists of the sex of each patient, the blood
pressure measured, and the date the measurement took place. The experiment is conducted three
times, once before the patient was treated, once one hour after administrating the drug, and again
2 days after administrating the drug. What variables comprise this experiment?
The characteristics measured in the experiment are the patient's sex, blood pressure, and the date.
The sex is a nominal variable, the blood pressure is numeric, and the date is ordinal. The fact that
the experiment is repeated does not change the number of variables recorded, each time 3
variables are recorded.
Note that recording these three variables only would not enable a successful data analysis. It
seems likely that in order to test the drug's effectiveness you need to correlate the measurements
taken on different dates with particular patients. In other words, you want to know what the
blood pressure for each patient was each time you took it. Thus, you should introduce at least one
more variable into your experiment, such as a patient ID (which is a nominal variable, even
though the ID could be a number).
Example: Consider the following survey, given a random sample of Seton Hall University
students:
Q1: What is your status: [ ] Freshmen [ ] Sophomore [ ] Junior [ ] Senior [ ] Graduate Student
Q2: What is your major: __________________
Q3: What is your age: ___________________
Q4: How often do you use the following support services?
daily
few times
per week
few times
per month
few times
per year
never
-1
Academic Advisement
The Career Center
Dining Services
Health Services
Counseling Services
Recreation Center
PC Support Services
Campus Ministry
Counseling Services
Recreation Center
PC Support Services
Campus Ministry
Example: Suppose a company issues sales reports for two years, 2004 and 2005, as shown
in the table and picture below. We can consider this report to consist of two variables
(v_2004 and v_2005, say), each one having 4 values (for North, South, East, and West,
respectively). Are the distributions of values hetero- or homogeneous?
The values for the 2004 variable (v_2004 if you like) are pretty close to each other.
In the chart you can see that all blue bars are approximatley of equal height. If I
would look at the original figures, I would find an (about) equal amount of sales for
North, South, East, and West, no region would stick out, particularly. Thus, each
region is equally likely in terms of number of sales - the distribution is
heterogeneous (if I checked where an individual, randomly selected sale came from,
each region would be approximately equally likely).
The values for the 2005 variable (v_2005 in our terminology) differ widely. In the
chart the red bars are of different heights, with "East" being by far the heighest. If I
would look at the original figures, I would find that most sales were made in the
East. Thus, a sale from the East is much more likely than from any other region - the
distribution is homogeneous (if I checked where an individual, randomly selected
sale came from, it would most likely come from the east).
Discussion Topic: Which type of variable (ordinal, nominal, or numeric) you think will be most
useful for statistical analysis? Which type of variable you think is usually present in a surveys
given to groups of people? Look at survey results from newspapers or online and report the
variables and their categories.
1. Types of Variables
There are three primary categories of variables: nominal, ordinal and interval.
Nominal variables are categorical, such as gender. Ordinal variables are also
categorical but have a clear ordering, such as high, medium and low
socioeconomic status. Finally, interval variables are similar to ordinals but have
defined spaced between the measures. For example, an interval measure would
categorize people into identically spaced annual income categories, such as
$10,000, $20,000, $30,000 and $40,000.
Scale Variables
Though nominal is one of the specific categories of variable, scale is a more
general distinction. In fact, any of the three formal types of variables can also be
scale. To make the variable scale, a researcher would assign numbers to the
nominal, ordinal or interval variables. For example, if the ordinal variable is
socioeconomic status and there are five categories (poverty, lower class, middle
class, upper-middle class and elite class), assigning the five options a number
would make the variable scale. So, poverty would equal 1, lower class would
equal 2 and so on.
Sponsored Links
Weighing Balance
Wide Range Of Weighing Balance With Premium Quality In Delhi.
devweighing.in/Call_Now_8800221611
Relationship
Nominal and scale are not mutually exclusive descriptors of variables. Rather,
they describe different aspects of variables. Nominal refers to the type of
variable whereas scale indicates how the variable is being measured or
categorized for analysis. Often, nominal variableswhich are always qualitative
in natureneed to be turned into scale variables to use statistical equations to
analyze them.
Uses
Both nominal and scale variables are used often in statistics and in academic
research. Such variables can help social scientists, mathematicians and other
researchers assess data. For example, if researchers conduct a survey and want to
find out if Catholics respond differently than Muslims, they can use regression
analysis to determine if a respondent's religiona nominal variableleads to a
significant difference in the responses. The researchers would scale the variable
by assigning a 1 to Catholics, 2 to Muslims, 3 to Buddhists and so on.
http://www.ats.ucla.edu/stat/mult_pkg/whatst
at/nominal_ordinal_interval.htm
What is the difference between categorical, ordinal and interval variables?
In talking about variables, sometimes you hear variables being described as categorical (or
sometimes nominal), or ordinal, or interval. Below we will define these terms and explain why
they are important.
Categorical
A categorical variable (sometimes called a nominal variable) is one that has two or more
categories, but there is no intrinsic ordering to the categories. For example, gender is a
categorical variable having two categories (male and female) and there is no intrinsic ordering to
the categories. Hair color is also a categorical variable having a number of categories (blonde,
brown, brunette, red, etc.) and again, there is no agreed way to order these from highest to
lowest. A purely categorical variable is one that simply allows you to assign categories but you
cannot clearly order the variables. If the variable has a clear ordering, then that variable would
be an ordinal variable, as described below.
Ordinal
An ordinal variable is similar to a categorical variable. The difference between the two is that
there is a clear ordering of the variables. For example, suppose you have a variable, economic
status, with three categories (low, medium and high). In addition to being able to classify people
into these three categories, you can order the categories as low, medium and high. Now consider
a variable like educational experience (with values such as elementary school graduate, high
school graduate, some college and college graduate). These also can be ordered as elementary
school, high school, some college, and college graduate. Even though we can order these from
lowest to highest, the spacing between the values may not be the same across the levels of the
variables. Say we assign scores 1, 2, 3 and 4 to these four levels of educational experience and
we compare the difference in education between categories one and two with the difference in
educational experience between categories two and three, or the difference between categories
three and four. The difference between categories one and two (elementary and high school) is
probably much bigger than the difference between categories two and three (high school and
some college). In this example, we can order the people in level of educational experience but
the size of the difference between categories is inconsistent (because the spacing between
categories one and two is bigger than categories two and three). If these categories were equally
spaced, then the variable would be an interval variable.
Interval
An interval variable is similar to an ordinal variable, except that the intervals between the values
of the interval variable are equally spaced. For example, suppose you have a variable such as
annual income that is measured in dollars, and we have three people who make $10,000, $15,000
and $20,000. The second person makes $5,000 more than the first person and $5,000 less than
the third person, and the size of these intervals is the same. If there were two other people who
make $90,000 and $95,000, the size of that interval between these two people is also the same
($5,000).
Why does it matter whether a variable is categorical, ordinal or interval?
Statistical computations and analyses assume that the variables have a specific levels of
measurement. For example, it would not make sense to compute an average hair color. An
average of a categorical variable does not make much sense because there is no intrinsic ordering
of the levels of the categories. Moreover, if you tried to compute the average of educational
experience as defined in the ordinal section above, you would also obtain a nonsensical result.
Because the spacing between the four levels of educational experience is very uneven, the
meaning of this average would be very questionable. In short, an average requires a variable to
be interval. Sometimes you have variables that are "in between" ordinal and interval, for
example, a five-point likert scale with values "strongly agree", "agree", "neutral", "disagree" and
"strongly disagree". If we cannot be sure that the intervals between each of these five values are
the same, then we would not be able to say that this is an interval variable, but we would say that
it is an ordinal variable. However, in order to be able to use statistics that assume the variable is
interval, we will assume that the intervals are equally spaced.
Does it matter if my dependent variable is normally distributed?
When you are doing a t-test or ANOVA, the assumption is that the distribution of the sample
means are normally distributed. One way to guarantee this is for the distribution of the
individual observations from the sample to be normal. However, even if the distribution of the
individual observations is not normal, the distribution of the sample means will be normally
distributed if your sample size is about 30 or larger. This is due to the "central limit theorem"
that shows that even when a population is non-normally distributed, the distribution of the
"sample means" will be normally distributed when the sample size is 30 or more, for example see
Central limit theorem demonstration .
If you are doing a regression analysis, then the assumption is that your residuals are normally
distributed. One way to make it very likely to have normal residuals is to have a dependent
variable that is normally distributed and predictors that are all normally distributed, however this
is not necessary for your residuals to be normally distributed. You can see Regression with SAS:
Chapter 2 - Regression Diagnostics, Regression with SAS: Chapter 2 - Regression Diagnostics,
or Regression with SAS: Chapter 2 - Regression Diagnostics