Assumptions I
Assumptions I
Assumptions I
Independence of Observations
Observations should be independent from each other. There
should be no effect of an individual’s scores on others.
What are the assumptions ?
Normality
The population(s) from which the samples come from are
normally distributed on the DV.
Homogeneity of Variance
Measurement Scale
When a variable is measured on scales lower than interval,
mathematical operations such as taking square roots are not
meaningful. This makes calculation of essential information
such as variability problematic.
Independence of Observations
Observations should be independent from each other so that
the measures such as mean and standard deviation are not
biased in estimating population parameters.
Normality of the Distribution(s) for the DV
This is prepared for students registered in PSY 202, and should not be
distributed to others.
Measurement Scale
The dependent variable (DV) or the outcome variable (OV) should be
measured on an interval or a ratio scale.
Independence of Observations
Observations (participant responses) should be independent from each other.
There should be no effect of one’s scores on others.
Normality
The population(s) from which the data come from should be normally
distributed on the DV.
Homogeneity of Variance
Populations from which the data come should have equal variances on the
DV.
1
WHY SHOULD THE ASSUMPTIONS BE SATISFIED ?
The assumptions need to be satisfied so that the statistical conclusions you
reach would be valid. These assumptions contribute to the validity of the test in
different ways, as described below.
Measurement Scale
With a DV measured on an interval/ratio scale, it is possible to do multiplication,
division, squares etc., which are essentail for calculation of important measures such
as standard deviations. Ordinal or nominal scale are not appropriate for that; you
cannot take a square of ranks, for instance.
Independence of Observations
This makes the samples more representative of the population and is one way of
making your sample resemble a randomly selected sample.
Normality
If this assumptions is not satisfied, the main problem would be that the critical values
(z, t, p) of the theoretical distributions would be less applicable to your data. For
instance, in a z distribution probability of obtaining a z value of 2.27 or larger is
.0116. However, if the normality is not satisfied, we cannot guarantee that this value
applies to our data. Depending on the exact nature of our (obtained) distribution, the
probability corresponding to the z value of 2.27 might be more or less. Therefore, we
might have increased chance of Type I or Type II error, depending on specific
characteristics of the non-normality.
Homogeneity of Variance
First, remember that this is an issue only when you have two or more groups (two or
more levels of the IV). The problems mentioned for normality are applicable to
homogeneity of variance as well: when the variances are not homogeneous, the
main problem is that the p values based on the theoretical t distributions may not be
applicable. This results in increased Type I or Type II error. These problems are
more likely when sample sizes are not equal. If the large variance comes from the
smaller sample, that increases Type I error rate. If the large variance comes from the
larger sample, then that increases Type II error rate.
Keep in mind that, in general, sample size are considered unequal when one
sample 1.5. times larger than the other one.
2
HOW DO WE KNOW THAT THE ASSUMPTIONS ARE VIOLATED ?
Measurement Scale
Assumptions of measurement scale should be taken care of before data collection.
You select your DV so that it’s measured on a ratio or an interval scale.
Independence of Observations
The best way to achieve independence would be random assignment in an
experimental situation. If two close friends signed up for an experiment, random
assignment would increase the chance that they will go into separate conditions
(compared to if people who signed up for the same time slot are assigned to the
same condition). It’s more difficult to achieve if it is a non-experimental study, such
as when a group of people are asked to fill out questionnaires. In such cases, you
should try to increase the variety of people that you invite/take into your samples, so
that the problems of dependence would have less of an effect.
Normality
You use a variety of tools to determine whether normality is satisfied. In single
sample z and t tests this means that the population for the DV is normally distributed.
When you have between and within-subjects t-test this means that the DV is
normally distributed in the two populations (corresponding to two conditions). In other
words, you should look at two distributions separately to see if they are both normally
distributed. There are a number of tools that you can use in conjunction to test
normality. They are explained below:
I. Descriptive Statistics
Four of the descriptive statistics are especially useful for determining normality:
mean, median, skew, and kurtosis. A simple tool is to look at the difference between
the Mean and the Mdn. If they are close to each other than this would typically tell
you that there are no extreme values that are moving the mean to one side,
indication a normal distribution. If, on the other hand, there is a noticable/large
difference between the Mean and the Mdn, this indicates that there are (more)
extreme values on one side of the distribution, leading to a skew and a tendency for
a non-normal distribution.
More direct information comes from the skew statistics. If the skew value is
between -.5 to +.5, then you can safely assume that there is no skew. If the values
are around -1 or +1 they indicate moderate skew. Values beyond -1.5 & +1.5 indicate
3
substantial skew. There is a more formal test of whether the skew (and kurtosis) is
large enough to create problems. This involves hypothesis testing where you test the
Ho = Skew is zero and the H1 = Skew is non-zero (It is the same idea for
kurtosis).The null hypothesis is tested with a z test such that
zskew = skew / st. error of skew
zkurtosis = kurtosis / st. serror of kurtosis
If the zobt > zcr (±1.96), then you reject Ho and conclude that there is a significant
skew/kurtosis.
4
sensitive (liberal) to large samples, so even minor deviations from normality might
lead to a significant result. Moreover, it is not very reliable when there are too many
repeating values.
• With regard to the tests, if both K-S and S-W are giving you non-significant
results (telling you that the distribution is normally distributed), then you can
be quite confident that the data is actually normally distributed. But still check
the descriptives and the visuals to make sure they are converging.
• If one or both of the tests are telling you that the data is non-normal, first ask
yourself whether potential problems associated with the tests mentioned
above (sample size etc.) might be playing a role. For instance, the sample
might be very large, so that may be why both tests turn out to be significant.
Or, the S-W might be significant, when K-S is not. If that is the case this could
possibly come from the fact that there are many repeating values in the data
(e.g., the DV is the number of children in a sample of 60 familiies. There will
be a lot of 1s and 2s and very few of other possible values.). These will affect
the degree to which you’ll pay attention to results of each test.
• Use the visuals to see if the distribution agrees with the descriptive and/or the
statistical tests above.Visuals are very important because sometimes you’ll
see that your evaluation based on them will disagree with the descriptive
(e.g., the skew value) or inferential statistical information (e.g., K-S test). In
that case, remember that the visual information may be as important and
powerful as statistical information.
• When the statistical information (especially the significance tests K-S or S-W
for skew/kurtosis) disagrees with the visual, it will typically be the case that
statistics will tell you that there is significant skew/kurtosis and the visual will
tell you otherwise. This will be especially true when you have large samples.
(Remember large samples are more likely to lead to significant results in
5
general, despite the fact that the effect might not be large). It is rarer that the
tests tell you there is no skew/kurtosis, whereas the visuals say there is.
• Finally, remember from our discussions in PSY 201 the basic rule regarding
normality. We almost never know whether the population is distributed
normally, so the only way to assume that is to have large samples (which
means that the sampling distribution is normally distributed). So, having a
large sample should achieve a lot in terms of having a normally distributed
sample. (interesting irony: when the sample is large, K-S & S-W will turn out
to be significant, telling you that the distribution is not normal ! That is why
you should always consider information other than the tests)
Homogeneity of Variance
This is an assumption that applies when you have at least two levels of the
independent/predictor variable. So it is applicable to independent measures t-test
and independent-measures ANOVA.
I. A Rule of Thumb
The first thing to do would be to look at the standard deviations of the groups that
you are comparing. Look at each one individually, and ask the question: “is this too
large a standard deviation for this variable ?”. For instance, if, again, the DV is the
number of children a family has, an SD of 5.3 would feel too much, given that the
mean would be around 2. So, if that is the case check your data file to make sure
there is no entry error, and then try to see if there are outliers (more on that later).
This gives you a general sense of the magnitude of the variability, but does not say
much about the homogeneity of variance assumption.
Then, the important thing is whether the two devaitions are different. This is difficult
to determine by just comparing the numbers. For instance, SD1 = 18.21 and SD2 =
24.6, can we say that the variances are not homogeneous? There is a rule of thumb
used in these cases: if the variance of one group is not 3 times or larger than the
variance of the other group, we can assume that variances are homogeneous. If you
are carrying out an ANOVA, the largest variance should not be more than 3 times of
6
the smallest variance (because you have more than two groups). Some suggest
using 2 times rather than 3 times as the cut-off point. This is a matter of personal
choice to some degree, but you can use the following principle: if it is less than 2,
assume homogeneity, if it’s more than 3, homogeneity is violated. If it’s in between,
look at where it is between 2 and 3, and use judgment. This is for a quick decision on
homogeneity and cannot really be used when you officially report results.
• Levene’s test: This is the test Jamovi and SPSS use to determine
homogeneity of variance. It uses an ANOVA (F-test) to determine whether
variances of two or more groups are homogeneous. If the null hypothesis is
rejected than we conclude that variances are heterogeneous (different).
• Hartley’s F-Max test: This test can be considered a more formal version of
the rule of thumb described above. This is another Fmax-test, where the
obtained-Fmax value is calculated and than compared to the Fmax critical value.
The formula for the obtained Fmax is: Fmax = s2largest / s2smallest. Fmax critical
values are based on a modified F distribution. I provide an Fmax critical table
as a separate file, but you can find many versions of it on the web as well. If
Fmax > Fmax critical Ho is rejected, which means that the variances are
heterogeneous.
7
actual data distribution. When that is the case Type-I or Type-II error rates do not
change much (changes are minimal and acceptable).
Keep in mind that this argument of robustness holds true when the samples are large
and approximately equal, which typically means one sample is no more than 1.5
times larger then the other one.
There is at least one non-parametric test that corresponds to each parametric test.
For instance, Mann-Whitney U Test is the non-parametric counterpart to independent
measures t-test, Spearman’s r is the nonparametric version of Pearson’s r, and
Friedman’s Analysis of Variance is the nonparametric counterpart to repeated-
measures ANOVA.
The general approach of these tests is that the dependent/outcome variables that are
measured on a ratio/interval scale are changed into a lower measurement scale
(ordinal or nominal), and a test statistics is calculated and then compared with the
critical value for that test statistic.
In Mann-Whitney U, for example, you calculate a U value, and then compare it with
the critical value, then make your statistical decision based on that comparison.
Although, nonparametric tests are useful when assumptions are violated, they are
typically less powerful than parametric tests.
8
A single transformation may actually help both with non-normality and homogeneity
of variance problems; so it may be very useful. However, one potential issue is that it
may not always be clear what the new (transformed) variable really is. Assume that
we are interested in whether two groups are different in self-esteem, as measured on
a 10-pt scale. When you transform that variable so that the new one is the “log of
self-esteem score”, what exactly is this new variable conceptually? Statistically, it
might lead to a better distribution, but conceptually it is difficulty to understand what
you are comparing between the groups (you might end up saying that the two groups
are different in log of self-esteem). There is no guarantee that this variable is the
same thing as the original one. So, when you plan to use transformation, you should
consider this interpretation issue.
Transformation is rarely used to deal with violations. My view is that it should be used
under extreme circumstances, and only for variables that makes (more) sense when
the transformation is taken. One such example would be experiments where the DV
is reaction time. In such studies, the reaction time is mostly measured in milliseconds
and several substantial outliers may be seen.
IV. Outliers
Identifying Outliers
An outlier can be described as a data point that is far away from the bulk of the data,
and not every far away (small or large) data is an outlier.
First, dealing with outliers is not an assumption of the tests; but, it is an important
element in data analysis. Outliers show their effects indirectly through its effect on
normality and homogeneity of variance. You should always look at the descriptive
statistics and the shape of the data before doing any kind of analyses. Examining
outliers is a part of that.
9
If there are “real” outliers, you should “deal with” them. An outlier could occur for a
number of different reasons. These include calculation, coding or data entry errors; in
those cases very large or very small values might be seen. On the other hand, the
outlier might be one of the actual accurate values from the sample. It might be a very
low rating a participant gave or a very tall person. These two types of outliers are
different in nature and should be dealt with differently. If they come from errors, and
can not be corrected, they should be trimmed (removed) from data. If they are of the
second type, then you should carefully evaluate whether this can be considered an
outlier, by looking at the statistics below as well as the shape of the distribution, and
the remaining values in the data set.
Every large/small value that seems far away from the rest of the data is not an
outlier. Below are two ways in which you can identify outliers numerically. However,
you should also evaluate these outliers in terms of the hypothesis you are testing and
determine whether to treat them as outliers or a natural part of your data.
There are two ways you can spot an outlier numerically: 1) any value in the data set
that is beyond the 3*IQR (in both directions) can be considered an outlier, 2) any
value in the data set that is beyond a z-value of 2.5 in both directions. (There are
some who suggest that z = ± 3, which may also be used).*
As I noted above, once you identify a value as an outlier, evaluate it within the
context of the actual values in the distribution. Somebody who is 197 cm tall in a
class where the mean is 168 cm could turn out to be an outlier, technically. Or,
somebody who gives a rating of 5 on a self-estem scale where the mean is 2.12
could technically be identified as an outlier. But, because this person gave a rating
within the normal range of values, one should be careful about calling this an outlier,
because it seems that it is a natural part of the data.
* An important note about using the z-value to determine outliers. Remember that in a typical
normal distribution it is possible to find values that have a z-value of 2.5 or 3. So such
extreme values are part of the distribution, and if you identify a score as an outlier just
because it has a specific z value, you might be calling a natural part of a normal distribution
as an outlier. That is why you shoıld look at the distribution as well when you make a decision
about an outlier.
10
Dealing with Outliers
There are a number of ways of minimizing or eliminating the effects of outliers on
your data analysis.
Trimming (Removing). Trimming refers to removal of outliers from the data set
so that they are not included in the analyses. (Do not confuse this with “trimmed
mean”, where a certain percentage of the top and the bottom of the data are
removed and the mean is calculated). When you trim data for outliers, you just
remove those outliers.
This is a valid strategy when you are sure that the outlier is a “real” outlier (see
above). It is also important that in a typical research there should not be many
outliers. There is another rule of thumb that says that you can remove upto 5% of the
data if they are outliers. If you have more outliers, you should look at your design,
manipulation, the DV etc.
Trimming can be done at two levels: you either remove the participant with an
outlier completely from the data set; therefore that participant never goes into any of
the analyses. The alternative is that, that participant is trimmed only for the variables
where he/she was the outlier.
The potential disadvantages of trimming are a) you decrease the sample size
(and power), and b) you may be changing the nature of the data because you are
throwing out a legitimate data point that says something about the variable you are
investigating.
11