Statistics
Statistics
Statistics
S =
=
=
= 4.20
Shortcut Formulae
A shortcut method of calculating variance
and standard deviation requires two
quantities: sum of the values and sum of the
squares of the values.
x = sum of the measures
x
2
= sum of the squares of the measures
For example, using these six measures: 3, 9,
1, 2, 5, and 4:
The quantities are then
substituted by the shortcut
formula shortcut formula
The variance and standard
deviation are now found as before
Solving Meaning of Standard
Deviation Practice Problems
The following practice problem
shows meaning for standard
deviation.
Solve the standard deviation for
the values 11, 3, 12 and 6.
Solution:
= 4.24
Solve the standard deviation for the values 9, 11, 8,
7, 4 and 3.
Solution:
= 3.04
Solve the standard deviation for the values 8, 9, 6,
12 and 5.
Solution:
= 2.73
Symbols:
denotes the standard deviation of a population.
S denotes the standard deviation of a sample
S
2
denotes variance
Calculating SD with excel
Enter values in a column
Click Data Analysis
on the Tools menu
Select Descriptive
Statistics and click
OK
Click Input Range
icon
Highlight all the
values in the column
Click OK
Check if labels are
in the first row
Check Summary
Statistics
SD is calculated precisely
Plus several other Descriptive
statistics
MEASURES OF RELATIONSHIP -
CORRELATION
Definition: The relationship (or association) between
two quantitatively measured (continuous) variables is
called correlation. An increase in stress, for example, may
be related to an increase in specific somatic symptoms.
The data can be represented by the ordered pairs (x,y)
where x is the independent, or explanatory, variable
and y is the dependent, or response, variable.
Examples of variables that may be correlated:
height and shoe size
SAT score and grade point average
number of cigarettes smoked per day and
lung capacity
- Dull children tend to be more neurotics
than bright children
- Is there any relationship between the size
of the skull and general intelligence of the
individuals?
Note: Correlation Tests will establish the
association among variables but cant show
cause and effect relationship.
Coefficient of Correlation:
LINEAR CORRELATION
The purpose of a LINEAR CORRELATION ANALYSIS
is to determine whether there is a relationship
between two sets of variables through scatter plots .
A scatter plot is a graph of the ordered pairs (x, y) of
numbers consisting of the independent variable, x,
and the dependent variable, y.\
We may find that there are five kinds of
linear correlation. They are
A. Perfect positive correlation.
B. Moderately positive correlation.
C. Perfect negative correlation.
D. Moderately negative correlation.
E. Absolutely No correlation.
This relationship between the variables can
be easily visualized by using
SCATTERED DIAGRAMS. They are
1. Perfect positive correlation:
E.g.: Height and weight; Age and Height; age and
weight.
It is very difficult to get perfect positive Correlation
On the abscissa x values plotted and on ordinate y
values are plotted.
5
4
3
2
1
2 4 8 9 10
6
7
12
Ex: Bivariate distribution. Here it should be noted that
every increase of 2 units on x variable there is
corresponding increase of one unit on y. Variable here
is a straight line runs form lower left of the scattered
diagram to upper right. If this were a perfect positive
correlation all of the points would fall on a straight
line. The more linear the data points, the closer the
relationship between the two variables .Positive
Correlationas x increases, y increases
Perfectly Negative Correlation:
Ex 1: If pressure in lung increase its
air volume decreases
5
4
3
2
1
2 4 6 8 10
6
7
12 14 16
Example :2
Notice that in this example as the number of
parasites increases, the harvest of
unblemished apples decreases. If this were
a perfect negative correlation all of the
points would fall on a line with a negative
slope. The more linear the data points, the
more negatively correlated are the two
variables. Negative Correlationas x
increases, y decreases
Moderately Positive correlation: The Correlation
ranges from 0 to 0.8. Here the scatter will be around an
imaginary line which runs from lower left to upper
right.
Ex: Temperature & pulse rate correlation.
X
Y
Moderately Negative correlation:
Ex: Age & vital capacity in adults:
X
Y
Absolutely No correlation: Here the variables are
not related to one another.
X in completely independent of Y. Ex: Height and
pulse rate, height and I.Q
X
Y
Interpretation of Correlation
Coefficient
Coefficient
Range
Strength of
Relationship
0.00 - 0.20 Very Low
0.20 - 0.40 Low
0.40 - 0.60 Moderate
0.60 - 0.80 High Moderate
0.8.- 1 Very High
Pearsons correlation coefficient is also
known as Karl Pearsons correlation
coefficient.
Pearsons correlation coefficient is the
method of measuring the correlation.
This method was developed by Karl Pearson
and is therefore named Pearsons correlation
coefficient.
Typically denoted by r is a measure of the
correlation (linear dependence) between
two variables X and Y, giving a value
between +1 and 1
ADVANTAGES
It is known as the best method of measuring
the correlation, because it is based on the
method of covariance.
Pearsons correlation coefficient gives
information about the degree of correlation
as well as the direction of the correlation.
Pearson product moment
correlation ( Method 1)
N = Number of values or elements
X = 1st Score
Y = 2nd Score
XY = Sum of the product of 1st and
2nd Scores
X = Sum of 1st Scores
y = Sum of 2nd Scores
x
2
= Sum of square 1st Scores
y
2
=
Sum of square 2nd Scores
Example 1 : Knowledge scores in Test I & II
X Y X
2
Y
2
XY
19
18
15
15
13
12
12
10
9
7
16
15
11
14
12
10
9
10
3
5
361
324
225
225
169
148
144
100
81
49
256
225
121
196
144
100
81
100
64
25
304
270
165
210
156
120
168
100
72
35
x =130 y=110 x
2
=1822 y
2 = 1312
xy=1540
r = 1540 (130) (110)
10
______________________________________
=
10
(110)(110)
- 1312
10
) 130 )( 130 (
- 1822 x
13464
(110)
102 132
) 110 (
1210 - 1620x1312 - 1822
1430 - 1540
x
= 110
116.03
= 0.948
A correlation greater than 0.9 is
generally described as strong / very
high positive correlation, hence it is
inferred that Those who have
performed good scores in test I also
performed with good scores in test II.
Bivariate Correlation coefficient:
Age & weight distribution
Example 2
Age
X
Wt
Y
X
2
Y
2
XY
1
1
2
2
3
3
4
4
5
5
6
7
9
11
13
12
13
14
15
15
1
1
4
4
9
9
16
16
25
25
36
49
81
121
169
144
169
196
225
225
6
7
18
22
39
36
52
56
75
75
x = 30 y:
115
x
2
110
y
2
1415
xy
386
= =
= = = 41/ 43
= 0.95
2 2
10
) 115 (
1415
10
(30)
- 110
10
30x115
- 86 3
2 2
10
) 115 (
1415
10
(30)
- 110
345 - 386
0x92.5 2
345 - 386
1850
41
Inference: Increase in the age positively
correlated with increase in the weight
among children below five years.
From the deviation from items: (
Method 2)
Scores of English & Maths , n = 10
X Y x
(X- )
y
(Y- )
x
2
y
2
xy
19 16 6 5 36 25 30
18 15 5 4 25 16 20
15 11 2 0 04 0 0
15 14 2 3 04 9 6
0
13 12 0 1 0 1 1
12 10 -1 -1 1 1 2
12 9 -1 -2 1 4 3
10 10 -3 -1 9 1 12
09 08 -4 -3 16 9 36
07 5 -6 -6 36 36
x
130
x
110
x
2
132
y
2
102
xy
110
xy
1. ( = = 13 =
Formula: XY / square root of x
2 *
y
2
R = = =
= 0.948
= 0.95
x 10
130
10
110
= 11
102 132
110
x 13464
110
034 . 116
110
Rank Correlation Coefficients
Rank correlation is the study of relationships
between different rankings on the same set of items. A
rank correlation coefficient measures the
correspondence between two rankings and assesses its
significance.
Meaning: Spearmans Rank correlation coefficient is a
technique which can be used to summarize the
strength and direction (negative or positive) of a
relationship between two variables.
Two of the more popular rank correlation statistics are
Spearman's rank correlation coefficient (Spearman's )
Kendall's tau rank correlation coefficient (Kendall's )
An increasing rank correlation coefficient implies
increasing agreement between rankings. The coefficient
is inside the interval [1, 1] and assumes the value:
1 if the disagreement between the two rankings is
perfect; one ranking is the reverse of the other.
0 if the rankings are completely independent.
1 if the agreement between the two rankings is perfect;
the two rankings are the same.
Spearman's rank correlation coefficient or
Spearman's Rho, named after Charles Spearman and
often denoted by the Greek letter (Rho)
Create a table from your data.
Rank the two data sets. Ranking is achieved by
giving the ranking '1' to the biggest number in a
column, '2' to the second biggest value and so on.
The smallest value in the column will get the
lowest ranking. This should be done for both sets
of measurements.
Tied scores are given the mean (average) rank. For
example, the three tied scores, but occupy three
positions (fifth, sixth and seventh) in a ranking
hierarchy of ten. The mean rank in this case is
calculated as (5+6+7) 3 = 6.
Find the difference in the ranks (d): This is
the difference between the ranks of the two
values on each row of the table. The rank of
the second value is subtracted from the
rank of the first.
Square the differences (d) To remove
negative values and then sum them ( d).
Efficient of correlation of Rank Difference
method (spearmans formula)
n = 10
Marks in
1
st
test
X
Marks in
2
nd
test
Y
Rank in
X
Rank is
Y
R
1
-R
2
=
d
d
2
12
15
24
20
8
15
21
20
11
26
21
25
35
24
17
18
25
16
16
38
8
6.5
2
4.5
10
6.5
3
4.5
9
1
6
3.5
2
5
8
7
3.5
9.5
9.5
1
2
3
0
-0.5
2
-0.5
3-0.5
-5
-0.5
0
4
9
0
0.25
4
0.25
0.25
25
0.25
0
d
2
= 3.00
r = 1 - (6 d
2
) / n(n
2
- 1)
=
=
= 1- 0.26
= 0.74
1 - 10(100)
6x43 - 1
990
258 - 1
Practice Problems for Correlation Co-efficient:
Calculate Sample Correlation Co-efficient:
X Values Y Values
3 4
2 3
1 3
3 4
2 3
5 2
Answer:
Sample Correlation co-efficient = -0.3241.
Calculate Sample Correlation Co-efficient:
X Values Y Values
5 2
5 4
2 8
9 2
3 8
2 6
7 4
Answer:
Sample Correlation co-efficient = -0.80468.
Normal Probability Distribution
Characteristics of normal Curve:
Properties of a normal distribution
A normal distribution is symmetric about its mean
The highest point is at its mean
The height of the curve decreases as one moves away
from the mean in direction,
It is bell shaped. It has 2 curves central part in
convex where come down. It becomes concave on
both sides
It is symmetrical distribution; Variable on either
side of mean is equal in number.
A normal distribution curve is uni modal (i.e., it
has only one mode)
Skewness of the Curve in zero.
It is a asymptotic (i.e. that tails never touch the base line
theoretically).
The curve is continuous, that is, there are no gaps or
holes. For each value of X, there is a corresponding value
of Y
The curve never touches the x axis. Theoretically, no
matter how far in either direction the curve extends, it
never meets the x axis but it gets increasingly closer
The mean, median, and mode are equal and are located
at the center of the distribution
In some cases where the scores of individual in a group
seriously deviate from the average, the Curves
representing these distributions also deviate from the
shape of a normal Curve. Those are called Skewness and
kurtosis
The distribution is determined by the mean mu, and
the standard deviation sigma. The mean, mu controls
the centre and standard deviation, sigma controls the
spread.
The total area under a normal distribution curve is
equal to 1.00 S.D. The area under the part of a normal
curve that lies as follows:
1. About 68.3% of the area under a normal curve is
within one standard deviation (SD)
2. About 95.5% is within two SDs
3. About 99.7% is within three SDs
4. 32% lie outside the range mean at + ISD and + 2sd
says 95.45% observations are in normal range and
4.55% outside these limits.
Normal distribution helps us to predict
that where cases will fall within a
distribution probabilistically.
For example, what are the odds, given the
population parameter of human height that
someone will grow to more than eight feet?
Answer: likely less than a .025 probability.
Skew:
Positive skew:
Negative skew:
Kurtosis is the degree of peakedness of
a distribution. A normal distribution is
a mesokurtic distribution. A pure
leptokurtic distribution has a higher
peak than the normal distribution and
has heavier tails. A pure platykurtic
distribution has a lower peak than a
normal distribution and lighter tails.
Parametric & Nonparametric
Methods
Parametric tests are statistical methods which depend
on the parameters of populations or probability
distributions and are referred to as parametric
methods.
Parametric Test - Key features
Sample randomly selected
Sample homogenous
Data at ratio or interval level
Parametric tests include:
Large sample (>30) z test
Small sample (<30) t test.
ANOVA
Regression
Correlation
Nonparametric methods
1 Methods used with qualitative data.
or:
2. Methods used with quantitative data
when no assumption can be made
about the population probability
distribution.
Non Parametric Tests -- Key
features
Sample not homogenous
Not normally distributed
Data is at ordinal and nominal levels
Nonparametric tests include:
Chi-squared test
Wilcoxon signed-rank test
Mann-Whitney test Kruskal-Wallis tests
Differences in Parametric and Non-
parametric Tests
1. Scales of Measurement
Example: 100 kg person is twice as heavy as a person
weighing 50 kg (the ratio scale), and 10C is 5C
warmer than 5C (but not twice as warm; this is the
interval scale ratio in which the zero on the scale is
not absolute but arbitrary).
2. Normal Distribution
Parametric statistics are used when the data are
normally distributed.
Example: If you measure the weights of
1,000 males and then graph the results
showing frequency of weights, you will likely
find a bell-shaped curve with most people
around the mean (average) weight at the
center of the curve, which tapers off at the
sides as frequency of extreme weights
decreases. This is called a normal
distribution.
Non-parametric statistics used when the
sample is distribution free.
3. Equal Variances
Parametric statistics used when the variance of the test
is less to compare the two sets of data
Example: The variance is a measure of the spread of
values from the mean. Suppose you wish to test if the mean
weights of males and females differed, but the values of
males are scattered much more widely (therefore a higher
variance) than those of females)
Non-parametric are used when data does not assume equal
variances among samples like chi-square test, the Mann-
Whitney U-test should be employed to compare the two
sets of
Power of Test
Parametric tests are more powerful than
those of non-parametric statistics in making
conclusions.
If data violates one or more criteria of
parametric tests, then use a non-parametric
equivalent, even though it is less powerful,
at least the risk of error is less.
Tests of significance
Basic Concepts:
1. The standard error of the mean:
The standard error of the mean:
SEM is usually estimated by the sample estimate of the
population standard deviation (sample standard deviation)
divided by the square root of the sample size (assuming
statistical independence of the values in the sample):
Random Sampling Error = standard deviation/ square root
of the sample size
2. Degrees of freedom
A single sample:
Two samples:
One-way ANOVA with g groups
3. Type I and II errors (1 of 2)
Statistical Decision
of the Null Hypothesis
H
0
True H
0
False
Reject H
0
Type I error Correct
Do not Reject H
0
Correct Type II error
4. Confidence Intervals
We can actually use the information we have about
a standard deviation from the mean and calculate
the range of values for which a sample would have
if they were to fall close to the mean of the
population.
This range is based on the probability that the
sample mean falls close to the population mean
with a probability of .95, or 5% error.
5.How Confident Are You?
Are you 100% sure?
Social scientists use a 95% as a threshold to test
whether or not the results are product of chance.
That is, we take 1 out of 20 chances to be wrong
What do you MEAN?
We build a 95% confidence interval to make sure
that the mean will be within that range
6. Significance Level:
First, the difference between the results of the
experiment and the null hypothesis is determined.
Then, assuming the null hypothesis is true; the
probability of a difference is computed.
Finally, this probability is compared to the significance
level from the table with specific degrees of freedom
If the calculated probability is less than or equal to
the significance level, then the null hypothesis is
rejected and the outcome is said to be statistically
significant. Significance is set at 0.05 levels
(sometimes called the 5% level) or the 0.01 level (1%
level), The significance level can be defined as the
probability of constructing a type I error.
Therefore if we select significance level of 0.05 denotes the 5%
possibility of constructing a type I error.
If we select significance level of 0.01 denotes the 1% possibility of
constructing a type I error.
The significance level is used in hypothesis testing
7. What is the difference between a
probability value and the significance
level?
Odds ratios are widely used in medical
literature because:
They provide an estimate (with confidence
intervals) for the relationship between two
binary (yes/no) variables.
They enable us to examine the effects of other variables on
that relationship, using logistic regression.
They are useful in case-control studies.
The odds are a way of representing probability.
8. Two decision making rules of hypothesis testing
Rule one: If the p-value (calculated value) is less than or
equal to the significance level (table value) then reject the
null hypothesis and conclude that the research finding is
statistically significant.
Rule two: If the p-value is greater than the significance
level then you fail to reject the null hypothesis and
conclude that the finding is not statistically significant.
9. Two areas of statistical inference.
Estimation
Hypothesis testing
1.Estimation:
A. Types of estimation
B. Points to remember:
2. Hypothesis testing
10. Types of Statistical Hypotheses.
A) Null hypothesis.
B) Alternative hypothesis.
H
0
: P = 0.5
H
a
: P 0.5
Steps of Hypothesis Testing
State the hypotheses. This involves stating the null and alternative
hypotheses. The hypotheses are stated in such a way that they are
mutually exclusive. That is, if one is true, the other must be false.
Formulate an analysis plan. The analysis plan describes how to use
sample data to evaluate the null hypothesis. The evaluation often
focuses around a single test statistic.
Analyze sample data. Find the value of the test statistic (mean score,
proportion, t-score, z-score, etc.) described in the analysis plan.
Interpret results. Apply the decision rule described in the analysis plan.
If the value of the test statistic is unlikely, based on the null hypothesis,
reject the null hypothesis. Describe the results with probability ( level
of significance ) Example: In a study designed to determine the effects
of primary care nursing as compared with functional team nursing on
patient satisfaction, a significant difference was found between the two
approaches to patient care. Higher rates of satisfaction were found
among patients exposed to primary care nursing ( t = 12.23, p < .05).
Parametric vs. Non-parametric Tests
Parametric Non-parametric
Assumed distribution Normal Any
Assumed variance Homogeneous Any
Typical data Ratio or Interval Ordinal or Nominal
Data set relationships Independent Any
Usual central measure Mean Median
Benefits Can draw more conclusions
Simplicity; Less affected by
outliers
Tests
Choosing Choosing parametric test
Choosing a non-parametric
test
Correlation test Pearson Spearman
Independent measures, 2
groups
Independent-measures t-test Mann-Whitney test
Independent measures, >2
groups
One-way, independent-
measures ANOVA
Kruskal-Wallis test
Repeated measures, 2
conditions
Matched-pair t-test Wilcoxon test
Repeated measures, >2
conditions
One-way, repeated measures
ANOVA
Friedman's test
Choosing an Appropriate Statistical Test
Goal
Dataset
Measurement
(from a normal
distribution)
Rank, Score, or
Measurement
(from non-normal
distribution)
Binomial
(e.g. heads or tails)
Describe one group:
Mean, SD
Median, interquartile
range
Proportion
Compare one group to
a hypothetical value: One-sample t test Wilcoxon test
Chi-square
or
Binomial test
Compare two
unpaired groups: Unpaired t test Mann-Whitney test
Fisher's exact test
(or chi-square for large
samples)
Compare two paired
groups:
Paired t test Wilcoxon test McNemar's test
Compare three or
more unmatched
groups:
One-way ANOVA Kruskal-Wallis test Chi-square test
Compare three or
more matched groups:
Repeated-measures
ANOVA
Friedman test Cochrane Q test
Quantify association
between two variables: Pearson correlation Spearman correlation
Contingency
coefficients
Predict value from
another measured
variable:
Simple regression
Nonparametric
regression
Simple logistic
regression
Predict value from
several measured or
binomial variables:
Multiple regression
Multiple logistic
regression
PARA METRIC TESTS FOR HYPOTHESIS TESTING
LARGE & INDEPENDENT SAMPLE
(2 GROUPS I.E, EXPERIMENTAL & CONTROL)
Z test
1. Significance of difference between the means has to
be calculated.
Step - 1
SED or
m m
D
2 1
2 2
o o o + =
o m
1
= the standard error of mean
of first sample.
1
1
1
N
m
o
o =
2
2
2
N
m
o
o =
Or
Directly
Step - 2
Complete Z Value
2
2
2
1
2
1
N N
D
o o
o + =
=
=
D
Z
m m
o
2 1
Difference between means
Standard error of the difference between
means
Step 3
Compare the null hypothesis at 0.05 and 0.01 level of significance.
LARGE & INDEPENDENT SAMPLE
Example 1: The teacher has taught lecture cum demonstration
to Group A and Group B by only lecture method. Which
method is effective?
H
O
= There exists no significant difference between means of 2
sample.
Groups A Group B
Mean 43 30
o 8 7
No 65 65
65
7 7
65
8 8
2
2
2
1
2
1
x x
N N
D + = +
o o
o
32 . 1
65
113
65
49 64
= =
x
85 . 9
32 . 1
13
2 1
D
m m
Z
The critical value is higher than values at 5% i.e., 1.96
so the difference is significant and we reject null
hypothesis saying that lecture-cum demonstration
methods is effective than only lecture method.
T-test
The Student's t-test (or simply t-test) was developed
by William Gosset - "Student" in 1908. The t-test is
used to compare two groups and comes in at least 3
kinds. A t-test is an inferential statistical technique
used to compare the means of two groups. The
reporting of the results of a t-test generally includes
the df, t-value, and probability level is generally used.
A t-test can be one-tailed or two-tailed.
One-tailed test: Used where there is some basis
(e.g. previous experimental observation) to predict
the direction of the difference, e.g. expectation of a
significant difference between the groups. . If the
hypothesis of the study is directional, a one-tailed
test is used.
Two-tailed test: Used where there is no basis to assume
that there may be a significant difference between the
groups - this is the test most frequently used. If the
hypothesis is non directional, a two-tailed test is used.
Why are they called "tails"?
Note that H
A
states 'there is a difference .... ', it
does not state why there is a difference or whether
the difference between the two groups if greater or
less than. If H
A
had specified the nature of the
difference, this would have been a one-tailed
hypothesis. However, since H
A
does not specify the
nature of the difference, hence we can either
accept a reduction or an increase. This is therefore
a two-tailed hypothesis. For a variety of reasons
two-tailed hypotheses are safer than one-tailed.
Another classification:
Paired t-test:
Unpaired t-test:
Two-sample assuming equal variances
Two-sample assuming unequal variances
1.Unpaired t-test
2.Paired t-test
Samples are independent when there are two
separate groups such as an experimental group and a
control group.
Samples are dependent when the participants from
the two groups are paired in some manner. For
example, when the same participants are assessed on a
given characteristic before and after an
intervention
SIGNIFICANCE OF DIFFERENCE BETWEEN CORRELATED ( WITH IN)
SAMPLE SINGLE GROUP
Formula
Example 2 : Ten subjects were tested on an attitude
scale. They were made to read literature in order to
bring change in their attitude. The Attitude scale is re
administered . Check whether literature could bring s
change in the attitude
) 1 (
) (
2 2
2
=
N
D D N
D
t
c c
c
df = n 1
Null hypothesis: There in significant
deference in the attitude score before and after reading
the literature.
Initial Final D D
2
10 11 -1 1
9 7 2 4
9 8 1 1
8 9 -1 1
8 6 2 4
7 6 1 1
7 8 -1 1
5 4 1 1
4 3 1 1
4 4 0 0
N=10 N=10 c D=5 c D
2=
15
9
125
5
9
5 5 15 10
5
=
=
x x
t
34 . 1
73 . 3
5
88 , 13
5
= = =
df = n -1 = 10 -1 = 9
Table value is 2.26 at 5% level with 9 df , calculated
value is lower than the table value , Hence accepting
null hypothesis saying that reading literature has
shown significant deference in the attitude score
before and after reading it.
t TEST SMALL & INDEPENDENT SAMPLE
(2 GROUPS i.e, EXPERIMENTAL &
CONTROL)
Example 3: Two groups of 10 students each got the
following scored on an attitude scale. Find the
significant difference between means.
Group I 10, 9, 8, 7, 7, 8, 6, 5, 6, 4
Group II 9, 8, 6, 7, 8, 8, 11, 12, 6, 5
Df= N
1
+ N
2
-2
10 + 10 2 = 18
Null hypothesis: There in significant deference in the
attitude score between two groups of 10 students
Group I Group II
x
1
m
1
x
1
x
1
2
x
2
m
2
x
2
x
2
2
10 7 3 9 7 81 1 1
9 7 2 4 8 8 0 0
8 7 1 1 6 8 -2 4
7 7 0 0 7 8 -1 1
7 7 0 0 8 8 0 6
8 7 1 1 8 8 0 0
6 7 -1 1 11 8 3 9
5 7 -2 4 12 8 4 16
6 7 -1 1 6 8 -2 4
4 7 -3 9 5 8 -3 9
Total 70 c X
1
2
=30 Total 80 cx
2
2
=44
m
1
= 70/10 = 7 m
2
= 80/10 = 8
03 . 2 111 . 4
18
74
9 9
44 30
) 1 ( ) 1 (
) 1
2 1
2
2
2
1
= = =
+
+
=
+
+
=
N N
x x
Pooled
c c
o
2) SED or
10
1
10
1
03 . 2
2
1 1
1
+ = + =
N N
Pooled D o o
908 . 0
5
1
03 . 2 = =
1 . 1
908 . 0
1
908 . 0
8 7
/
2 1
=
=
SED D
m m
t
o
18 DF 5% t value = 2.10
Table value is 2.10 at 5% level with 18 df, calculated
value is lower than the table value , Hence accepting
null hypothesis saying that there is a significant
deference in the attitude score between two groups of
10 students.
SIGNIFICANCE OF DIFFERENCE
BETWEEN 2 MEANS FOR 2 SMALL BUT
INDEPENDENT SAMPLES
1
st
calculate pooled SD
In samples we calculate simple SD called pooled SD for
further calculation of SED or oD.
1) Pooled SD =
2) Calculate o D =
3) Calculate t value
4) Test Null H
O
at pre-established level of
significance.
Df = (N
1
+ N
2
2)
At 5% or 1% level
) 1 ( ) 1 (
2 1
2
2
2
1
+
+
N N
x x c c
2
1 1
1
N N
+ o
D
m m
o
2 1
=
5) Compare t values.
Example 4: Language teacher divides the class in 2 groups. For
example group they gave 2 hours daily reading of news paper &
magazine and not for control group. After 6 months both
groups were given a vocabulary test. The scores obtained are:
Experimental Group:115, 112, 109, 112, 137
Control Group :110, 112, 95, 105, 111, 97, 112, 102
Null hypothesis : There is no significant deference in the a vocabulary test among the
experimental group who had 2 hours daily reading of news paper & magazine for 6
months and not for control group
Group I Group II
x
1
m
1
x
1
x
1
2
x
2
m
2
x
2
x
2
2
115 117 -2 4 110 105.5 4.5 20.25
112 117 -5 25 112 105.5 6.5 42.25
109 117 -81 64 95 105.5 -10.5 110.25
112 117 -5 25 105 105.5 -0.5 0.25
137 117 -20 466 111 105.5 5.5 30.25
- - - - 97 105.5 -8.5 72.25
- - - - 112 105.5 6.5 42.25
- - - - 102 105.5 -3.5 12.25
565 c X
1
2
=518 844 cx
2
2
=330
m
1
= 565 / 5 = 117 m
2
= 844 / 8 = 105.5
Pooled SD =
o D =
Df = N1 + N2 2 = 5 18 2 = 11 Critical value at 5% is 1.80
Inference: Our value computed is high so it is significant and we
reject Null H
O
and accept that the1
st
method is good in increasing the
vocabulary among the students.
11
840
6 4
840
) 1 8 ( ) 1 5 (
330 518
) 1 ( ) 1 (
2 1
2
2
2
1
=
+
=
+
+
=
+
+
N N
x x c c
7 . 8 36 . 78 = =
8
1
5
1
79 . 8
2
1 1
1
+ = +
N N
o
57 . 0 79 . 8 125 . 0 200 . 0 79 . 8 x = +
9 . 4 325 . 0 79 . 8 = =
3 . 2
9 . 4
5 . 11
29 . 3
5 . 105 117
2 1
= =
=
D
m m
t
o
SIGNIFICANCE OF DIFFERENCE BETWEEN CORRELATED SAMPLE OR
WITH THE SAME GROUP (PRE & POST TEST)
o D
Example 5: A teacher of mathematics gave a test in
multiplication to the 30 students of his class. Then he
induced a state of anxiety among them and the
achievement test was re administered
T = 0.82
Initial Best Final Test
m 70 mean 67
o 6 o 58
2 1 2
2
1
2
. 2 m m m m o o o o + =
628 .
648 . 0
3
648 . 0
67 70
648 . 0 42 . 0 89 . 1 12 . 1 19 . 1
06 . 1 09 . 1 82 . 0 2 ) 06 . 1 ( ) 09 . 1 (
06 . 1
30
8 . 5
09 . 1
30
6
. 2 /
2 1
2 2
2
2
2
1
1
1
2 1 2
2
1
2
o
o
o
o
o
o o o o o
= =
=
= = +
+ =
= = =
= = =
+ =
D
m m
Z
x x x
N
m
N
m
m m m m D SED
Example 6 : A random of 10 boys had the
following IQ: 70, 120, 110, 11, 88, 83, 95, 98, 107, 100, Do
these data suggest the assumption of a population
mean IQ of 100?
t test is done to test the difference between
sample mean and population mean. It is worked out as
under:
x
70 -28.2 795.24
120 21.8 475.24
110 11.8 139.24
111 12.8 163.84
88 -10.2 104.04
83 -15.2 231.04
95 -3.2 10.24
98 -0.2 0.04
107 8.8 77.44
100 1.8 3.24
982 / 100 1999.6
x x
2
) ( x x
9 1
3818 . 0
10 / 91 . 14
100 2 . 98
2 . 98
100
/
91 . 14
9
6 . 1999
1
) (
2
= =
=
= =
=
=
= =
E
= =
n Df
X
n S
X
t
n
X x
S SD Sample
= = =
= =
= = =
n n
s n s n
S SD Pooled
n
x x
S Variance SD Sample
n
x x
S Variance SD Sample
Df = n
1
+ n
2
2 = 25
t Table value for DF = 25
at P = 0.05 is 2.060
t Calculated is less than table value for DF 25, P =
0.05 not significant. Accept null hypothesis
5882 . 0
4 . 3
2
15
1
12
1
30 28
1 1
78 . 8
2 1
2 1
= =
+
=
+
=
S
n n
x x
t
Example 8: A drug given to 12 volunteers showed the
following difference in systolic BP.
Can you conclude the drug, in general, is accompanied
by an increase in Systolic BP?
For paired observations test for paired test is done.
Data is re-tabulated to facilitate the calculation of and
s (SD)
1 2 3 4 5 6 7 8 9 10 11 12
Before
Drug
120 112 110 120 106 110 110 114 120 116 104 98
After Drug 125 114 118 119 109 110 108 115 125 120 110 98
Before Drug After Drug Difference (d- ) (d- )
2
X
1
X
2
(d)
120 125 5 2.42 5.8564
112 114 2 -0.58 0.3364
110 118 8 5.42 29.3764
120 119 -1 3.58 12.8164
106 109 3 0.42 0.1764
110 110 0 -2.58 6.6564
110 108 -2 -4.58 20.9764
110 115 1 -1.58 2.4964
114 115 1 -1.58 2.4964
120 125 5 2.42 5.8564
116 120 4 1.42 2.0164
104 110 6 3.42 11.6964
98 98 0 -2.58 6.6564
31 / 12 = d 104.9068
= =2.5833
(d- )
2
=104.9068
d
d
d d
8980 . 2
8914 . 0
58 . 2
12 / 0882 . 3
5833 . 2
/
0882 . 3
11
9068 . 104
1
) (
2
= = =
=
= =
= =
n S
d
t
n
d d
S sample of SD
DF = n 1 = 11
DF = 11 P = 0.05 is 2.201
t Calculated is more than table value at
DF = 11 with P = 0.05
Hence reject null hypothesis Drug has definite
influence on systolic blood pressure.
One-way Analysis of Variance
(ANOVA)
Frequently in the study of nursing practice, more than two
means are of interest when assessing an independent variable.
Example: Nurse Investigators may want to compare three
different patient groups critical care patients, ambulatory
inpatients, and outpatients (sub groups of one independent
variable i.e. type of patient.) in terms of their level of satisfaction
with patient care.
One-way analysis of variance (ANOVA) is an extension of the t-
test that permits the investigator to simultaneously compare
more than two means.
ANOVA, unlike the t-test, uses variances to calculate a value that
reflects the differences among three or more means.
In this test an F statistic or ratio is calculated.
Analysis of Covariance (ANCOVA)
ANCOVA is an inferential statistical test that enables
investigators to adjust statistically for group differences that may
interfere with obtaining results that relate specifically to the
effects of the independent variable(s) on the dependent
variable(s).Usually there are two, three, or four factors
(independent variables) and a number of levels within each
variable (usually no more than ten).
Example : If there were three modes of delivering care
primary nursing, functional team nursing, and modified primary
nursing and both males and females were to be assessed, there
would be two independent variables (modes of delivering care,
sex) and one dependent variable (patient satisfaction) . Here
independent variables are modes of delivering care and sex,
Dependent variable is Patient satisfaction
Multivariable Analysis: Multivariate analysis refers to a
group of inferential statistical tests that enable the
investigator to examine multiple variables simultaneously.
Unlike other inferential statistical techniques, these tests
permit the investigator to examine several dependent or
independent variables simultaneously.
Example: A group of nurse investigators designed a study
to examine the effect of two forms of relaxation therapy on
levels of depression and anxiety among male and female
spinal-cord-injured young adults (paraplegics). Data
collected in relation to the two independent variables
relaxation therapy (two groups) and genderand the
dependent variableslevel of depression and level of
anxietywere analyzed using a multivariate test.
The Chi Square Test (
2
)
The chi-square (
2
) test can be used to evaluate a relationship between two
nominal or ordinal variables. It is one example of a non-parametric test. In order
to test the association between two events in a binomial or multinomial cell is by
X
2
test. The two events can often be studied for their association. Chi square tests
can only be used on actual numbers and not on percentages, proportions, means,
etc. The Chi Square statistic compares the tallies or counts of categorical
responses between two (or more) independent groups or in a single group.
Example:
Smoking cancer
Treatment outcome of disease.
Age Knowledge score.
Social class disease prevalence.
Cholesterol CAD.
Wt Diabetes mellitus
Bp Heart disease.
There are two possibilities. Either they influence or affect each other. i.e.
whether the two variables are independent (no association) or dependent
(association on each other).
This test can be used even in multi nominal sample.
I. Incidence of filariasis and social class (very rich,
middle and poor).
II. Party of the mother and weight of the baby like 1
st
,
2
nd
, 3
rd
, 4
th parity
III. State of nutrition and IQ.
<60%
61-80%
81-100%
Death & survival among control & experimental
group.
Eg: out of Drug and placebo
Groups Died Survived Total
Control on
placebo
10 25 35
Experimental on
drug
5 60 65
Total 15 85 100
Here we are seeing association between
two classes about events. So they are
called 2x2 contingency (Death,
(Survival, control, experiment ), table
or four cell.
It can calculate even if there are more
than 2 cell or class and
events(multinomial)
Eg: Social class & leprosy
Social class Leprosy
positive
Leprosy
negative
Total
Higher 4 76 80
Middle 20 180 200
Low 60 440 500
Total 84 696 780
Uses:
Chi-square test in best used even if the sample is not
in Normal distribution and even if the sample size is
small. As it is not possible to calculate significance if
the sample is very small through parametric tests liket
and z tests.
Example 100 boys & 60 girls were asked to select one of
five subjects. Do you think that the choice of subjects
is dependent upon the sex of students?
Chi Square Goodness of Fit (One
Sample Test)
This test allows us to compare a collection of
categorical data with some theoretical expected
distribution. This test is often used in genetics to
compare the results of a cross with the theoretical
Example
The opinion of 90 unmarried people and 100 married
people on child marriages were collected on an
attitude scale. Do the data indicate a significant
difference in opinion in terms of marital status?
1. Establish Hypotheses
Null Hypothesis: There no difference of opinion on
attitude about child marriages among married and
unmarried persons.
Marital
status
Agree Disagree No
opinion
Total
Un
Married
Married
14
(19.4)
27
(21.6)
66
(62.5)
66
(69.5)
10
(8)
7
(9)
90
100
41 132 17 190
2. Calculate the expected value for each cell of the
table(column & row)
Formula:
i) 90 x 41 = 19.4 ii ) 41 x 100 = 21.6
190 190
iii) 132 x 90 = 62.5 iv) 132 x 100 = 69.5
190 190
v) 17 x 90 = 8 vi) 100 x 17 = 9
100 190
Or
Total of cells in row x Column
Total frequencies
3. Calculate Chi-square statistic
Compute
2
= sum of (fo - fe)
2
fe
fo fe fo-fe (fo-fe)
2
(fo-fe)
2
/fe
14
66
10
27
66
7
19.4
62.5
8
21.6
69.5
9
-5.4
3.5
2.0
5.4
-3.5
-2.0
29.16
12.25
4
29.16
12.25
4
1.50
0.196
0.5
0.135
0.176
0.44
Total 190 190
2
= 4.106
4. CALCULATE DEGREES OF FREEDOM
The formula is
In your example is
5. Check table/ critical values at 0.05 and 0.01 level of
significance:
It is required to find the association with the table
value.
Critical values of 2
from table
0.05 level = 5.99
level = 9.210
6. Make inference
Computed X
2
value = 4.106, it is lower than the critical
values at both levels of significance. When the computed
value is less it says that it is not significant. So we accept
null hypothesis saying that there will be significant
difference between attitudes on child marriages between
male & female.
The Chi Square statistic compares the tallies or counts of
categorical responses between two (or more)
independent groups. (Note: Chi square tests can only be
used on actual numbers and not on percentages,
proportions, means, etc.)
2 x 2 Contingency Table Chi-square
2
II
formula
Variable 2 Data type 1 Data type 2 Totals
Category 1 a b a + b
Category 2 c d c + d
Total a + c b + d
a + b + c + d =
N
Table General notation for a 2 x 2
contingency table.
Variable 1
For a 2 x 2 contingency table the Chi Square statistic is
calculated by the formula:
2
= n (AD BC)
2
(A+B) (C+D) (A+C) (B+D)
Note: notice that the four components of the
denominator are the four totals from the table
columns and rows.
Suppose you conducted a drug trial on a group of
animals and you hypothesized that the animals
receiving the drug would survive better than those
that did not receive the drug. You conduct the study
and collect the following data:
Ho: The survival of the animals is independent of
drug treatment.
Ha: The survival of the animals is associated with drug
treatment.
Table . Number of animals that survived a
treatment.
Dead Alive Total
Treated 36 14 50
Not treated 30 25 55
Total 66 39 105
Applying the formula above we get:
Chi square (
2
)
= 105[(36) (25) - (14)(30)]
2
/
(50)(55)(39)(66) = 3.418
Before we can proceed we need to know how many
degrees of freedom we have. When a comparison is
made between one sample and another, a simple rule
is that the degrees of freedom equal (number of
columns minus one) x (number of rows minus one)
not counting the totals for rows or columns. For our
data this gives (2-1) x (2-1) = 1.
We now have our chi square statistic (
2
= 3.418), our
predetermined alpha level of significance (0.05), and
our degrees of freedom (df =1). Since our
2 statistic
(3.418) did not exceed the critical value for 0.05
probability level (3.841) we can accept the null
hypothesis that the survival of the animals is
independent of drug treatment (i.e. the drug had no
effect on survival).
Probability level (alpha)
Df 0.5 0.10 0.05 0.02 0.01 0.001
1 0.455 2.706 3.841 5.412 6.635 10.827
2 1.386 4.605 5.991 7.824 9.210 13.815
3 2.366 6.251 7.815 9.837 11.345 16.268
4 3.357 7.779 9.488 11.668 13.277 18.465
5 4.351 9.236 11.070 13.388 15.086 20.517
300 cases of Typhoid are admitted in a hospital in one
year. 150 cases were given ciprofloxacin and 150 cases
were given chloramphenicol. Which drug has better
cure rate.
Null HY: There is no significant difference in cure rates
among the two 2 drugs.
Drugs Curved Not cured Total
Ciprofloxacin
Chlorampheni
col
143 (A)
197 (C)
7 (B)
13 (D)
150
150
Total 280 20 300
2
= n (AD BC)
2
(A+B) (C+D) (A+C) (B+D)
2
= (143 x 13 7 x 137)2 x 300 = 810000 x 300
150 x 150 x 280 20 12600000
= 24300000 = 1.928
12600000
df = (r-1) (c -1) = (2-1) (2-1) = 1
At 0.05 level of significance critical value is 3.84
and computed value(1.928) is less so it is not
significant. Accept null hypothesis
The mothers of 2 hundred adolescents (some of
them were graduated and others were non
graduates) were asked whether they agree or
disagree certain aspects of adolescent behavior.
Null Hypo: Attitudes of mothers is independent of their
being graduates or non graduates.
Agree Disagree Total
Graduate
mothers
Non
Graduate
mothers
38 (A)
84 (C)
12 (B)
66 D
50
150
Total 122 78 N = 200
2
= n (AD BC)
2
(A+B) (C+D) (A+C) (B+D)
= 200 (38X66 84 X 12)
2
50 X 150 X 122 X 78
= 200 (2508 - 1008) = 200 X (1500)
2
71370000
= 200 X 2250000 = 45000000 = 6.305
71370000 71370000
Df = (r-1) (c-1)
= (2-1) (2-1) = 1
Table values at 0.05 level of significance = 3.841.
The computed value is higher than the
critical table value at 0.05 level of
significance so the difference is significant
and we reject null hypothesis and say that
attitudes are influenced by educational
status of the mother.
YATES FORMULA
Note: If any one of the self frequency in a (2x2)
contingency table less than 5, we use Yates
correction in the formula for calculating chi-
square test statistic. The corrected formula is.
) )( )( )( (
)
2
| (|
2
2
d b c a d c b a
N
bc ad N
x
+ + + +
=
Example:
The following data was obtained in an investigation in the
effect of vaccination of small pox
Vaccinat
ed
Non
vaccinated
Total
Attack by small
pox
3 (a) 12 (b) 15
Not attack by
small pox
8 (c) 5 (d) 13
11 (a+c) 11 (b+d) N = 28
Examine whether vaccination is effective in preventing
small pox.
Sol. Here, we want to test
H
O
: there is no association between attack of small pox
and vaccination (i.e, vaccination is not effect)
H
1
: There is association between attack of small pox
and vaccination (i.e. vaccination is effective).
For testing association we can use Chi
2
test. A= 3, b=12,
c=8, d=5:
A=3 Here 1
st
frequency is less than 5 hence
We can use Yates correction i.e.,
Tabled value of x
2
at 5% level with 1 df. = 3.84
44 . 3
36465
4489 28
36465
) 14 81 ( 28
17 11 13 15
] 14 ) 96 15 [( 28
) 12 5 )( 8 3 )( 5 8 )( 12 3 (
|) 8 12 5 3 (| 28
) )( )( )( (
)
2
| (|
2
2
2
2
2
=
=
=
+ + + +
=
+ + + +
=
d b c a d c b a
N
bc ad N
x
Inference the calculated value is less than the tabled
value of x2 at 5% level with 1 df hence we will accept
the Null Hypothesis. So vaccination is not effective in
the attack of small pox.
Regression:
After having understood the correlation between two
variables it is necessary to estimate or predict the
value of one character (variable say Y) from the known
value of the other character (variable say X) such as to
estimate height when weight is known. This is
possible when the two (variables) are linearly
correlated. The variable (Y, i.e. height) to be estimated
is called dependent variable and the variable (X, i.e.
weight) which is known, is called independent
variable. This is done by means of regression line or
equation.
Regression is the measure of the average relationship
between two or more variables in terms of the original unties
of the data. The prediction or estimation of most likely
values of one variable for specified values of the other is done
by using suitable equations involving the two variables. Such
equitations are known as Regression Equations.
In linear regression the relationship between the two
variables X and Y is linear (i.e., straight line of the type X = a
+ bY (or) Y = a + bX). In order to estimate the best average
values of the two variables two regression equations are
required. One equation is used for estimating the value of X
variable for a given value of Y variable and the second
equation is used for estimating the value of Y variable for a
given value of X variable. Therefore, the two lines of
regression are:
(i) Regression equation of X on Y is
(ii) Regression equation of Y on X is
Where X = Value of X; = Mean of X values
Y = Value of Y; = Mean of Y values
= std. Deviation of X values or series
= std Deviation of Y values or series r =
correlation coefficient between X and Y
) ( Y Y r X X
y
x
=
o
o
) ( X X r Y Y
x
y
=
o
o
x
o
y
o
Regression Coefficients:
Regression coefficient of Y on X is denoted by and
regression coefficient of X on Y is denoted by b
xy.
These are found by either of the following three
formulate.
(i) If correlation coefficient, r is ready calculate d,
the regression
Coefficients are derived as
And
(ii) If means are already calculated, the regression
coefficients are derived by the least-squares
methods:
x
y
yx
r
b
o
o
=
y
x
xy
r
b
o
o
=
2 2
) (
) )( (
x
xy
X X
Y Y X X
b
yx
=
=
2 2
) (
) )( (
y
xy
Y X
Y Y X X
b
xy
=
=
(iii) If means are not to calculated, then a simple and
direct method is adopted to find regression
coefficient as
And
n
x
x
n
Y X
xy
b
yx
2
2
) (
n
y
y
n
Y X
xy
b
xy
2
2
) (
EXAMPLE 1
The following results of the height and weight of 1000
students:
= 170 cm.; = 60 kg; r = 0.6, o
y
= 6.5 cm, o
x
= 5 kg. Anil
weight 45 kg. Sunil is 165 cm tall. Estimate the height of
Anil from his weight and the weight of Sunil from his
height.
SOLUTION
Here, Height = Y Weight = X
= 170 cm = 60 kg r = 0.6
o
y
= 6.5 cm o
x
= 5
Y
X
(i) The regression equation of Y on X is
2 . 123 78 . 0
8 . 46 78 . 0 170
) 60 ( 78 . 0 170
) 50 (
5
5 . 6
6 . 0 170
) (
=
+ =
= =
=
=
X Y
X Y
X Y
X Y
X X r Y Y
x
y
o
o
When Anils weight X = 45 kg
Then his height Y will be = 0.78 x 45 + 123.2
= 35.1 + 123.2
= 158.3 cms.
Required height of anil = 158.3 cms.
(iii) The regression equation of X on Y is
2 . 18 46 . 0
2 . 78 60 46 . 0
2 . 78 46 . 0 60
) 170 (
5 . 6
5
6 . 0 60
) (
=
+ =
=
=
=
Y X
Y X
Y X
Y X
Y Y r X X
y
x
o
o
When Sunils height Y = 165 cms.
Then his weight X will be
= 0.46 x 165 18.2
= 75.9 18.2
= 57.7 kg
Required weight of Sunil = 57.7 kg.
2423854368905876587789790
Mann-Whitney U Test
Nonparametric test, alternative to two-sample t-
test
Actual measurements not used ranks of the
measurements used
Data can be ranked from highest to lowest or
lowest to highest values
Calculate Mann-Whitney U statistic
U = n1n2 + n1(n1+1) R1
2
Example of Mann-Whitney U test
Two tailed null hypothesis that there is no
difference between the heights of male and
female students
Ho: Male and female students are the same
height
HA: Male and female students are not the
same height
Heights of males
(cm)
Heights of
females
(cm)
Ranks of
male
heights
Ranks of female
heights
193 175 1 7
188 173 2 8
185 168 3 10
183 165 4 11
180 163 5 12
178 6
170 9
n1 = 7 n2 = 5 R1 = 30 R2 = 48
U = n1n2 + n1(n1+1) R1
2
U=(7)(5) + (7)(8) 30
2
U = 35 + 28 30
U = 33
U = n1n2 U
U = (7)(5) 33
U = 2
U 0.05(2),7,5 = U 0.05(2),5,7 = 30
As 33 > 30, Ho is rejected
Research Designs to Appropriate Statistical
Analyses
-----------------------------------------
DESIGN STATISTICAL TEST
-----------------------------------------
EXPERIMENTAL DESIGN
1. Basic two-group design 1. a. t-test - independent means
(Interval or ratio data)
b. Mann-Whitney U test
(Ordinal data)
c. Chi-square (nominal data)
2. Pre-test and post-test 2. a. t-test - dependent
Design. (non-independent) means
(Interval)
b. Wilcoxon or Sign test
(Ordinal)
c. McNemar test (Nominal)
4. Covariance, or repeated 4. a. Repeated measures analysis
measures design. of variance OR Analysis of
Co-variance (Interval)
b. Friedman's AOV by ranks
(Ordinal)
c. Cochran's Q (Nominal)
5. Three or more groups 5. a. Analysis of variance
Design (Interval)
b. Kruskal-Wallis (Ordinal)
c. Chi-square test
Independent groups (Nominal)
DESCRIPTIVE RESEARCH
6. one-group sample from a 6. a. One-group t-test (Interval)
Known population. b. Kolmogorov-Smirnov test for
Goodness-of-fit (Ordinal)
c. Chi-square goodness-of-fit
Test (Nominal)
Summary of Statistical Tests
t -test for independent means
t -test for dependent means
One group t-test
Analysis of variance (ANOVA)
Repeated measures analysis of
variance (RAOV)
Analysis of covariance (ANCOVA)
Categorical I.V and D.V
a. Chi-square
b. Cochran's Q
c. McNemar test
d. Lambda beta
When data are scores (ordinal measurement) use
these methods. These can be used with interval data as
well by converting the interval data to ranks.
Spearman's rank order correlation
Kendall's Tau
Kolmogorov-Smirnov test -
Mann-Whitney U-test
Wilcoxon Matched-pairs,
Kruskal-Wallis -
Friedman analysis of variance by ranks -
Relationships research questions and require large
sample sizes, i.e., twenty or more per group. Smaller
sample sizes can be used (10 to 15 per group) but the
validity of the results can be reduced.
1. Pearson's product moment correlation
coefficient
2. Regression analysis
3.Multiple regression analysis
4. Multivariate Multiple Regression Analysis -
What is Excel?
Data are organized by worksheets, rows and columns
Worksheet limits are 256 columns and 65,536 total cells
Cells contain data or formulas with relative or
absolute references to other cells
Direct manipulation of data and flexibility to move
data around (e.g. sorting, replacing, merging)
Opens many file types
Quite useful in prepping files for use in SPSS, SAS
or other programs
What is SAS?
SPSS (originally, Statistical Package for the Social
Sciences) was released in its first version in 1968 after
being developed by Norman H. Nie and C. Hadlai Hull
A general purpose statistical package with a basic
programming capability utilizing scores of statistical and
mathematical functions in numerous modules
Can readily access data from a wide variety of sources,
perform data management, and present findings in a
variety of report and graph formats
Provides powerful tools for both specialized and
enterprise-wide analytical needs
What is SPSS? (Statistical package for social
sciences)
It is used by market researchers, health researchers,
survey companies, government, education
researchers, marketing organizations and others.
Program functionality is broken into over a dozen
different modules which are sold individually
Most commonly used are Base, Regression Models, and
Advanced Models
Other modules can be installed to run more complex
analyses
SPSS data files include both the data and also
variable information (variable and value labels,
formats and missing values). It has versions of
SPSS
SPSS - Strengths
Easily opens data from other programs such as Excel
and SAS
Variable view screen allows for quick overview of file
contents and allows for easy modifications of names,
formats, labels, and variable order
Having all data information in a single file allows
sharing files on a project to be very easy
Point-and-click menus do not require memorizing
syntax for majority of procedures
Many procedures can be expanded beyond the menu
options in syntax
Split-file command allows all output to be replicated
for various groups through a single command
Journal file tracks all commands used for life of
program, with good resources to find code
accidentally deleted
SPSS Weaknesses
Ease of doing data manipulation can sometimes lead
to mistakes as the program does not preclude
inappropriate modifications to the data
Matching feature requires exact match
Duplicate records generate warnings but can be marked in
file
Error logs are hard to interpret at times
Incompleteness of menus means some options are
only available via syntax
While the majority of output is saved as pivot tables
allowing great flexibility in modifying tables
Output tables and graphs generally not done as well as
Excel and are harder to manipulate
What is analyzed in Statistics?
Descriptive statistics: Cross tabulation, Frequencies,
Descriptive, Explore, Descriptive Ratio Statistics
Bivariate statistics: Means, t-test, ANOVA,
Correlation , Nonparametric tests
Prediction for numerical outcomes: Linear regression
Prediction for identifying groups: Factor analysis,
cluster analysis
LISREL (statistics package used in structural equation
modeling)
Ideal for discrete data types
Test data, Likert scale item data
Data can be imported in various types
ASCII, Access, Excel, SAS, SPSS, etc.
Variable names have length restrictions
Data files then stored as system files for later use
Basic statistics (e.g. means and correlations) are
generated in an underlying program called
PRELIS
LISREL itself is used to confirm the structural
validity of a measurement model for any
assessment
Requires syntax and input matrices
HLM
Hierarchical Linear Modeling (HLM) is
becoming a more popular type of analysis,
namely in cohort trend modeling
Also allows you to look at variance component
estimates and regression models given a nested
sample of respondents
Students within countries within global regions on
personality variables
More tedious to set up analysis with fewer
available file types
Also requires more upfront work as multiple data files
are needed
What Program should be used?
Microsoft Excel is the most basic and accessible
spreadsheet program available today
It is most ideal for general data exploration, histograms, scatter
plots, etc.
Appearance of tables can be customized to meet APA standards
Allows for easy transition to other programs to complete analyses
and write reports
However, its heritage is not as a statistical analysis program
Certain statistical programs are designed for specific
analytic tasks
Balance the results and what will being presented
Choose wisely in the interests of efficiency and accuracy of results
Some output is good for looking at the data through basic
exploration and to generate basic tables, but not to present the data
Computers and Research
Computers are now used by researchers through
out the research process to conduct bibliographic
searchers, to learn about funding opportunities
for research projects to collect and store data of
all types, to maintain administrative records etc.
Computers can be used by just about any one;
doctors, policemen, pilots, scientists, nurses,
engineers, and recently even housewives.
Use of computers in research
1. Problem identification
2. Literature review
3. Research design
4. Data collection and analysis
PREPARING THE DATA FOR COMPUTER ANALYSIS AND
PRESENTATION
Computers have clearly created numerous opportunities for
researchers. The computer centers at universities are
particularly likely to have a variety of software packages
available to their users. Sophisticated programmes are
available for performing statistical analysis.
Computers are ideally suited fore data analysis concerning
large research projects. Researchers are essentially concerned
with huge storage of data, their faster retrieval when required
and processing of data with the aid of various techniques. In
all these operations, computers are very helpful.
Computers do facilitate the research work. Innumerable data
can be processed and analyzed with greater ease and speed.
Moreover, the results obtained are generally correct and
reliable. Not only this, even the design pictorial graphing and
report are being developed with the help of computers.
Steps for preparing the data for analysis and
presentation
The data organization and coding
Storing the data in the computer
Selection of appropriate statistical measures
I. mean
II. median
III. standard deviation ,
IV. frequency distribution
V. percentages
VI. range
VII. Variance
Selection of appropriate software package
SPSS
PSS represent a highly flexible programme with a syntax
that is not technically oriented. It has a data entry
programme that can be used to create data files for
subsequent analysis. It can be perform most widely used
multivariate analysis , including multiple regression,
analysis of covariance, discriminate function analysis,
factor analysis, multivariate analysis of variance, logistic
regression, life table analysis etc.
SAS
SAS is integrated set of data management tools that includes a
complete programming, language as well as modules for multiple
functions including spreadsheets, project management, scheduling
and mathematic, engineering and statistical applications.
Execution of the computer program
The computer may be operated to execute instructions.
Limitations of computer based analysis
Computers are machines that only compute, they do not think.
As such, researchers should be fully aware about the following
limitations of computer-based analysis.
Computerized analysis requires setting of an elaborate system of
monitoring, collection and feeding of data. All these require
time, effort, and money. Hence, computer based analysis may
not prove economical in case of small projects.
Various items of detail, which are not being specifically fed to
compute, may get lost sight of.
The computer does not think; it can only execute the instructions of
a thinking person. If poor data or faulty programs were introduced
into the computer, the data analysis would not be worthwhile.
Bibliography:
C.R .Kothari, Research Methodology, 2
nd
edition, New Delhi New
age international (p) Ltd, 2005.page no 122-151 and 361-371.
F.Polit et al, nursing research principles and methods, 6
th
edition,
New York, Lippincott Williams and Wilkins, 1999, page no 437-602.
John, R. Cutcliffe et al, The essential concepts of Nursing, 1
st
edition
China, Elsevier limited, 2005, page no 125-140.
Basavantappa, Nursing Research, 1
st
edition, New Delhi, Jaypee
brothers medical publishers (p) ltd, 1998, page no 219-230.
Dorothy Young, Fundamentals of Nursing research, 3
rd
edition,
London, Jones and Barllet publishers, 2003, 277-322.
Eleanor Walter et; al; Elements of nursing, 2nd edition, London,
The C.V Mosby company, 1977, page no 331-381.
Catherin H.C, Seamon, Research Methods Principles, Practice,
and theory for Nursing, third edition, New York, 1987, page no 331-381.
http://www.stats.gla.ac.uk/steps/glossary/hypothesis_testing.ht
ml#1err
THE END