Spss Assignment
Spss Assignment
Spss Assignment
SPSS is a comprehensive system for analyzing data. SPSS can take data from almost any
type of file and use them to generate tabulated reports, charts, and plots of distributions and
trends, descriptive statistics, and complex statistical analyses. SPSS makes statistical analysis
more accessible for the beginner and more convenient for the experienced user. Simple menus
and dialog box selections make it possible to perform complex analyses without typing a single
line of command syntax.
SPSS has four windows--- Data editor; Output viewer; Syntax editor; Script window.
Many tasks can be performed with the menus and dialog boxes but some very powerful features
are available only with command syntax. Graphs command is used exclusively in SPSS to make
graphs. SPSS usually creates commonly used graphics in the fields of social science, such as
histograms, scatterplots, and regression line, etc. The Graphs command allows changing aspects
of axes, adding text, changing color and font, copying, pasting, and exporting, etc. You could
also manually editing graphics for publication. For more information regarding making
publication ready graphics, please refer “http://www.techdocs.ku.edu/docs/spss_output-
publishing.pdf”. However, SPSS cannot give you very complicated statistical graphics, such as
maps, contour plot, etc.
1
1.2 OPENING SPSS
2
We will open an existing dataset.
To open the dataset for analysis, you have to go to the Folder in which it is located
If the dataset is saved as a .sav file, then it will appear in the folder
1.3 SPSS MAIN MENUS
SPSS has 11 main menus, which provide access to every tool of the SPSS program. We
can see the menus on the top of Figure
Menus
3
The File, Edit and View menus are very similar to what we get on opening a spreadsheet.
The File menu let us open, save, print and close files and provides access to recently used files.
The Edit menu let us do things like cut, copy, paste etc. The View menu we can hide or show the
toolbar, status bar, gridlines etc.
The Data menu is an important tool in SPSS. It allows us to manipulate the data in
various ways. We can define variables, go to a particular case, sort cases, transpose them, merge
cases as well as variables from some other file. The Transform menu is another very useful tool,
which let us compute new variables and make changes to existing ones.
The Analyze menu is the function which lets us perform all the statistical analyses. This
has various statistical tools categorized under different categories. The Graphs menu lets us
make various types of plots from our data. The Add-ons tells us about other programs of the
SPSS family such as Amos, Clementine etc. In addition, we can find the newly added functions
under Add-ons . Finally , The Window and Help menus are very similar to other Windows
application menus (Gaur & Gaur, 2010).
1.4 SPSS WINDOWS HAS 4 WINDOWS:
Data Editor
Viewer or Draft Viewer which displays the output files
Syntax Editor, which displays syntax files
Script window
1.4.1 The Data Editor has two parts:
Data View window, which displays data from the active file in spreadsheet format
Variable View window, which displays metadata or information about the data in the
active file, such as variable names and labels, value labels, formats, and missing value
indicators.
1.4.2 SPSS Data View
By default the Data View, which allows the data to be entered and viewed. The user can toggle
between the windows by clicking on the appropriate tabs on the bottom left of the screen. The
place to enter data, where
Columns: variables
Rows: records
4
Fig.4 Data view
1.4.3 SPSS Variable View
The Variable View spreadsheet serves to define the variables. Each variable definition occupies a
row of this spreadsheet. As soon as data is entered under a column in the Data View, the default
name of the column occupies a row in the Variable View. The place to enter variables, where
5
There are 10 characteristics to be specified under the columns of the Variable View
1.Name — the chosen variable name. This can be up to eight alphanumeric characters but must
begin with a letter. While the underscore (_) is allowed, hyphens (-), ampersands (&), and spaces
cannot be used. Variable names are not case sensitive.
2. Type — the type of data. SPSS provides a default variable type once variable values have
been entered in a column of the Data View. The type can be changed by highlighting the
respective entry in the second column of the Variable View and clicking the three-periods
symbol (…) appearing on the right-hand side of the cell. This results in the Variable Type box
being opened, which offers a number of types of data including various formats for numerical
data, dates, or currencies. (Note that a common mistake made by first-time users is to enter
categorical variables as type “string” by typing text into the Data View. To enable later analyses,
categories should be given artificial number codes and defined to be of type “numeric.”)
3. Width — the width of the actual data entries. The default width of numerical variable entries
is eight. The width can be increased or decreased by highlighting the respective cell in the third
column right-hand side of the cell or by simply typing a new number in the cell.
4. Decimals — the number of digits to the right of the decimal place to be displayed for data
entries. This is not relevant for string data and for such variables the entry under the fourth
column is given as a greyed-out zero. The value can be altered in the same way as the value of
Width.
5. Label — a label attached to the variable name. In contrast to the variable name, this is not
confined to eight characters and spaces can be used. It is generally a good idea to assign variable
labels. They are helpful for reminding users of the meaning of variables (placing the cursor over
the variable name in the Data View will make the variable label appear) and can be displayed in
the output from statistical analyses.
6. Values — labels attached to category codes. For categorical variables, an integer code should
be assigned to each category and the variable defined to be of type “numeric.” When this has
been done, clicking on the respective cell under the sixth column of the Variable View makes the
three-periods symbol appear, and clicking this opens the Value Labels dialogue box, which in
turn allows assignment of labels to category codes. For example, our data set included a
categorical variable sex indicating
6
7. Missing — missing value codes. SPSS recognizes the period symbol as indicating a missing
value. If other codes have been used (e.g., 99, 999) these have to be declared to represent missing
values by highlighting the respective cell in the seventh column, clicking the three-periods
symbol and filling in the resulting Missing Values dialogue box accordingly.
8. Columns — width of the variable column in the Data View. The default cell width for
numerical variables is eight. Note that when the Width value is larger than the Columns value,
only part of the data entry might be seen in the Data View. The cell width can be changed in the
same way as the width of the data entries or simply by dragging the relevant column boundary.
(Place cursor on right-hand boundary of the title of the column to be resized. When the cursor
changes into a vertical line with a right and left arrow, drag the cursor to the right or left to
increase or decrease the column width.)
9.Align — alignment of variable entries. The SPSS default is to align numerical variables to the
right-hand side of a cell and string variables to the left. It is generally helpful to adhere to this
default; but if necessary, alignment can be changed by highlighting the relevant cell in the ninth
column and choosing an option from the drop-down list.
10. Measure — measurement scale of the variable. The default chosen by SPSS depends on the
data type. For example, for variables of type “numeric,” the default measurement scale is a
continuous or interval scale (referred to by SPSS as “scale”). For variables of type “string,” the
default is a nominal scale. The third option, “ordinal,” is for categorical variables with ordered
categories but is not used by default. It is good practice to assign each variable the highest
appropriate measurement scale (“scale” > “ordinal” > “nominal”)since this has implications for
the statistical methods that are applicable( Sabine & Brian,2004).
To create a new variable you can either use the dropdown menu in SPSS or you could use syntax
for SPSS. For this example, the dropdown menu will be used to create a new variable
To start, go to the "transform" option on the menu bar.
Then go down and click on "Compute Variable..."
7
Fig.6 Compute variable
8
You give ID number for each case (NOT real identification numbers of your subjects) if
you use paper survey.
If you use online survey, you need something to identify your cases.
You also can use Excel to do data entry.
Example of a code book
A code book is about how you code your variables. What are in code book?
1. Variable names
2. Values for each response option
3. How to recode variables
10
• SPSS is much better at handling numeric variables than string variables (categorical data
entered as text).
• Therefore, if you want to transfer data from Excel to SPSS it is a good idea to ensure that
any categorical data (e.g. yes/no/don’t know, male/female, etc.) are entered in Excel as
numeric data (codes) rather than text.
• For example, you could always code ’No’ as 0 and ’Yes’ as 1, and so on.
Option 1: Copy and paste data from another spreadsheet directly into the Data Editor.
Option 2:
• 1 Start SPSS.
• 5 Check that the box labelled “Read variable names from the first row of data” is ticked
and click OK (that is if the first row in excel contains your variable names, otherwise
leave un-ticked)
After reading in the data It is a good idea to ’tell’ SPSS what the codes for your categorical
variables are. This ensures that tables and graphs are labelled appropriately. More detailed
instructions:
1. Click on the Variable View tab in the bottom left hand corner of the data editor window.
2. Look at the row for the variable you’re dealing with and go to the Values column. Click
on the word None.
3. Click on the little grey square (with dots in it) on the right.
4. Enter the first value (code) — e.g. 0 — and the corresponding label — e.g. No — then
click on Add.
5. Repeat until you have entered all the labels & codes for this variable, then click OK.
6. Repeat this process for the other categorical variables.
Entering data
• Can also transfer in data from other databases
11
• Ideally use Stat/Transfer – commercially produced software package for transferring
data between spreadsheet software packages including Excel, Lotus , Paradox, Dbase and
Quattro Pro and statistical software packages.
• Import ASCII (text) files, File | Read ASCII Data
Shape of data
• SPSS usually requires data in wide format
• One row per observation (e.g. one row per patient)
• Columns represent the different variables
Rules for Defining Variable Names
• The name must begin with a letter.
• Maximum of 8 characters and no spaces.
• Names must be unique.
• @ # _ or $ allowed.
• A full stop can be used but not as the last character, so best avoided.
• The space character and others such as * ! ? And ‘ are not allowed.
• Names are not case sensitive so ID, id and Id are identical
• Certain SPSS keywords are no allowed as variable names they are:
ALL TO WITH BY AND
OR NOT EQ NE LE
LT GE GT
1.9 OUTPUT VIEWER
Where results of statistical analysis performed via analyze are displayed (will open
automatically when analysis is performed).
12
1.10 STORING AND RETRIEVING DATA FILES
Storing and retrieving data files are carried out via the drop-down menu available after
selecting File on the menu bar.
1.10.1 Saving the Dataset
A data file shown in the Data Editor can be saved by using the commands Save or Save
As…. In the usual Windows fashion Save (or from the toolbar) will save the data file under its
current name, overwriting an existing file or prompting for a name otherwise. By contrast, Save
As always
In order to save the new dataset that has been imported into SPSS, we can click on the
File menu and then go down the save option
Alternatively, we could also click on the button with the litle disk on it to save the file as
well
Go to the folder in which you would like to save the file
Type in a name for the dataset
In this example, the dataset was called "Navel Data.sav".
Once this is done then click "Save" and the file will now be saved as an SPSS dataset.
1.10.2 Opening the Dataset
To open existing SPSS data files we use the commands File – Open – Data… from the menu
bar (or from the toolbar). This opens the Open File dialogue box from which the appropriate file
can be chosen in the usual way . Recently used files are also accessible by placing the cursor
over Recently Used Data on the File drop-down menu and double clicking on the required file.
In addition, files can be opened when first starting SPSS by checking Open an existing data
source on the initial dialogue box (SPSS guide,2007) .
Before you perform analysis in SPSS, let’s set up the following option.
Go to Edit, Options,..
13
Fig.11 options
Fig.12 Frequencies
14
Click Statistics.
Select Mean, Median, Std. deviation, Minimum, and Maximum
Fig.13 Frequencies-Statistics
Click Continue.
Click OK to run the procedure.
The Frequencies Statistics table is displayed in the Viewer window
The first thing that would be done would be to go to the "Analyze" option on the menu
bar.
Then go down to "Descriptives" and click on the "Frequency" option
15
Fig.15 Frequency Distribution
Fig.16 Frequencies-Statistics
Click Continue
Click OK to run the procedure
17
In addition to the frequency tables, the same information is now displayed in the form of
bar Charts.
The first step in data analysis is simply understanding what the data mean. In order to guide this
process, the social sciences often use the concept of measurement levels. Measurement levels are
classifications of variables. Each variable is classified as having one of the following four
measurement levels: ratio, interval, ordinal and nominal.
Mainstream statistics recognises four levels or scales of measure. These are Nominal,
Ordinal, Interval, Ratio (combined with Interval as Scale in SPSS)
These are in order from most name-like to most number-like. Each level has its own
characteristics and association with a set of permissible statistical procedures. Below, the level
will be characterised and associated with one or more measures of central tendency, viz., mode,
mean, and median.
18
The Nominal level of measure is used for categorical data, where each value has each been
assigned to a discrete category. For instance, eye color of participants in a study might be
nominally (from Latin nomen for name) categorised into groups brown, blue, green, other
The only procedure for quantitative analysis of these data is counting, to discover
frequencies of occurrence. That is, how many individuals are assigned to each category. The
categories are often coded numerically (i.e., assigned a unique number) and named (using
the values attribute of the variable). The only measure of central tendency for nominal data is
the mode, which is the most frequently occurring category. Note that there is no guarantee that a
sample will produce a unique modal value.
Ordinal Data:The Ordinal level of measure is used for data which form discrete categories and
can be naturally ranked on some scale. This ranking is a weak ordering of the data in that two
values may share the same rank: the relative rank of a and b is: a < b or a > b or a While
income might be treated as a scalar variable, it is often useful to create categories out of income
ranges. As with nominal data, these groupings might each be represented by a numeric code.
The central tendency in ordinal data may be represented by the mode (defined above) and by
the median, the value that divides the data into equal halves.
Ratio data is all of: ordered, of comparable distance (successive, integral points on the scale are
equally spaced), and on a scale with a true zero point.
For example, consider the measurement of height in meters: some objects have no elevation,
which naturally maps to a height of zero meters. The values can also form ratios, such that any
value can be expressed as a ratio of other values. If we find, for example, three people of heights
1.5m, 1.75m and 2m, we can express any one of these as a multiple of any other.
The central tendency in scale data can be indicated by the mode, the median and
the arithmetic mean. The first two are discussed above. The arithmetic mean is the sum of all the
data values divided by the number of data points. In other words, it is what is commonly referred
to as the average.
19
2.4.2 Dichotomous, Categorical and Metric
Although four measurement levels have been defined, we often use a different classification for
practical purposes.
Categorical variables are variables on which calculations are not meaningful. They thus
comprise nominal and ordinal variables. The distinction between these is of minor importance
because we often use the same analytical procedures for them.
Metric variables are variables on which calculations are meaningful. They thus
comprise interval and ratio variables. It's rarely necessary to distinguish these and we virtually
always use the same analytical procedures from them.
Dichotomous variables are variables that have only two distinct valid values. Dichotomous
variables are a special case of metric variables; calculations are meaningful and usually result in
a proportion or a percentage. It is useful to distinguish dichotomous variables as a separate
measurement level because they require different analytical procedures than other variables, such
as SPSS Independent Samples T Test and SPSS Binomial Test.
T-tests and z-tests are commonly used when making comparisons between the means of two
samples or between some standard value and the mean of one sample. There are different
varieties of t-tests which are used in different conditions depending on the design of the
experiment or nature of the data.
2. Move a scale variable to dependent, move the category variable to factor- click Statistics
(fig.22)
20
3. Select Descriptors- click continue- click Plots (fig 23)
Table. 1 result
21
2.5.2 Independent t- test
Click Analyze- Compare Means- Independent Samples t test
Select the test variable and the grouping variable, then click Define Groups
table. 2
22
Research Question ( Independent Samples t test )
A sample of students was administered a Math test (pretest scores). Afterwards, they
went through a special Math class designed to improve their scores. At the end of the course,
they were administered the same Math test (posttest scores). We want to determine if the Math
class was effective in increasing Math scores.] RQ: Is there a difference between mean pretest
and posttest Math scores?
Results
The following table presents descriptive statistics on the pretest and posttest Math scores. As can
be seen from this table, the sample mean Math score was higher at posttest (M = 88.4) than at
pretest (M = 82.1)
Table-3
23
Move the Numerical variable to the dependent list and move the categorical variable to the
Factor list
24
Research Question (ANOVA)
A sample of students has been randomly assigned to three groups. Each group attended
one specific Reading class. Afterwards, all students were administered a standardized test. The
objective is to determine which class (if any) was associated with the highest average test score]
RQ: Are there differences in the mean Reading test scores among students in Classes A, B &C?
Results- The following table presents descriptive statistics on the Reading test scores by Class:
Table-5
Tukey's I-15D was used to conduct multiple comparisons among the three groups. Results
showed that students in Class C had a significantly higher score than students in Class A (p =
0.033).
2.6 CORRELATION ANALYSIS
Correlation is a measure of relationship between two variables. The Correlation
coefficient gives a mathematical value for measuring the strength of the linear relationship
between two variables. In the SPSS program, there are three types of correlation coefficients:
Pearson’s, Kendall’s tau-b, and Spearman’s. While Pearson’s coefficient is commonly used for
continuous data, the other two are used mainly for ranked data.
25
In SPSS
Under the Analyze menu, choose Correlate, then choose Bivariate. Move all variables you are
interested in correlating to the "Variables" box. Make sure that "Pearson" is checked in the
Correlation Coefficients box. This will produce a correlation matrix which shows the correlation
coefficients among all possible pairs of variables from the ones selected.
26
From this table, 66.7% of males stated they would vote for Candidate A, while 45.5% of
females stated they would vote for that candidate. However, results of Pearson's chi-square test
showed that there was no relationship between gender and preference for Candidate A (Chi-
Square(1) = 0.9, p = 0.343).
27
The R-Squared estimate was 0.433, suggesting that IQ and gender explained 43.3% of the
variability in Math scores. As can be gleaned from the above table, only IQ was significantly
related to Math Score (b = 0.945, p = 0.003). This suggests that each 1-point increase in IQ was
associated, on average, to a 0.945-point increase in Math test scores.
28