Nothing Special   »   [go: up one dir, main page]

Appendix R

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Appendix R

The following syntax are indicative R Syntax setup to run the main analyses used in the data analyses
examples in this book. In Chapter 3 we show you the basic way of run the main statistical analyses
with SPSS, below we set out examples of how to run these in R. Those familiar with R will be aware
that there are many ways to run any one type of analysis. Here we suggest the syntax that produces
most (if not all) of the output that SPSS produces. There may be more elegant ways to conduct the
analyses, but these examples are the methods here are how the authors run these analyses with R.

Note we indicate that we are running the syntax files from the same working directory that the data
files are stored in, that way we do not have to indicate the directory location of our datafiles here.
However, the working directory would obviously need to be adjusted to match the location that the
analyst has stored the datafiles. These examples will run using the datafiles indicated, assuming that
you have downloaded these from the Kogan Page WEB address: koganpage.com/PHRA2

To find out what working directory your R programme is currently running from type:

getwd()

A simple way of managing your syntax and directory location storage is to have the syntax file saved
in the same directory as your working directory and have your datafile saved in that directory also.
To set the working directory you need to use the setwd(“”) command, e.g. here we are working off
our C: hard drive:

setwd("C:\\Nameofyourdesiredworkingdirectory\\")

this may be a number of folders deep, e.g.:

setwd("C:\\Users\\MyRFiles\\Nameofyourdesiredworkingdirectory\\")

You can also select the directory interactively (using your mouse) using the following command:

setwd(choose.dir())

You will find the R syntax and datafiles for every analyses conducted in the data analyses Chapters 4-
11 in the data and syntax download files available at the following WEB address:

koganpage.com/PHRA2

A syntax file (called “Appendix R Syntax.R”) with all of the syntax set out below is included in the R
syntax files download
#################################

# Appendix R1 #
#R Code Matching Figure 3.1: running a Crosstabs and Chi square

#read the datafile (here Appendix R1.dat) in working directory location and give name
#(here DataForChi)for the data that R is reading
DataForChi<-read.table("Appendix R1.dat",header=T,sep="\t")
#obtain names in the file
names(DataForChi)
#e.g. "Gender" "LeaverStatus" "Country"
#attach the datafile for analyses
attach(DataForChi)
#create Factor (call it a new name or leave it as is) and label values for categorical variables
LeaverNoYesFact <- factor(LeaverStatus,
levels = c(0,1),
labels = c("Stayer", "Leaver"))
#make a frequency table of this, call this table Freq1 here
Freq1=table(LeaverNoYesFact)
Freq1
#Create a table of this factor
PcntLeaver <- table(LeaverNoYesFact)
#ask for proportions of the levels in the factor
prop.table(PcntLeaver)
#follow same procedure with the other categorical variable that you are going to use:
CountryCode <- factor(Country,
levels = c(1,2,3,4,5,6,7,8,9,10),
labels = c("Belgium", "Sweden", "Italy", "France", "Poland", "Mexico", "Spain", "UK", "USA",
"Australia"))
Freq2=table(CountryCode)
Freq2
PcntCountry <- table(CountryCode)
prop.table(PcntCountry)
#produce basic frequency cross-tabulation across these two variables
CountryByLeaver=table(CountryCode , LeaverNoYesFact)
CountryByLeaver
#This gives you your marginal Freq of country (regardless of stayer V Leaver)
margin.table(CountryByLeaver,1)
#the following will give you the proportions of stayer V Leaver in each country:
prop.table(CountryByLeaver,1)
#This gives you your marginal Stayer V Leaver in your table
margin.table(CountryByLeaver,2)
#the following will give you the proportions of Country in each level of Stayer Versus Leaver
prop.table(CountryByLeaver,2)
#round this to 2 decimal places
round(prop.table(CountryByLeaver,2),digits=2)
#This gives you your Chi square test
chisq.test(CountryByLeaver)
##################################

# Appendix R2 #
#R Code Matching Figure 3.2: running a Binary Logistic Regression

#read the datafile (here Appendix R2.txt) in working directory location and give name (here
DataForLOG)for the data that R is reading
DataForLOG<-read.table("Appendix R2.txt",header=T,sep="\t")
#obtain names in the read file
names(DataForLOG)
# e.g. some variable in this dataset
#"ApplicantCode" "Gender" "BAMEyn" "ShortlistedNY"
#make the dataset live
attach(DataForLOG)

#you then set up (and name, modelFig32 here) the logistic regression model
#notice the glm is the key command here, then the regression formula is set out: DV~IV1 + IV2
modelFig32 <- glm(ShortlistedNY ~ Gender + BAMEyn, family=binomial("logit"))
#this gives results of the logistic model
modelFig32
summary(modelFig32)
#for odds ratios:
exp(coef(modelFig32))
#for some Logistic regression stats (Nagelkerke and pseudo R sq) you need to install package
#rcompanion
#install.packages("rcompanion") if not installed already
library(rcompanion)
nagelkerke(modelFig32, null = NULL, restrictNobs = TRUE)
#################################

# Appendix R3 #
#R Code Matching Figure 3.5: running an Independent t-test

#read the datafile (here Appendix R3.txt) in working directory location


#and give name (here DataForINDt)for the data that R is reading
DataForINDt<-read.table("Appendix R3.txt",header=T,sep="\t")
#obtain names in the file
names(DataForINDt)
#e.g. "ID" "Engagement" "LondonorNot"
attach(DataForINDt)
#Levene's needs the binary grouping/factor variable to be a "Factor" with the two levels/values
#labelled
LondonFrame <- factor(LondonorNot,
levels = c(1,2),
labels = c("London", "Not London"))
#if we want to get frequencies for this Factor
Freq1=table(LondonFrame)
Freq1
#we need to install and load a package (if not already installed) called "car"
library(car)
#run homogeneity of variance test
leveneTest(EMPsurvEngagement, group = LondonFrame, center = mean)
#ask for the two t-tests one with and one without assuming equal variances
t.test(EMPsurvEngagement~LondonFrame, var.equal=TRUE)
t.test(EMPsurvEngagement~LondonFrame, var.equal=FALSE)

################################
# Appendix R4 #
#R Code Matching Figure 3.6: running a paired t-test

#read the datafile (here Appendix R4.txt) in working directory location


#and give name (here DataForPairedt) for the data that R is reading
DataForPairedt<-read.table("Appendix R4.txt",header=T,sep="\t")
#obtain names in the file
names(DataForPairedt)
#e.g. "ID" "stressT1" "stressT2" "Gender"
attach(DataForPairedt)
#get descriptives for the two T1 and T2 variables:
#install "pastecs" package
library(pastecs)
Dataforstress <-subset(DataForPairedt, select = c(stressT1,stressT2))
stat.desc(Dataforstress)
#run paired t-test
t.test(stressT1,stressT2,paired=TRUE)
################################
# Appendix R5 # #R Code Matching Figure 3.7 - 3.9: running a One-Way ANOVA
#read the datafile (here Appendix R5) in working directory location
#and give name (e.g. here DataForONEwayANOVA)for the data that R is reading
DataForONEwayANOVA<-read.table("Appendix R5.dat",header=T,sep="\t")
#obtain names in the file
names(DataForONEwayANOVA)
#e.g. "ID" "Country" "TeamEngagement" "TeamSeparation"
#attach the datafile for analyses
attach(DataForONEwayANOVA)
#create Factor for the grouping/factor variable, e.g. here is Country (call it a new name or
#leave it as is) and label values for categorical variables
CountryCode <- factor(Country,
levels = c(1,2,3,4),
labels = c("UK", "USA", "Canada", "Spain"))
#obtain counts for these categories
Freq1=table(CountryCode)
Freq1
#install plyr package if not already installed
library(plyr)
#for some mean descriptives (using tapply command)
meanCountryEng <- tapply(DataForONEwayANOVA$Engagement,CountryCode, mean)
meanCountryEng
meanCountrySep <- tapply(DataForONEwayANOVA$TeamSeparation,CountryCode, mean)
meanCountrySep
# you need to install a number of packages to get the ANOVA in R to mirror the SPSS output
#for Games Howell Post Hoc you need to install userfriendlyscience with XML & some dependencies
library(XML)
#install.packages("userfriendlyscience") if not installed already
library(userfriendlyscience)
# install.packages("htmlwidgets") if not installed already
library(htmlwidgets)
#For levenes test - you will need to need to install and load car
#install.packages("car") # if not already installed
library(car)
# For engagement one way by country Fig 3.6
OneWayEngagement <- aov(Engagement ~ CountryCode)
OneWayEngagement
summary(OneWayEngagement)
# to call for the Levene's test:
leveneTest(Engagement~CountryCode)
#For Welch Test ANOVA with unequal variance
oneway.test(Engagement ~ CountryCode, var.equal=FALSE)
#produces AiC and Total stats
drop1(OneWayEngagement,~.,test="F")
plot(Engagement ~ CountryCode)
#for Tukey Post hoc (Figure 3.7)
TukeyHSD(OneWayEngagement)
#for Games Howell Post hoc (3.7)
PostHocGHEngagement <- oneway(CountryCode, y = Engagement, posthoc = 'games-howell')
PostHocGHEngagement
################################

# Appendix R6 #
#R Code Matching Figure 3.10: running a One-Way repeated Measures ANOVA

#read the datafile (here Appendix R6.txt) in working directory location and
#give name (here DataForRepOneAnova)for the data that R is reading
DataForRepOneAnova<-read.table("Appendix R6.txt",header=T,sep="\t")
#obtain names in the file
names(DataForRepOneAnova)
#e.g. names:
# "ID" "FUNCTION" "ValsCompositeT1" "ValsCompositeT2" "ValsCompositeT3"
#make the dataset live
attach(DataForRepOneAnova)
#install.packages("car") if not already installed
library(car)
#this approach produces lots of output that matches SPSS
# make a multivariate linear model with only the intercept as a
# predictor for your within-participants observations
multmodel=lm(cbind(ValsCompositeT1,ValsCompositeT2, ValsCompositeT3) ~ 1)
# create a factor for your repeatedmeasures variable
Time=factor(c("ValsCompositeT1","ValsCompositeT2","ValsCompositeT3"), ordered=F)
model1=Anova(multmodel,idata=data.frame(Time),idesign=~Time,type="III")
model1
summary(model1,multivariate=F)

################################

# Appendix R7 #
#R Code Matching Figure 3.11: running a Repeated Measures ANOVA with a Between Subject
#Factor

#This has an added level of complication that we need to restructure our wide dataset
#to a tall setup and we need to create a new ID variable
#read the datafile (here Appendix R7.txt) in working directory location and
#give name (here DataForANOVAWithBet)for the data that R is reading
DataForANOVAWithBet<-read.table("Appendix R7.txt",header=T,sep="\t")
#obtain names in the file
names(DataForANOVAWithBet)
#e.g.: "FUNCTION" "ValsCompositeT1" "ValsCompositeT2" "ValsCompositeT3" "Country"
#notice in this dataset there is no ID variable - for this analyses we have to
#create this as it is needed later
#make the dataset live
attach(DataForANOVAWithBet)
#install.packages("car")
library(car)
#need to add an ID variable as this will be needed when transposing the datset to tall (rather than
#wide format)
#install.packages("tidyverse") # if not already installed
library(tidyverse)
# this adds an ID variable to the datafile read and creates a new name (Valsdata) for the
#datafile that now has this extra variable included
Valsdata <- tibble::rowid_to_column(DataForANOVAWithBet, "ID")
names(Valsdata)
# select the variables that you are going to use in the dataset, call this subset ValuesData here:
#the variables you need here are the three repeated measures variable, the grouping variable and
#the new ID variable
ValuesData <-subset(Valsdata, select = c(ID, FUNCTION, ValsCompositeT1, ValsCompositeT2,
ValsCompositeT3))
# trim the NAs (if they exist) from this subset of variables and create a clean dataframe called
#"Trimmed" here
Trimmed <- na.omit(ValuesData)
# now need to transform the wide datset to tall stacked version - this will create a new variable of
#Time and ValueLevel
# from your "Trimmed" dataframe and stack the ID and composite variables. This new stacked
#dataset
# is called "Datalong" here. Note your new catagorical variable indicating what Time period the
#data refers to is called "Time", the "ValueLevel" variable is your new Dependent variable name
#and you indicate the three repeated measure variables than get stacked into this "Value level"
#variable. The ID variable gets stacked also
Datalong <- gather(Trimmed, Time, ValueLevel, ValsCompositeT1:ValsCompositeT3,
factor_key=TRUE)
Datalong
#change new categorical variables into categorical factors within the tall/stacked dataframe
Datalong <- within(Datalong, {
FUNCTION <- factor(FUNCTION)
Time <- factor(Time)
ID <- factor(ID)})
#check your new stacked dataframe
names(Datalong)
# you can look at the data, this need to be closed before running the ANOVA
fix(Datalong)
#install.packages("ez") to enable ezAnova command
#load this
library(ez)
#run the within and between anova from the stacked dataframe with DV of ValueLevel, ID as the
#stack identifier
#key , the "Time" variable as the time condition indicator here and "Function" as the between
#measure factor
ezANOVA(data = Datalong, dv=.(ValueLevel), wid=.(ID), within=.(Time),
between=.(FUNCTION), detailed=T, type=3)
# now create the interaction plot:
with(Datalong, interaction.plot(Time, FUNCTION, ValueLevel,
ylim = c(3.9, 4.4), lty= c(1, 12), lwd = 3,
ylab = "mean of Values", xlab = "Time", trace.label = "FUNCTION"))
################################

# Appendix R8 #
#R Code Matching Figure 3.14: running correlations

#read the datafile (here Appendix R8.txt) in working directory location


#give name (here DataForCorr)for the data that R is reading
DataForCorr<-read.table("Appendix R8.txt",header=T,sep="\t")
#obtain names in the file
names(DataForCorr)
#e.g. "ID" "Gender" "JobTenure2014" "JOBSAT2014" "PerformanceRating2014" "SickDays2014"
#attach the datafile for analyses
attach(DataForCorr)

#call for correlations of all variables (excluding cases where there are NAs)
cor(DataForCorr, use = "complete.obs")
#specify particular variables to correlate
cor(PerformanceRating2014, SickDays2014, use = "complete.obs")
#this doesn't give p values
#install and load Hmisc package
library(Hmisc)
rcorr(as.matrix(DataForCorr))
rcorr(PerformanceRating2014, SickDays2014, type="pearson")
################################

# Appendix R9 #
#R Code Matching Figure 3.18: running Exploratory Factor Analyses (EFA)

#read the datafile (here Appendix R9.txt) in working directory location


# and give name (here DataForEFA)for the data that R is reading
DataForEFA<-read.table("Appendix R9.txt",header=T,sep="\t")
#obtain names in the file
names(DataForEFA)
#e.g:"Gender" "jbstatus" "age" "tenure" "Eng1" "Eng2" "Eng3" "Eng4" "pos1" "pos2" "pos3"
#attach the datafile for analyses
attach(DataForEFA)
#install and load a number of packages
library(psych)
library(GPArotation)
library(mvtnorm)
library(nFactors)

#select a subset of variables for the EFA


FORefa <-subset(DataForEFA, select = c(Eng1, Eng2, Eng3, Eng4, pos1, pos2, pos3))
#restrict the dataset to cases where you don't have NAs
TrimmedData <- na.omit(FORefa)
#Run an EFA and get eigen values for factors observed in the data
eigv <- eigen(cor(TrimmedData)) # get eigenvalues
eigv
# note the number of eigen values that are greater than 1. IN the example later in the book
# we find two eigen's above 1 thus = optimal number of factors in this example is 2
#we can check this with another command
ap <- parallel(subject=nrow(TrimmedData),var=ncol(TrimmedData),
rep=100,cent=.05)
nS <- nScree(x=eigv$values, aparallel=ap$eigen$qevpea)
plotnScree(nS)
#this confirms the 2 factor solution as it produces a scree plot with 2 factors with eigen values > 1
#THUS we can then ask for the factor loadings for a 2 factor solution
Loadings2f <- principal(TrimmedData,nfactors=2,rotate="varimax")
Loadings2f
################################

# Appendix R10 #
#R Code Matching Figure 3.17: running a Multiple Linear (OLS) Regression

#read the datafile (here Appendix R10.txt) in working directory location and
#give name (here DataForOLSReg)for the data that R is reading
DataForOLSReg<-read.table("Appendix R10.txt",header=T,sep="\t")
#obtain names in the file
names(DataForOLSReg)
#e.g. pasted some names in dataset from R console "DepartmentGroupNumber" "GroupSize"
#"PercentMale" "BAME" "NumberTeamLeads" "NumberFeMaleTeamLeads"
#"Location" "LondonorNot" "Function" "EMPsurvEngagement"
attach(DataForOLSReg)
#Fig 3.17 of book so we call this model here “modelFig317”
modelFig317=lm(BAME ~ LondonorNot + Function + GroupSize + NumberFeMaleTeamLeads +
PercentMale)
modelFig317
anova(modelFig317)
summary(modelFig317)
coef(modelFig317)
#to get standadrised Beta Coef
#install and load QuantPsyc package
library(QuantPsyc)
lm.beta(modelFig317)

################################

# Appendix R11 #
#R Code Matching Figure 3.19: running reliability analyses

#read the datafile (here Appendix R11.txt) in working directory location and give
#name (here DataForReliable) for the data that R is reading
DataForReliable<-read.table("Appendix R11.txt",header=T,sep="\t")
#obtain names in the file
names(DataForReliable)
#e.g. "sex" "jbstatus" "age" "tenure" "ocb1" "ocb2" "ocb3" "ocb4"
#"Eng1" "Eng2" "Eng3" "Eng4" "pos1" "pos2" "pos3"
#attach the datafile for analyses
attach(DataForReliable)
#install and load the psych package
library(psych)
#select variables for reliability dataframe
dATAFORRel <-subset(DataForReliable, select = c(ocb1, ocb2, ocb3, ocb4))
#create new data frame and clear NAs
TrimmedData <- na.omit(dATAFORRel)
alpha(TrimmedData, keys=NULL,cumulative=FALSE, title=NULL, max=10,na.rm = TRUE,
check.keys=FALSE,n.iter=1,delete=TRUE,use="pairwise",warnings=TRUE,n.obs=NULL)
################################

# Appendix R12 #
#R Code for Kaplan-Meier survival analyses discussion in Chapter 3 – Figure 3.4

#read the datafile (here Appendix R12.dat) in working directory location and give
#name (here DataForSurvival) for the data that R is reading
DataForSurvival<-read.table("Appendix R12.dat",header=T,sep="\t")
#obtain names in the file
names(DataForSurvival) #E.G. "Gender" "LengthOfService" "LeaverStatus"
#attach the datafile for analyses
attach(DataForSurvival)
#create a data frame for the gender variable and label values
SEX <- factor(Gender,
levels = c(0,1),
labels = c("Female", "Male"))
#create a data frame for the LeaverStatus variable and label values
LeaverStatus <- factor(LeaverStatus,
levels = c(0,1),
labels = c("Stayer", "Leaver"))
#install survival package if not already installed #install.packages("survival")
library(survival)
#install survminer package if not already installed
#install.packages("survminer")
library(survminer)
#the following sets out the variables to use for the survival analyses leaver status on tenure overall
IndTO.survival <- with(DataForSurvival, Surv(LengthOfService,LeaverStatus))
#survival analyses of all data in dataset
kmall <- survfit(IndTO.survival~1,data=DataForSurvival)
#prints the survival table statistics
summary(kmall)
#produce event statistics and mean overall survival statistics
print(kmall, print.rmean=TRUE)
#Plot of the overall turnover/tenure data
plot(kmall, xlab="Job Tenure",
ylab="% Surviving", yscale=100,
main="Survival Distribution (Overall)")
#conduct survival analyses BY Gender
kmGEN <- survfit(IndTO.survival~Gender,data=DataForSurvival)
#prints the survival table statistics
summary(kmGEN)
#the following prints the number of events (leaver) and mean survival by gender
print(kmGEN, print.rmean=TRUE)
#Plot of the different survival curves across males V females
plot(kmGEN, xlab="Job Tenure",
ylab="% Surviving", yscale=100, col=c("blue","red"),
main="Survival Distributions by Gender")
legend("topright", title="Gender", legend=c("Female", "Male"),
fill=c("blue" , "red"))
#to get the significance of any differences in survival patterns across gender
survdiff(IndTO.survival~Gender, data=DataForSurvival)

You might also like