Problem Set 1: Introduction To R - Solutions With R Output: 1 Install Packages

Problem Set 1: Introduction to R - Solutions with R
Output
Financial Econometrics in R
Tim Maurer & Jack Melbourne & Chandler Lutz & Lisbeth La Cour
The aim of this assignment is to learn:

• how to execute commands in R
• how to get started download R, RStudio and open up RStudio.(If you haven’t used Rstudio before,
search Youtube for “rstudio tutorial for beginners”. There are also various other tutorials online.)
• how to do descriptive statistics
• how to write a simple function
• how to solve basic problems
1 Install Packages
Please run the code to install packages below before the first exercise class.
Packages are collections of R functions, data, and compiled code in a well-defined format. The directory
where packages are stored is called the library. R comes with a standard set of packages. Others are available
for download and installation. Once installed, they have to be loaded into the session to be used.
Copy paste and run the following code to install packages we will use throught the course (this can take a
few minutes):
#Clear the workspace
rm(list = ls())
#T0 install R packages
packages <- c("abind", "AER", "astsa", "car", "caret", "cowplot", "censReg",

"data.table", "dummies", "devtools", "dynlm", "effects",
"fBasics", "fGarch", "forecast", "foreign",
"ggplot2", "haven", "knitr",
"lfe", "lubridate", "lmtest",
"matrixcalc", "maps", "matrixStats", "mfx", "orcutt",
"pdfetch", "plm", "RcppArmadillo", "quantmod", "quantreg",
"readr", "readstata13", "roxygen2", "rmgarch", "rugarch",
"rmarkdown", "sampleSelection", "sandwich", "seasonal", "stargazer", "Synth",
"survival", "systemfit", "texreg", "tidyverse", "timeSeries", "tseries",
"truncreg", "TTR", "urca", "vars", "VARsignR",
"xtable", "xts", "zoo","dplyr","base")
install.packages(packages, dependencies = TRUE)
packages[!(packages %in% installed.packages()[, "Package"])]
1
2 Introduction
2.1 How to get started
We don’t expect you to have prior knowledge on how to use R and we will go through all code that is relevant
for the exam. But one nice feature of R is that there is a lot of free online guidance that is helping you to
get familiar with it. The one online introduction we recommend you to do is the Introduction to R on
Datacamp.
2.1.1 Comments
We can use comments in R with the pound (hashtag) sign (#)

# This is a comment
# This is another comment
# R ignores commented out lines
2.1.2 Basic Code
Here we’ll execute some basic addition. The pound signs will be the comments, then we’ll do the addition
and show the output from R. Hopefully, this is review for you.
2 + 3
3 + 4
2.1.3 Saving your program
All R programs end in “.r” or “.R”. It’s a good idea to always save your program in a place that’s easy to find.
We can also use an “Rmd” file to weave code and text. This document was created by an Rmd document for
example. Often times, it’s convenient to save the program in the same folder as your data. That’s what I’ll
do here. This makes importing and exporting data very convenient. In the editor window, write the name of
the program on the first line.
Now save the program. Go to File; Save As. Then label your program “lab1.R”, choose the appropriate
folder, and save. If you’re using an “Rmd” document, save your file as “lab1.Rmd”
2.1.4 Your Working directory
Your working directory is the base directory where R is going to look for files. To see your working directory,
use the getwd() function. Here’s my working directory for this file:
getwd()
To set your working directory to the location where you saved your file in Rstudio, go Session -> Set
Working Directory -> To Source File Location. If you want to set your working directory within your
program, use the setwd() function
2
2.2 Basics
2.2.1 Importing Data
Here we’ll use the caschool.csv dataset. We’re going to use the readr package which makes it very
convenient for reading data. For reading in excel files (we’ll primarily use csv files in this course) see the
readxl package. The caschool.csv file has information on California schools. The below output describes
the column types.
library(readr) #load the readr package
caschool <- read_csv("caschool.csv")
## Parsed with column specification:

## cols(
## `Observation Number` = col_double(),
## dist_cod = col_double(),
## county = col_character(),
## district = col_character(),
## gr_span = col_character(),
## enrl_tot = col_double(),
## teachers = col_double(),
## calw_pct = col_double(),
## meal_pct = col_double(),
## computer = col_double(),
## testscr = col_double(),
## comp_stu = col_double(),
## expn_stu = col_double(),
## str = col_double(),
## avginc = col_double(),
## el_pct = col_double(),
## read_scr = col_double(),
## math_scr = col_double()
## )
Let’s print the first few rows (However, simply writing the name of the dataset will allow you to print the
dataset).
print(caschool)
Get the names of the data. For now, don’t worry about what the variables mean.
names(caschool)
2.2.2 Get some summary statistics
summary(caschool)
2.2.3 Run a Regression
Here, we’ll use the lm() function to run a regression and look at a summary of the output. Note how the
<- operator assigns output to a variable. testscr is the student standardized test scores, meal_pct is the
percentage of students receiving a government subsidized lunch, and comp_stu is the number of computers
per student.
3
#Simple linear regression
model.1 <- lm(testscr ~ meal_pct, caschool)
summary(model.1)
#Multivariate regression
model.2 <- lm(testscr ~ meal_pct + comp_stu, data = caschool)
summary(model.2)
2.2.4 Draw a scatterplot
Here, we’ll use the ggplot2 package that is automatically loaded with the tidyverse package
library(tidyverse)
p <- ggplot(caschool, aes(x = meal_pct, y = testscr)) + #The basis for the plot
geom_point() + #Add points for a scatterplot
geom_smooth(method = lm) + #Add a regression line
labs(x = "Percentage of Students Receiving a Free Lunch",
y = "Test Score",
title = "California Schools")
print(p)
California Schools
690
Test Score
660
630
0 25 50 75 100
Percentage of Students Receiving a Free Lunch
The fun thing about R is that there are many different ways to get stuff done. An alternative way to produce
the plot without using the ggplot2 package is to use the plot function and add a fitted regression line to it.
plot(caschool$meal_pct, y = caschool$testscr);abline(lm(testscr~meal_pct, data = caschool), col = "red")
4
700
680
caschool$testscr
660
640
620
0 20 40 60 80 100
caschool$meal_pct
2.2.5 Getting Help
To get help for a specific function, use ?function. You can search for a more general topic using the
help.search() function. Google and StackOverflow are also helpful.
?mean
2.2.6 The pipe operator
The R pipe operator %>% automatically pipes the output of one expression to the next expression. This makes
chaining R expressions together easy and convenient. The best way to see this is through an example.
Let’s say we want to generate random numbers from the normal distribution, then take their mean, and print
the output. Here’s a reasonable way to do this:
x <- rnorm(n = 1000, mean = 0, sd = 2)
norm.mean <- mean(x)
print(norm.mean)
Let’s do the same thing with the pipe operator. Again, the pipe operator is going to “pipe” the output from
the last expression to the next expression. The . is going to hold the output from the previous expression.
The following prints the output, just as above
rnorm(n = 1000, mean = 0, sd = 2) %>%
mean(.) %>% #Take the mean
print(.) #Print
In fact, if the first argument of the function is the target of the pipe, we can omit the .
The following is again printing the output from the mean of randomly generated stochastic variables:
rnorm(n = 1000, mean = 0, sd = 2) %>%
mean %>% #Take the mean
print #Print
5
To see the power of pipes, let’s use dplyr::filter to regress testscr on meal_pct only for the county of
Los Angeles.
caschool %>%
filter(county == "Los Angeles") %>% #filter data
lm(testscr ~ meal_pct, data = .) %>% #run regression
summary %>% #take summary of regression output
print #print regress output
3 Basic Problems
Read chapters 1 and 2, AppliedEconometricswithR
(1) Create a vector with values that range from 1 to 10, get the length of the vector, square each element
using sapply(), vector multiply x’x and print the first and last elements.
#Solution
#Vector from 1 to 10 -- a few different ways
x <- 1:10
x <- seq(from=1,to=10,by=1)
x <- c(1,2,3,4,5,6,7,8,9,10)
#Length of x
length(x)
#Square each element using sapply
sapply(x,function(number) number^2)
#also can just use
x^2
#vector multiply x'x
t(x)%*%x
#print the first and last elements
x[1]
x[length(x)]
x[10] ## length(x) == 10
(2) Load the “women” dataset into R by typing data(women)

(a) View the data, look at the head() function, look at the tail using the tail() function, and view the
class of dataset.
#Solution
#Load in some preset data. The height and weight for
#15 women
data(women)
#view the data
women
#view the first six observations
head(women)
#view the last six observations
tail(women)
#view the class of the dataset
class(women)
(b) Select the first and second columns by the column name and the column number. Select first and last
rows by row number, and select the tenth row for the second column.
6
#Solution
#Select the first column in the "dataframe"
women[,1]
women[,"height"]
women$height
#Select the second column in the dataframe
women[,2]
women[,"weight"]
women$weight
#Select the first row
women[1,]
#Select the last row
women[15,]
women[nrow(women),]
#Note: that women[i,j] selects the ith row and the jth column
women[10,2]
(c) Summarize the data using the summary() function; get the mean and standard deviations for each
column using the apply() and sapply() functions. What do the apply() and sapply() functions do?
#Solution
#summarize the data
summary(women)
#get the mean of the data
apply(women,2,mean) #2 applies the function on columns.
sapply(women,mean)
lapply(women, mean) #Results will be a list
#the standard deviation

apply(women,2,sd)
sapply(women,sd)
lapply(women, sd)
(d) Run a regression of height on weight using the lm() function, compare the results to those that use
white standard errors using the coeftest() function, and plot the data.
#Solution
#Run a linear regression of height on weight
women.model <- lm(height ~ weight,data=women)
#Look at a summary of the output
summary(women.model)
##White standard errors
require(lmtest);require(sandwich)
coeftest(women.model,vcov=sandwich)
# vcov = sandwich means that the variance / covariance matrix uses White
# standard errors (robustness checks)
#plot the data

plot(women)
pairs(women)
(3) Write your first function in R, that is able to calculate the BMI,
lbs
BM I = ∗ 703
in2
Now add a column to the women dataframe with the BMI
7
#Solution
#BMI function
bmi.func <- function(inches,lbs) {
bmi <- lbs/(inches^2)*703
return(bmi)
}
women$bmi <- bmi.func(women$height,women$weight)
women
(4) In this exercise we will replicate the output of the lm()-function. So you have to think back to the
classes on OLS you had with Ralf last semester. Now clear the workspace (for a new question). Then
set the seed using the set.seed().set n = 100 for the number of random draws that we’ll use and
generate two random draws from the normal distribution with n data points, call them x1 and x2.
(set.seed() makes sure that you the same random draws everytime you run the code)
(a) Regress x1 on x2 and view a summary of the regression output
#Solution
#clear the workspace
rm(list=ls())
#Set the seed
set.seed(123456)
#the number of observations
n <- 100
#Two random series from the normal distribution
x1 <- rnorm(n)
x2 <- rnorm(n)
#(a) the model
model <- lm(x1 ~ x2)
#The output we will try and replicate
summary(model)
(b) Assign x1 to Y (dependent variable) and a constant and x2 to X (independent variables). Program the
OLS regression estimator using matrix notation, get the residuals, and compare them to the output
from R’s built in functions
#Solution
#The dependent variable
Y <- x1
#The matrix of independent variables. Notice how we include the constant
X <- matrix(c(rep(1,n),x2),nrow=n,ncol=2)
#The regression estimator
bhat <- solve(t(X)%*%X)%*%t(X)%*%Y # (X'X)^(-1)X'Y
bhat #same as output from R!
#Get the residuals
residuals <- Y - X%*%bhat
cor(residuals,model$residuals) #equals 1 same as R!
(c) Get the standard errors of the residuals, the var-cov matrix and the standard errors for the regression
estimators and compare them to R’s built in functions.
#Solution
#Get the variance of the residuals --> note that we have two
#independent variables
resid.var <- as.numeric(t(residuals)%*%residuals)*(1/(n-2))
#Get the standard error of the residuals --> in regression analysis,
#this is just the square root of the variance of the residuals
8
se.resid <- sqrt(resid.var)
se.resid #Same as R!
#Get the variance-cov matrix
varCov <- solve(t(X)%*%X)*resid.var
varCov
#Compare this to the var-cov matrix from R
vcov(model)
#They are the same!
#Get the standard errors using our varCov matrix
se <- sqrt(diag(varCov))
se
#compare these to the standard errors from R
sqrt(diag(vcov(model))) #Same!!
#or
summary(model)
(d) Get the t-statistics, p-values for the t-statistics, F-statistic, and the p-value for the F-statistic.
#Solution
#Now get the t-stats. Note: our null is that the regression estimators
#are equal to zero. This will be a two-sided test
tstat <- bhat/se
tstat
#Compare these to R
summary(model) #Same!
#p-values. this is a two sided test
#First get the degrees of freedom. df = n-2 since
#we one independent variable plus a constant
df <- n-2
#Use the cdf of the t distribution
pvals <- 2*(1 - pt(abs(tstat),df))
pvals #same as R!
#The F-statistic
#RSS under the null
RSS_0 <- as.numeric(t((Y-mean(Y)))%*%(Y-mean(Y)))
#RSS under the alternative
RSS_1 <- as.numeric(t(residuals)%*%residuals)
#The numerator of the F-stat
Fnum <- (RSS_0 - RSS_1)/(2-1)
#The denominator of the F-stat
Fdenom <- (RSS_1)/(n-2)
#The F-stat
F <- Fnum/Fdenom
F #Same as R!
#The p-value for the f-stat
dfnum <- 2-1 #the degrees of freedom for the numerator
dfdenom <- n-2 #the df for the denominator
Fpval <- 1 - pf(F,dfnum,dfdenom)
Fpval #Same as R
(e) Get the R2 and AdjR2 statistics.

#Solution
#The R^2
#We already defined the Residual sum of squares above as RSS_1
9
#Define the total sum of squares
TSS <- as.numeric(t((Y-mean(Y)))%*%(Y-mean(Y)))
R2 <- 1 - RSS_1/TSS
R2 #same as R!
#The adjusted R^2 -- For the adjusted r-squared we
#divide the numerator and the denominator by the degrees
#of freedom.
R2_adj <- 1 - ((1/(n-2))*RSS_1)/((1/(n-1))*TSS)
R2_adj# Same as R!
(5) Use the quantmod package to download the stock prices for Apple and Microsoft from Yahoo finance.
Also download the S&P500. Use the adjusted close and convert the prices into returns. Report the
alpha and beta for these stocks from 2010-01-01 to 2014-12-31. Use a for loop. The 1-month treasury
can be downloaded from the FRED database.
#Solution
#Load the quantmod package
require(quantmod);
stocks <- c("AAPL","IBM","MSFT","SPY")
for (i in 1:length(stocks)) {
getSymbols(stocks[i],from="2010-01-01",to="2014-12-31")
}
#Look at the results for apple
head(AAPL)
#Merge all of the data and covert to returns
all.data <- merge(AAPL[,6],IBM[,6],MSFT[,6],SPY[,6])
# We can also do : all.data <- cbind(AAPL[,6],IBM[,6],MSFT[,6],SPY[,6])
#Update the table names
names(all.data) <- c("AAPL","IBM","MSFT","SPY")
class(all.data) ##xts object (great for time series)
head(all.data)
#Convert to returns
all.data <- na.omit(diff(log(all.data))*100)
#remove the "SPY" from our list of stocks
stocks <- stocks[1:3]
#Now get the risk free rate -- 1 month treasury from the
#fred
getSymbols("DGS1MO",src="FRED")
head(DGS1MO)
#The interest rates are in annual terms get in daily terms
#(assume 252 trading days a year)
#using the following formula
# r = (1 + i)^{1/n} - 1
#where i is the annual interest rate
DGS1MO <- (1 + DGS1MO)^(1/252) - 1
#merge with the stock return data
all.data <- na.omit(merge(all.data,DGS1MO))
head(all.data)
#The alpha and beta from the summary function
for (i in 1:length(stocks)) {
print(stocks[i])
print(summary(lm((all.data[,i] - all.data$DGS1MO) ~
(all.data$SPY - all.data$DGS1MO))))
}
10
4 Time Series
In this part and later in the course we are using a data set offered by Professor Jesper Rangvid (CBS -
Department of Finance).
Find more information on Jesper Rangvid’s work with the data set using the link: Rangvid’s Research
(1)
(a) Import the data set “Rangvid_to_R.csv” and have a look at it. “Rangvid.xlsx”.
#Solution
rm(list=ls())
library(readr) #load the readr package
ts.data <- read_csv("Rangvid_to_R.csv")
## Parsed with column specification:

## cols(
## .default = col_double()
## )
## See spec(...) for full column specifications.
head(ts.data)
## # A tibble: 6 x 24
## Year P D E R RLONG CPI RealR Inflation
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1871 4.44 0.26 0.4 6.35 5.32 12.5 1.05 0.0152
## 2 1872 4.86 0.3 0.43 7.81 5.36 12.7 1.05 0.0223
## 3 1873 5.11 0.33 0.46 8.35 5.58 12.9 1.13 -0.0451
## 4 1874 4.66 0.33 0.46 6.86 5.47 12.4 1.15 -0.0717
## 5 1875 4.54 0.3 0.36 4.96 5.07 11.5 1.11 -0.0596
## 6 1876 4.46 0.3 0.28 5.33 4.59 10.8 1.04 0.00874
## # ... with 15 more variables: average_Inflation_12 <dbl>, `Real
## # RLONG` <dbl>, RealP <dbl>, RealD <dbl>, Return <dbl>,
## # `ln(1+ret)` <dbl>, RealE <dbl>, P_D <dbl>, P_E <dbl>, E10 <dbl>,
## # P_E10 <dbl>, ln_1_CAPE <dbl>, `LN(E_P)` <dbl>, `LN(D_P)` <dbl>,
## # `10y_return` <dbl>
(b) To see what the different variables mean please open the excel file on learn that is called “Rangvid.xlsx”.
Looking at the first 6 variables you will find out that numbers are imported from a data base but all
the other variabels generated by formulas. Click into the cells of the different variables to what formula
Jesper used. Try to generate the following variables yourself.
Inflation: Inf lation
Price-Dividend ratio: P/D
#Solution
# add inflation to dataframe
library(dplyr); library(base); library(fBasics); library(xts);library(quantmod)
## Loading required package: timeDate

## Loading required package: timeSeries
## Loading required package: zoo
##
## Attaching package: 'zoo'
11
## The following object is masked from 'package:timeSeries':
##
## time<-
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
##
## first, last
## Loading required package: TTR
##
## Attaching package: 'TTR'
## The following object is masked from 'package:fBasics':
##
## volatility
## Version 0.4-0 included new data defaults. See ?getSymbols.
ts.data$Inflation_replicated <- log(lead(ts.data$CPI,1)/ts.data$CPI)
ts.data$PD_replicated <- ts.data$RealP/Lag(ts.data$RealD)
(c) Calculate the mean, standard deviation, median, min, max all variables in the data set.
#Solution
summary(ts.data)
## Year P D E
## Min. :1871 Min. : 3.25 Min. : 0.180 Min. : 0.1600
## 1st Qu.:1908 1st Qu.: 7.39 1st Qu.: 0.400 1st Qu.: 0.5225
## Median :1944 Median : 15.98 Median : 0.820 Median : 1.3300
## Mean :1944 Mean : 248.63 Mean : 5.142 Mean : 12.4887
## 3rd Qu.:1980 3rd Qu.: 117.85 3rd Qu.: 5.650 3rd Qu.: 13.6825
## Max. :2017 Max. :2580.00 Max. :43.390 Max. :102.3100
## NA's :2 NA's :1
## R RLONG CPI RealR
## Min. : 0.120 Min. : 1.880 Min. : 6.47 Min. :0.8529
## 1st Qu.: 2.955 1st Qu.: 3.305 1st Qu.: 10.14 1st Qu.:0.9927
## Median : 4.405 Median : 3.860 Median : 18.20 Median :1.0201
## Mean : 4.561 Mean : 4.554 Mean : 56.40 Mean :1.0265
## 3rd Qu.: 5.695 3rd Qu.: 5.125 3rd Qu.: 82.40 3rd Qu.:1.0497
## Max. :17.630 Max. :14.590 Max. :242.84 Max. :1.2510
## NA's :1 NA's :2
## Inflation average_Inflation_12 Real RLONG
## Min. :-0.17022 Min. :-0.03499 Min. :-0.033482
## 1st Qu.: 0.00000 1st Qu.: 0.01329 1st Qu.: 0.009039
## Median : 0.02186 Median : 0.02600 Median : 0.022423
## Mean : 0.02029 Mean : 0.02554 Mean : 0.022130
## 3rd Qu.: 0.03952 3rd Qu.: 0.04461 3rd Qu.: 0.037427
## Max. : 0.18805 Max. : 0.07618 Max. : 0.068269
## NA's :1 NA's :22 NA's :10
## RealP RealD Return ln(1+ret)
12
## Min. : 76.87 Min. : 4.743 Min. :-0.36543 Min. :-0.45481
## 1st Qu.: 158.50 1st Qu.: 8.069 1st Qu.:-0.03489 1st Qu.:-0.03552
## Median : 237.97 Median :12.065 Median : 0.08472 Median : 0.08132
## Mean : 475.79 Mean :14.009 Mean : 0.08188 Mean : 0.06487
## 3rd Qu.: 564.79 3rd Qu.:17.702 3rd Qu.: 0.20996 3rd Qu.: 0.19059
## Max. :2517.07 Max. :43.390 Max. : 0.51445 Max. : 0.41505
## NA's :2 NA's :2 NA's :2
## RealE P_D P_E E10
## Min. : 4.065 Min. :10.12 Min. : 5.633 Min. : 8.547
## 1st Qu.: 12.217 1st Qu.:18.57 1st Qu.:11.860 1st Qu.:13.848
## Median : 19.583 Median :23.44 Median :14.731 Median :17.641
## Mean : 28.052 Mean :27.79 Mean :15.678 Mean :26.679
## 3rd Qu.: 36.164 3rd Qu.:30.63 3rd Qu.:18.123 3rd Qu.:36.764
## Max. :103.715 Max. :85.42 Max. :58.171 Max. :79.690
## NA's :1 NA's :2 NA's :2 NA's :10
## P_E10 ln_1_CAPE LN(E_P) LN(D_P)
## Min. : 5.288 Min. :0.02324 Min. :0.01704 Min. :0.01164
## 1st Qu.:11.737 1st Qu.:0.04816 1st Qu.:0.05371 1st Qu.:0.03213
## Median :16.185 Median :0.05995 Median :0.06568 Median :0.04177
## Mean :16.782 Mean :0.06687 Mean :0.07040 Mean :0.04240
## 3rd Qu.:20.266 3rd Qu.:0.08177 3rd Qu.:0.08095 3rd Qu.:0.05246
## Max. :42.537 Max. :0.17319 Max. :0.16342 Max. :0.09421
## NA's :10 NA's :10 NA's :2 NA's :2
## 10y_return Inflation_replicated PD_replicated.Lag.1
## Min. :-0.04438 Min. :-0.17022 Min. :10.12195
## 1st Qu.: 0.03542 1st Qu.: 0.00000 1st Qu.:18.56738
## Median : 0.06348 Median : 0.02226 Median :23.44444
## Mean : 0.06273 Mean : 0.02034 Mean :27.79440
## 3rd Qu.: 0.09731 3rd Qu.: 0.03952 3rd Qu.:30.62564
## Max. : 0.16436 Max. : 0.18805 Max. :85.41582
## NA's :12 NA's :1 NA's :2
basicStats(ts.data)
## Year P D E R
## nobs 147.000000 1.470000e+02 147.000000 147.000000 147.000000
## NAs 0.000000 0.000000e+00 2.000000 1.000000 1.000000
## Minimum 1871.000000 3.250000e+00 0.180000 0.160000 0.120000
## Maximum 2017.000000 2.580000e+03 43.390000 102.310000 17.630000
## 1. Quartile 1907.500000 7.390000e+00 0.400000 0.522500 2.955000
## 3. Quartile 1980.500000 1.178500e+02 5.650000 13.682500 5.695000
## Mean 1944.000000 2.486320e+02 5.142276 12.488699 4.561336
## Median 1944.000000 1.598000e+01 0.820000 1.330000 4.405000
## Sum 285768.000000 3.654891e+04 745.630000 1823.350000 665.955000
## SE Mean 3.511885 4.127184e+01 0.711612 1.922375 0.237117
## LCL Mean 1937.059302 1.670646e+02 3.735721 8.689202 4.092684
## UCL Mean 1950.940698 3.301995e+02 6.548830 16.288196 5.029987
## Variance 1813.000000 2.503947e+05 73.426770 539.546916 8.208744
## Stdev 42.579338 5.003945e+02 8.568942 23.228149 2.865090
## Skewness 0.000000 2.414899e+00 2.252251 2.381911 1.033260
## Kurtosis -1.224516 5.383880e+00 4.910606 4.895781 2.735285
## RLONG CPI RealR Inflation
## nobs 147.000000 147.000000 147.000000 147.000000
## NAs 0.000000 0.000000 2.000000 1.000000
## Minimum 1.880000 6.469903 0.852945 -0.170224
13
## Maximum 14.590000 242.839000 1.250995 0.188055
## 1. Quartile 3.305000 10.140290 0.992715 0.000000
## 3. Quartile 5.125000 82.400000 1.049694 0.039523
## Mean 4.554014 56.397438 1.026502 0.020292
## Median 3.860000 18.200000 1.020051 0.021865
## Sum 669.440000 8290.423402 148.842798 2.962649
## SE Mean 0.185907 5.724091 0.005391 0.004817
## LCL Mean 4.186597 45.084657 1.015847 0.010771
## UCL Mean 4.921431 67.710220 1.037157 0.029813
## Variance 5.080543 4816.486356 0.004213 0.003388
## Stdev 2.254006 69.400910 0.064910 0.058208
## Skewness 1.798725 1.398465 0.504977 -0.046549
## Kurtosis 3.641912 0.511163 1.781835 1.567576
## average_Inflation_12 Real.RLONG RealP RealD
## nobs 147.000000 147.000000 1.470000e+02 147.000000
## NAs 22.000000 10.000000 0.000000e+00 2.000000
## Minimum -0.034988 -0.033482 7.686637e+01 4.742906
## Maximum 0.076181 0.068269 2.517072e+03 43.390000
## 1. Quartile 0.013292 0.009039 1.585010e+02 8.068878
## 3. Quartile 0.044606 0.037427 5.647888e+02 17.702364
## Mean 0.025544 0.022130 4.757868e+02 14.008674
## Median 0.026002 0.022423 2.379662e+02 12.065169
## Sum 3.193027 3.031841 6.994066e+04 2031.257673
## SE Mean 0.002385 0.002104 4.224384e+01 0.611852
## LCL Mean 0.020825 0.017969 3.922984e+02 12.799302
## UCL Mean 0.030264 0.026291 5.592752e+02 15.218045
## Variance 0.000711 0.000607 2.623277e+05 54.282605
## Stdev 0.026660 0.024628 5.121793e+02 7.367673
## Skewness -0.309209 -0.165027 1.908424e+00 1.237537
## Kurtosis -0.406066 -0.553745 2.940391e+00 1.788320
## Return ln.1.ret. RealE P_D P_E
## nobs 147.000000 147.000000 147.000000 147.000000 147.000000
## NAs 2.000000 2.000000 1.000000 2.000000 2.000000
## Minimum -0.365431 -0.454809 4.065422 10.121951 5.632812
## Maximum 0.514446 0.415050 103.714805 85.415818 58.170699
## 1. Quartile -0.034893 -0.035517 12.216946 18.567376 11.859587
## 3. Quartile 0.209964 0.190591 36.163679 30.625641 18.122807
## Mean 0.081876 0.064875 28.052208 27.794395 15.678130
## Median 0.084718 0.081320 19.583138 23.444444 14.731343
## Sum 11.872086 9.406832 4095.622415 4030.187287 2273.328860
## SE Mean 0.014605 0.014089 1.850316 1.203044 0.559712
## LCL Mean 0.053008 0.037026 24.395133 25.416487 14.571818
## UCL Mean 0.110745 0.092723 31.709284 30.172303 16.784443
## Variance 0.030930 0.028784 499.855915 209.860816 45.425215
## Stdev 0.175870 0.169658 22.357458 14.486574 6.739823
## Skewness -0.091411 -0.599082 1.580540 1.817487 2.595212
## Kurtosis -0.075190 0.441570 2.040474 3.293026 12.049427
## E10 P_E10 ln_1_CAPE LN.E_P. LN.D_P.
## nobs 147.000000 147.000000 147.000000 147.000000 147.000000
## NAs 10.000000 10.000000 10.000000 2.000000 2.000000
## Minimum 8.547449 5.288354 0.023237 0.017045 0.011639
## Maximum 79.690087 42.537002 0.173192 0.163420 0.094214
## 1. Quartile 13.848084 11.736667 0.048165 0.053711 0.032131
## 3. Quartile 36.764130 20.265948 0.081767 0.080953 0.052458
14
## Mean 26.679018 16.782367 0.066872 0.070396 0.042403
## Median 17.640883 16.184741 0.059953 0.065678 0.041769
## Sum 3655.025484 2299.184261 9.161417 10.207375 6.148405
## SE Mean 1.520858 0.564689 0.002308 0.002090 0.001342
## LCL Mean 23.671429 15.665661 0.062306 0.066265 0.039749
## UCL Mean 29.686608 17.899073 0.071437 0.074526 0.045056
## Variance 316.882292 43.685656 0.000730 0.000633 0.000261
## Stdev 17.801188 6.609513 0.027020 0.025165 0.016165
## Skewness 1.193412 1.044700 1.286196 0.954997 0.390897
## Kurtosis 0.738935 1.866967 1.954151 1.173184 0.197390
## X10y_return Inflation_replicated PD_replicated
## nobs 147.000000 147.000000 147.000000
## NAs 12.000000 1.000000 2.000000
## Minimum -0.044383 -0.170224 10.121951
## Maximum 0.164355 0.188055 85.415818
## 1. Quartile 0.035418 0.000000 18.567376
## 3. Quartile 0.097314 0.039523 30.625641
## Mean 0.062735 0.020339 27.794395
## Median 0.063484 0.022264 23.444444
## Sum 8.469211 2.969549 4030.187287
## SE Mean 0.004117 0.004817 1.203044
## LCL Mean 0.054592 0.010818 25.416487
## UCL Mean 0.070877 0.029861 30.172303
## Variance 0.002288 0.003388 209.860816
## Stdev 0.047834 0.058209 14.486574
## Skewness -0.202640 -0.048963 1.817487
## Kurtosis -0.521449 1.567493 3.293026
(d) Show the graphs of the following variables:
Price-Earnings-Ratio 10year earnings: P_E
Price-Dividend-Ratio: P_D
Dividend-Price-Ratio: 1/(P_D)
Log inverse of cyclically adjusted price 10year earnings: ln_1_CAPE
S&P500 price: P
S&P500 return: Return
Long interest rate (10 year): RLONG
Finally plot the return and the long interest rate against time simultaneously
#Solution
class(ts.data$Year)
## [1] "numeric"
#The "Year" variable is an integer. To be compatible with the xts format,
#it needs to be in the "Date" format.
ts.data$Date <- as.Date(as.yearmon(ts.data$Year))
#Convert into xts format
xts.data <- xts(ts.data[,-c(1,ncol(ts.data))],ts.data$Date)
#Plot Return against time
plot(xts.data$P_E)
15
xts.data$P_E 1871−01−01 / 2017−01−01
50 50
40 40
30 30
20 20
10 10
Jan Jan Jan Jan Jan Jan Jan Jan

1871 1890 1910 1930 1950 1970 1990 2010
plot(xts.data$P_D)
xts.data$P_D 1871−01−01 / 2017−01−01
80 80
60 60
40 40
20 20

1871 1890 1910 1930 1950 1970 1990 2010
16
plot(1/xts.data$P_D)
1/xts.data$P_D 1871−01−01 / 2017−01−01
0.08 0.08
0.06 0.06
0.04 0.04
0.02 0.02

1871 1890 1910 1930 1950 1970 1990 2010
plot(xts.data$ln_1_CAPE)
xts.data$ln_1_CAPE 1871−01−01 / 2017−01−01
0.15 0.15
0.10 0.10
0.05 0.05

1871 1890 1910 1930 1950 1970 1990 2010
17
plot(xts.data$P)
xts.data$P 1871−01−01 / 2017−01−01
2500 2500
2000 2000
1500 1500
1000 1000
500 500

1871 1890 1910 1930 1950 1970 1990 2010
plot(xts.data$Return)
xts.data$Return 1871−01−01 / 2017−01−01
0.4 0.4
0.2 0.2
0.0 0.0
−0.2 −0.2

1871 1890 1910 1930 1950 1970 1990 2010
18
#Plot the return, short and long rate against time simultaneously
plot(cbind(xts.data$R,xts.data$RLONG),plot.type="single",col=c("red","blue"))
cbind(xts.data$R, xts.data$RLONG) 1871−01−01 / 2017−01−01
15 15
10 10
5 5

1871 1890 1910 1930 1950 1970 1990 2010
(e) Repeat (d) for two variables of your choice but now for the sample periods 1891-2016 & 1946-2016. We
chose Return and Price
plot(xts.data$Return["1891-01-01/2016-01-01",])
19
xts.data$Return["1891−01−01/2016−01−01",1891−01−01
] / 2016−01−01
0.4 0.4
0.2 0.2
0.0 0.0
−0.2 −0.2
Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan
1891 1905 1915 1925 1935 1945 1955 1965 1975 1985 1995 2005 2015
plot(xts.data$P["1891-01-01/2016-01-01",])
xts.data$P["1891−01−01/2016−01−01", ] 1891−01−01 / 2016−01−01
2000 2000
1500 1500
1000 1000
500 500
Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan
1891 1905 1915 1925 1935 1945 1955 1965 1975 1985 1995 2005 2015
20
plot(xts.data$Return["1946-01-01/2016-01-01",])
xts.data$Return["1946−01−01/2016−01−01",1946−01−01
] / 2016−01−01
0.4 0.4
0.2 0.2
0.0 0.0
−0.2 −0.2

1946 1955 1965 1975 1985 1995 2005 2015
plot(xts.data$P["1946-01-01/2016-01-01",])
xts.data$P["1946−01−01/2016−01−01", ] 1946−01−01 / 2016−01−01
2000 2000
1500 1500
1000 1000
500 500

1946 1955 1965 1975 1985 1995 2005 2015
21
plot(cbind(xts.data$R["1946-01-01/2016-01-01",],xts.data$RLONG["1946-01-01/2016-01-01",]),
plot.type="single",col=c("red","blue"))
cbind(xts.data$R["1946−01−01/2016−01−01",
]) ], xts.data$RLONG["1946−01−01/2016−01−01"
1946−01−01 / 2016−01−01
15 15
10 10
5 5

1946 1955 1965 1975 1985 1995 2005 2015
(f) Create the first differences of stock price, earnings and dividend and plot them. What do you think are
important features of these graphs?
#Calculate the first difference for the Price, Earnings and Dividend variables
xts.data$d.P <- diff(xts.data$P)
xts.data$d.E<- diff(xts.data$E)
xts.data$d.D <- diff(xts.data$D)
#Plot the first difference for the Price, Earnings and Dividend variables
plot(xts.data$d.P)
22
xts.data$d.P 1871−01−01 / 2017−01−01
200 200
0 0
−200 −200
−400 −400

1871 1890 1910 1930 1950 1970 1990 2010
plot(xts.data$d.E)
xts.data$d.E 1871−01−01 / 2017−01−01
20 20
0 0
−20 −20
−40 −40

1871 1890 1910 1930 1950 1970 1990 2010
23
plot(xts.data$d.D)
xts.data$d.D 1871−01−01 / 2017−01−01
4 4
2 2
0 0
−2 −2
−4 −4

1871 1890 1910 1930 1950 1970 1990 2010
# We can see that over time the times series get more volatile.
#In 2008 when the finanial crisis started we can see a sharp drop
24

Problem Set 1: Introduction To R - Solutions With R Output: 1 Install Packages

Uploaded by

Copyright:

Available Formats

Problem Set 1: Introduction To R - Solutions With R Output: 1 Install Packages

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Problem Set 1: Introduction To R - Solutions With R Output: 1 Install Packages

Uploaded by

Copyright:

Available Formats

Problem Set 1: Introduction to R - Solutions with R

The aim of this assignment is to learn:

#T0 install R packages

packages <- c("abind", "AER", "astsa", "car", "caret", "cowplot", "censReg",

install.packages(packages, dependencies = TRUE)

packages[!(packages %in% installed.packages()[, "Package"])]

2.1 How to get started

We can use comments in R with the pound (hashtag) sign (#)

2.1.2 Basic Code

2.1.3 Saving your program

2.1.4 Your Working directory

2.2.1 Importing Data

## Parsed with column specification:

2.2.2 Get some summary statistics

2.2.3 Run a Regression

2.2.4 Draw a scatterplot

2.2.5 Getting Help

2.2.6 The pipe operator

(2) Load the “women” dataset into R by typing data(women)

#the standard deviation

#plot the data

(e) Get the R2 and AdjR2 statistics.

## Parsed with column specification:

## Loading required package: timeDate

Jan Jan Jan Jan Jan Jan Jan Jan

xts.data$P_D 1871−01−01 / 2017−01−01

Jan Jan Jan Jan Jan Jan Jan Jan

1/xts.data$P_D 1871−01−01 / 2017−01−01

Jan Jan Jan Jan Jan Jan Jan Jan

xts.data$ln_1_CAPE 1871−01−01 / 2017−01−01

Jan Jan Jan Jan Jan Jan Jan Jan

xts.data$P 1871−01−01 / 2017−01−01

Jan Jan Jan Jan Jan Jan Jan Jan

xts.data$Return 1871−01−01 / 2017−01−01

Jan Jan Jan Jan Jan Jan Jan Jan

cbind(xts.data$R, xts.data$RLONG) 1871−01−01 / 2017−01−01

Jan Jan Jan Jan Jan Jan Jan Jan

xts.data$P["1891−01−01/2016−01−01", ] 1891−01−01 / 2016−01−01

Jan Jan Jan Jan Jan Jan Jan Jan

xts.data$P["1946−01−01/2016−01−01", ] 1946−01−01 / 2016−01−01

Jan Jan Jan Jan Jan Jan Jan Jan

Jan Jan Jan Jan Jan Jan Jan Jan

Jan Jan Jan Jan Jan Jan Jan Jan

xts.data$d.E 1871−01−01 / 2017−01−01

Jan Jan Jan Jan Jan Jan Jan Jan

xts.data$d.D 1871−01−01 / 2017−01−01

Jan Jan Jan Jan Jan Jan Jan Jan

You might also like