Basic R Easy
Basic R Easy
Basic R Easy
Basic R
Brigid Wilson brigid@stat.ucla.edu
April 5, 2010
Outline
Outline
I. Preliminaries II. Variable Assignment III. Working with Vectors IV. Working with Matrices V. From Vectors to Matrices VI. More on Handling Missing Data VII. The Help System VIII. Datasets in R IX. Overview of Plots X. R Environment XI. Common Bugs and Fixes XII. Online Resources for R XIII. Exercises XIV. Upcoming Mini-Courses
Brigid Wilson brigid@stat.ucla.edu Basic R UCLA SCC
Preliminaries
Part I Preliminaries
Installing R on Mac
Go to
http://cran.r-project.org
and select MacOS X. Select to download the latest version: 2.10.1 Install and Open. The R window should look like this:
Installing R on Windows
1
Go to
http://cran.r-project.org
and select Windows. Select base to install the R system. Click on the large download link. There is other information available on the page. Install and Open. The R window should look like this:
UCLA SCC
Variable Assignment
Creating Variables I
To use R as a calculator, type an equation and hit ENTER. (Note how R prints the result.) Your output should look like this:
1 2
2 + 5 [1] 7
Creating Variables II
To create variables in R, use either <- or =:
1 2 3 4 5 6 7 8
# Assign a value to a a <- -2 # Is a less than -5? a <-5 a [1] 5 # Expected FALSE
Creating Variables IV
Use spaces so that R will not be confused. It is better to use parentheses instead.
1 2 3
Creating Variables V
Caution! It is important not to name your variables after existing variables or functions. For example, a bad habit is to name your data frames data. data is a function used to load some datasets. If you give a variable the same name as an existing constant, that constant is overwritten with the value of the variable. So, it is possible to dene a new value for .
Creating Variables VI
Caution! On the other hand, if you give a variable the same name as an existing function, R will treat the identier as a variable if used as a variable, and will treat it as a function when it is used as a function: c <- 2 #typing c yields "2" c(c,c) #yields a vector containing two 2s.
Caution! As we have seen, you can get away with using the same name for a variable as with an existing function, but you will be in trouble if you give a name to a function and a function with that name already exists.
Creating Vectors I
Scalars are the most basic vectors. To create vectors of length greater than one, use the concatenation function c():
1 2
d = c (3 ,4 ,7) ; d [1] 3 4 7
The More You Know... The semicolon ; is used to combine multiple statements on one line.
Creating Vectors II
To create a null vector:
1 2
x = c () ; x NULL
f = rep ( NA , 6) ; f [1] NA NA NA NA NA NA
length ( d ) [1] 3
To nd the maximum value of the vector, use the maximum function max():
1 2
max ( d ) [1] 7
min ( d ) [1] 3
Caution! Although T and F work in place of TRUE and FALSE, it is not recommended.
unique ( g ) [1] 2 6 7 4 5 9 3
duplicated ( g ) [1] FALSE FALSE FALSE FALSE FALSE [7] FALSE FALSE TRUE TRUE TRUE
TRUE
sum ( is . na ( a ) ) [1] 1
summary ( a )
Median 2.50
Max. 6.00
There are many, many other functions you can use on vectors!
Comparisons in R
Meaning logical NOT logical AND logical OR less than less than or equal to greater than greater than or equal to logical equals not equal
d [2] [1] 4
d [ d >= 2] [1] 3 5 7 9
R will return values of d where the expression within brackets is TRUE. Think of these statements as: give me all d such that d 2.
Exercise 1
Create a vector of the positive odd integers less than 100 Remove the values greater than 60 and less than 80 Find the variance of the remaining set of values
Creating Matrices I
mat <- matrix (10:15 , nrow =3 , ncol =2) ; mat [ ,1] [ ,2] [1 ,] 10 13 [2 ,] 11 14 [3 ,] 12 15
Alternatively, we can nd the rows and columns of the matrix, by nrow() and ncol().
mat % * % t ( mat ) [ ,1] [ ,2] [ ,3] [1 ,] 269 292 315 [2 ,] 292 317 342 [3 ,] 315 342 369
mat [1 ,] [1] 10 13
mat [ , 2] [1] 13 14 15
To extract elements 1 and 3 from the second column, use c() and [ ]:
1 2
2 3 4 5
sum ( is . na ( h ) ) [1] 2
To obtain the element number of the matrix of the missing value(s), use which() and is.na():
1 2
which ( is . na ( h ) ) [1] 1 6
na . omit ( h ) [ ,1] [ ,2] [1 ,] 1 7 attr ( , " na . action " ) [1] 1 3 attr ( , " class " ) [1] " omit "
Exercise 2
Getting Help in R
? read . table
Datasets in R
stock . data <- read . table ( " http : / / www . google . com / finance / historical ? q = NASDAQ : AAPL & output = csv " , header = TRUE , sep = " ," )
getwd ()
Tell R in what folder the data set is stored (if dierent from (1)). Suppose your data set is on your desktop:
1
Now use the read.table() command to read in the data, substituting the name of the le for the website.
library ( alr3 )
Extract the data set you want from that package, using the data() function. In our case, the data set is called UN2.
1
data ( UN2 )
To use the variable names when working with data, use attach():
1 2
names ( UN2 )
? UN2
# Make a copy of the data set UN2 . copy <- UN2 detach ( UN2 ) attach ( UN2 . copy ) # Change the 10 th observation for logFertility UN2 . copy [10 , 2] <- 999
# Check that the change has been made summary ( UN2 ) summary ( UN2 . copy )
Purban 55.538860
cor ( UN2 [ ,1:2]) logPPgdp logFertility logPPgdp 1.000000 -0.677604 logFertility -0.677604 1.000000
Exercise 3
Load the Animals dataset from the MASS package Examine the documentation for this dataset Find the correlation coecient of brain weight and body weight in this dataset
Overview of Plots in R
Basic scatterplot I
plot ( x = UN2 $ logPPgdp , y = UN2 $ logFertility , main = " Fertility vs . PerCapita GDP " , xlab = " log PerCapita GDP , in $ US " , ylab = " Log Fertility " )
Basic scatterplot II
Fertility vs. PerCapita GDP
2.0 Log Fertility 0.0 0.5 1.0 1.5
10
12
14
Histogram I
hist ( UN2 $ logPPgdp , main = " Distribution of PerCapita GDP " , xlab = " log PerCapita GDP " )
Histogram II
Distribution of PerCapita GDP
30 Frequency 0 6 5 10 15 20 25
10
12
14
16
Boxplot I
Boxplot II
Boxplot of PerCapita GDP
10
12
14
Matrix of Scatterplots I
To make scatterplots of all the numeric variables in your dataset in R, you can use pairs():
1
pairs ( UN2 )
Matrix of Scatterplots II
0.0 0.5 1.0 1.5 2.0 14 1.5 2.0
logFertility
0.0
0.5
1.0
10
12
14
20
40
60
80
100
20
40
Purban
60
80 100
10
logPPgdp
12
1 2 3 4 5 6
We then add in the points where % urban is more than 50 and mark these points with a dierent color.
1 2 3
points ( logPPgdp [ Purban >= 50] , logFertility [ Purban >= 50] , col = " red " )
10 log PPGDP
12
14
Overlaying
Caution! Once a plot is constructed using plot, whatever is contained in the plot cannot be modied. To overlay things on a rendered plot, use one of the following 1 abline - add a line with slope b , intercept a or horizontal/vertical. 2 points - add points. 3 lines - add lines.
R Environment
Part X R Environment
Exploring R Objects I
To see the names of the objects available to be saved (in your current workspace), use ls().
1
ls ()
[1] "UN2" "a" "b" "d" "data" "e" "f" "h" "mat1" "mat2"
Exploring R Objects II
To remove objects from your workspace, use rm().
1 2
rm ( d ) ls ()
[1] "UN2" "a" "b" "data" "e" "f" "h" "mat1" "mat2"
rm ( list = ls () ) ls ()
character(0)
To save (to the current directory) all the objects in the workspace, use save.image().
1
To save (to the current directory) a single object in the workspace, use save().
1
R Environment
To save (to the current directory) certain objects in the workspace to be used in Excel, use write.csv().
1 2
Saving R Commands I
To see all of the commands you typed in an R session, click on the Yellow and Green Tablet
Saving R Commands II
To save all of the commands you typed in an R session, use:
1
Go to: File -> New Document Type your commands Save the le as "code.r" Go back to the R Console To run all the commands, use:
1
The More You Know... Use the # sign to write comments in your code. Use them!
Error:
syntax error
Possible causes: Incorrect spelling (of the function, variable, etc.) Including a + when copying code from the Console Having an extra parenthesis at the end of a function Having an extra bracket when subsetting
Trailing +
Possible causes: Not closing a function call with a parenthesis Not closing brackets when subsetting Not closing a function you wrote with a squiggly brace You can escape this sticky situation by hitting the ESCAPE key to exit your command.
CRAN
http://cran.stat.ucla.edu/
http://www.rseek.org
R Bootcamp is a day-long introduction to R. Handouts and datasets from Bootcamp 2008 can be found on Ryan Rosarios website: http://www.stat.ucla.edu/rosario/boot08/
Exercises
Exercises
Exercise 1
Create a vector of the positive odd integers less than 100 Remove the values greater than 60 and less than 80 Find the variance of the remaining set of values
Exercises
Exercise 2
Exercises
Exercise 3
Load the Animals dataset from the MASS package Examine the documentation for this dataset Find the correlation coecient of brain weight and body weight in this dataset
Exercises
Solutions I
1 2 3 4
e1 <- seq ( from = 1 , to = 100 , by = 2) e1 .2 <- e1 [ e1 <= 60 | e1 >= 80] var ( e1 .2) [1] 931.282
Exercises
Solutions II
1
3 4 5 6 7
A <- matrix ( c (2 , 3 , nrow = 3 , byrow B <- matrix ( c (3 , 2 , nrow = 3 , byrow A%*%B [ ,1] [ ,2] [ ,3] [1 ,] 41 81 56 [2 ,] 13 60 61 [3 ,] 14 49 69
7 , 1 , 6 , 2 , 3 , 5 , 1) , = TRUE ) 9 , 0 , 7 , 8 , 5 , 8 , 2) , = TRUE )
Exercises
Solutions III
1 2 3 4 5 6 7
library ( MASS ) data ( Animals ) ? Animals cor ( Animals ) body brain body 1.000000000 -0.005341163 brain -0.005341163 1.000000000
Upcoming Mini-Courses
April 7 - R Programming II: Data Manipulation and Functions April 12 - LaTex I: Writing a Document, Paper, or Thesis April 14 - LaTeX II: Bibliographies, Style and Math in LaTeX April 19 - LaTeX III: Sweave, Embedding R in LaTeX For a schedule of all mini-courses oered please visit http://scc.stat.ucla.edu/mini-courses .