Nothing Special   »   [go: up one dir, main page]

Rbasics

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 96

R Programming

What is R?
•R is a scripting language for statistical data manipulation, analysis, graphics
representation and reporting
•R was created by Ross Ihaka and Robert Gentleman at the University of Auckland,
New Zealand, and is currently developed by the R Development Core Team.
•R is freely available and offers great graphical facilities for data analysis and
visualization
•R documentation and manual are available at
http://cran.r-project.org/manuals.html

•An alphabetical list of R packages is available at


https://cran.r-project.org/web/packages/available_packages_by_name.html
R features

• R is case sensitive.

• Commands are separated either by a semi-colon “ ; ” or by a new line.


• If a command is not complete at the end of line R will keep on reading
second or subsequent line until command is syntactically complete.
• Vertical arrow navigator keys can be used to scroll forward and backward
through command history.
• “help() or ?” can be used for accessing the documentation of any function
Objects and Workspace
• The entities that R creates and manipulates are known as objects.
• These may be variables, array of numbers, character strings, functions or more structure built from
such components
• During an R session, objects are created and stored by name.
• “ ls() ” commands can be used to display the names of objects which are currently stored within R.
• The collection of objects currently stored is called workspace.
• For removing an object from current workspace “rm()” function is used.
• All objects are written to a file called .Rdata in the current directory and the commands used are
saved in a file called .Rhistory
• When R is started at a later time all the objects and command history is automatically reloaded
from these files
Basic Data Types

• Numeric • “Class()” function is used for revealing the class of a


variable whether it is numeric, integer or any other
• Integer type

• Logical • “is.numeric” can be used to check whether the number


is numeric or not (similarly “is.integer”, “is.logical” and
• Character so on)

• Complex

• Double
Numeric and Integer
• > a <- 6.5 # assigning a value to variable “a”
> a # print value of “a”
[1] 6.5
> class(a) # print class of “a”
[1] “numeric”
• > a <- 5
> class(a)
[1] “numeric”
> is.integer(a)
[1] FALSE
• Class of a is numeric but not integer

• > a <- 5L # to specify that the number is integer “L” can be added
> class(a)
[1] “integer”
> is.integer(a)
[1] TRUE
> is.numeric(a)
[1] TRUE
Logical
• TRUE, FALSE and NA are logical
• > class(TRUE)
[1] “logical”
• TRUE and FALSE can be abbreviated to T and F respectively
• A logical is often created via comparison between two variable
> a <- 7 < 9
>a
[1] TRUE
> a <- 7 > 9
>a
[1] FALSE
> is.logical(a)
[1] TRUE
• NA stands for “Not Available” and is created when there is missing data
Character
• Used to represent string values
> x <- “This is a character string”
> x
[1] “This is a character string”
> class(x)
[1] “character”
• Concatenation of two or more strings using “paste()” function
> conc_string <- paste(“1st string ”, “2nd string ”, “and 3rd string ”)
> conc_string
[1] “1st string 2nd string and 3rd string”
• For extracting a substring “substr()” function can be used
> substr(string_name, start, stop)
> substr(conc_string, 3, 14) or substr(conc_string, start=3, stop=14)
[1] “t string 2n”
Coercion of data type
• In R one type of variable can be converted to another type of variable if possible
• Example
> as.numeric(TRUE)
[1] 1
> as.character(5)
[1] “5”
> as.numeric(“6.5”)
[1] 6.5
> as.integer(6.5)
[1] 6
> as.numeric(“abc”)
[1] [NA]
Warning message:
NAs introduced by coercion
Data Structures
• Vectors

• Matrix

• List

• Factors

• Data Frames
Vectors
• A vector is a sequence of data elements of the same basic data type.

• For creating vector c( ) function is used


> c(1,2,3,4,5) #Numeric Vector
[1] 1 2 3 4 5
> c(“abc”, “xyz”) #Character vector
[1] “abc” “xyz”
> x <- c(TRUE, TRUE, FALSE, TRUE) #Logical vector
>x
[1] TRUE TRUE FALSE TRUE

• length( ) function can be used for determining the numbers of members or element of a vector
> length(x)
[1] 4
Combining Vectors
• Vectors can be combined by simply using “c( )” function
> x <- c(1,2,3)
> y <- c(“four”, “five”, “six”)
> c(x, y)
[1] “1” “2” “3” “four” “five” “six”

• Above code is also an example of value coercion in vectors

Atomic Vectors
• R does not provide separate data structure to hold a single element or variable

• If a variable is defined it is stored as a vector of length 1 and can be called as atomic vectors
> x <- 5
>is.vector(x)
[1] TRUE
Vector Arithmetic
• Arithmetic operations of vectors are performed member-by-member Example,
> x <- c(1,2,5,8)
>2*x
[1] 2 4 10 16
> y <- c(2,5,6,3)
>x+y
[1] 3 7 11 11
>x*y
[1] 2 10 30 24
> x^2
1 4 25 64

• Similarly for subtraction division, we get new vectors via memberwise operations.
Vector Arithmetic
• Recycling rule or recycling of members
> x <- c(1,2,2,6)
> y <- c(2,4)
>x+y
[1] 3 6 4 10

• Here y became equivalent to a vector of length 4, i.e., c(2,4,2,4)


> x <- c(x,9)
>x
[1] 1 2 2 6 9
>x+y
[1] 3 6 4 10 11
Warning message:
In x + y : longer object length is not a multiple of shorter object length

• Now y became is equivalent to a vector of length 5, i.e., c(2,4,2,4,2)


Labelling of Vector elements
• Suppose a vector for counts of 100 coin toss outcomes
> coin.toss <- c(47, 53)
> coin.toss
[1] 47 53

• To label the members of vector “names( )” function can be used


> labels <- c(“Heads”, “Tails”)
> names(coin.toss) <- labels
> coin.toss
Heads Tails
47 53

• Another method for labelling is at the time of its definition or creation


> coin.toss <- c(Heads = 47, Tails = 53)
Subset of Vector elements
• We can retrieve values in a vector by declaring an index inside a single
square bracket "[]"operator.
> x <- c(“abc”, “xyz”, “lmn”)
> x[2]
[1] “xyz”

• Unlike other programming languages, the square bracket operator returns


more than just an individual member. In fact, the result of the square
bracket operator is another vector, and x[2] is a vector slice containing a
single member “xyz".
Negative Index
• If the index is negative, it would remove the member whose position has the same absolute
value as the negative index.
> x <- c(“abc”, “xyz”, “lmn”)
> x[-2]
[1] “abc” “lmn”

Out of Range Index


• If an index is out-of-range, a missing value will be reported via the symbol NA.
> x <- c(“abc”, “xyz”, “lmn”)
> x[5]
[1] NA
Subset of Multiple elements
• A new vector can be sliced from a given vector with a numeric index vector, which consists of member positions
of the original vector to be retrieved.
> x <- c(“abc”, “xyz”, “lmn”)
> x[c(1,3)]
[1] “abc” “lmn”
> x[c(3,1)]
[1] “lmn” “abc”.

• Range/Colon Operator (“:”) can be used to subset between a range of indices


> x[c(1:3)]
[1] “abc” “xyz” “lmn”

• Rang operator can also be used to create a vector


> c(1:11)
[1] 1 2 3 4 5 6 7 8 9 10 11
Subset by Name/label
• > dice.roll <- c( Two = 6, Three = 1, Four = 5, Five = 6, Six = 3)
> dice.roll
One Two Three Four Five Six
3 6 1 5 6 3
> dice.roll[“Six”]
Six
3
> dice.roll[ c(“Six”, “One”) ]
Six One
3 3
Subset using Logical Vector
• A new vector can be sliced from a given vector with a logical index vector, which has the same length
as the original vector. Its members are TRUE if the corresponding members in the original vector are
to be included in the slice, and FALSE if otherwise.
> y <- c(1:6)
>y
[1] 1 2 3 4 5 6
> y[c(TRUE, FALSE, FALSE, TRUE, FALSE, TRUE)]
[1] 1 4 6

• If length of logical vector is not equal to the original vector then recycling of the members of logical
vector happens
> y[ c(TRUE, FALSE) ]
135
Matrix
• Vector: 1D array of data elements of same type
• Matrix: 2D array of data elements of same type
• “matrix()” function can be used to build a matrix
matrix( data = x, nrow = i , ncol = j, byrow = FALSE, dimnames = NULL )
> mat1 <- matrix(1:6, nrow = 2)
> mat1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> mat1 <- matrix(1:6, ncol = 3)
> mat1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> mat1 <- matrix( 1:6, nrow = 2, byrow = TRUE )
> mat1
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
Matrix
• > mat2 <- matrix(1:6, nrow = 3, ncol = 4)
> mat2
[,1] [,2] [,3] [,4]
[1,] 1 4 1 4
[2,] 2 5 2 5
[3,] 3 6 3 6
• Naming the rows and columns of matrix
> rownames(mat2) <- c(“row1”, “row2”, “row3”)
> colnames(mat2) <- c(“col1”, “col2”, “col3”, “col4”)
> mat2
col1 col2 col3 col4
row1 1 4 1 4
row2 2 5 2 5
row3 3 6 3 6
OR
> mat2 <- matrix(1:6, nrow = 3, ncol = 4, dimnames = list( c(“row1”, “row2”, “row3”), c(“col1”, “col2”,
“col3”, “col4”) ) )
OR
> dimnames(mat2) = list( c(“row1”, “row2”, “row3”), c(“col1”, “col2”, “col3”, “col4”) )
Matrix (rbind & cbind)
• cbind() function can be used for adding new column to a matrix
> mat1 <- matrix(1:6, ncol = 3)
> mat1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> cbind(mat1, c(7,8))
[,1] [,2] [,3] [,4]
[1,] 1 3 5 7
[2,] 2 4 6 8
• rbind() function can be used for adding new row to a matrix
> rbind(mat1, c(1,2))
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[3,] 1 2 1 (here recycling of values happens)
Matrix (rbind & cbind)
• While adding new row or column to a matrix using rbind() or cbind() function, if number of elements
passed are lesser then in that case recycling of elements happens
• If the number of elements passed are greater than required then in that case R fills value up to which they
are required and ignore remaining
• rbind() and cbind() can also be used to create a new matrix out of scratch
> rbind(c(1:5), c(6:10), c( 11:15))
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
> cbind(c(1:3), c(3:5))
[,1] [,2]
[1,] 1 3
[2,] 2 4
[3,] 3 5
• Attaching two matrices using rbind() or cbind() functions. For attaching matrices using rbind() the number
of columns of both the matrices must be equal and for attaching using cbind() number of rows must be
equal.
• > A <- matrix(1:6, ncol = 3)
> B <- matrix(7:12, ncol = 3)
> rbind(A,B)
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[3,] 7 9 11
[4,] 8 10 12
> cbind(A,B)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 3 5 7 9 11
[2,] 2 4 6 8 10 12
> cbind(A, rbind(A,B))
Error in cbind(A, rbind(A, B)) :
number of rows of matrices must match (see arg 2)
• If a row or column added to the matrix have different data type then coercion happens
> rbind(A, c(TRUE, FALSE, FALSE))
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[3,] 1 0 0
Subset Matrix
• > mat1 <- matrix( 1:15, nrow = 3, byrow = TRUE )
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
• Suppose we want to subset “8” which is at 2nd row and 1st column?
> mat1[2,3]
[1] 8
• For selecting entire row
> mat1[2, ]
[1] 6 7 8 9 10
• For selecting entire column
> mat1[ ,3]
[1] 3 8 13
• If “comma” is not included in code R will count in column wise fashion up to the given index
> mat1[10]
[1] 4
Subset Multiple elements
• > A <- matrix(LETTERS[1:16], ncol = 4, byrow = TRUE)
> A # LETTERS & letters are vectors containing alphabets in uppercase and lowercase respectively, and present by default in R
[,1] [,2] [,3] [,4]
[1,] "A" "B" "C " "D"
[2,] "E" "F" "G" "H"
[3,] "I" "J" "K" "L"
[4,] "M" "N" "O" "P"
> A[ 2, c(2,4) ]
[1] "F" "H“
> A[ c(1,4), 4]
[1] "D" "P“
> A[ c(2,4), c(1,3,4) ] #This will create a sub matrix having elements from row 2 & 4, and column 1, 3 & 4
[,1] [,2] [,3]
[1,] "E" "G" "H"
[2,] "M" "O" "P“
> A[ c(1,2), c(3,4) ]
[,1] [,2]
[1,] "C" "D"
[2,] "G" "H"
Subset by Names and Logicals
• > A <- matrix(LETTERS[1:16], ncol = 4, byrow = TRUE)
> rownames(A) <- c(“r1”, “r2”, “r3”, “r4”)
> colnames(A) <- c(“c1”, “c2”, “c3”, “c4”)
> mat2
c1 c2 c3 c4
r1 "A" "B" "C " "D"
r2 "E" "F" "G" "H"
r3 "I" "J" "K" "L"
r4 "M" "N" "O" "P“
> A["r1", "c2"]
[1] “B”
> A[2, "c3"] # Combination of index and name can aslo be used
[1] “G”
• Using Logicals
> A[c(TRUE, FALSE), c(TRUE, FALSE, FALSE,TRUE)] #Here recycling of elements of row vector will take place
c1 c4
r1 "A" "D"
r3 "I" "L"
Matrix Arithmetic
• Very similar to vectors
• Elementwise operation happens
> A <- matrix(1:12 , nrow = 3, byrow = TRUE)
>A
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
> A * 10
[,1] [,2] [,3] [,4] # Similarly all other operations happen in elementwise fashion
[1,] 10 20 30 40
[2,] 50 60 70 80
[3,] 90 100 110 120
>A*A
[,1] [,2] [,3] [,4]
[1,] 1 4 9 16 # For multiplication or any other operation to work between two matrix
[2,] 25 36 49 64 there dimensions must be same
[3,] 81 100 121 144
Matrix Arithmetic
• > A – c(1,5) # Here recycling of elements of vector happens (column wise) till a equivalent matrix is generated
[,1] [,2] [,3] [,4]
[1,] 0 -3 2 -1
[2,] 0 5 2 7
[3,] 8 5 10 7
• colSums(), rowSums() functions are for calculating sums of each row and each column of the
matrix respectively
• For algebraic matrix multiplication “ %*% ” operator is used and works only when number of
columns of first matrix equals number of rows of second matrix
• For obtaining transpose of a matrix t() function is used
• For obtaining the inverse of a matrix solve() function is used
Factors
• This data structure is used for handling categorical variables ( e.g. Blood group, gender etc. )
• They are useful for those which have a limited number of unique values.
• Suppose blood group types of 15 people is stored in a vector
> blood.group <- c("A", "AB", "A", "O", "A", "B", "B", "O", "AB", "A", "B", "O", "AB", "O", "O")
> blood.group
[1] "A" "AB" "A" "O" "A" "B" "B" "O" "AB" "A" "B" "O" "AB" "O" "O“
• For converting above vector into factor factor() function is used
> B.G.factor <- factor(blood.group)
> B.G.factor
[1] A AB A O A B B O AB A B O AB O O
Levels: A AB B O #These are the diferent unique categories. By default levels are are sorted alphabetically
> str(B.G.factor) # str() function is used for viewing the structure of any data structure
Factor w/ 4 levels "A","AB","B","O": 1 2 1 4 1 3 3 4 2 1 . . .
• Each Level is stored as an integer, because it requires much less space (repeating large strings per
observation can take up a lot of space).
• Basically factors are integer vectors with factor levels associated with them.
Factors
• Setting order of levels manually
> B.G.factor2 <- factor(blood.group, levels = c("O","A","B","AB") )
> B.G.factor2
[1] A AB A O A B B O AB A B O AB O O
Levels: O A B AB
> str(B.G.factor2)
Factor w/ 4 levels "O","A","B","AB": 2 4 2 1 2 3 3 1 4 2 . . .
> str(B.G.factor)
Factor w/ 4 levels "A","AB","B","O": 1 2 1 4 1 3 3 4 2 1 . . .
• Renaming factor levels
> levels(B.G.factor) <- c("BT_A", "BT_AB", "BT_B", "BT_O")
OR
> B.G.factor <- factor( blood.group, labels = c("BT_A", "BT_AB", "BT_B", "BT_O") )
> B.G.factor
[1] BT_A BT_AB BT_A BT_O BT_A BT_B BT_B BT_O BT_AB BT_A BT_B BT_O BT_AB BT_O BT_O
Levels: BT_A BT_AB BT_B BT_O
# Above two methods for renaming the levels have a limitation that the labels specified must follow the same order as that
of factor levels i.e. A, AB, B, O
> B.G.factor <- factor(blood.group, levels = c("O","A","B","AB"), labels = c("BT_O", "BT_A", "BT_B","BT_AB") )
Factors
• Comparison operator do not work on simple factors
> B.G.factor[1] < B.G.factor[2] # comparison will not work because these are nominal variable
[1] NA Warning message:
In Ops.factor(B.G.factor[1], B.G.factor[2]) :
‘<’ not meaningful for factors
• Ordinal variable in factors
> tshirt <- c("L", "M", "M", "S","L","M")
> tshirt_size <- factor(tshirt, ordered = TRUE, levels = c("S","M","L") ) #levels are specified in ascending order
> tshirt_size
[1] L M M S L M
Levels: S < M < L
> str(tshirt_size)
Ord.factor w/ 3 levels "S"<"M"<"L": 3 2 2 1 3 2
> tshirt_size[1] > tshirt_size[2]
[1] TRUE
List
•A list is a vector containing other R objects. List can store practically anything
i.e. numeric, character, vector, matrix, factor etc.
•A list can contain even a number of other lists within it
•No coercion
•Loss of functionalities as compared to vectors and matrices e.g. performing calculus on list is a tedious job
Example:
> c(pdb_id = “5FAC”, Protein = “Alanine Racemase”, resolution= 2.8, seq_len = 410, uniprot_id = “O86786”) # Coercion will happen

pdb_id Protein resolution seq_len uniprot_id


"5FAC" "Alanine Racemase" “2.8" "410" "O86786“

> list( “5FAC”, “Alanine Racemase”, 2.8, 410, “O86786”)


[[1]]
[1] "5FAC"

[[2]]
[1] "Alanine Racemase"

[[3]]
2.8

[[4]]
[1] 410

[[5]]
[1] "O86786"
List
• Assigning labels
> protein <- list(pdb_id = “5FAC”, protein = “Alanine Racemase”, resolution = 2.8, seq_len = 410, uniprot_id = “O86786”)
OR
> names(protein) <- c("pdb_id", "protein", “resolution", "seq_len", “uniprot_id")
> protein
$pdb_id
[1] "5FAC"

$protein
[1] "Alanine Racemase"

$str_wt
2.8

$seq_len
[1] 410

$uniport_id
[1] "O86786“

> str(protein)
List of 5
$ pdb_id : chr "5FAC"
$ protein : chr "Alanine Racemase"
$ str_wt : num 2.8
$ seq_Len : num 410
$ uniprot_id : chr "O86786"
• List can store any type of object
> v <- c(1,2,3) #numeric vector
> v1 <- c(“a”, “b”, “c”) # character vector
> v2 <- c(TRUE, FALSE, TRUE, TRUE) #logical vector
> m1 <- matrix( c(9,5,6,7), nrow = 2, byrow = TRUE ) #matrix
> list1 <- list(v, v1, v2, m1) # list1 will have copies of v, v1, v2, m1
> str(list1)
List of 4
$ : num [1:3] 1 2 3
$ : chr [1:3] "a" "b" "c"
$ : logi [1:4] TRUE FALSE TRUE TRUE
$ : num [1:2, 1:2] 9 6 5 7
• A list can store other list
> protein <- list(pdb_id = "5FAG", protein = "Alanine Racemase", resolution = 1.51, seq_len = 410, uniprot_id = "O86786“,
prev_pro = protein)
> str(protein)
List of 6
$ pdb_id : chr "5FAG"
$ protein : chr "Alanine Racemase"
$ str_wt : num 1.51
$ seq_Len : num 410
$ Uniprot_id : chr "O86786“
$ prev_pro : List of 5
. . $ pdb_id : chr "5FAC"
. . $ protein : chr "Alanine Racemase"
. . $ str_wt : num 2.8
. . $ seq_Len : num 410
Subset List
• > protein <- list(pdb_id = "5FAG", protein = "Alanine Racemase", resolution = 1.51, seq_len = 410,
uniprot_id = "O86786“, prev_pro = protein)
# Subset the 1st element i.e. “pdb_id”
> protein[1] or protein["pdb_id"]
$pdb_id
[1] "5FAC" # output is a list of 1
> protein[[1]] or protein[["pdb_id"]]
[1] "5FAC" # output is a character string
# [ ] gives a sublist and [[ ]] gives single element
> protein[c(1, 3)] # output will be a sublist
$pdb_id
[1] "5FAC"
$resolution
[1] 1.51
> protein[[c(1, 3)]]
Error in protein[[c(1, 3)]] : subscript out of bounds
# Because separate elements have to be accessed separate
Subset List
• [[c(1,3]] is equivalent to [[1]] [[3]] which means from the 1st element of list select its 3rd
element, but for the 1st element that is a vector of length 1 (“FAG”) there is no 3rd element
and that’s why the error “Subscript out of bound”
• Selecting sub elements
> protein[[6]] [[1]] or protein[[c(6, 1)]] or protein[[“prev_pro”]] [[“pdb_id”]]
[1] "5FAC"

> v <- c(1,2,3)


> v1 <- c(“a”, “b”, “c”)
> v2 <- c(TRUE, FALSE, TRUE, TRUE)
> m1 <- matrix( c(9,5,6,7), nrow = 2, byrow = TRUE )
> list1 <- list(v, v1, v2, m1)
> list1[[2]] [[3]]
[1] “c”
# selecting elements of matrix which is in list
> list1[[4]] [[2,1]] # 2nd row and 1st column of matrix
[1] 6
>list1[[4]] [2, ]
[1] 6 7
Subset List
• Subset using logical work only up to single bracket
> protein <- list(pdb_id = "5FAG", protein = "Alanine Racemase", resolution = 1.51, seq_len = 410, uniprot_id =
"O86786“)
> protein[c(TRUE,FALSE,TRUE,TRUE,FALSE)]
$pdb_id
[1] "5FAC"
$resolution
[1] 1.51
$seq_len
[1] 410

> protein[[c(T,F,T,T,F)]] ~ protein[[T]] [[F]] [[T]] [[T]] [[F]]


#logical vector used in double brackets do not make any sense
Error in protein[[c(T, F, T, T, F)]]
: recursive indexing failed at level 2
Subset List using “$” and Extending List
• Selection using $ sign works only with labelled elements of list
> protein <- list(pdb_id = "5FAG", protein = "Alanine Racemase", resolution = 1.51, seq_len = 410, uniprot_id = "O86786“, prev_pro = protein)
> protein$resolution
[1] 1.51
• Selection of subelements
> protein$prev_pro$resolution
[1] 2.8
• List can be extended using "$" or [[ ]] (double square brackets)
> ligands <- c(“Sodium ion”, “Propanoic acid”, “Nitrate ion”, “PYRIDOXAL-5'-PHOSPHATE”)
> protein$ligands <- ligands
> str(protein)
List of 7
$ pdb_id : chr "5FAG"
$ protein : chr "Alanine Racemase"
$ str_wt : num 1.51
$ seq_Len : num 410
$ Uniprot_id : chr "O86786“
$ prev_pro : List of 5
. . $ pdb_id : chr "5FAC"
. . $ protein : chr "Alanine Racemase"
. . $ str_wt : num 2.8
. . $ seq_Len : num 410
. . $ uniprot_id : chr "O86786“
$ ligand : chr [1:4] “Sodium ion”, “Propanoic acid”, “Nitrate ion”, “PYRIDOXAL-5'-PHOSPHATE”
Datasets
• Observations
• Variables
Example:
Name Activity CID Structure
17-alpha-Ethynylestradiol 0 5991 OC4(C#C)C3(C)C(C2C(C1=C(C=C(O)C=C1)CC2)CC3)CC4
1-Chloro-10 11- 5294402
dehydroamitriptyline 1 0 ClC3=C2C(C(=CCCN(C)C)C1=C(C=CC=C1)C=C2)=CC=C3
1-Chloroamitriptyline 1 103696 ClC3=C2C(C(=CCCN(C)C)C1=C(C=CC=C1)CC2)=CC=C3
1-Phenylpiperazine 0 7096 N2(C1=CC=CC=C1)CCNCC2
3-Methylcholanthrene 0 1674 C15=C2C(=C(C)C=C1)CCC2=C3C(C4=C(C=C3)C=CC=C4)=C5
4-Cyano-5-chlorophenyl-
amidinourea 1 30760 ClC1=C(C#N)C=CC(NC(=O)NC(=N)N)=C1
5-Hydroxydopamine 1 114772 OC1=C(O)C=C(CCN)C=C1O
Abacavir 0 441300 OCC4C=CC(N3C2=NC(N)=NC(NC1CC1)=C2N=C3)C4
ABT-518 (parent) 0 9827497 S(=O)(=O)(C2=CC=C(OC1=CC=C(OC(F)(F)F)C=C1)C=C2)CC(N(O)C=O)C3OC(C)(C)OC3

• Each row represent an observation


• Name, Activity, CID and Structure are the variables for the above dataset
• Matrix can not be used be used for storing this dataset because of the different datatype of columns
• List can be used but not that practical because of the structure of lists (even retrieving some small
information a lot of coding will have to be written)
Data frames
• Data frames are fundamental data structure to store specifically datasets
• They can contain elements of different type, but in one column there can be only elements
of one data type
• All the columns must have equal length
• Creating a data frame ( data.frame() )
> v <- c(1,2,3)
> v1 <- c(“a”, “b”, “c”)
> v2 <- c(TRUE, FALSE, TRUE)
> df <- data.frame(v1, v, v2) or data.frame(v1,v1,v2, stringsAsFactors = FALSE) for saving character column as
character instead of saving it as factor
> df
v1 v v2
1 a 1 TRUE
2 b 2 FALSE
3 c 3 TRUE
# columns are named automatically, row names can be assigned using rownames() function
> str(df)
'data.frame': 3 obs. of 3 variables:
$ v : num 1 2 3
$ v1: Factor w/ 3 levels "a","b","c": 1 2 3 (characters are converted to factor automatically)
$ v2: logi TRUE FALSE TRUE
#similar to list because data frame under the hood is basically a list where each column is stored as vector and all
Data

frames
In most cases data frames are created via importing datasets from external sources such as CSV
file, xlsx file and SQL file etc.
Example:
> df <- read.csv( “c:/mydataset.csv”, header = TRUE, sep = “,”)
• Subsetting data frame
data frames are basically an intersection between matrix and list therefor syntax of both matrix
and list can be used to subset data frames
i.e.
[ ] from matrix subsetting
[[ ]] and $ from list subsetting
> v <- c(1,2,3) > df
> v1 <- c(“a”, “b”, “c”) v1 v v2
> v2 <- c(TRUE, FALSE, TRUE) 1 a 1 TRUE
> df <- data.frame(v1, v, v2) 2 b 2 FALSE
3 c 3 TRUE
> df[3,1] or df[3, "v1"]
[1] “c”
> df[1, ]
v1 v v2
1 a 1 TRUE #result is a data frame

> df[ ,1] or df[ ,"v1"]


[1] “a” “b” “c” #result is a vector
• The only difference between matrix and data frame subsetting arises when within square
bracket index is given without comma
> df[3] #output will be a data frame instead of single element which was in case of matrix
v2
1 TRUE
2 FALSE
3 TRUE
• This is because data frame under the hood is a list basically and therefor list syntax for
subsetting will work as well
> df$v1 or df[["v1"]] or df[[1]]
[1] "a" "b" "c"
#single bracket will give a list or a data frame as in above case
• Extending data frame
adding column = adding new variable
adding row = adding new observation
# adding column
> x <- c(45, 56, 89)
> df$x <- x or df[["x"]] <- x or cbind(df, x)
> df
v1 v v2 x
1 a 1 TRUE 45
2 b 2 FALSE 56
3 c 3 TRUE 89
• Similarly rbind() can be used for adding observations or rows, but a vector can
not be used for adding a row because different columns in data frame have
different data type and a vector can store only one type of elements
• Therefor either a list or a data frame can be used in rbind() function for adding
new observations or row
Example:
> df <- rbind(df, list("b", 5, FALSE, 50))
or
> df <- rbind(df, data.frame("b", 5, FALSE, 50))
> df
v1 v v2 x
1 a 1 TRUE 45
2 b 2 FALSE 56
3 c 3 TRUE 89
4 b 5 FALSE 50
Sorting data frame
• > names <- c("Ankita", "Aman", "Ravi", "Pankaj", "Lokesh")
> age <- c(20, 23, 28, 19, 22)
> weight <- c(52, 68, 65, 57, 82)
> people <- data.frame(names, age, weight)
• For sorting this data frame according to age in ascending order
> ranks <- order(people$age)
> ranks
[1] 4 1 5 2 3
> people[ranks, ]
names age weight
4 Pankaj 19 57
1 Ankita 20 52
5 Lokesh 22 82
2 Aman 23 68
3 Ravi 28 65
or
> people[order(people$age), ]
• For sorting in descending order
> people[order(people$weight, decreasing = TRUE), ]
Operators
 Arithmetic operator
 Relational operator
 Logical operator
 Assignment operator
 Miscellaneous operator
 Manual operator
Arithmetic operators
Operator Description
+ Addition of two vectors or matrices
- Subtraction
* Multiplication
/ Division
%% Gives reminder
%/% Gives quotient
e.g
>16%/%3
[1] 5

^ Power operator(Raises a value to given exponent)

 Element wise operation takes place for vectors and matrices


Relational operator
Operator Description Example
< Checks if first value is lesser than the second >5<7
value. [1] TRUE
> Checks if first value is greater than the >5>7
second value. [1] FALSE
<= Checks if first value is lesser than or equal to > 5 <= 7
the second value. [1] TRUE
>= Checks if first value is greater than or equal > 8 >= 8
to the second value. [1] TRUE
== Checks if first value is equal to the second > 10 == 11
value. [1] FALSE
!= Checks if first value is not equal to the > 10 != 11
second value. [1] TRUE

 Element wise operation takes place for vectors and matrices


Logical or Boolean operator
Operator Description Example
| > v <- c(1,2,FALSE)
It is called Element-wise Logical OR operator. It > v1 <- c(0,3,FALSE)
combines each element of the first vector with the > v | v1
corresponding element of the second vector and [1] TRUE TRUE FALSE
gives a output TRUE if one the elements is TRUE.

& > v <- c(1,2,FALSE)


It is called Element-wise Logical AND operator. It > v1 <- c(0,3,FALSE)
combines each element of the first vector with the > v & v1
corresponding element of the second vector and [1] FALSE TRUE FALSE
gives a output TRUE if both the elements are TRUE.

! > v <- c(1,2,FALSE)


> v1 <- c(0,3,FALSE)
It is called Logical NOT operator. Takes each element > v ! v1
of the vector and gives the opposite logical value. [1] FALSE FALSE TRUE

|| Called Logical OR operator. Takes first element of > v <- c(1,2,FALSE)


both the vectors and gives the TRUE if one of them is > v1 <- c(0,3,FALSE)
TRUE. > v || v1
[1] TRUE
&& Called Logical AND operator. Takes first element of _
both the vectors and gives the TRUE only if both are
TRUE.
Assignment operator
Operator Descripton Example
"<-" or "=" or "<<-" Leftward assignment operator > x <- 5
"->" or "->>" Rightward assignment > 5 -> x

Miscellaneous operator
Operator Descripton Example
: colon operator is used for obtaining > c(5:15)
series of number in sequence [1] 5 6 7 8 9 10 11 12 13 14 15
%*% for algebraic multiplication of _
matrices
%in% for checking whether an element > 21 %in% c(5:15)
belongs to a vector or not [1] FALSE
Conditional statements
• if() statements
• if() else() statements
• If() else if() else() statements
• Very similar to C++ conditional statements
> if(condition){
expression
}
> x <- -5
> if(x<0){
print("x is a negative number")
}
[1] "x is a negative number"
Conditional statements
• > if(x>0){
print("x is a positive number")
} else {
print ("x is a negative number")
}
[1] "x is a negative number“
• > x <- 12
> if(x%%2 == 0) {
print ("x is divisible by 2")
} else if(x%%3 = = 0){ # true but will not be executed
print ("x is divisible by 3")
} else {
print ("x is neither divisible by 2 nor by 3")
}
[1] ("x is divisible by 2")
Loops
• For loop
> for(n in x) { expr }
Example
> z <- c(5,12,13)
> for (i in z) {
print(i^2)
}
[1] 25
[1] 144
[1] 169
# for looping over a series of sequence
> for (n in 1:10) { print(n) }
[1] 1
[1] 2
.
.
[1] 10
• C style looping using while and repeat loop
> i <- 1
> while(1) {
i <- i+4
if (i > 10) break
}
>i
[1] 13
# same results can be obtained via specifying condition ()
> i <- 1
> while(i<10) {
i <- i+4
}
>i
[1] 13
# break statement can also be used with for loop
# Another useful statement is next, which instructs the interpreter to go to the next
iteration of the loop.
• repeat loop
> repeat {
commands
if(condition) { break }
}
#A repeat loop is used to iterate over a block of code multiple number of
times. There is no condition check in repeat loop to exit the loop.
# So a condition with break statement must be specified within the loop
> x <- 1
> repeat {
print(x)
x = x+1
if (x == 6){ break }
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
Functions
• Basic syntax for declaring function
> func_name <- function (argument) {
statement
}
Example:
> pow <- function(x, y) { #x & y are arguments of function
result <- x^y
print(result)
}
> pow(5,3) #function is called
[1] 125

• Named arguments can also be used


> pow(y=3,x=4)
[1] 64

• Default values of arguments can also be set while declaring function, The use of
default value to an argument makes it optional when calling the function
> pow <- function(x, y=2) { #y argument set to 2 as default, which can be changed while calling the function
result <- x^y
print(result)
}
> pow(x=6)
[1] 36
> pow (x= 3,y=5)
[1] 243
Functions
• A particular return value can also be specified inside the declaration of a function
using return() function
Example 1 Example 2
> check <- function(x) { > check <- function(x) {
if (x > 0) { if (x > 0) {
result <- "Positive"
} else if (x < 0) {
return("Positive")
result <- "Negative" } else if (x < 0) {
} else { return("Negative")
result <- "Zero" } else {
}
return(result) return("Zero")
} }}
> check(x = -65) > check(2)
[1] "Negative" [1] "positive"
• If there are no explicit returns from a function, the value of the last evaluated
expression is returned automatically in R.
• We generally use explicit return() functions to return a value immediately from a
function. If it is not the last statement of the function, it will prematurely end the
function bringing the control to the place from which it was called.
#In example 2 once "positive" is returned no further execution of code takes
place
Functions
• In return() function we can specify a vector, matrix, data frame or even a list, But
return() function can take up a single object at a time
• Therefore If we want to return multiple values in R function, we can use a list (or
other objects) and return it.
Example
> multi_return <- function() {
my_list <- list("color" = "red", "size" = 20, "shape" = "round")
return(my_list) }
> a <- multi_return()
> a$color
[1] "red"
> a$size
[1] 20
> a$shape
[1] "round“

• Writing Your Own Binary Operations(Manual operators)


> "%a2b%" <- function(a,b) {return(a+2*b)}
> 5%a2b%10
[1] 25
# name of operator begins ends with ‘%’ symbol
R Environment and Scope
• The top level environment available to us at the R command prompt is the global
environment called R_GlobalEnv.
• environment() function can be used to get the current environment.
> a <- 2
> b <- 5
> f <- function(x) {
x<-0
print(ls())
print(environment())
}
> ls()
[1] "a" "b" "f"
#notice "x" is not listed because it is not in global environment because when a function is defined a new
local environment is created within that function
> environment()
<environment: R_GlobalEnv>
> f(3)
[1] "x"
<environment: 0x0000000010c2bdc8>
R Environment and Scope
• Cascading of environments
> f <- function(f_x)
{
g <- function(g_x)
{
print("Inside g")
print(ls())
}
g(5)
print("Inside f")
print(ls()) • Local variables are those variables which
} exist only within a certain part of a
> f(6)
[1] "Inside g" program like a function, and is released
[1] "g_x" when the function call ends.
[1] "Inside f"
[1] "f_x" "g"
> ls()
[1] "f“
• Global variables are those variables which exists throughout the execution of a
program. It can be changed and accessed from any part of the program.(A local
variable for one function(parent) can be global for another function if the function is a
nested in parent function)
R Environment and Scope
• > a <- 10
> outer_func <- function() • > a <- 10
{ > func <- function()
a <- 20 {
inner_func <- function(){
a <- 30 a <<- 20
print(a) } print(a)
inner_func() }
print(a)
}
> func()
> outer_func() [1] 20
[1] 30 # for inner_func() > print(a)
[1] 20 # for outer_func() [1] 20
> print(a)
[1] 10 # for global one
• "a" is global variable and as per the definition of global variable it should have changed or
reassigned, but in this "a" remains unchanged.
• This happened because global variable can not be accessed with “<-” assignment operator
and “<-” limits the scope by creating a new copy of that particular variable in each
environment despite the fact that they may have same name
• “<<-” or “->>”(superassignment operator) can be used for accessing and manipulating global
variable inside a function (if no global variable found with that name, a new copy will be
created and saved as global variable)
Recursive function
• A function that calls itself is called a recursive function and this technique is
known as recursion.
# Recursive function to find factorial
> recursive.factorial <- function(x)
{
if (x == 0) return (1)
else return (x * recursive.factorial(x-1))
}
> recursive.factorial(0)
[1] 1
> recursive.factorial(5)
[1] 120
Assignment – use recursive function to convert decimal to binary
Assignment – use recursive function to print fibonacci series
Assignment – to subset only those values from a vector which are greater than a certain value
Assignment – use loops to determine factorial of a number
R switch() Function
• The switch() function in R tests an expression against elements of a list. If the
value evaluated from the expression matches item from the list, the
corresponding value is returned.
• Syntax for switch function
> switch (expression, list)
#Here, the expression is evaluated and based on this value, the corresponding item in the list is
returned. If the value evaluated from the expression matches with more than one item of the list,
switch() function returns the first matched item.
• Example:
> switch(2,"red","green","blue")
[1] "green"
> switch(1,"red","green","blue")
[1] "red“
> switch(0,"red","green","blue")
NULL
> > switch("color", "color" = "red", "shape" = "square", "length" = 5)
[1] "red"
# Simple calculator that can add, subtract, multiply and divide using
functions
> add <- function(x, y) { return(x + y) }
> subtract <- function(x, y) { return(x - y) }
> multiply <- function(x, y) { return(x * y) }
> divide <- function(x, y) { return(x / y) }
# take input from the user print("Select operation.")
> print("1.Add")
> print("2.Subtract")
> print("3.Multiply")
> print("4.Divide")
> choice = as.integer(readline(prompt="Enter choice[1/2/3/4]: ")) # readline() used for taking input
from keyboard
> num1 = as.integer(readline(prompt="Enter first number: "))
> num2 = as.integer(readline(prompt="Enter second number: "))
> operator <- switch(choice,"+","-","*","/")
> result <- switch(choice, add(num1, num2), subtract(num1, num2), multiply(num1, num2),
divide(num1, num2))
> print(paste(num1, operator, num2, "=", result))
apply() family of functions
• The apply() family contains functions to manipulate slices of data from matrices,
arrays, lists and dataframes in a repetitive way. These functions allow crossing the
data in a number of ways and avoid explicit use of loops. They act on an input list,
matrix or array and apply a named function with one or several optional arguments.
• apply() function operates on Arrays(mainly 2D)
> apply(X, MARGIN, FUN, ...)
• where:
X is an array or a matrix if the dimension of the array is 2;
MARGIN is a variable defining how the function is applied: when MARGIN=1, it
applies over rows, whereas with MARGIN=2, it works over columns. Note that when
you use the construct MARGIN=c(1,2), it applies to both rows and columns; and
FUN, which is the function that you want to apply to the data. It can be any R
function, including a User Defined Function (UDF).
> x <- matrix(rnorm(36), nrow=6, ncol=6)
> apply(x, 2, sum) # x: matrix, 2: MARGIN,
sum: function applied
# in output column wise sum of matrix will be obtained

• lapply() function
It can be used for other objects like dataframes, lists or vectors; and the output
returned is a list with the same number of elements as that of object passed to it
• Lapply() if used on a dataframe then only column wise operation is possible
• lapply function applied on matrix x(as dataframe) gives a list of 6(1 for each
column)

• lapply function used on a dataframe for manually created function


• sapply() function
• The sapply() function works like lapply(), but it tries to simplify the output to the
most elementary data structure that is possible. And indeed, sapply() is a
‘wrapper’ function for lapply().

> str(sapply(as.data.frame(x),var))
Named num [1:6] 0.759 1.86 0.89 1.62 0.289 ...
– attr(*, "names")= chr [1:6] "V1" "V2" "V3" "V4" ...
• Applying the lapply() function would give us a list, but when sapply is used a
vector is returned as in above case
Other important functions in R
• cat(x) # Prints the arguments after concatenating them
• identical() # Test if 2 objects are *exactly* equal
• rep(2,5) # Repeat the number 2 five times
• rev(x) # reverse the elements of x
• seq(1,10,0.4) # Generate a sequence (1 -> 10, spaced by 0.4)
• floor(x), ceiling(x), round(x), signif(x) # rounding functions
• unique(x) # Remove duplicate entries from vector
• getwd() # Return working directory
• setwd() # Set working directory
# Built-in constants:
• pi,letters,LETTERS # Pi, lower & uppercase letters, e.g. letters[7] = "g"
• month.abb, month.name # Abbreviated & full names for months
Other important functions in R
• range(x) # Returns the minimum and maximum of x
• mean(x) # Returns mean of x
• var(x) # Returns variance of x
• sd(x) # Returns standard deviation of x
• median(x) # Returns median of x
• weighted.mean() # Returns weighted mean of x
• min(x), max(x), quantile(x)
• cor(x,y) # Gives correlation coefficient between x and y
• rnorm(n, mean = 0, sd = 1) # gives a random deviates for specified mean and sd
• lm() # Fit liner regression model
• sample(x, size, replace = FALSE, prob = NULL) # for random or weighted sampling
Object oriented programming(OOP)
• Object oriented programming (OOP) is a programming structure where programs are
organized around objects as opposed to action and logic.
• OOP helps programmers to develop in a defined style instead of ‘getting stuff done’
• Everything in OOP is grouped as self sustainable “objects”
• In OOP programmers define not only the data type of a data structure, but also the
types of operation/methods(functions) that can be applied to the data structure
• In this way data structure becomes an object that includes both data and
functions(methods) in one unit. In OOP, computer programs are designed by making
them out of objects that interact with one another.
• A key aspect of object-oriented programming is the use of classes. A class is a
blueprint of an object.
• Think of a class as a concept, and the object as the embodiment of that concept.(or
we can say object is an instance of class)
What is Method ?
• A method in object-oriented programming is like a procedure or
function in procedural programming.
• The key difference here is that the method is part of an object. In
object-oriented programming, code is organized by creating objects,
and then give those objects properties and make them do certain
things.
What is CLASS ?
• A class is a collection of variables and methods for objects that have
common properties, operations and behaviors.
• A class is a combination of state (data) and behavior (methods).
• In object-oriented languages, a class is a data type, and objects are
instances of that data type. In other words, classes are prototypes
from which objects are created.
• Once a class is defined, any number of objects can be created which
belong to that class and each object created of particular class will
have independent memory allocation
What is Object ?
• In R everything is an object
• Each object belong to a particular class(e.g. numeric, factor, list, dataframe etc.)
• Objects are the basic run-time entities in an object-oriented system. They are
instances of a class and units of abstraction.
• Programming problem is analyzed in terms of objects and nature of
communication between them. When a program is executed, objects interact
with each other by sending messages.
• Different objects can also interact with each other without knowing the details
of their data or code.
• An object passes a message to another object, which results in the invocation
of a method. Objects then perform the actions that are required to get a
response from the system.
• Encapsulation is when a group of related methods, properties, and
other members are treated as a single object.
• Inheritance is the ability to receive (“inherit”) methods and
properties from an existing class.
• Polymorphism is when each class implements the same methods in
varying ways, but you can still have several classes that can be utilized
interchangeably.
• Abstraction is the process by which a developer hides everything
other than the relevant data about an object in order to simplify and
increase efficiency.
Data Abstraction and Encapsulation

• Abstraction refers to the act of representing essential features without including


the background details or explanations.
• The objects hide its data and methods from the rest of the world(user).
• Because objects encapsulate data and implementation, the user of an object
can view the object as a black box that provides services.
• Instance variables and methods can be added, deleted, or changed, but as long
as the services provided by the object remain the same, code that uses the
object can continue to use it without being rewritten.
• Classes use the concept of abstraction and are defined as a list of abstract
attributes.
• Storing data and functions in a single unit (class) is encapsulation.
• Data cannot be accessible to the outside world and only those functions which
are stored in the class can access it.
Inheritance
• Inheritance is the process by which objects can acquire the properties
of objects of other class.
• In OOP, inheritance provides reusability, like, adding additional
features to an existing class without modifying it.
• This is achieved by deriving a new class from the existing one. The
new class will have combined features of both the classes.
Polymorphism
• Polymorphism means the ability to take more than one form.
• An operation may exhibit different behaviors in different instances.
• The behavior depends on the data types used in the operation.
• Polymorphism is extensively used in implementing Inheritance.
Advantages of OOP
• Modularity
• The source code for a class can be written and maintained independently of
the source code for other classes. Once created, an object can be easily passed
around inside the system.
• Information hiding
• OOP provides a clear modular structure for programs which makes it good for
defining abstract data types where implementation details are hidden and the
unit has a clearly defined interface.
• Code re-use
• If a class already exists, you can use objects from that class in your program.
• OOP makes it easy to maintain and modify existing code as new objects can be
created with small differences to existing ones.
•Easy debugging
Disadvantages of OOP
• Slow down the processing or not able to harness the processing
power of CPU completely
• Not required or sometimes not desirable for data analysis purpose
• It works best only for limited number of complex objects and you
completely understands the behaviour of objects
Types of OOP Systems
in R

S3 S4 ReferenceClasses R6
S3
• Mainly focused on function overloading (polymorphism)
• Simple to use
• Important terms in S3 system are:
 A class is an attribute of object that dictates what messages the
object can receive and return
 A method is a function that is designed for a specific class
 Dispatch is selection of class specific method
• S3 operates around classes and methods
> print
In ‘UseMethod("print")’ what is
function (x, ...)
mean????
UseMethod("print")
<bytecode: 0x000000000b849258>
<environment: namespace:base>
• “print” is an example of generic function
• generic functions do not itself do anything, generic function invokes or chooses
another function(method) to do certain action
• Generic function looks at the class of argument passed to it (in above case ‘x’) and
then dispatches to the method of that corresponding class
Example:
> print(x) #suppose x is data frame
is equivalent to or gets converted to
> print.data.frame(x) #for factor class print.factor(x) and similarly for others
• If class of object is not available as separate method then default method gets
dispatched #print.default(x) for above case
How to make a class?
> x <- 1:6
> class(x) <- “myclass” or attr(x, “class”) <- “myclass”
> attributes(x)
$class
[1] “myclass”
Defining Methods for the assigned class
> print.myclass <- function(x, ...) {cat(x, sep = ", ")}
#In this case print() function is overloaded (polymorphism)
> print(x)
1, 2, 3, 4, 5, 6
Now we have a method for printing objects that belong to “myclass”
How to define your own generic function?
> MyGenFunction <- function(x, …) UseMethod(“MyGenFunction”)

Now a number of methods for this generic function can be defined

> MyGenFunction.default <- function(x, …) { head(x)}

It is always a good practice to define a “default” method for newly


defined generic function

> MyGenFunction.myclass <- function(x, …) {cat(sort(x, deacreasing=TRUE), sep = “ > ”)}


> MyGenFunction(x)
6>5>4>3>2>1
• For checking the methods associated with generic function use
methods() function
> methods(print)
[1] print.acf*
[2] print.anova*
.
.
.
[181] print.xngettext*
[182] print.xtabs*
Non-visible functions are asterisked
• For checking the methods associated with particular class
> methods( class = “myclass” )
[1] print MyGenFunction
Inheritance in S3
• Just for understanding
> class(x) <- c(“myclass”, “numeric”)
> mean(x) # mean function inherited from numeric class
[1] 3.5
• First R will try to find mean.myclass() method, if not found then will
look for mean.numeric()
• That means no need to write mean function again for new class of
objects (i.e. “myclass”)
• While assigning multiple class to an object order of the classes is very
important because the order also selects dispatching priority
> class(x) <- c(“numeric”, “myclass”)
#In this case first mean.numeric() method will be searched and
dispatched
Constructor Function
> # a constructor function for the "student" class
> student <- function(n,a,g) {
# we can add our own integrity checks
if(g>4 || g<0) stop("GPA must be between 0 and 4")
value <- list(name = n, age = a, GPA = g)
# class can be set using class() or attr() function
attr(value, "class") <- "student"
value
}
> s <- student("Paul", 26, 3.7)
> str(s)
List of 3
$ name: chr "dg"
$ age : num 26 $
GPA : num 2.5 –
attr(*, "class")= chr "student“

> s <- student("Paul", 26, 5)


Error in student("Paul", 26, 5) : GPA must be between 0 and 4
> # these integrity check only work while creating the object using constructor
> s <- student("Paul", 26, 2.5)
> s$GPA <- 2.5
S4
• Unlike S3 classes and objects which lacks formal definition, S4 class which is stricter in the
sense that it has a formal definition and a uniform way to create objects.
• S4 class is defined using the setClass() function. In R terminology, member variables are called
slots. While defining a class, we need to set the name and the slots (along with class of the
slot) it is going to have.
Example:
> myClass <- setClass("myClass", slots= ....,validity = …., contains = ....)
> student <- setClass("student", slots=list(name="character", age="numeric", GPA="numeric"))
• S4 objects are created using the new() function.
> s <- new("student",name="John", age=21, GPA=3.5)
>s
An object of class "student"
Slot "name":
[1] "John"
Slot "age":
[1] 21
Slot "GPA":
[1] 3.5
> isS4(s) # if object is an s4 object then it returns TRUE as output
[1] TRUE
• The function setClass() returns a generator function. This generator function
(usually having same name as the class) can be used to create new objects. It acts
as a constructor.
> student <- setClass("student", slots=list(name="character", age="numeric", GPA="numeric"))
> s <- student(name="John", age=21, GPA=3.5)
>s
An object of class "student"
Slot "name":
[1] "John"
Slot "age":
[1] 21
Slot "GPA":
[1] 3.5
> student
class generator function for class “student” from package ‘.GlobalEnv’
function (...)
new("student", ...)
# constructor in turn uses the new() function to create objects. It is just a wrap around.
• Now we know how to define S4 class and how to create object for S4 class
How to access and modify slot?
• slots of an S4 object are accessed using “@”
> s@name
[1] "John"
> s@GPA
[1] 3.5
> s@age
[1] 21
• A slot can be modified directly through reassignment.
> s@GPA <- 3.7
>s
An object of class "student"
Slot "name":
[1] "John"
Slot "age":
[1] 21
Slot "GPA":
[1] 3.7
• > slotNames(s) or slotNames(“student”) #returns the names of slots in the form of character vector
[1] “name” “age” “GPA”
• Similarly, slots can be access or modified using the slot() function.
> slot(s,"name") #Accessing slot named as “name”
[1] "John"
> slot(s,"name") <- "Paul" #Modifying slot
>s
An object of class "student"
Slot "name":
[1] "Paul"
Slot "age":
[1] 21
Slot "GPA":
[1] 3.7
• > getSlots(“student”) #returns slots of class as named vector along with their data type
name age GPA
“character” “numeric” “numeric”
• To set the default values for slots “prototype” argument is used within the setClass() function
> student <- setClass("student",
slots=list(name="character", age="numeric", GPA="numeric"),
# Set the default values for the slots. (optional)
prototype = list(name = "No name", age = 0, GPA = 0)
)
> student() An object of class "student"
Slot "name":
[1] “No name"
Slot "age":
[1] 0
Slot "GPA":
[1] 0
• To check whether the data passed while creating object is consistent
or not, validity argument is used within the class definition
> student <- setClass("student",
slots=list(name="character", age="numeric", GPA="numeric"),
# Set the default values for the slots. (optional)
prototype = list(name = "Enter name", age = 1, GPA =3 ),
validity=function(object)
{
if( object@age<=0 || object@GPA < 2 || object@GPA > 10 )
{
return("Data is out of bound.")
}
return(TRUE)
}
)
> student(name="paul",age=25,GPA=1)
Error in validObject(.Object) :
invalid class “student” object: Data is out of bound.
Defining methods for S4 class
• We can write our own method using setMethod() function.
Example:
> setMethod(f = "show",
signature="student",
definition = function(object)
{
cat(object@name, "\n")
cat(object@age, "years old\n")
cat("GPA:", object@GPA, "\n")
}
)
#Here method for generic function “show()” is defined
• show() function is the S4 analogy of the S3 print() function.
> show(student(name=“xyz",age=22,GPA=5))
xyz
22 years old
GPA: 5
Defining new generic functions for S4 class
• First the name to the generic function is defined using setGeneric() function
• Then methods for that function can be defined for different classes
> setGeneric(name="reveal",
def=function(object, …)
{
standardGeneric("reveal")
}
)
> setMethod(f = "reveal",
signature="student",
definition = function(object, reg_no)
{
cat("Name of the student is ", object@name, " \n Age of the student is " , object@age ,
"\n GPA of the student is " , object@GPA, "\n Registration Number is", reg_no)
}
)
Inheritance
• “contains” argument of setClass() function can be used to specify a
vector of class names from which user want to inherit
contains = c("Pclass1", "Pclass2") #within the definition of class

> class_name <- setClass(


"class_name",
slots=list( ),
# Set the default values for the slots. (optional)
prototype = list( ),
validity=function(object) { },
contains = "parent_class")
)

You might also like