Rbasics
Rbasics
Rbasics
What is R?
•R is a scripting language for statistical data manipulation, analysis, graphics
representation and reporting
•R was created by Ross Ihaka and Robert Gentleman at the University of Auckland,
New Zealand, and is currently developed by the R Development Core Team.
•R is freely available and offers great graphical facilities for data analysis and
visualization
•R documentation and manual are available at
http://cran.r-project.org/manuals.html
• R is case sensitive.
• Complex
• Double
Numeric and Integer
• > a <- 6.5 # assigning a value to variable “a”
> a # print value of “a”
[1] 6.5
> class(a) # print class of “a”
[1] “numeric”
• > a <- 5
> class(a)
[1] “numeric”
> is.integer(a)
[1] FALSE
• Class of a is numeric but not integer
• > a <- 5L # to specify that the number is integer “L” can be added
> class(a)
[1] “integer”
> is.integer(a)
[1] TRUE
> is.numeric(a)
[1] TRUE
Logical
• TRUE, FALSE and NA are logical
• > class(TRUE)
[1] “logical”
• TRUE and FALSE can be abbreviated to T and F respectively
• A logical is often created via comparison between two variable
> a <- 7 < 9
>a
[1] TRUE
> a <- 7 > 9
>a
[1] FALSE
> is.logical(a)
[1] TRUE
• NA stands for “Not Available” and is created when there is missing data
Character
• Used to represent string values
> x <- “This is a character string”
> x
[1] “This is a character string”
> class(x)
[1] “character”
• Concatenation of two or more strings using “paste()” function
> conc_string <- paste(“1st string ”, “2nd string ”, “and 3rd string ”)
> conc_string
[1] “1st string 2nd string and 3rd string”
• For extracting a substring “substr()” function can be used
> substr(string_name, start, stop)
> substr(conc_string, 3, 14) or substr(conc_string, start=3, stop=14)
[1] “t string 2n”
Coercion of data type
• In R one type of variable can be converted to another type of variable if possible
• Example
> as.numeric(TRUE)
[1] 1
> as.character(5)
[1] “5”
> as.numeric(“6.5”)
[1] 6.5
> as.integer(6.5)
[1] 6
> as.numeric(“abc”)
[1] [NA]
Warning message:
NAs introduced by coercion
Data Structures
• Vectors
• Matrix
• List
• Factors
• Data Frames
Vectors
• A vector is a sequence of data elements of the same basic data type.
• length( ) function can be used for determining the numbers of members or element of a vector
> length(x)
[1] 4
Combining Vectors
• Vectors can be combined by simply using “c( )” function
> x <- c(1,2,3)
> y <- c(“four”, “five”, “six”)
> c(x, y)
[1] “1” “2” “3” “four” “five” “six”
Atomic Vectors
• R does not provide separate data structure to hold a single element or variable
• If a variable is defined it is stored as a vector of length 1 and can be called as atomic vectors
> x <- 5
>is.vector(x)
[1] TRUE
Vector Arithmetic
• Arithmetic operations of vectors are performed member-by-member Example,
> x <- c(1,2,5,8)
>2*x
[1] 2 4 10 16
> y <- c(2,5,6,3)
>x+y
[1] 3 7 11 11
>x*y
[1] 2 10 30 24
> x^2
1 4 25 64
• Similarly for subtraction division, we get new vectors via memberwise operations.
Vector Arithmetic
• Recycling rule or recycling of members
> x <- c(1,2,2,6)
> y <- c(2,4)
>x+y
[1] 3 6 4 10
• If length of logical vector is not equal to the original vector then recycling of the members of logical
vector happens
> y[ c(TRUE, FALSE) ]
135
Matrix
• Vector: 1D array of data elements of same type
• Matrix: 2D array of data elements of same type
• “matrix()” function can be used to build a matrix
matrix( data = x, nrow = i , ncol = j, byrow = FALSE, dimnames = NULL )
> mat1 <- matrix(1:6, nrow = 2)
> mat1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> mat1 <- matrix(1:6, ncol = 3)
> mat1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> mat1 <- matrix( 1:6, nrow = 2, byrow = TRUE )
> mat1
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
Matrix
• > mat2 <- matrix(1:6, nrow = 3, ncol = 4)
> mat2
[,1] [,2] [,3] [,4]
[1,] 1 4 1 4
[2,] 2 5 2 5
[3,] 3 6 3 6
• Naming the rows and columns of matrix
> rownames(mat2) <- c(“row1”, “row2”, “row3”)
> colnames(mat2) <- c(“col1”, “col2”, “col3”, “col4”)
> mat2
col1 col2 col3 col4
row1 1 4 1 4
row2 2 5 2 5
row3 3 6 3 6
OR
> mat2 <- matrix(1:6, nrow = 3, ncol = 4, dimnames = list( c(“row1”, “row2”, “row3”), c(“col1”, “col2”,
“col3”, “col4”) ) )
OR
> dimnames(mat2) = list( c(“row1”, “row2”, “row3”), c(“col1”, “col2”, “col3”, “col4”) )
Matrix (rbind & cbind)
• cbind() function can be used for adding new column to a matrix
> mat1 <- matrix(1:6, ncol = 3)
> mat1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
> cbind(mat1, c(7,8))
[,1] [,2] [,3] [,4]
[1,] 1 3 5 7
[2,] 2 4 6 8
• rbind() function can be used for adding new row to a matrix
> rbind(mat1, c(1,2))
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[3,] 1 2 1 (here recycling of values happens)
Matrix (rbind & cbind)
• While adding new row or column to a matrix using rbind() or cbind() function, if number of elements
passed are lesser then in that case recycling of elements happens
• If the number of elements passed are greater than required then in that case R fills value up to which they
are required and ignore remaining
• rbind() and cbind() can also be used to create a new matrix out of scratch
> rbind(c(1:5), c(6:10), c( 11:15))
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
> cbind(c(1:3), c(3:5))
[,1] [,2]
[1,] 1 3
[2,] 2 4
[3,] 3 5
• Attaching two matrices using rbind() or cbind() functions. For attaching matrices using rbind() the number
of columns of both the matrices must be equal and for attaching using cbind() number of rows must be
equal.
• > A <- matrix(1:6, ncol = 3)
> B <- matrix(7:12, ncol = 3)
> rbind(A,B)
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[3,] 7 9 11
[4,] 8 10 12
> cbind(A,B)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 3 5 7 9 11
[2,] 2 4 6 8 10 12
> cbind(A, rbind(A,B))
Error in cbind(A, rbind(A, B)) :
number of rows of matrices must match (see arg 2)
• If a row or column added to the matrix have different data type then coercion happens
> rbind(A, c(TRUE, FALSE, FALSE))
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
[3,] 1 0 0
Subset Matrix
• > mat1 <- matrix( 1:15, nrow = 3, byrow = TRUE )
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
• Suppose we want to subset “8” which is at 2nd row and 1st column?
> mat1[2,3]
[1] 8
• For selecting entire row
> mat1[2, ]
[1] 6 7 8 9 10
• For selecting entire column
> mat1[ ,3]
[1] 3 8 13
• If “comma” is not included in code R will count in column wise fashion up to the given index
> mat1[10]
[1] 4
Subset Multiple elements
• > A <- matrix(LETTERS[1:16], ncol = 4, byrow = TRUE)
> A # LETTERS & letters are vectors containing alphabets in uppercase and lowercase respectively, and present by default in R
[,1] [,2] [,3] [,4]
[1,] "A" "B" "C " "D"
[2,] "E" "F" "G" "H"
[3,] "I" "J" "K" "L"
[4,] "M" "N" "O" "P"
> A[ 2, c(2,4) ]
[1] "F" "H“
> A[ c(1,4), 4]
[1] "D" "P“
> A[ c(2,4), c(1,3,4) ] #This will create a sub matrix having elements from row 2 & 4, and column 1, 3 & 4
[,1] [,2] [,3]
[1,] "E" "G" "H"
[2,] "M" "O" "P“
> A[ c(1,2), c(3,4) ]
[,1] [,2]
[1,] "C" "D"
[2,] "G" "H"
Subset by Names and Logicals
• > A <- matrix(LETTERS[1:16], ncol = 4, byrow = TRUE)
> rownames(A) <- c(“r1”, “r2”, “r3”, “r4”)
> colnames(A) <- c(“c1”, “c2”, “c3”, “c4”)
> mat2
c1 c2 c3 c4
r1 "A" "B" "C " "D"
r2 "E" "F" "G" "H"
r3 "I" "J" "K" "L"
r4 "M" "N" "O" "P“
> A["r1", "c2"]
[1] “B”
> A[2, "c3"] # Combination of index and name can aslo be used
[1] “G”
• Using Logicals
> A[c(TRUE, FALSE), c(TRUE, FALSE, FALSE,TRUE)] #Here recycling of elements of row vector will take place
c1 c4
r1 "A" "D"
r3 "I" "L"
Matrix Arithmetic
• Very similar to vectors
• Elementwise operation happens
> A <- matrix(1:12 , nrow = 3, byrow = TRUE)
>A
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
> A * 10
[,1] [,2] [,3] [,4] # Similarly all other operations happen in elementwise fashion
[1,] 10 20 30 40
[2,] 50 60 70 80
[3,] 90 100 110 120
>A*A
[,1] [,2] [,3] [,4]
[1,] 1 4 9 16 # For multiplication or any other operation to work between two matrix
[2,] 25 36 49 64 there dimensions must be same
[3,] 81 100 121 144
Matrix Arithmetic
• > A – c(1,5) # Here recycling of elements of vector happens (column wise) till a equivalent matrix is generated
[,1] [,2] [,3] [,4]
[1,] 0 -3 2 -1
[2,] 0 5 2 7
[3,] 8 5 10 7
• colSums(), rowSums() functions are for calculating sums of each row and each column of the
matrix respectively
• For algebraic matrix multiplication “ %*% ” operator is used and works only when number of
columns of first matrix equals number of rows of second matrix
• For obtaining transpose of a matrix t() function is used
• For obtaining the inverse of a matrix solve() function is used
Factors
• This data structure is used for handling categorical variables ( e.g. Blood group, gender etc. )
• They are useful for those which have a limited number of unique values.
• Suppose blood group types of 15 people is stored in a vector
> blood.group <- c("A", "AB", "A", "O", "A", "B", "B", "O", "AB", "A", "B", "O", "AB", "O", "O")
> blood.group
[1] "A" "AB" "A" "O" "A" "B" "B" "O" "AB" "A" "B" "O" "AB" "O" "O“
• For converting above vector into factor factor() function is used
> B.G.factor <- factor(blood.group)
> B.G.factor
[1] A AB A O A B B O AB A B O AB O O
Levels: A AB B O #These are the diferent unique categories. By default levels are are sorted alphabetically
> str(B.G.factor) # str() function is used for viewing the structure of any data structure
Factor w/ 4 levels "A","AB","B","O": 1 2 1 4 1 3 3 4 2 1 . . .
• Each Level is stored as an integer, because it requires much less space (repeating large strings per
observation can take up a lot of space).
• Basically factors are integer vectors with factor levels associated with them.
Factors
• Setting order of levels manually
> B.G.factor2 <- factor(blood.group, levels = c("O","A","B","AB") )
> B.G.factor2
[1] A AB A O A B B O AB A B O AB O O
Levels: O A B AB
> str(B.G.factor2)
Factor w/ 4 levels "O","A","B","AB": 2 4 2 1 2 3 3 1 4 2 . . .
> str(B.G.factor)
Factor w/ 4 levels "A","AB","B","O": 1 2 1 4 1 3 3 4 2 1 . . .
• Renaming factor levels
> levels(B.G.factor) <- c("BT_A", "BT_AB", "BT_B", "BT_O")
OR
> B.G.factor <- factor( blood.group, labels = c("BT_A", "BT_AB", "BT_B", "BT_O") )
> B.G.factor
[1] BT_A BT_AB BT_A BT_O BT_A BT_B BT_B BT_O BT_AB BT_A BT_B BT_O BT_AB BT_O BT_O
Levels: BT_A BT_AB BT_B BT_O
# Above two methods for renaming the levels have a limitation that the labels specified must follow the same order as that
of factor levels i.e. A, AB, B, O
> B.G.factor <- factor(blood.group, levels = c("O","A","B","AB"), labels = c("BT_O", "BT_A", "BT_B","BT_AB") )
Factors
• Comparison operator do not work on simple factors
> B.G.factor[1] < B.G.factor[2] # comparison will not work because these are nominal variable
[1] NA Warning message:
In Ops.factor(B.G.factor[1], B.G.factor[2]) :
‘<’ not meaningful for factors
• Ordinal variable in factors
> tshirt <- c("L", "M", "M", "S","L","M")
> tshirt_size <- factor(tshirt, ordered = TRUE, levels = c("S","M","L") ) #levels are specified in ascending order
> tshirt_size
[1] L M M S L M
Levels: S < M < L
> str(tshirt_size)
Ord.factor w/ 3 levels "S"<"M"<"L": 3 2 2 1 3 2
> tshirt_size[1] > tshirt_size[2]
[1] TRUE
List
•A list is a vector containing other R objects. List can store practically anything
i.e. numeric, character, vector, matrix, factor etc.
•A list can contain even a number of other lists within it
•No coercion
•Loss of functionalities as compared to vectors and matrices e.g. performing calculus on list is a tedious job
Example:
> c(pdb_id = “5FAC”, Protein = “Alanine Racemase”, resolution= 2.8, seq_len = 410, uniprot_id = “O86786”) # Coercion will happen
[[2]]
[1] "Alanine Racemase"
[[3]]
2.8
[[4]]
[1] 410
[[5]]
[1] "O86786"
List
• Assigning labels
> protein <- list(pdb_id = “5FAC”, protein = “Alanine Racemase”, resolution = 2.8, seq_len = 410, uniprot_id = “O86786”)
OR
> names(protein) <- c("pdb_id", "protein", “resolution", "seq_len", “uniprot_id")
> protein
$pdb_id
[1] "5FAC"
$protein
[1] "Alanine Racemase"
$str_wt
2.8
$seq_len
[1] 410
$uniport_id
[1] "O86786“
> str(protein)
List of 5
$ pdb_id : chr "5FAC"
$ protein : chr "Alanine Racemase"
$ str_wt : num 2.8
$ seq_Len : num 410
$ uniprot_id : chr "O86786"
• List can store any type of object
> v <- c(1,2,3) #numeric vector
> v1 <- c(“a”, “b”, “c”) # character vector
> v2 <- c(TRUE, FALSE, TRUE, TRUE) #logical vector
> m1 <- matrix( c(9,5,6,7), nrow = 2, byrow = TRUE ) #matrix
> list1 <- list(v, v1, v2, m1) # list1 will have copies of v, v1, v2, m1
> str(list1)
List of 4
$ : num [1:3] 1 2 3
$ : chr [1:3] "a" "b" "c"
$ : logi [1:4] TRUE FALSE TRUE TRUE
$ : num [1:2, 1:2] 9 6 5 7
• A list can store other list
> protein <- list(pdb_id = "5FAG", protein = "Alanine Racemase", resolution = 1.51, seq_len = 410, uniprot_id = "O86786“,
prev_pro = protein)
> str(protein)
List of 6
$ pdb_id : chr "5FAG"
$ protein : chr "Alanine Racemase"
$ str_wt : num 1.51
$ seq_Len : num 410
$ Uniprot_id : chr "O86786“
$ prev_pro : List of 5
. . $ pdb_id : chr "5FAC"
. . $ protein : chr "Alanine Racemase"
. . $ str_wt : num 2.8
. . $ seq_Len : num 410
Subset List
• > protein <- list(pdb_id = "5FAG", protein = "Alanine Racemase", resolution = 1.51, seq_len = 410,
uniprot_id = "O86786“, prev_pro = protein)
# Subset the 1st element i.e. “pdb_id”
> protein[1] or protein["pdb_id"]
$pdb_id
[1] "5FAC" # output is a list of 1
> protein[[1]] or protein[["pdb_id"]]
[1] "5FAC" # output is a character string
# [ ] gives a sublist and [[ ]] gives single element
> protein[c(1, 3)] # output will be a sublist
$pdb_id
[1] "5FAC"
$resolution
[1] 1.51
> protein[[c(1, 3)]]
Error in protein[[c(1, 3)]] : subscript out of bounds
# Because separate elements have to be accessed separate
Subset List
• [[c(1,3]] is equivalent to [[1]] [[3]] which means from the 1st element of list select its 3rd
element, but for the 1st element that is a vector of length 1 (“FAG”) there is no 3rd element
and that’s why the error “Subscript out of bound”
• Selecting sub elements
> protein[[6]] [[1]] or protein[[c(6, 1)]] or protein[[“prev_pro”]] [[“pdb_id”]]
[1] "5FAC"
Miscellaneous operator
Operator Descripton Example
: colon operator is used for obtaining > c(5:15)
series of number in sequence [1] 5 6 7 8 9 10 11 12 13 14 15
%*% for algebraic multiplication of _
matrices
%in% for checking whether an element > 21 %in% c(5:15)
belongs to a vector or not [1] FALSE
Conditional statements
• if() statements
• if() else() statements
• If() else if() else() statements
• Very similar to C++ conditional statements
> if(condition){
expression
}
> x <- -5
> if(x<0){
print("x is a negative number")
}
[1] "x is a negative number"
Conditional statements
• > if(x>0){
print("x is a positive number")
} else {
print ("x is a negative number")
}
[1] "x is a negative number“
• > x <- 12
> if(x%%2 == 0) {
print ("x is divisible by 2")
} else if(x%%3 = = 0){ # true but will not be executed
print ("x is divisible by 3")
} else {
print ("x is neither divisible by 2 nor by 3")
}
[1] ("x is divisible by 2")
Loops
• For loop
> for(n in x) { expr }
Example
> z <- c(5,12,13)
> for (i in z) {
print(i^2)
}
[1] 25
[1] 144
[1] 169
# for looping over a series of sequence
> for (n in 1:10) { print(n) }
[1] 1
[1] 2
.
.
[1] 10
• C style looping using while and repeat loop
> i <- 1
> while(1) {
i <- i+4
if (i > 10) break
}
>i
[1] 13
# same results can be obtained via specifying condition ()
> i <- 1
> while(i<10) {
i <- i+4
}
>i
[1] 13
# break statement can also be used with for loop
# Another useful statement is next, which instructs the interpreter to go to the next
iteration of the loop.
• repeat loop
> repeat {
commands
if(condition) { break }
}
#A repeat loop is used to iterate over a block of code multiple number of
times. There is no condition check in repeat loop to exit the loop.
# So a condition with break statement must be specified within the loop
> x <- 1
> repeat {
print(x)
x = x+1
if (x == 6){ break }
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
Functions
• Basic syntax for declaring function
> func_name <- function (argument) {
statement
}
Example:
> pow <- function(x, y) { #x & y are arguments of function
result <- x^y
print(result)
}
> pow(5,3) #function is called
[1] 125
• Default values of arguments can also be set while declaring function, The use of
default value to an argument makes it optional when calling the function
> pow <- function(x, y=2) { #y argument set to 2 as default, which can be changed while calling the function
result <- x^y
print(result)
}
> pow(x=6)
[1] 36
> pow (x= 3,y=5)
[1] 243
Functions
• A particular return value can also be specified inside the declaration of a function
using return() function
Example 1 Example 2
> check <- function(x) { > check <- function(x) {
if (x > 0) { if (x > 0) {
result <- "Positive"
} else if (x < 0) {
return("Positive")
result <- "Negative" } else if (x < 0) {
} else { return("Negative")
result <- "Zero" } else {
}
return(result) return("Zero")
} }}
> check(x = -65) > check(2)
[1] "Negative" [1] "positive"
• If there are no explicit returns from a function, the value of the last evaluated
expression is returned automatically in R.
• We generally use explicit return() functions to return a value immediately from a
function. If it is not the last statement of the function, it will prematurely end the
function bringing the control to the place from which it was called.
#In example 2 once "positive" is returned no further execution of code takes
place
Functions
• In return() function we can specify a vector, matrix, data frame or even a list, But
return() function can take up a single object at a time
• Therefore If we want to return multiple values in R function, we can use a list (or
other objects) and return it.
Example
> multi_return <- function() {
my_list <- list("color" = "red", "size" = 20, "shape" = "round")
return(my_list) }
> a <- multi_return()
> a$color
[1] "red"
> a$size
[1] 20
> a$shape
[1] "round“
• lapply() function
It can be used for other objects like dataframes, lists or vectors; and the output
returned is a list with the same number of elements as that of object passed to it
• Lapply() if used on a dataframe then only column wise operation is possible
• lapply function applied on matrix x(as dataframe) gives a list of 6(1 for each
column)
> str(sapply(as.data.frame(x),var))
Named num [1:6] 0.759 1.86 0.89 1.62 0.289 ...
– attr(*, "names")= chr [1:6] "V1" "V2" "V3" "V4" ...
• Applying the lapply() function would give us a list, but when sapply is used a
vector is returned as in above case
Other important functions in R
• cat(x) # Prints the arguments after concatenating them
• identical() # Test if 2 objects are *exactly* equal
• rep(2,5) # Repeat the number 2 five times
• rev(x) # reverse the elements of x
• seq(1,10,0.4) # Generate a sequence (1 -> 10, spaced by 0.4)
• floor(x), ceiling(x), round(x), signif(x) # rounding functions
• unique(x) # Remove duplicate entries from vector
• getwd() # Return working directory
• setwd() # Set working directory
# Built-in constants:
• pi,letters,LETTERS # Pi, lower & uppercase letters, e.g. letters[7] = "g"
• month.abb, month.name # Abbreviated & full names for months
Other important functions in R
• range(x) # Returns the minimum and maximum of x
• mean(x) # Returns mean of x
• var(x) # Returns variance of x
• sd(x) # Returns standard deviation of x
• median(x) # Returns median of x
• weighted.mean() # Returns weighted mean of x
• min(x), max(x), quantile(x)
• cor(x,y) # Gives correlation coefficient between x and y
• rnorm(n, mean = 0, sd = 1) # gives a random deviates for specified mean and sd
• lm() # Fit liner regression model
• sample(x, size, replace = FALSE, prob = NULL) # for random or weighted sampling
Object oriented programming(OOP)
• Object oriented programming (OOP) is a programming structure where programs are
organized around objects as opposed to action and logic.
• OOP helps programmers to develop in a defined style instead of ‘getting stuff done’
• Everything in OOP is grouped as self sustainable “objects”
• In OOP programmers define not only the data type of a data structure, but also the
types of operation/methods(functions) that can be applied to the data structure
• In this way data structure becomes an object that includes both data and
functions(methods) in one unit. In OOP, computer programs are designed by making
them out of objects that interact with one another.
• A key aspect of object-oriented programming is the use of classes. A class is a
blueprint of an object.
• Think of a class as a concept, and the object as the embodiment of that concept.(or
we can say object is an instance of class)
What is Method ?
• A method in object-oriented programming is like a procedure or
function in procedural programming.
• The key difference here is that the method is part of an object. In
object-oriented programming, code is organized by creating objects,
and then give those objects properties and make them do certain
things.
What is CLASS ?
• A class is a collection of variables and methods for objects that have
common properties, operations and behaviors.
• A class is a combination of state (data) and behavior (methods).
• In object-oriented languages, a class is a data type, and objects are
instances of that data type. In other words, classes are prototypes
from which objects are created.
• Once a class is defined, any number of objects can be created which
belong to that class and each object created of particular class will
have independent memory allocation
What is Object ?
• In R everything is an object
• Each object belong to a particular class(e.g. numeric, factor, list, dataframe etc.)
• Objects are the basic run-time entities in an object-oriented system. They are
instances of a class and units of abstraction.
• Programming problem is analyzed in terms of objects and nature of
communication between them. When a program is executed, objects interact
with each other by sending messages.
• Different objects can also interact with each other without knowing the details
of their data or code.
• An object passes a message to another object, which results in the invocation
of a method. Objects then perform the actions that are required to get a
response from the system.
• Encapsulation is when a group of related methods, properties, and
other members are treated as a single object.
• Inheritance is the ability to receive (“inherit”) methods and
properties from an existing class.
• Polymorphism is when each class implements the same methods in
varying ways, but you can still have several classes that can be utilized
interchangeably.
• Abstraction is the process by which a developer hides everything
other than the relevant data about an object in order to simplify and
increase efficiency.
Data Abstraction and Encapsulation
S3 S4 ReferenceClasses R6
S3
• Mainly focused on function overloading (polymorphism)
• Simple to use
• Important terms in S3 system are:
A class is an attribute of object that dictates what messages the
object can receive and return
A method is a function that is designed for a specific class
Dispatch is selection of class specific method
• S3 operates around classes and methods
> print
In ‘UseMethod("print")’ what is
function (x, ...)
mean????
UseMethod("print")
<bytecode: 0x000000000b849258>
<environment: namespace:base>
• “print” is an example of generic function
• generic functions do not itself do anything, generic function invokes or chooses
another function(method) to do certain action
• Generic function looks at the class of argument passed to it (in above case ‘x’) and
then dispatches to the method of that corresponding class
Example:
> print(x) #suppose x is data frame
is equivalent to or gets converted to
> print.data.frame(x) #for factor class print.factor(x) and similarly for others
• If class of object is not available as separate method then default method gets
dispatched #print.default(x) for above case
How to make a class?
> x <- 1:6
> class(x) <- “myclass” or attr(x, “class”) <- “myclass”
> attributes(x)
$class
[1] “myclass”
Defining Methods for the assigned class
> print.myclass <- function(x, ...) {cat(x, sep = ", ")}
#In this case print() function is overloaded (polymorphism)
> print(x)
1, 2, 3, 4, 5, 6
Now we have a method for printing objects that belong to “myclass”
How to define your own generic function?
> MyGenFunction <- function(x, …) UseMethod(“MyGenFunction”)