SVM Example in R
SVM Example in R
SVM Example in R
Let's first generate some data in 2 dimensions, and make them a little separated. After setting
random seed, you make a matrix x, normally distributed with 20 observations in 2 classes on 2
variables. Then you make a y variable, which is going to be either -1 or 1, with 10 in each class.
For y = 1, you move the means from 0 to 1 in each of the coordinates. Finally, you can plot the
data and color code the points according to their response. The plotting character 19 gives you
nice big visible dots coded blue or red according to whether the response is 1 or -1.
set.seed(10111)
x <- matrix(rnorm(40), 20, 2)
y <- rep(c(1, -1), c(10, 10))
x[y == 1,] = x[y ==1,] + 1
plot(x, col = y + 3, pch = 19)
Code Explanation:
Now load the package e1071 which contains the svm function.
library(e1071)
Code explanation:
Now you make a dataframe of the data, turning y into a factor variable. After that, you make a
call to svm on this dataframe, using y as the response variable and other variables as the
predictors. The dataframe will have unpacked the matrix x into 2 columns named x1 and x2.
You tell SVM that the kernel is linear, the tune-in parameter cost is 10, and scale equals false. In
this example, you ask it not to standardize the variables.
Call:
svm(formula = y ~ ., data = dat, kernal = "Linear", cost = 10,
scale = FALSE)
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 10
Code explanation:
- This code is written in R and performs support vector machine (SVM) classification on a
dataset.
- This first line creates a data frame called dat with two columns x and y.
- The x columns is assumed to already exist in the workspace, while the y column is
created by converting an existing variable y into a factor using as.factor() function.
- The second line fits an SVM model to the data using svm() function.
- The formula y ~ . specifies that the response variable is y and all other columns in the
data frame should be used as predictors.
- The kernel argument specifies that a linear kernel should be used, while the cost
argument sets the cost parameter to 10.
- The scale argument is set to FALSE, which means that the data will not be scaled
before fitting the model.
- The third line prints the SVM model to the console.
Printing the svmfit gives its summary. You can see that the number of support vectors is 6 -
they are the points that are close to the boundary or on the wrong side of the boundary.
There's a plot function for SVM that shows the decision boundary, as you can see below. It
doesn't seem there's much control over the colors. It breaks with convention since it puts x2 on
the horizontal axis and x1 on the vertical axis.
plot(svmfit, dat)
Code explanation:
plot(x, col = y + 1)
Code explanation:
Code explanation: