5.2 Support Vector Machine (SVM)

The Support Vector Machine was an algorithm invented by Hava Siegelmann and Vladimir Vapnik. The objective of this algorithm is to fit a line to divide a group of point/features in two or more classes.

Figure 1 show the possibilities the line to segregate the two types of points.

Figure 1 - Possible lines to divide the points in two groups.

All the three lines could be a solution, but the SMV algorithm aims to maximaze the distance to the nearest point, which is the so-called margin.

Have in mind, the first objective of SMV is to classify correctly, later maximize the margin.

Additional notebooks

For a better understanding about SVM, read the bookdown of Machine Learning offered by Stanford and taught by Andrew Ng.

This course is focused in application and do not talk much about the theoretical aspects.

5.2.1 Scikit Learn

The Scikit Learn also has methods to calculate the SVM.

5.2.1.1 `SVC()`

This function creates an object classifier.

clf = SVC(kernel = 'linear')

Remember, the kernel used is linar, which means the SVM will plot a straight line, and also could be lon-linear.

5.2.1.2 Non linear SVM

The non-linear SVM occurs when you adopt a non-linear kernel, as you can see in Figure 2.

Figure 2 - Example of Non-linear SVM.

There are several non-linear kernels:

Poly;
rbf (gives the curvy answer);
Sigmoid, etc.

The uses of a polynomial gives the output of Figure 3.

Figure 3 - Interpretation of \(z = x^2 + y^2\).

5.2.1.3 Parameters

This is characteristics that you define when you creating the classifier, so before you fitting the data. There are three:

Kernel;
Gamma, and;
C.

Figure 4 shows two examples of output with different kernel.

Figure 4 - Comparison between linear (on the left side) and rbf (on the right side).

For this reason, the kernel could change completely the boundary.

clf = sm.SVC(C = 1, gamma = 1, kernel = "rbf")

5.2.1.4 C

Controls trafeoff between smooth decision boundary and classifying training points correctly.

A low C makes the decision surface smooth, while a high C aims at classifying all training examples correctly.

5.2.1.5 Gamma

Defines how far the influence of a single training example reaches.

gamma defines how much influence a single training example has. The larger gamma is, the closer other examples must be to be affected.

5.2.1.6 Overfitting

Avoide the overfitting, in SVM there are three parameters which could lead overfitting.

Kernel;
C
gamma

5.2.1.7 `.fit()`

Similarly to the .fit() from GaussianNB.

# Fitting a model.
clf.fit(X_train,Y_train)

Where:

X_train: is the “coordinates” of the points to train the data;
Y_train: the answer of each pair of coordinates.

5.2.1.8 `.pred()`

Based on the clf it is possible to predict the values of a test dataframe.

# After fitting the clf you can use to predict.
clf.pred(X_test)

A work by AH Uyekita

anderson.uyekita[at]gmail.com