Understanding Support Vector Machine(SVM) algorithm from examples (along with code)

Sunil 15 Apr, 2020 • 9 min read

Note: This article was originally published on Oct 6th, 2015 and updated on Sept 13th, 2017

Overview

  • Explanation of support vector machine (SVM), a popular machine learning algorithm or classification
  • Implementation of SVM in R and Python
  • Learn about the pros and cons of Support Vector Machines(SVM) and its different applications

 

Introduction

Mastering machine learning algorithms isn’t a myth at all. Most of the beginners start by learning regression. It is simple to learn and use, but does that solve our purpose? Of course not! Because you can do so much more than just Regression!

Think of machine learning algorithms as an armoury packed with axes, sword, blades, bow, dagger, etc. You have various tools, but you ought to learn to use them at the right time. As an analogy, think of ‘Regression’ as a sword capable of slicing and dicing data efficiently, but incapable of dealing with highly complex data. On the contrary, ‘Support Vector Machines’ is like a sharp knife – it works on smaller datasets, but on the complex ones, it can be much stronger and powerful in building machine learning models.

By now, I hope you’ve now mastered Random ForestNaive Bayes Algorithm and Ensemble Modeling. If not, I’d suggest you take out a few minutes and read about them as well. In this article, I shall guide you through the basics to advanced knowledge of a crucial machine learning algorithm, support vector machines.

You can learn about Support Vector Machines in course format here (it’s free!):

If you’re a beginner looking to start your data science journey, you’ve come to the right place! Check out the below comprehensive courses, curated by industry experts, that we have created just for you:

support vector machines, svm

Understanding Support Vector Machine algorithm from examples (along with code)

 

Table of Contents

  1. What is Support Vector Machine?
  2. How does it work?
  3. How to implement SVM in Python and R?
  4. How to tune Parameters of SVM?
  5. Pros and Cons associated with SVM

 

What is Support Vector Machine?

“Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges. However,  it is mostly used in classification problems. In the SVM algorithm, we plot each data item as a point in n-dimensional space (where n is number of features you have) with the value of each feature being the value of a particular coordinate. Then, we perform classification by finding the hyper-plane that differentiates the two classes very well (look at the below snapshot).

SVM_1

Support Vectors are simply the co-ordinates of individual observation. The SVM classifier is a frontier which best segregates the two classes (hyper-plane/ line).

You can look at support vector machines and a few examples of its working here.

 

How does it work?

Above, we got accustomed to the process of segregating the two classes with a hyper-plane. Now the burning question is “How can we identify the right hyper-plane?”. Don’t worry, it’s not as hard as you think!

Let’s understand:

  • Identify the right hyper-plane (Scenario-1): Here, we have three hyper-planes (A, B and C). Now, identify the right hyper-plane to classify star and circle.
    SVM_2You need to remember a thumb rule to identify the right hyper-plane: “Select the hyper-plane which segregates the two classes better”. In this scenario, hyper-plane “B” has excellently performed this job.
  • Identify the right hyper-plane (Scenario-2): Here, we have three hyper-planes (A, B and C) and all are segregating the classes well. Now, How can we identify the right hyper-plane?

    SVM_3Here, maximizing the distances between nearest data point (either class) and hyper-plane will help us to decide the right hyper-plane. This distance is called as Margin. Let’s look at the below snapshot:SVM_4
    Above, you can see that the margin for hyper-plane C is high as compared to both A and B. Hence, we name the right hyper-plane as C. Another lightning reason for selecting the hyper-plane with higher margin is robustness. If we select a hyper-plane having low margin then there is high chance of miss-classification.

  • Identify the right hyper-plane (Scenario-3):Hint: Use the rules as discussed in previous section to identify the right hyper-plane

SVM_5Some of you may have selected the hyper-plane B as it has higher margin compared to A. But, here is the catch, SVM selects the hyper-plane which classifies the classes accurately prior to maximizing margin. Here, hyper-plane B has a classification error and A has classified all correctly. Therefore, the right hyper-plane is A.

  • Can we classify two classes (Scenario-4)?: Below, I am unable to segregate the two classes using a straight line, as one of the stars lies in the territory of other(circle) class as an outlier. 
    SVM_6
    As I have already mentioned, one star at other end is like an outlier for star class. The SVM algorithm has a feature to ignore outliers and find the hyper-plane that has the maximum margin. Hence, we can say, SVM classification is robust to outliers.
    SVM_7
  • Find the hyper-plane to segregate to classes (Scenario-5): In the scenario below, we can’t have linear hyper-plane between the two classes, so how does SVM classify these two classes? Till now, we have only looked at the linear hyper-plane.
    SVM_8
    SVM can solve this problem. Easily! It solves this problem by introducing additional feature. Here, we will add a new feature z=x^2+y^2. Now, let’s plot the data points on axis x and z:
    SVM_9
    In above plot, points to consider are:

    • All values for z would be positive always because z is the squared sum of both x and y
    • In the original plot, red circles appear close to the origin of x and y axes, leading to lower value of z and star relatively away from the origin result to higher value of z.

    In the SVM classifier, it is easy to have a linear hyper-plane between these two classes. But, another burning question which arises is, should we need to add this feature manually to have a hyper-plane. No, the SVM  algorithm has a technique called the kernel trick. The SVM kernel is a function that takes low dimensional input space and transforms it to a higher dimensional space i.e. it converts not separable problem to separable problem. It is mostly useful in non-linear separation problem. Simply put, it does some extremely complex data transformations, then finds out the process to separate the data based on the labels or outputs you’ve defined.

    When we look at the hyper-plane in original input space it looks like a circle:
    SVM_10

Now, let’s look at the methods to apply SVM classifier algorithm in a data science challenge.

How to implement SVM in Python and R?

In Python, scikit-learn is a widely used library for implementing machine learning algorithms. SVM is also available in the scikit-learn library and we follow the same structure for using it(Import library, object creation, fitting model and prediction).

Now, let us have a look at a real-life problem statement and dataset to understand how to apply SVM for classification

Problem Statement

Dream Housing Finance company deals in all home loans. They have a presence across all urban, semi-urban and rural areas. A customer first applies for a home loan, after that the company validates the customer’s eligibility for a loan.

Company wants to automate the loan eligibility process (real-time) based on customer details provided while filling an online application form. These details are Gender, Marital Status, Education, Number of Dependents, Income, Loan Amount, Credit History and others. To automate this process, they have given a problem to identify the customers’ segments, those are eligible for loan amount so that they can specifically target these customers. Here they have provided a partial data set.

Use the coding window below to predict the loan eligibility on the test set. Try changing the hyperparameters for the linear SVM to improve the accuracy.

 

Support Vector Machine(SVM) code in R

The e1071 package in R is used to create Support Vector Machines with ease. It has helper functions as well as code for the Naive Bayes Classifier. The creation of a support vector machine in R and Python follow similar approaches, let’s take a look now at the following code:

#Import Library
require(e1071) #Contains the SVM 
Train <- read.csv(file.choose())
Test <- read.csv(file.choose())
# there are various options associated with SVM training; like changing kernel, gamma and C value.

# create model
model <- svm(Target~Predictor1+Predictor2+Predictor3,data=Train,kernel='linear',gamma=0.2,cost=100)

#Predict Output
preds <- predict(model,Test)
table(preds)

 

How to tune Parameters of SVM?

Tuning the parameters’ values for machine learning algorithms effectively improves model performance. Let’s look at the list of parameters available with SVM.

sklearn.svm.SVC(C=1.0, kernel='rbf', degree=3, gamma=0.0, coef0=0.0, shrinking=True, probability=False,tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, random_state=None)

I am going to discuss about some important parameters having higher impact on model performance, “kernel”, “gamma” and “C”.

kernel: We have already discussed about it. Here, we have various options available with kernel like, “linear”, “rbf”,”poly” and others (default value is “rbf”).  Here “rbf” and “poly” are useful for non-linear hyper-plane. Let’s look at the example, where we’ve used linear kernel on two feature of iris data set to classify their class.

Support Vector Machine(SVM) code in Python

Example: Have a linear SVM kernel

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features. We could
 # avoid this ugly slicing by using a two-dim dataset
y = iris.target
# we create an instance of SVM and fit out data. We do not scale our
# data since we want to plot the support vectors
C = 1.0 # SVM regularization parameter
svc = svm.SVC(kernel='linear', C=1,gamma=0).fit(X, y)
# create a mesh to plot in
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
h = (x_max / x_min)/100
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
 np.arange(y_min, y_max, h))
plt.subplot(1, 1, 1)
Z = svc.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
plt.title('SVC with linear kernel')
plt.show()SVM_11

Example: Use SVM rbf kernel

Change the kernel type to rbf in below line and look at the impact.

svc = svm.SVC(kernel='rbf', C=1,gamma=0).fit(X, y)

SVM_12

I would suggest you go for linear SVM kernel if you have a large number of features (>1000) because it is more likely that the data is linearly separable in high dimensional space. Also, you can use RBF but do not forget to cross-validate for its parameters to avoid over-fitting.

gamma: Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’. Higher the value of gamma, will try to exact fit the as per training data set i.e. generalization error and cause over-fitting problem.

Example: Let’s difference if we have gamma different gamma values like 0, 10 or 100.

svc = svm.SVC(kernel='rbf', C=1,gamma=0).fit(X, y)

SVM_15

C: Penalty parameter C of the error term. It also controls the trade-off between smooth decision boundaries and classifying the training points correctly.

SVM_18

We should always look at the cross-validation score to have effective combination of these parameters and avoid over-fitting.

In R, SVMs can be tuned in a similar fashion as they are in Python. Mentioned below are the respective parameters for e1071 package:

  • The kernel parameter can be tuned to take “Linear”,”Poly”,”rbf” etc.
  • The gamma value can be tuned by setting the “Gamma” parameter.
  • The C value in Python is tuned by the “Cost” parameter in R.

 

Pros and Cons associated with SVM

  • Pros:
    • It works really well with a clear margin of separation
    • It is effective in high dimensional spaces.
    • It is effective in cases where the number of dimensions is greater than the number of samples.
    • It uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.
  • Cons:
    • It doesn’t perform well when we have large data set because the required training time is higher
    • It also doesn’t perform very well, when the data set has more noise i.e. target classes are overlapping
    • SVM doesn’t directly provide probability estimates, these are calculated using an expensive five-fold cross-validation. It is included in the related SVC method of Python scikit-learn library.

Practice Problem

Find right additional feature to have a hyper-plane for segregating the classes in below snapshot:

SVM_19

Answer the variable name in the comments section below. I’ll shall then reveal the answer.

 

End Notes

In this article, we looked at the machine learning algorithm, Support Vector Machine in detail.  I discussed its concept of working, process of implementation in python, the tricks to make the model efficient by tuning its parameters, Pros and Cons, and finally a problem to solve. I would suggest you to use SVM and analyse the power of this model by tuning the parameters. I also want to hear your experience with SVM, how have you tuned parameters to avoid over-fitting and reduce the training time?

Did you find this article helpful? Please share your opinions/thoughts in the comments section below.

If you like what you just read & want to continue your analytics learning, subscribe to our emailsfollow us on twitter or like our facebook page.

Sunil 15 Apr 2020

Frequently Asked Questions

Lorem ipsum dolor sit amet, consectetur adipiscing elit,

Responses From Readers

Clear

nishant
nishant 07 Oct, 2015

hi, gr8 articles..explaining the nuances of SVM...hope u can reproduce the same with R.....it would be gr8 help to all R junkies like me

ASHISH
ASHISH 07 Oct, 2015

NEW VARIABLE (Z) = SQRT(X) + SQRT (Y)

Mahmood A. Sheikh
Mahmood A. Sheikh 07 Oct, 2015

Kernel

Sanjay
Sanjay 07 Oct, 2015

Nicely Explained . The hyperplane to separate the classes for the above problem can be imagined as 3-D Parabola. z=ax^2 + by^2 + c

FrankSauvage
FrankSauvage 12 Oct, 2015

Thanks a lot for this great hands-on article!

Harsha
Harsha 08 Nov, 2015

Really impressive content. Simple and effective. It could be more efficient if you can describe each of the parameters and practical application where you faced non-trivial problem examples.

Aman Srivastava
Aman Srivastava 26 Nov, 2015

kernel

Ephraim Admassus
Ephraim Admassus 14 Feb, 2016

How does the python code look like if we are using LSSVM instead of SVM?

Janpreet Singh
Janpreet Singh 04 Mar, 2016

Polynomial kernel function?! for exzmple : Z= A(x^2) + B(y^2) + Cx + Dy + E

Krishna Kalaparti
Krishna Kalaparti 18 Apr, 2016

Hi Sunil. Great Article. However, there's an issue in the code you've provided. When i compiled the code, i got the following error: Name error: name 'h' is not defined. I've faced this error at line 16, which is: "xx, yy = np.meshgrid(np.arange(x_min, x_min, h), ...). Could you look into it and let me know how to fix it?

Shikha
Shikha 28 May, 2016

great explanation :) I think new variable Z should be x^2 + y.

VEERAMANI NATARAJAN
VEERAMANI NATARAJAN 03 Jun, 2016

Nice Articlel

Carlos
Carlos 14 Jun, 2016

The solution is analogue to scenario-5 if you replace y by y-k

K.Krithiga Lakshmi
K.Krithiga Lakshmi 15 Jun, 2016

Your SVM explanation and kernel definition is very simple, and easy to understand. Kudos for that effort.

pfcohen
pfcohen 19 Jun, 2016

Most intuitive explanation of multidimensional svm I have seen. Thank you!

yc
yc 27 Jun, 2016

what is 'h' in the code of SVM . xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

Iresh
Iresh 14 Jul, 2016

z = (x^2 - y) z > 0, red circles

LI JENG HUANG
LI JENG HUANG 05 Aug, 2016

very neat explaination to SVC. For the proposed problem, my answers are: (1) z = a* x^2 + b y + c, a parabola. (2) z = a (x-0)^2 + b (y- y0)^2 - R^2, a circle or an ellipse enclosing red stars.

Hari
Hari 18 Aug, 2016

Great article.. I think the below formula would give a new variable that help to separate the points in hyper plane z = y - |x|

MADHVI
MADHVI 23 Aug, 2016

THANKS FOR EASY EXPLANATION

Raghu
Raghu 31 Aug, 2016

Useful article for Machine learners.. Why can't you discuss about effect of kernel functions.

Yamani
Yamani 22 Sep, 2016

The explanation is really impressive. Can you also provide some information about how to determine the theoretical limits for the parameter's optimal accuracy.

harshel jain
harshel jain 23 Sep, 2016

how can we use SVM for regression? can someone please explain..

Diana
Diana 04 Oct, 2016

That was a really good explanation! thanks a lot. I read many explanations about SVM but this one help me to understand the basics which I really needed it.

Manjunath GS
Manjunath GS 27 Oct, 2016

please give us the answer

Diptesh
Diptesh 28 Oct, 2016

This is very useful for understanding easily.

Dan
Dan 14 Nov, 2016

just substitude x with |x|

Min
Min 23 Nov, 2016

Same goes with Diana. This really help me a lot to figure out things from basic. I hope you would also share any computation example using R provided with simple dataset, so that anyone can practice with their own after referring to your article. I have a question, if i have time-series dataset containing mixed linear and nonlinear data, (for example oxygen saturation data ; SaO2), by using svm to do classification for diseased vs health subjects, do i have to separate those data into linear and non-linear fisrt, or can svm just performed the analysis without considering the differences between the linearity of those data? Thanks a lot!

anu
anu 29 Nov, 2016

z

Renny Varghese
Renny Varghese 03 Dec, 2016

Could you please explain how SVM works for multiple classes? How would it work for 9 classes? I used a function called multisvm here: http://www.mathworks.com/matlabcentral/fileexchange/39352-multi-class-svm but I'm not sure how it's working behind the scenes. Everything I've read online is rather confusing.

lubna
lubna 06 Dec, 2016

NEW VARIABLE (Z) = SQRT(X) + SQRT (Y)

Haftom A.
Haftom A. 07 Dec, 2016

Thank you so much!! That is really good explanation! I read many explanations about SVM but this one help me to understand the basics which I really needed it. keep it up!!

Frank
Frank 06 Jan, 2017

Thanks for the great article. There are even cool shirts for anyone who became SVM fan ;) http://www.redbubble.com/de/people/perceptron/works/24728522-support-vector-machines?grid_pos=2&p=t-shirt&style=mens

bilashi
bilashi 10 Jan, 2017

great explanation!! Thanks for posting it.

arun
arun 19 Jan, 2017

I think this is |X|

Priodyuti Pradhan
Priodyuti Pradhan 21 Jan, 2017

It is very nicely written and understandable. Thanks a lot...

Walter
Walter 06 Feb, 2017

z=ax^2 + by^2

madhavi
madhavi 21 Feb, 2017

nice explanations with scenarios and margin values

lishanth
lishanth 01 Mar, 2017

wow!!! excellent explanation.. only now i understood the concepts clearly thanks a lot..

anwar
anwar 01 Mar, 2017

(Z) = SQRT(X) + SQRT (Y)

Kresla Matty
Kresla Matty 20 Mar, 2017

thanks, and well done for the good article

Jonathan benitez
Jonathan benitez 16 Apr, 2017

it's magnific your explanation

Aishwarya Jangam
Aishwarya Jangam 20 Apr, 2017

Great Explanation..Thanks..

Hams
Hams 17 May, 2017

simple and refreshed the core concepts in just 5 mins! kudos Mr.Sunil

Shashi
Shashi 17 May, 2017

Best starters material for SVM, really appreciate the simple and comprehensive writing style. Expecting more such articles from you

Ravindar
Ravindar 20 May, 2017

Z= square(x)

Narasimha
Narasimha 25 May, 2017

Hey Sunil, Nice job of explaining it concisely and intuitively! Easy to follow and covers many aspects in a short space. Thanks!

John Doe
John Doe 30 May, 2017

Very well written - concise, clear, well-organized. Thank you.

Radhika
Radhika 14 Jun, 2017

Excellent explanation..Can you please also tell what are the parameter values one should start with - like C, gamma ..Also, again a very basic question.. Can we say that lesser the % of support vectors (count of SVs/total records) better my model/richer my data is- assuming the datasize to be the same.. Waiting for more on parameter tuning..Really appreciate the knowledge shared..

Kirana
Kirana 15 Jun, 2017

Hi could you please explain why SVM perform well on small dataset?

Chris
Chris 20 Jun, 2017

Another nice kernel for the problem stated in the article is the radial basis kernel.

实用指南-在python中使用Scikit-learn进行数据预处理 - 数据分析网
实用指南-在python中使用Scikit-learn进行数据预处理 - 数据分析网 22 Jun, 2017

[…] 资源:阅读这篇文章来理解SVM support vector machines。 […]

chiru
chiru 23 Jun, 2017

wow excellent

Zhen Zhang
Zhen Zhang 26 Jun, 2017

very appreciating for explaining

Andrey
Andrey 27 Jun, 2017

Nice tutorial. The new feature to separate data would be something like z = y - x^2 as most dots following the parabola will have lower z than stars.

BanavaD
BanavaD 04 Jul, 2017

Very intuitive explanation. Thank you! Good to add SVM for Regression of Continuous variables.

neha
neha 11 Jul, 2017

this is so simple method that anyone can get easily thnx for that but also explain the 4 senario of svm.

Nirav Pingle
Nirav Pingle 20 Jul, 2017

Great article for understanding of SVM: But, When and Why do we use the SVM algorithm can anyone make that help me understand because until this thing is clear there may not be use of this article. Thanks in advance.

Mostafa
Mostafa 02 Aug, 2017

It is one of best explanation of machine learning technique that i have seen! and new variable: i think Z=|x| and new Axis are Z and Y

venkat
venkat 03 Aug, 2017

higher degree polynomial will separate the points in the problem,

Tirthankar
Tirthankar 08 Aug, 2017

I guess the required feature is z = x^2 / y^2 For the red points, z will be close to 1 but for the blue points z values will be significantly more than 1

murtaza ali
murtaza ali 09 Aug, 2017

amazing article no doubt! It makes me clear all the concept and deep points regarding SVM. many thanks.

katherine
katherine 19 Aug, 2017

The best explanation ever! Thank you!

Rahul
Rahul 20 Aug, 2017

z = x^2+y^2

Applied text classification on Email Spam Filtering [part 1] – Sarah Mestiri
Applied text classification on Email Spam Filtering [part 1] – Sarah Mestiri 01 Sep, 2017

[…] [1] Naive Bayes and Text Classification. [2]Naive Bayes by Example. [3] Andrew Ng explanation of Naive Bayes video 1 and video 2 [4] Please explain SVM like I am 5 years old. [5] Understanding Support Vector Machines from examples. […]

roshan
roshan 07 Sep, 2017

new variable = ABS(Y)

Robert
Robert 13 Sep, 2017

Man, I was looking for definition of SVM for my diploma, but I got interested in explanation part of this article. Keep up good work!

Aman Goel
Aman Goel 15 Sep, 2017

we can use 'poly' kernel with degree=2

Nethra Kulkarni
Nethra Kulkarni 21 Sep, 2017

Hi.. Very well written, great article !:). Thanks so much share knowledge on SVM.

S Sen Sharma
S Sen Sharma 23 Sep, 2017

z=y-x^2

Dalon
Dalon 02 Oct, 2017

Wonderful, easy to understand explanation.

Eka A
Eka A 11 Oct, 2017

Thanks a lot for your explanations, they were really helpful and easy to understand

Kevin Mekulu
Kevin Mekulu 19 Oct, 2017

It would be a parabola z = a*x^2 + b*y^2 + c*x + d*y + e

Yadi
Yadi 25 Oct, 2017

Very good explanation, helpful

shefali
shefali 03 Nov, 2017

valuable explanation!!

vami
vami 15 Nov, 2017

Very helpfull

Shivam Misra
Shivam Misra 12 Jan, 2018

|X|

panimalar
panimalar 18 Jan, 2018

thank u sir ,it is easy to understand

John
John 09 Feb, 2018

z = x^2 + y

Pavan Kumar
Pavan Kumar 07 Mar, 2018

It may be z=x^2+y

Jose
Jose 10 Mar, 2018

y=x^2

anoop
anoop 21 Mar, 2018

z=ax^2 + by^2 + c

quandapro
quandapro 28 Mar, 2018

Nice. new variable is z = abs(x). Then replace x coordinates with z coordinates

Athul
Athul 31 Mar, 2018

z = |x|

Deyire Yusuf Umar
Deyire Yusuf Umar 02 May, 2018

PARABOLA

Jason
Jason 02 May, 2018

I think the boundaryf between two type of snapshot could be a curve (of a part of circle). So I prefer kernel Z=sqrt(X^2+(Y-c)^2)

ILA
ILA 08 May, 2018

Thanks a lot. I like how you define a problem and then solve it. It makes things clear.

Prachi
Prachi 26 May, 2018

z=x-y^2

Machine Learning
Become a full stack data scientist

Take a note

10 Nov 23 • 08:00pm