Support Vector Machine (SVM)
A powerful algorithm that finds the optimal hyperplane to separate data into classes with maximum margin.
What is SVM?
SVM is a supervised learning algorithm that finds the hyperplane (decision boundary) that best separates data points into different classes. It maximizes the margin between the closest points of each class (called support vectors) and the boundary.
SVM works by finding the widest possible "street" between two classes. The edges of the street are defined by the support vectors -- the data points closest to the boundary.
Key Concepts
- Hyperplane — The decision boundary that separates classes
- Support Vectors — Data points closest to the hyperplane that define its position
- Margin — The distance between the hyperplane and nearest support vectors (SVM maximizes this)
- Kernel Trick — Transforms non-linearly separable data into higher dimensions where a linear boundary works
Kernel Types
Linear
For linearly separable data. Fastest. Use when data can be separated by a straight line/plane.
RBF (Radial Basis Function)
Default kernel. Maps data to infinite-dimensional space. Works well for most non-linear problems.
Polynomial
Maps data using polynomial features. Good for data with polynomial relationships.
Sigmoid
Similar to a neural network activation. Rarely used in practice.
SVR (Support Vector Regression)
For regression tasks, SVR fits a hyperplane within an epsilon-insensitive tube around the data:
- Points inside the tube incur no penalty (treated as correct)
- Points outside the tube incur a loss
- Only support vectors (points on or outside the tube) define the model
Code: SVR for Rent Prediction
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVR
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
# Dataset
data = {
"Size_sqft": [500, 700, 900, 1100, 1500, 1800, 2100],
"Bedrooms": [1, 1, 2, 2, 3, 3, 4],
"Rent": [12, 15, 20, 25, 30, 35, 50] # Rent in thousands
}
df = pd.DataFrame(data)
# Features and target
X = df[["Size_sqft", "Bedrooms"]]
y = df["Rent"]
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Feature scaling (critical for SVM)
scaler_X = StandardScaler()
scaler_y = StandardScaler()
X_train_scaled = scaler_X.fit_transform(X_train)
X_test_scaled = scaler_X.transform(X_test)
y_train_scaled = scaler_y.fit_transform(y_train.values.reshape(-1, 1)).flatten()
# Train SVR with RBF kernel
svr = SVR(kernel="rbf", C=100, epsilon=0.01)
svr.fit(X_train_scaled, y_train_scaled)
# Predictions (inverse transform to original scale)
y_pred_scaled = svr.predict(X_test_scaled)
y_pred = scaler_y.inverse_transform(y_pred_scaled.reshape(-1, 1)).flatten()
# Evaluate
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"MAE: {mae:.2f}")
print(f"MSE: {mse:.2f}")
print(f"R2 Score: {r2:.2f}")
SVM is very sensitive to feature scales. Always use StandardScaler before training. Also scale the target variable for SVR.
Key Parameters
| Parameter | Description |
| C | Regularization. Higher C = less tolerance for misclassification (risk of overfitting) |
| kernel | "linear", "rbf", "poly", "sigmoid". Default is "rbf" |
| epsilon (SVR) | Width of the insensitive tube. Larger = more tolerance for errors |
| gamma | Controls influence of a single training example. "scale" or "auto" are common defaults |
When to Use SVM
| Good For | Not Ideal For |
| High-dimensional data (text, genomics) | Very large datasets (slow training) |
| Clear margin of separation | Noisy data with overlapping classes |
| Binary and multi-class classification | When you need probability estimates (use SVC with probability=True) |
| Small to medium datasets | When interpretability is important |
Classification Regression Supervised Kernel Margin