Support Vector Machine (SVM)

A powerful algorithm that finds the optimal hyperplane to separate data into classes with maximum margin.

What is SVM?

SVM is a supervised learning algorithm that finds the hyperplane (decision boundary) that best separates data points into different classes. It maximizes the margin between the closest points of each class (called support vectors) and the boundary.

SVM works by finding the widest possible "street" between two classes. The edges of the street are defined by the support vectors -- the data points closest to the boundary.

Key Concepts

Hyperplane — The decision boundary that separates classes
Support Vectors — Data points closest to the hyperplane that define its position
Margin — The distance between the hyperplane and nearest support vectors (SVM maximizes this)
Kernel Trick — Transforms non-linearly separable data into higher dimensions where a linear boundary works

Kernel Types

Linear

For linearly separable data. Fastest. Use when data can be separated by a straight line/plane.

RBF (Radial Basis Function)

Default kernel. Maps data to infinite-dimensional space. Works well for most non-linear problems.

Polynomial

Maps data using polynomial features. Good for data with polynomial relationships.

Sigmoid

Similar to a neural network activation. Rarely used in practice.

SVR (Support Vector Regression)

For regression tasks, SVR fits a hyperplane within an epsilon-insensitive tube around the data:

Points inside the tube incur no penalty (treated as correct)
Points outside the tube incur a loss
Only support vectors (points on or outside the tube) define the model

Code: SVR for Rent Prediction

import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.svm import SVR from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score # Dataset data = { "Size_sqft": [500, 700, 900, 1100, 1500, 1800, 2100], "Bedrooms": [1, 1, 2, 2, 3, 3, 4], "Rent": [12, 15, 20, 25, 30, 35, 50] # Rent in thousands } df = pd.DataFrame(data) # Features and target X = df[["Size_sqft", "Bedrooms"]] y = df["Rent"] # Train-test split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Feature scaling (critical for SVM) scaler_X = StandardScaler() scaler_y = StandardScaler() X_train_scaled = scaler_X.fit_transform(X_train) X_test_scaled = scaler_X.transform(X_test) y_train_scaled = scaler_y.fit_transform(y_train.values.reshape(-1, 1)).flatten() # Train SVR with RBF kernel svr = SVR(kernel="rbf", C=100, epsilon=0.01) svr.fit(X_train_scaled, y_train_scaled) # Predictions (inverse transform to original scale) y_pred_scaled = svr.predict(X_test_scaled) y_pred = scaler_y.inverse_transform(y_pred_scaled.reshape(-1, 1)).flatten() # Evaluate mae = mean_absolute_error(y_test, y_pred) mse = mean_squared_error(y_test, y_pred) r2 = r2_score(y_test, y_pred) print(f"MAE: {mae:.2f}") print(f"MSE: {mse:.2f}") print(f"R2 Score: {r2:.2f}")

SVM is very sensitive to feature scales. Always use StandardScaler before training. Also scale the target variable for SVR.

Key Parameters

Parameter	Description
C	Regularization. Higher C = less tolerance for misclassification (risk of overfitting)
kernel	"linear", "rbf", "poly", "sigmoid". Default is "rbf"
epsilon (SVR)	Width of the insensitive tube. Larger = more tolerance for errors
gamma	Controls influence of a single training example. "scale" or "auto" are common defaults

When to Use SVM

Good For	Not Ideal For
High-dimensional data (text, genomics)	Very large datasets (slow training)
Clear margin of separation	Noisy data with overlapping classes
Binary and multi-class classification	When you need probability estimates (use SVC with probability=True)
Small to medium datasets	When interpretability is important

Classification Regression Supervised Kernel Margin