Isolation Forest
Detect anomalies by isolating outliers with random partitioning. Anomalies are "few and different" — they're easier to isolate.
Core Idea
Normal points are clustered together and need many splits to isolate. Anomalies are far from the cluster and can be isolated with very few splits. Isolation Forest builds random trees and measures how quickly each point gets isolated — shorter paths = more anomalous.
Traditional methods model what's "normal" and flag what doesn't fit. Isolation Forest does the opposite — it directly finds what's easy to isolate, which is much more efficient.
How It Works
Algorithm Steps
- Build trees: For each tree, randomly select a feature and a random split value within the feature's range
- Split recursively until each point is isolated (alone in its partition) or max depth is reached
- Measure path length: Count how many splits it took to isolate each point
- Average across trees: Points with short average path lengths are anomalies
- Compute anomaly score: Normalize path lengths to a score between 0 and 1
Anomaly Score:
s(x, n) = 2^(-E(h(x)) / c(n))
Where:
h(x) = path length of point x (average across trees)
c(n) = average path length in a binary tree with n samples
E(·) = expected value (mean across all trees)
Score ≈ 1 → Anomaly (short path, easy to isolate)
Score ≈ 0.5 → Normal (average path length)
Score ≈ 0 → Very normal (deep in the cluster)
Code Implementation
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest
# Generate data: normal cluster + outliers
np.random.seed(42)
X_normal = np.random.randn(300, 2) # Normal data
X_outliers = np.random.uniform(-4, 4, size=(20, 2)) # Outliers
X = np.vstack([X_normal, X_outliers])
# Train Isolation Forest
model = IsolationForest(
n_estimators=100, # Number of trees
contamination=0.06, # Expected fraction of outliers (~20/320)
random_state=42
)
predictions = model.fit_predict(X) # 1 = normal, -1 = anomaly
scores = model.decision_function(X) # Lower = more anomalous
# Results
n_anomalies = (predictions == -1).sum()
print(f"Detected {n_anomalies} anomalies out of {len(X)} points")
# Visualize
plt.figure(figsize=(10, 6))
plt.scatter(X[predictions == 1, 0], X[predictions == 1, 1],
c='blue', s=20, label='Normal')
plt.scatter(X[predictions == -1, 0], X[predictions == -1, 1],
c='red', s=50, marker='x', label='Anomaly')
plt.legend()
plt.title("Isolation Forest Anomaly Detection")
plt.show()
Key Parameters
| Parameter | Default | Description |
| n_estimators | 100 | Number of trees. More = more stable results |
| contamination | 'auto' | Expected proportion of outliers (0 to 0.5). Affects the threshold |
| max_samples | 'auto' (256) | Samples per tree. Smaller = faster, more randomness |
| max_features | 1.0 | Features per tree. Less = more diversity between trees |
Real-World Applications
- Fraud detection — Flag unusual credit card transactions
- Network intrusion — Detect unusual network traffic patterns
- Manufacturing — Identify defective products on assembly lines
- Healthcare — Flag abnormal patient readings
- Log analysis — Find unusual system events or errors
Isolation Forest vs Other Methods
| Method | Approach | Scalability |
| Isolation Forest | Isolation-based (how easy to separate) | Excellent (linear time) |
| One-Class SVM | Find boundary around normal data | Poor (quadratic) |
| LOF | Local density comparison | Moderate |
| DBSCAN | Density-based clustering | Good |
| Autoencoder | Reconstruction error | Good (needs GPU) |
Isolation Forest works best with low-dimensional data (< 20 features) and a clear distinction between normal and anomalous. For high-dimensional data, consider using PCA first.