ML Playground / Cross-Validation

Cross-Validation

Properly estimate model performance by training and testing on different subsets of your data.

Why Cross-Validation?

A single train-test split can be misleading. If your test set happens to be "easy", you'll overestimate performance. Cross-validation uses multiple splits to get a reliable, unbiased estimate of how your model will perform on unseen data.

A single 80/20 split gives you ONE accuracy number. 5-fold cross-validation gives you FIVE, plus a mean and standard deviation — much more trustworthy.

Types of Cross-Validation

K-Fold Cross-Validation

Data split into K equal folds: Fold 1: [TEST] [Train] [Train] [Train] [Train] → Score 1 Fold 2: [Train] [TEST] [Train] [Train] [Train] → Score 2 Fold 3: [Train] [Train] [TEST] [Train] [Train] → Score 3 Fold 4: [Train] [Train] [Train] [TEST] [Train] → Score 4 Fold 5: [Train] [Train] [Train] [Train] [TEST] → Score 5 Final Score = Mean(Score 1..5) ± Std(Score 1..5)

Stratified K-Fold

Same as K-Fold but ensures each fold has the same class distribution as the full dataset. Essential for imbalanced datasets where one class is rare.

Leave-One-Out (LOO)

K = N (number of samples). Each sample is used as test set once. Very thorough but computationally expensive. Best for very small datasets.

Time Series Split

For time-ordered data where future data can't be used to predict the past. Training set grows with each fold:

Fold 1: [Train] [Test] [----] [----] [----] Fold 2: [Train] [Train] [Test] [----] [----] Fold 3: [Train] [Train] [Train] [Test] [----] Fold 4: [Train] [Train] [Train] [Train] [Test]

Code Implementation

from sklearn.model_selection import (cross_val_score, KFold, StratifiedKFold, LeaveOneOut, TimeSeriesSplit) from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import load_iris import numpy as np X, y = load_iris(return_X_y=True) model = RandomForestClassifier(n_estimators=100, random_state=42) # --- K-Fold (5 folds) --- kf = KFold(n_splits=5, shuffle=True, random_state=42) scores = cross_val_score(model, X, y, cv=kf, scoring='accuracy') print(f"K-Fold: {scores.mean():.4f} ± {scores.std():.4f}") print(f" Per fold: {scores}") # --- Stratified K-Fold (preserves class distribution) --- skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42) scores = cross_val_score(model, X, y, cv=skf, scoring='accuracy') print(f"Stratified: {scores.mean():.4f} ± {scores.std():.4f}") # --- Leave-One-Out --- loo = LeaveOneOut() scores = cross_val_score(model, X, y, cv=loo, scoring='accuracy') print(f"LOO: {scores.mean():.4f} (150 folds)") # --- Time Series Split --- tscv = TimeSeriesSplit(n_splits=5) scores = cross_val_score(model, X, y, cv=tscv, scoring='accuracy') print(f"TimeSeries: {scores.mean():.4f} ± {scores.std():.4f}") # --- Shorthand (just pass k) --- scores = cross_val_score(model, X, y, cv=10) # 10-fold print(f"10-Fold: {scores.mean():.4f} ± {scores.std():.4f}")

Which to Use?

MethodWhen to UseK Value
K-FoldGeneral purpose, balanced classes5 or 10
Stratified K-FoldImbalanced classes (always prefer this)5 or 10
Leave-One-OutVery small datasets (<50 samples)N
Repeated K-FoldNeed very stable estimates5×10 repeats
Time Series SplitTime-ordered data (stock prices, weather)5-10

Cross-Validation with Hyperparameter Tuning

from sklearn.model_selection import GridSearchCV param_grid = { 'n_estimators': [50, 100, 200], 'max_depth': [3, 5, 10, None] } # GridSearchCV uses cross-validation internally grid_search = GridSearchCV( RandomForestClassifier(random_state=42), param_grid, cv=StratifiedKFold(n_splits=5), scoring='accuracy', n_jobs=-1 ) grid_search.fit(X, y) print(f"Best params: {grid_search.best_params_}") print(f"Best CV score: {grid_search.best_score_:.4f}")

Never use cross-validation scores as final test scores. Always keep a completely separate holdout test set that is never used during training or hyperparameter tuning.