Feature Selection
Pick the most important features and drop the noise. Fewer features = faster training, less overfitting, better interpretability.
Why Feature Selection?
Not all features contribute equally to predictions. Some are noisy, redundant, or irrelevant. Feature selection identifies and keeps only the useful ones, improving model performance and reducing complexity.
With 100 features, your model might overfit noise. With the right 15 features, it could perform better and train 10x faster.
Three Categories
Filter Methods
Score each feature independently using statistical tests. Fast but ignores feature interactions. Examples: correlation, chi-squared, mutual information.
Wrapper Methods
Train models with different feature subsets and pick the best. Accurate but slow. Examples: RFE, forward/backward selection.
Embedded Methods
Feature selection is built into the model training. Best of both worlds. Examples: Lasso (L1), tree-based feature importance.
Method 1: Correlation Analysis
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# Remove highly correlated features (redundant)
df = pd.DataFrame(np.random.randn(100, 5), columns=['A', 'B', 'C', 'D', 'E'])
df['B'] = df['A'] * 0.95 + np.random.randn(100) * 0.1 # B ≈ A (highly correlated)
corr_matrix = df.corr().abs()
# Find pairs with correlation > 0.9
upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(bool))
to_drop = [col for col in upper.columns if any(upper[col] > 0.9)]
print(f"Drop: {to_drop}") # ['B'] - redundant with A
Method 2: SelectKBest (Filter)
from sklearn.feature_selection import SelectKBest, f_classif, mutual_info_classif, chi2
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
# Select top 2 features using F-statistic (ANOVA)
selector = SelectKBest(score_func=f_classif, k=2)
X_selected = selector.fit_transform(X, y)
print(f"Original: {X.shape}") # (150, 4)
print(f"Selected: {X_selected.shape}") # (150, 2)
print(f"Scores: {selector.scores_}")
print(f"Selected features: {selector.get_support()}") # [True, False, True, True] etc.
Method 3: Recursive Feature Elimination (RFE)
from sklearn.feature_selection import RFE
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100, random_state=42)
# Select top 2 features by recursively removing the least important
rfe = RFE(estimator=model, n_features_to_select=2, step=1)
rfe.fit(X, y)
print(f"Selected: {rfe.support_}") # [False, False, True, True]
print(f"Ranking: {rfe.ranking_}") # [3, 2, 1, 1] (1 = selected)
print(f"Feature names: {np.array(load_iris().feature_names)[rfe.support_]}")
Method 4: Tree-Based Importance (Embedded)
from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X, y)
importances = model.feature_importances_
feature_names = load_iris().feature_names
# Sort by importance
idx = np.argsort(importances)[::-1]
for i in idx:
print(f" {feature_names[i]:20s}: {importances[i]:.4f}")
# Threshold-based selection
from sklearn.feature_selection import SelectFromModel
selector = SelectFromModel(model, threshold="mean") # Keep above-average importance
X_selected = selector.fit_transform(X, y)
print(f"Selected {X_selected.shape[1]} features out of {X.shape[1]}")
Method 5: L1 Regularization (Lasso)
from sklearn.linear_model import LassoCV
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Lasso drives unimportant feature weights to exactly 0
lasso = LassoCV(cv=5).fit(X_scaled, y)
print(f"Coefficients: {lasso.coef_}")
# Non-zero coefficients = selected features
selected = np.where(lasso.coef_ != 0)[0]
print(f"Selected feature indices: {selected}")
Comparison
| Method | Speed | Considers Interactions | Best For |
| Correlation | Very Fast | Pairwise only | Removing redundant features |
| SelectKBest | Fast | No | Quick baseline, high-dimensional data |
| RFE | Slow | Yes | When accuracy matters more than speed |
| Tree Importance | Medium | Yes | General purpose, interpretable |
| Lasso (L1) | Medium | Partially | Linear models, automatic selection |
Always do feature selection inside cross-validation, not before. Otherwise you leak information from the test set into the selection process.