ML Playground / Gradient Boosting View Notebook

Gradient Boosting

A sequential ensemble method that builds models one after another, each correcting the errors of the previous one using gradient descent.

What is Gradient Boosting?

Gradient Boosting is an ensemble technique where weak models (typically small decision trees) are trained sequentially. Each new tree focuses on correcting the residual errors of the combined predictions so far. The method uses gradient descent to minimize a loss function.

Unlike Random Forest (bagging) where trees are independent, Gradient Boosting trains trees sequentially. Each tree learns from the mistakes of all previous trees combined.

How Boosting Works

Algorithm Steps
  1. Start with a simple prediction (e.g., the mean for regression, or a log-odds for classification)
  2. Calculate residuals — the errors between actual and predicted values
  3. Train a new weak tree to predict these residuals
  4. Add the new tree's predictions (scaled by learning rate) to the ensemble
  5. Repeat for N iterations, each time reducing the remaining error
For regression: Prediction = initial_prediction + lr * tree_1(x) + lr * tree_2(x) + ... + lr * tree_n(x) For classification: Each tree corrects the classification errors of the previous ensemble. Final prediction is a weighted combination of all weak classifiers.

Code: Spam Detection with Gradient Boosting

import pandas as pd import nltk import string from nltk.corpus import stopwords from nltk.tokenize import word_tokenize from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.model_selection import train_test_split from sklearn.ensemble import GradientBoostingClassifier from sklearn.metrics import accuracy_score, classification_report # Load SMS spam dataset dataset_url = "https://raw.githubusercontent.com/justmarkham/pycon-2016-tutorial/master/data/sms.tsv" df = pd.read_csv(dataset_url, sep='\t', header=None, names=['label', 'message']) # Encode labels: ham=0, spam=1 df['label'] = df['label'].map({'ham': 0, 'spam': 1}) # Text preprocessing nltk.download('punkt', quiet=True) nltk.download('stopwords', quiet=True) nltk.download('punkt_tab', quiet=True) def preprocess_text(text): text = text.lower() text = text.translate(str.maketrans('', '', string.punctuation)) words = word_tokenize(text) words = [w for w in words if w not in stopwords.words('english')] return ' '.join(words) df['cleaned_message'] = df['message'].apply(preprocess_text) # TF-IDF feature extraction vectorizer = TfidfVectorizer() X = vectorizer.fit_transform(df['cleaned_message']) y = df['label'] # Train-test split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Train Gradient Boosting Classifier gbc = GradientBoostingClassifier( n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42 ) gbc.fit(X_train, y_train) # Evaluate y_pred = gbc.predict(X_test) print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}") print(classification_report(y_test, y_pred)) # Predict custom message custom = "Free entry in 2 a wkly comp to win FA Cup final tkts" cleaned = preprocess_text(custom) vector = vectorizer.transform([cleaned]) result = "spam" if gbc.predict(vector)[0] == 1 else "ham" print(f"Prediction: {result}")

Key Parameters

ParameterDescription
n_estimatorsNumber of boosting rounds (trees). More = better but slower, risk of overfitting
learning_rateShrinks contribution of each tree. Lower = needs more trees but generalizes better
max_depthDepth of each tree. Typically 3-5 for boosting (shallow trees are "weak learners")
subsampleFraction of samples used per tree. <1.0 adds randomness (stochastic gradient boosting)

When to Use Gradient Boosting

Good ForNot Ideal For
Structured/tabular dataImage or sequence data
High prediction accuracyWhen training speed is critical
Both classification and regressionVery small datasets
Kaggle competitions (top performer)When interpretability is required

The learning_rate and n_estimators are coupled: a smaller learning rate needs more estimators. A common strategy is to set a small learning_rate (0.01-0.1) and use early stopping to find the right number of trees.

Ensemble Boosting Classification Regression Supervised