Gradient Boosting

A sequential ensemble method that builds models one after another, each correcting the errors of the previous one using gradient descent.

What is Gradient Boosting?

Gradient Boosting is an ensemble technique where weak models (typically small decision trees) are trained sequentially. Each new tree focuses on correcting the residual errors of the combined predictions so far. The method uses gradient descent to minimize a loss function.

How Boosting Works

Algorithm Steps

Start with a simple prediction (e.g., the mean for regression, or a log-odds for classification)
Calculate residuals — the errors between actual and predicted values
Train a new weak tree to predict these residuals
Add the new tree's predictions (scaled by learning rate) to the ensemble
Repeat for N iterations, each time reducing the remaining error

Code: Spam Detection with Gradient Boosting

import pandas as pd import nltk import string from nltk.corpus import stopwords from nltk.tokenize import word_tokenize from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.model_selection import train_test_split from sklearn.ensemble import GradientBoostingClassifier from sklearn.metrics import accuracy_score, classification_report # Load SMS spam dataset dataset_url = "https://raw.githubusercontent.com/justmarkham/pycon-2016-tutorial/master/data/sms.tsv" df = pd.read_csv(dataset_url, sep='\t', header=None, names=['label', 'message']) # Encode labels: ham=0, spam=1 df['label'] = df['label'].map({'ham': 0, 'spam': 1}) # Text preprocessing nltk.download('punkt', quiet=True) nltk.download('stopwords', quiet=True) nltk.download('punkt_tab', quiet=True) def preprocess_text(text): text = text.lower() text = text.translate(str.maketrans('', '', string.punctuation)) words = word_tokenize(text) words = [w for w in words if w not in stopwords.words('english')] return ' '.join(words) df['cleaned_message'] = df['message'].apply(preprocess_text) # TF-IDF feature extraction vectorizer = TfidfVectorizer() X = vectorizer.fit_transform(df['cleaned_message']) y = df['label'] # Train-test split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Train Gradient Boosting Classifier gbc = GradientBoostingClassifier( n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42 ) gbc.fit(X_train, y_train) # Evaluate y_pred = gbc.predict(X_test) print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}") print(classification_report(y_test, y_pred)) # Predict custom message custom = "Free entry in 2 a wkly comp to win FA Cup final tkts" cleaned = preprocess_text(custom) vector = vectorizer.transform([cleaned]) result = "spam" if gbc.predict(vector)[0] == 1 else "ham" print(f"Prediction: {result}")

Parameter	Description
n_estimators	Number of boosting rounds (trees). More = better but slower, risk of overfitting
learning_rate	Shrinks contribution of each tree. Lower = needs more trees but generalizes better
max_depth	Depth of each tree. Typically 3-5 for boosting (shallow trees are "weak learners")
subsample	Fraction of samples used per tree. <1.0 adds randomness (stochastic gradient boosting)

Good For	Not Ideal For
Structured/tabular data	Image or sequence data
High prediction accuracy	When training speed is critical
Both classification and regression	Very small datasets
Kaggle competitions (top performer)	When interpretability is required

What is Gradient Boosting?

How Boosting Works

Code: Spam Detection with Gradient Boosting

Key Parameters

When to Use Gradient Boosting