Logistic Regression
A classification algorithm that predicts the probability of an event using the sigmoid function.
What is Logistic Regression?
Despite the name, Logistic Regression is a classification algorithm, not regression. It predicts the probability of an outcome (between 0 and 1) using the sigmoid function, then classifies based on a threshold (typically 0.5).
Linear Regression predicts continuous values. Logistic Regression predicts probabilities for classification. The sigmoid function is what makes the difference.
Why Not Use Linear Regression for Classification?
- Linear Regression outputs can go below 0 or above 1 (not valid probabilities)
- Logistic Regression squashes output to [0, 1] using the sigmoid function
- The decision boundary is smooth and probabilistic
The Sigmoid Function
sigmoid(z) = 1 / (1 + e^(-z))
Where z = B0 + B1*x1 + B2*x2 + ... + Bn*xn
Output is always between 0 and 1:
- If sigmoid(z) >= 0.5 --> classify as 1 (positive class)
- If sigmoid(z) < 0.5 --> classify as 0 (negative class)
How It Works
Algorithm Steps
- Compute linear combination z = B0 + B1*x1 + B2*x2 + ...
- Apply sigmoid function to get probability p = 1/(1+e^-z)
- Apply threshold — if p >= 0.5, predict class 1; otherwise class 0
- Optimize coefficients using gradient descent to minimize log loss
Code: Predicting House Purchase
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
# Dataset
data = {
"Age": [22, 25, 28, 30, 32, 35, 40, 45, 50, 55],
"Income": [13, 4, 5, 12, 27, 11, 30, 52, 15, 20],
"Buys_House": [1, 0, 0, 0, 1, 0, 1, 1, 0, 1]
}
df = pd.DataFrame(data)
# Features and target
X = df[["Age", "Income"]]
y = df["Buys_House"]
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Feature scaling (important for Logistic Regression)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Train the model
model = LogisticRegression()
model.fit(X_train_scaled, y_train)
# Predict
y_pred = model.predict(X_test_scaled)
# Evaluate
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
print(f"Model Accuracy: {accuracy * 100:.2f}%")
print("Classification Report:\n", report)
# Predict for a new person
new_data = scaler.transform([[29, 55]])
prediction = model.predict(new_data)
if prediction[0] == 1:
print("Person is likely to buy a house.")
else:
print("Person is unlikely to buy a house.")
When to Use Logistic Regression
| Good For | Not Ideal For |
| Binary classification (yes/no, spam/not spam) | Non-linear decision boundaries |
| When you need probability estimates | Multi-class problems (use softmax variant) |
| Linearly separable data | Complex patterns requiring deep learning |
| Fast training, interpretable model | Image or sequence data |
Always scale features before Logistic Regression. Unscaled features can cause slow convergence and biased coefficients.
Classification Supervised Sigmoid Binary