ML Playground / Logistic Regression View Notebook

Logistic Regression

A classification algorithm that predicts the probability of an event using the sigmoid function.

What is Logistic Regression?

Despite the name, Logistic Regression is a classification algorithm, not regression. It predicts the probability of an outcome (between 0 and 1) using the sigmoid function, then classifies based on a threshold (typically 0.5).

Linear Regression predicts continuous values. Logistic Regression predicts probabilities for classification. The sigmoid function is what makes the difference.

Why Not Use Linear Regression for Classification?

The Sigmoid Function

sigmoid(z) = 1 / (1 + e^(-z)) Where z = B0 + B1*x1 + B2*x2 + ... + Bn*xn Output is always between 0 and 1: - If sigmoid(z) >= 0.5 --> classify as 1 (positive class) - If sigmoid(z) < 0.5 --> classify as 0 (negative class)

How It Works

Algorithm Steps
  1. Compute linear combination z = B0 + B1*x1 + B2*x2 + ...
  2. Apply sigmoid function to get probability p = 1/(1+e^-z)
  3. Apply threshold — if p >= 0.5, predict class 1; otherwise class 0
  4. Optimize coefficients using gradient descent to minimize log loss

Code: Predicting House Purchase

import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score, classification_report # Dataset data = { "Age": [22, 25, 28, 30, 32, 35, 40, 45, 50, 55], "Income": [13, 4, 5, 12, 27, 11, 30, 52, 15, 20], "Buys_House": [1, 0, 0, 0, 1, 0, 1, 1, 0, 1] } df = pd.DataFrame(data) # Features and target X = df[["Age", "Income"]] y = df["Buys_House"] # Train-test split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Feature scaling (important for Logistic Regression) scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) # Train the model model = LogisticRegression() model.fit(X_train_scaled, y_train) # Predict y_pred = model.predict(X_test_scaled) # Evaluate accuracy = accuracy_score(y_test, y_pred) report = classification_report(y_test, y_pred) print(f"Model Accuracy: {accuracy * 100:.2f}%") print("Classification Report:\n", report) # Predict for a new person new_data = scaler.transform([[29, 55]]) prediction = model.predict(new_data) if prediction[0] == 1: print("Person is likely to buy a house.") else: print("Person is unlikely to buy a house.")

When to Use Logistic Regression

Good ForNot Ideal For
Binary classification (yes/no, spam/not spam)Non-linear decision boundaries
When you need probability estimatesMulti-class problems (use softmax variant)
Linearly separable dataComplex patterns requiring deep learning
Fast training, interpretable modelImage or sequence data

Always scale features before Logistic Regression. Unscaled features can cause slow convergence and biased coefficients.

Classification Supervised Sigmoid Binary