Logistic Regression

A classification algorithm that predicts the probability of an event using the sigmoid function.

What is Logistic Regression?

Despite the name, Logistic Regression is a classification algorithm, not regression. It predicts the probability of an outcome (between 0 and 1) using the sigmoid function, then classifies based on a threshold (typically 0.5).

Why Not Use Linear Regression for Classification?

The Sigmoid Function

How It Works

Algorithm Steps

Compute linear combination z = B0 + B1*x1 + B2*x2 + ...
Apply sigmoid function to get probability p = 1/(1+e^-z)
Apply threshold — if p >= 0.5, predict class 1; otherwise class 0
Optimize coefficients using gradient descent to minimize log loss

Code: Predicting House Purchase

import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score, classification_report # Dataset data = { "Age": [22, 25, 28, 30, 32, 35, 40, 45, 50, 55], "Income": [13, 4, 5, 12, 27, 11, 30, 52, 15, 20], "Buys_House": [1, 0, 0, 0, 1, 0, 1, 1, 0, 1] } df = pd.DataFrame(data) # Features and target X = df[["Age", "Income"]] y = df["Buys_House"] # Train-test split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Feature scaling (important for Logistic Regression) scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) # Train the model model = LogisticRegression() model.fit(X_train_scaled, y_train) # Predict y_pred = model.predict(X_test_scaled) # Evaluate accuracy = accuracy_score(y_test, y_pred) report = classification_report(y_test, y_pred) print(f"Model Accuracy: {accuracy * 100:.2f}%") print("Classification Report:\n", report) # Predict for a new person new_data = scaler.transform([[29, 55]]) prediction = model.predict(new_data) if prediction[0] == 1: print("Person is likely to buy a house.") else: print("Person is unlikely to buy a house.")

Good For	Not Ideal For
Binary classification (yes/no, spam/not spam)	Non-linear decision boundaries
When you need probability estimates	Multi-class problems (use softmax variant)
Linearly separable data	Complex patterns requiring deep learning
Fast training, interpretable model	Image or sequence data

What is Logistic Regression?

Why Not Use Linear Regression for Classification?

The Sigmoid Function

How It Works

Code: Predicting House Purchase

When to Use Logistic Regression