Linear Regression

Linear Regression models the relationship between a dependent variable (target) and one or more independent variables (features) by fitting a straight line to the data. It predicts a continuous numeric output.

The Equation

y = B0 + B1*x1 + B2*x2 + ... + Bn*xn + e Where: y = Target variable (what you predict) x1..xn = Input features B0 = Intercept (y value when all x = 0) B1..Bn = Coefficients (slope for each feature) e = Error term (actual - predicted)

Types of Linear Regression

Simple Linear Regression

One feature, one target. Equation: y = B0 + B1*x + e. Fits a line in 2D space.

Multiple Linear Regression

Multiple features, one target. Equation: y = B0 + B1*x1 + B2*x2 + ... + e. Fits a hyperplane.

How It Works

Algorithm Steps

Initialize coefficients (B0, B1, ...) to some starting values
Make predictions using the current equation
Calculate error — the difference between predicted and actual values
Adjust coefficients to minimize the error (using Ordinary Least Squares or Gradient Descent)
Repeat until the error stops decreasing significantly

Code: Predicting House Prices

import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error, r2_score # Create dataset data = { "House Size": [500, 800, 1000, 1500, 1800, 2000, 2500, 3000, 3500, 4000], "Price": [2000000, 3000000, 4000000, 6000000, 7200000, 8000000, 10000000, 12000000, 14000000, 16000000] } df = pd.DataFrame(data) # Split features and target X = df[["House Size"]] # feature / input Y = df["Price"] # target / need to predict # Train-test split x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42) # Train the model model = LinearRegression() model.fit(x_train, y_train) # Predictions y_train_pred = model.predict(x_train) y_test_pred = model.predict(x_test) # Evaluate print(f"Intercept (B0): {model.intercept_}") print(f"Coefficient (B1): {model.coef_[0]}") print(f"Train MSE: {mean_squared_error(y_train, y_train_pred)}") print(f"Test MSE: {mean_squared_error(y_test, y_test_pred)}") print(f"Test R2 Score: {r2_score(y_test, y_test_pred)}")

Evaluation Metrics

When to Use Linear Regression

Metric	What It Measures	Ideal Value
MSE (Mean Squared Error)	Average squared difference between actual and predicted	As close to 0 as possible
R-squared (R2)	How much variance the model explains	1.0 = perfect, 0 = no explanatory power

Good For	Not Ideal For
Linear relationships between features and target	Non-linear or complex patterns
Continuous numeric predictions	Classification tasks
Fast training, interpretable coefficients	Data with many outliers
Baseline model for any regression problem	High-dimensional sparse data

Linear Regression assumes a linear relationship. If your data has curves or complex patterns, consider polynomial regression, decision trees, or ensemble methods instead.

What is Linear Regression?