ML Playground / Linear Regression View Notebook

Linear Regression

Predicting continuous values by fitting a straight line through data points.

What is Linear Regression?

Linear Regression models the relationship between a dependent variable (target) and one or more independent variables (features) by fitting a straight line to the data. It predicts a continuous numeric output.

Linear Regression is the simplest supervised learning algorithm and often the first model you should try for regression problems.

The Equation

y = B0 + B1*x1 + B2*x2 + ... + Bn*xn + e Where: y = Target variable (what you predict) x1..xn = Input features B0 = Intercept (y value when all x = 0) B1..Bn = Coefficients (slope for each feature) e = Error term (actual - predicted)

Types of Linear Regression

Simple Linear Regression

One feature, one target. Equation: y = B0 + B1*x + e. Fits a line in 2D space.

Multiple Linear Regression

Multiple features, one target. Equation: y = B0 + B1*x1 + B2*x2 + ... + e. Fits a hyperplane.

How It Works

Algorithm Steps
  1. Initialize coefficients (B0, B1, ...) to some starting values
  2. Make predictions using the current equation
  3. Calculate error — the difference between predicted and actual values
  4. Adjust coefficients to minimize the error (using Ordinary Least Squares or Gradient Descent)
  5. Repeat until the error stops decreasing significantly

Code: Predicting House Prices

import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error, r2_score # Create dataset data = { "House Size": [500, 800, 1000, 1500, 1800, 2000, 2500, 3000, 3500, 4000], "Price": [2000000, 3000000, 4000000, 6000000, 7200000, 8000000, 10000000, 12000000, 14000000, 16000000] } df = pd.DataFrame(data) # Split features and target X = df[["House Size"]] # feature / input Y = df["Price"] # target / need to predict # Train-test split x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42) # Train the model model = LinearRegression() model.fit(x_train, y_train) # Predictions y_train_pred = model.predict(x_train) y_test_pred = model.predict(x_test) # Evaluate print(f"Intercept (B0): {model.intercept_}") print(f"Coefficient (B1): {model.coef_[0]}") print(f"Train MSE: {mean_squared_error(y_train, y_train_pred)}") print(f"Test MSE: {mean_squared_error(y_test, y_test_pred)}") print(f"Test R2 Score: {r2_score(y_test, y_test_pred)}")

Evaluation Metrics

MetricWhat It MeasuresIdeal Value
MSE (Mean Squared Error)Average squared difference between actual and predictedAs close to 0 as possible
R-squared (R2)How much variance the model explains1.0 = perfect, 0 = no explanatory power

When to Use Linear Regression

Good ForNot Ideal For
Linear relationships between features and targetNon-linear or complex patterns
Continuous numeric predictionsClassification tasks
Fast training, interpretable coefficientsData with many outliers
Baseline model for any regression problemHigh-dimensional sparse data

Linear Regression assumes a linear relationship. If your data has curves or complex patterns, consider polynomial regression, decision trees, or ensemble methods instead.

Regression Supervised Linear Continuous