Linear Regression
Predicting continuous values by fitting a straight line through data points.
What is Linear Regression?
Linear Regression models the relationship between a dependent variable (target) and one or more independent variables (features) by fitting a straight line to the data. It predicts a continuous numeric output.
Linear Regression is the simplest supervised learning algorithm and often the first model you should try for regression problems.
The Equation
y = B0 + B1*x1 + B2*x2 + ... + Bn*xn + e
Where:
y = Target variable (what you predict)
x1..xn = Input features
B0 = Intercept (y value when all x = 0)
B1..Bn = Coefficients (slope for each feature)
e = Error term (actual - predicted)
Types of Linear Regression
Simple Linear Regression
One feature, one target. Equation: y = B0 + B1*x + e. Fits a line in 2D space.
Multiple Linear Regression
Multiple features, one target. Equation: y = B0 + B1*x1 + B2*x2 + ... + e. Fits a hyperplane.
How It Works
Algorithm Steps
- Initialize coefficients (B0, B1, ...) to some starting values
- Make predictions using the current equation
- Calculate error — the difference between predicted and actual values
- Adjust coefficients to minimize the error (using Ordinary Least Squares or Gradient Descent)
- Repeat until the error stops decreasing significantly
Code: Predicting House Prices
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Create dataset
data = {
"House Size": [500, 800, 1000, 1500, 1800, 2000, 2500, 3000, 3500, 4000],
"Price": [2000000, 3000000, 4000000, 6000000, 7200000, 8000000, 10000000, 12000000, 14000000, 16000000]
}
df = pd.DataFrame(data)
# Split features and target
X = df[["House Size"]] # feature / input
Y = df["Price"] # target / need to predict
# Train-test split
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)
# Train the model
model = LinearRegression()
model.fit(x_train, y_train)
# Predictions
y_train_pred = model.predict(x_train)
y_test_pred = model.predict(x_test)
# Evaluate
print(f"Intercept (B0): {model.intercept_}")
print(f"Coefficient (B1): {model.coef_[0]}")
print(f"Train MSE: {mean_squared_error(y_train, y_train_pred)}")
print(f"Test MSE: {mean_squared_error(y_test, y_test_pred)}")
print(f"Test R2 Score: {r2_score(y_test, y_test_pred)}")
Evaluation Metrics
| Metric | What It Measures | Ideal Value |
| MSE (Mean Squared Error) | Average squared difference between actual and predicted | As close to 0 as possible |
| R-squared (R2) | How much variance the model explains | 1.0 = perfect, 0 = no explanatory power |
When to Use Linear Regression
| Good For | Not Ideal For |
| Linear relationships between features and target | Non-linear or complex patterns |
| Continuous numeric predictions | Classification tasks |
| Fast training, interpretable coefficients | Data with many outliers |
| Baseline model for any regression problem | High-dimensional sparse data |
Linear Regression assumes a linear relationship. If your data has curves or complex patterns, consider polynomial regression, decision trees, or ensemble methods instead.
Regression Supervised Linear Continuous