K-Nearest Neighbors - ML Playground

A simple algorithm that classifies data points based on the majority vote of their K closest neighbors.

What is KNN?

KNN is an instance-based algorithm used for both classification and regression. It makes predictions by finding the K training examples closest to a new data point and taking a majority vote (classification) or average (regression).

How It Works

Algorithm Steps

Choose K (number of nearest neighbors)
Calculate distance from the new point to all training points (usually Euclidean distance)
Find the K closest training points
Classification: Take the majority class among K neighbors
Regression: Take the average value of K neighbors

Why Feature Scaling Matters

KNN relies on distance calculations. If one feature has a range of 0-100 and another 0-1, the larger feature dominates the distance. Always scale features before using KNN.

Code Implementation

import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.neighbors import KNeighborsClassifier from sklearn.metrics import accuracy_score, classification_report # Dataset data = { "Age": [22, 25, 28, 30, 32, 35, 40, 45, 50, 55], "Income_LPA": [16, 8, 10, 9, 15, 12, 22, 26, 10, 35], "Buys_House": [1, 0, 0, 0, 0, 1, 1, 1, 0, 1] } df = pd.DataFrame(data) # Features and target X = df[["Age", "Income_LPA"]] y = df["Buys_House"] # Train-test split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Feature scaling (critical for KNN) scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) # Train KNN model knn = KNeighborsClassifier(n_neighbors=2, weights="distance") knn.fit(X_train_scaled, y_train) # Predict and evaluate y_pred = knn.predict(X_test_scaled) accuracy = accuracy_score(y_test, y_pred) report = classification_report(y_test, y_pred, zero_division=0) print(f"Model Accuracy: {accuracy * 100:.2f}%") print("Classification Report:\n", report) # Predict for a new person new_data = scaler.transform([[21, 2]]) prediction = knn.predict(new_data) print("Can buy house" if prediction[0] == 1 else "Cannot buy house")

Parameter	Description
n_neighbors	Number of neighbors (K). Default is 5
weights	"uniform" (all equal) or "distance" (closer neighbors have more influence)
metric	Distance metric: "euclidean", "manhattan", "minkowski"

Good For	Not Ideal For
Small to medium datasets	Large datasets (slow prediction)
Non-linear decision boundaries	High-dimensional data (curse of dimensionality)
Quick baseline model	When training speed matters
Multi-class classification	When features are not scaled

K-Nearest Neighbors (KNN)

What is KNN?

How It Works

Why Feature Scaling Matters

Code Implementation

Choosing K

Key Parameters

When to Use KNN