Recommendation Systems

Build personalized suggestions like Netflix, Amazon, and Spotify. Match users with items they'll love.

Types of Recommendation Systems

Collaborative Filtering

"Users who liked what you liked also liked this." Based on user-item interactions. No item features needed.

Content-Based Filtering

"Because you liked Action movies, here's another Action movie." Based on item features (genre, description, etc).

Hybrid

Combines both approaches. Netflix uses collaborative + content-based + deep learning for best results.

Collaborative Filtering

User-User Collaborative Filtering

Find users similar to you, then recommend what they liked but you haven't seen yet.

User-Item Rating Matrix: Movie1 Movie2 Movie3 Movie4 Alice 5 3 4 ? ← What will Alice rate Movie4? Bob 3 1 2 3 Carol 4 3 4 5 ← Carol is similar to Alice Dave 3 3 1 1 Alice ≈ Carol (similar ratings) → Recommend Movie4 to Alice (Carol gave it 5)

Item-Item Collaborative Filtering

Find items similar to what the user already liked. More stable than user-user (items don't change, users do).

Code: Collaborative Filtering

import numpy as np from sklearn.metrics.pairwise import cosine_similarity # User-Item rating matrix (0 = not rated) ratings = np.array([ [5, 3, 4, 0, 0], # Alice [3, 1, 2, 3, 3], # Bob [4, 3, 4, 5, 0], # Carol [3, 3, 1, 1, 5], # Dave [1, 5, 5, 2, 1], # Eve ]) users = ['Alice', 'Bob', 'Carol', 'Dave', 'Eve'] items = ['Movie1', 'Movie2', 'Movie3', 'Movie4', 'Movie5'] # --- User-User Similarity --- user_sim = cosine_similarity(ratings) print("User Similarity Matrix:") for i, u in enumerate(users): print(f" {u}: {np.round(user_sim[i], 2)}") # Predict Alice's rating for Movie4 (index [0][3]) alice_idx = 0 movie_idx = 3 # Find users who rated Movie4 rated_mask = ratings[:, movie_idx] > 0 similarities = user_sim[alice_idx][rated_mask] their_ratings = ratings[rated_mask, movie_idx] # Weighted average predicted = np.dot(similarities, their_ratings) / similarities.sum() print(f"\nPredicted rating for Alice on Movie4: {predicted:.2f}") # --- Item-Item Similarity --- item_sim = cosine_similarity(ratings.T) print("\nItem Similarity (Movie1 vs others):", np.round(item_sim[0], 2))

Content-Based Filtering

from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import cosine_similarity # Item features (descriptions) movies = { 'The Matrix': 'sci-fi action computer hacker virtual reality', 'John Wick': 'action thriller assassin revenge guns', 'Inception': 'sci-fi thriller dreams mind bending', 'The Notebook': 'romance drama love story emotional', 'Interstellar': 'sci-fi space adventure time gravity', 'Titanic': 'romance drama love ship historical', } # Convert descriptions to TF-IDF vectors tfidf = TfidfVectorizer() tfidf_matrix = tfidf.fit_transform(movies.values()) # Compute similarity between all movies sim_matrix = cosine_similarity(tfidf_matrix) # Recommend similar to "The Matrix" movie_names = list(movies.keys()) idx = movie_names.index('The Matrix') scores = list(enumerate(sim_matrix[idx])) scores = sorted(scores, key=lambda x: x[1], reverse=True) print("If you liked 'The Matrix', try:") for i, score in scores[1:4]: print(f" {movie_names[i]:20s} (similarity: {score:.3f})")

Matrix Factorization (Advanced)

Decompose the sparse user-item matrix into two lower-dimensional matrices. Discovers latent factors (hidden features like "genre preference" or "movie quality").

R ≈ U × V^T R = User-Item matrix (m users × n items) U = User factors matrix (m users × k latent factors) V = Item factors matrix (n items × k latent factors) Example: k=3 latent factors might represent: Factor 1: Action vs Romance preference Factor 2: Old vs New movie preference Factor 3: Blockbuster vs Indie preference

Evaluation Metrics

Metric	What It Measures
RMSE	How close predicted ratings are to actual ratings
Precision@K	Of top-K recommendations, how many were relevant
Recall@K	Of all relevant items, how many were in top-K
MAP	Average precision across all users
NDCG	Considers the ranking order (better items ranked higher)

Cold Start Problem

The biggest challenge: how to recommend for new users (no history) or new items (no ratings)?

New user → Ask preferences during onboarding, use popularity-based recs
New item → Use content-based features until enough ratings accumulate
Hybrid approach → Combine collaborative + content-based to handle both

Collaborative filtering needs enough user-item interactions to work. With very sparse data (<1% filled), consider content-based or hybrid methods.