Recommendation Systems
Build personalized suggestions like Netflix, Amazon, and Spotify. Match users with items they'll love.
Types of Recommendation Systems
Collaborative Filtering
"Users who liked what you liked also liked this." Based on user-item interactions. No item features needed.
Content-Based Filtering
"Because you liked Action movies, here's another Action movie." Based on item features (genre, description, etc).
Hybrid
Combines both approaches. Netflix uses collaborative + content-based + deep learning for best results.
Collaborative Filtering
User-User Collaborative Filtering
Find users similar to you, then recommend what they liked but you haven't seen yet.
User-Item Rating Matrix:
Movie1 Movie2 Movie3 Movie4
Alice 5 3 4 ? ← What will Alice rate Movie4?
Bob 3 1 2 3
Carol 4 3 4 5 ← Carol is similar to Alice
Dave 3 3 1 1
Alice ≈ Carol (similar ratings) → Recommend Movie4 to Alice (Carol gave it 5)
Item-Item Collaborative Filtering
Find items similar to what the user already liked. More stable than user-user (items don't change, users do).
Code: Collaborative Filtering
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
# User-Item rating matrix (0 = not rated)
ratings = np.array([
[5, 3, 4, 0, 0], # Alice
[3, 1, 2, 3, 3], # Bob
[4, 3, 4, 5, 0], # Carol
[3, 3, 1, 1, 5], # Dave
[1, 5, 5, 2, 1], # Eve
])
users = ['Alice', 'Bob', 'Carol', 'Dave', 'Eve']
items = ['Movie1', 'Movie2', 'Movie3', 'Movie4', 'Movie5']
# --- User-User Similarity ---
user_sim = cosine_similarity(ratings)
print("User Similarity Matrix:")
for i, u in enumerate(users):
print(f" {u}: {np.round(user_sim[i], 2)}")
# Predict Alice's rating for Movie4 (index [0][3])
alice_idx = 0
movie_idx = 3
# Find users who rated Movie4
rated_mask = ratings[:, movie_idx] > 0
similarities = user_sim[alice_idx][rated_mask]
their_ratings = ratings[rated_mask, movie_idx]
# Weighted average
predicted = np.dot(similarities, their_ratings) / similarities.sum()
print(f"\nPredicted rating for Alice on Movie4: {predicted:.2f}")
# --- Item-Item Similarity ---
item_sim = cosine_similarity(ratings.T)
print("\nItem Similarity (Movie1 vs others):", np.round(item_sim[0], 2))
Content-Based Filtering
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
# Item features (descriptions)
movies = {
'The Matrix': 'sci-fi action computer hacker virtual reality',
'John Wick': 'action thriller assassin revenge guns',
'Inception': 'sci-fi thriller dreams mind bending',
'The Notebook': 'romance drama love story emotional',
'Interstellar': 'sci-fi space adventure time gravity',
'Titanic': 'romance drama love ship historical',
}
# Convert descriptions to TF-IDF vectors
tfidf = TfidfVectorizer()
tfidf_matrix = tfidf.fit_transform(movies.values())
# Compute similarity between all movies
sim_matrix = cosine_similarity(tfidf_matrix)
# Recommend similar to "The Matrix"
movie_names = list(movies.keys())
idx = movie_names.index('The Matrix')
scores = list(enumerate(sim_matrix[idx]))
scores = sorted(scores, key=lambda x: x[1], reverse=True)
print("If you liked 'The Matrix', try:")
for i, score in scores[1:4]:
print(f" {movie_names[i]:20s} (similarity: {score:.3f})")
Matrix Factorization (Advanced)
Decompose the sparse user-item matrix into two lower-dimensional matrices. Discovers latent factors (hidden features like "genre preference" or "movie quality").
R ≈ U × V^T
R = User-Item matrix (m users × n items)
U = User factors matrix (m users × k latent factors)
V = Item factors matrix (n items × k latent factors)
Example: k=3 latent factors might represent:
Factor 1: Action vs Romance preference
Factor 2: Old vs New movie preference
Factor 3: Blockbuster vs Indie preference
Evaluation Metrics
| Metric | What It Measures |
| RMSE | How close predicted ratings are to actual ratings |
| Precision@K | Of top-K recommendations, how many were relevant |
| Recall@K | Of all relevant items, how many were in top-K |
| MAP | Average precision across all users |
| NDCG | Considers the ranking order (better items ranked higher) |
Cold Start Problem
The biggest challenge: how to recommend for new users (no history) or new items (no ratings)?
- New user → Ask preferences during onboarding, use popularity-based recs
- New item → Use content-based features until enough ratings accumulate
- Hybrid approach → Combine collaborative + content-based to handle both
Collaborative filtering needs enough user-item interactions to work. With very sparse data (<1% filled), consider content-based or hybrid methods.