ML Playground

A hands-on journey through Machine Learning, Deep Learning, NLP, and Prompt Engineering. 30 interactive notebooks you can read right here as web pages.

Author: Gokul

30
Notebooks
7
Categories
20+
Algorithms
44
Total Topics

Learning Path

Intro Preprocessing Supervised Unsupervised Reinforcement Deep Learning NLP & Prompts
🎯

Supervised Learning

Learn from labeled data. Map inputs to known outputs — predicting prices, classifying spam, and more.

Linear Regression
Read

Fit a straight line to predict house prices. y = β₀ + β₁x. Train-test split, MSE, R² evaluation, regression line visualization.

Key Concepts
Simple Linear Regression Fit a line in 2D space for single-feature continuous prediction
Multiple Linear Regression Fit a hyperplane with multiple features
Ordinary Least Squares Minimize squared differences between actual and predicted
Gradient Descent Iteratively adjust coefficients to minimize error
MSE & R² Evaluate prediction error and variance explained
House Prices (10 samples)sklearn
Logistic Regression
Read

Classification using the sigmoid function. Binary & multi-class with one-vs-rest. Decision boundaries and log-odds transform.

Key Concepts
Sigmoid Function Transform linear output to [0,1] probability for classification
Log Loss Optimization Minimize cross-entropy loss for optimal classification
Feature Scaling Essential to ensure convergence and unbiased coefficients
Probability Thresholding Convert probabilities to class labels using 0.5 threshold
Decision Boundary Line separating classes in feature space
Decision Tree
Read

Tree-shaped model splitting by questions. Gini impurity, Entropy, MSE criteria. Tree depth control with visual plot_tree diagram.

Key Concepts
Classification Tree Interpretable decisions through recursive feature-based splits
Regression Tree Predict continuous values using tree structure with leaf values
Gini Impurity Measure split quality by misclassification probability
Entropy / Information Gain Measure disorder to identify optimal splits
Max Depth Regulation Limit tree depth to prevent overfitting
Age & Income (7 samples)sklearn
K-Nearest Neighbors (KNN)
Read

Classify by majority vote of K closest neighbors. Euclidean distance, why feature scaling is critical. Interactive new predictions.

Key Concepts
K-Nearest Neighbors Classify based on majority vote of K closest training points
Euclidean Distance Calculate distances to identify nearest neighbors
Feature Scaling Critical for distance-based algorithms to prevent feature dominance
K Selection Start with K=sqrt(n), use odd values for binary to prevent ties
Weighted KNN Give closer neighbors more influence in predictions
n_neighbors=2sklearn
Random Forest
Read

Ensemble of decision trees with bagging. Basic regression + extended with categorical encoding. Feature importance. R² = 0.96.

Key Concepts
Bagging Train trees on random data subsets with replacement for diversity
Random Feature Selection Consider random features per split to reduce tree correlation
Classification Majority voting across ensemble for robust predictions
Regression Average predictions across ensemble for robust continuous values
Feature Importance Rank features by contribution to reduce variance
R²=0.96sklearn
Support Vector Machine (SVM / SVR)
Read

Maximize margin between classes. Kernel trick (RBF), support vectors, ε-insensitive tube for regression. Apartment rent prediction.

Key Concepts
Margin Maximization Find optimal hyperplane that maximizes gap between classes
Support Vectors Closest data points to boundary that define the hyperplane
Linear Kernel Use for linearly separable data (fastest option)
RBF Kernel Map to infinite dimensions for non-linear separation
SVR (Regression) Fit hyperplane within epsilon-tube for regression tolerance
Apartments (7 samples)sklearn
Gradient Boosting
Read

End-to-end SMS spam detection. NLTK tokenization, stopword removal, TF-IDF vectorization, GradientBoostingClassifier. 97% accuracy.

Key Concepts
Sequential Boosting Train trees one after another, each correcting previous errors
Residual Learning New trees predict errors of the combined previous predictions
Learning Rate Shrinkage Scale each tree's contribution to prevent overfitting
Weak Base Learners Use shallow trees to benefit from the boosting effect
Early Stopping Stop training when validation performance plateaus
SMS Spam (5,572 samples)97% Accuracy
XGBoost
Read

Extreme Gradient Boosting with L1 (Lasso) vs L2 (Ridge) regularization. Visualizes how regularization affects weights. Parallelization.

Key Concepts
XGBoost Optimized gradient boosting with regularization and parallelization
L1 Regularization (Lasso) Penalize absolute weights for automatic feature elimination
L2 Regularization (Ridge) Penalize squared weights to smoothly shrink without zeroing
Missing Value Handling Learns optimal direction for missing data automatically
Sparse Data Efficiency Efficiently handles sparse matrices like TF-IDF
House Pricesxgboost
Hyperparameter Tuning
Read

Three approaches compared: GridSearchCV (exhaustive), RandomizedSearchCV (sampling), Bayesian Optimization with Optuna (intelligent).

Key Concepts
Grid Search Test every hyperparameter combination (slow but guarantees best)
Random Search Sample random combinations efficiently (faster, comparable results)
Bayesian Optimization Intelligently search using probability to focus on promising regions
Cross-Validation Use multiple folds within tuning for robust estimates
Decision Tree paramssklearn, optuna
ML Evaluation Metrics
Read

Classification: Accuracy, Precision, Recall, F1, Confusion Matrix, ROC-AUC. Regression: MAE, MSE, RMSE, R². When to use which.

When to Use Which
Accuracy Correct prediction percentage (only reliable on balanced datasets)
Precision Optimize when false positives are costly (e.g. spam filter)
Recall Optimize when false negatives are costly (e.g. cancer detection)
F1 Score Harmonic mean balancing precision and recall for imbalanced data
ROC-AUC Performance across all thresholds (1.0=perfect, 0.5=random)
MAE / MSE / RMSE Regression error metrics in original units
Fraction of variance explained by the model (0=baseline, 1=perfect)
Naive Bayes
Read

Probabilistic classifier based on Bayes' theorem. Gaussian, Multinomial, Bernoulli variants. Fast and effective for text, spam detection, sentiment. Laplace smoothing explained.

Key Concepts
Bayes' Theorem Calculate posterior probability from likelihood, prior, and evidence
Gaussian NB Use for continuous features following normal distributions
Multinomial NB Use for discrete count features like word frequencies in text
Bernoulli NB Use for binary feature presence/absence
Laplace Smoothing Add pseudocounts to prevent zero probabilities for unseen features
Cross-Validation
Read

K-Fold, Stratified K-Fold, Leave-One-Out, Time Series Split. Get reliable performance estimates instead of one lucky train-test split.

Key Concepts
K-Fold CV Split data into K folds, train K times with rotating test sets
Stratified K-Fold Maintain class distribution in each fold (essential for imbalanced data)
Leave-One-Out (LOO) Use each single sample as test set (maximum thoroughness)
Time Series Split Expanding train window for temporal data (preserve causality)
Feature Selection
Read

Filter, wrapper, and embedded methods. SelectKBest, RFE, tree importance, Lasso L1. Pick the best features and drop the noise.

Key Concepts
Filter Methods (SelectKBest) Score features independently using statistical tests (fast)
Mutual Information Measure information dependency between features and target
RFE (Recursive Elimination) Iteratively remove least important features from model
Tree-Based Importance Extract importance scores from tree models for ranking
L1 Lasso Automatically drives unimportant weights to zero
ARIMA & Prophet
Read

Classical time series forecasting. ARIMA(p,d,q), stationarity testing, auto_arima. Facebook Prophet with holidays, multiple seasonality, component plots.

Key Concepts
AR (AutoRegressive) Use past values to forecast future values
I (Integrated) Difference data to remove trends and achieve stationarity
MA (Moving Average) Incorporate past forecast errors for prediction
SARIMA Extend ARIMA with seasonal components (P,D,Q,s)
Auto ARIMA Automatically determine optimal (p,d,q) parameters
Facebook Prophet User-friendly forecasting with automatic trend and seasonality
🔭

Unsupervised Learning

Discover hidden patterns in unlabeled data. Clustering, dimensionality reduction, and association rules.

K-Means Clustering
Read

Centroid-based partitioning. k-means++ init, Elbow method for K, Silhouette score. Iterative assignment & update until convergence.

Key Concepts
K-Means Clustering Partition data into K clusters by minimizing within-cluster variance
Elbow Method Determine optimal K by plotting inertia vs number of clusters
Silhouette Score Evaluate cluster quality from -1 to 1
k-means++ Smarter centroid initialization for better starting positions
300 samples, 3 clusterssklearn
Hierarchical Clustering
Read

Agglomerative (bottom-up) & Divisive (top-down). Linkage methods: Single, Complete, Average, Ward. Dendrogram tree diagrams.

Key Concepts
Agglomerative Bottom-up approach starting from individual points and merging
Divisive Top-down approach starting from one cluster and splitting
Ward's Linkage Minimizes total within-cluster variance (most popular)
Dendrogram Tree visualization showing hierarchical cluster structure
Age & Income (10 samples)scipy, sklearn
DBSCAN
Read

Density-based clustering. No need to specify K. Finds arbitrary shapes, marks outliers as noise (-1). ε and MinPts parameters.

Key Concepts
Core Points Points with at least MinPts neighbors within eps radius
Border Points Within eps of core points but with fewer than MinPts neighbors
Epsilon (eps) Radius parameter defining neighborhood size around each point
Noise Detection Automatically marks outliers that don't belong to any cluster
Gaussian Mixture Models (GMM)
Read

Soft probabilistic clustering via EM algorithm. Each point gets a probability per cluster. Gaussian distributions with μ, Σ, π.

Key Concepts
Soft Clustering Assigns probability of belonging to each cluster (not hard labels)
EM Algorithm Iteratively estimates Gaussian parameters to maximize likelihood
Covariance Types Full, tied, diag, or spherical options for different cluster shapes
Component Weights Learned proportions of data belonging to each Gaussian
300 samples, 3 clusterssklearn
Mean Shift Clustering
Read

Density peak-seeking. Each point shifts towards the mean of nearby points. Auto-discovers cluster count. Bandwidth parameter.

Key Concepts
Density Peak Seeking Iteratively shift points toward regions of highest density
Bandwidth Parameter Controls window size: small = many clusters, large = fewer
Auto Cluster Detection No need to pre-specify number of clusters
Arbitrary Shapes Can find non-spherical cluster shapes
300 samples, 3 clusterssklearn
Principal Component Analysis (PCA)
Read

Dimensionality reduction via maximum variance projection. MNIST 64D → 2D. Eigenvalues, eigenvectors, linear transformation.

Key Concepts
PCA Find axes capturing maximum variance for dimensionality reduction
Explained Variance Ratio Shows how much information each component captures
Eigenvalues & Eigenvectors Eigenvectors are principal components, eigenvalues show importance
Feature Scaling Essential preprocessing to standardize data before PCA
MNIST (1,797 samples, 64→2D)sklearn
Association Rule Mining
Read

Market basket analysis. Apriori, Eclat, FP-Growth compared. Support, Confidence, Lift metrics. Real-world product bundling.

Key Concepts
Apriori Generate frequent itemsets level-by-level using pruning
FP-Growth Compressed tree-based method, much faster than Apriori
Support Frequency of itemset in all transactions
Confidence How often a rule is correct (antecedent → consequent)
Lift Strength of association compared to random chance (>1 is positive)
4 transactionsmlxtend
Isolation Forest
Read

Anomaly detection by random partitioning. Anomalies are isolated with fewer splits. Anomaly scores, contamination parameter. Fraud and intrusion detection.

Key Concepts
Isolation Forest Detect anomalies by isolating outliers with random partitions
Anomaly Score Normalized path length showing how easily a point is isolated
Random Partitioning Uses feature selection and split values to isolate points
Contamination Expected proportion of outliers in dataset
Recommendation Systems
Read

Collaborative filtering (user-user, item-item), content-based filtering with TF-IDF, matrix factorization. Build Netflix/Amazon-style recommendations.

Key Concepts
Collaborative Filtering "Users who liked what you liked also liked this"
Content-Based Filtering "Because you liked X, here's another similar X"
User-User Similarity Find similar users and recommend their liked items
Item-Item Similarity Find items similar to what user already liked
Matrix Factorization Decompose sparse user-item matrix into latent factors
🎮

Reinforcement Learning

An agent learns by trial and error, receiving rewards for good actions and penalties for bad ones.

Introduction to Reinforcement Learning
Read

Core vocabulary: Agent, Environment, State, Action, Reward, Policy, Episode. Model-free vs model-based. Real-world examples.

Key Concepts
Agent-Environment Loop Agent observes state, takes action, receives reward, updates policy
Policy Strategy mapping states to actions; goal is to find the optimal one
Reward Signal Feedback indicating quality of actions; guides learning
Exploration vs Exploitation Tradeoff between trying new actions and using best known
Model-Free vs Model-Based Learning from experience vs planning with environment model
Q-Learning
Read

Model-free RL with Q-table. 4×4 grid world navigation. Bellman equation, epsilon-greedy exploration. 500 episodes to optimal path.

Key Concepts
Q-Table Stores expected cumulative reward for each state-action pair
Bellman Equation Update rule for Q-values incorporating rewards and future values
Epsilon-Greedy Balance exploration (random) vs exploitation (best known)
Off-Policy Learning Learns optimal policy regardless of current exploration behavior
4×4 Grid500 Episodes
Deep Q-Network (DQN)
Read

Q-Learning + neural networks for large state spaces. Experience replay, target networks, epsilon-greedy. The approach that mastered Atari games.

Key Concepts
Deep Q-Network Combines Q-Learning with neural networks for large state spaces
Experience Replay Store and randomly sample past transitions for stable training
Target Network Separate network updated infrequently for stability
Neural Approximation Replaces Q-table with network predicting Q-values
🧠

Deep Learning

Neural networks with multiple layers. From basic perceptrons to CNNs for images and LSTMs for sequences.

Deep Learning Overview
Read

Neural network fundamentals, activation functions (ReLU, Sigmoid, Tanh, Softmax), backpropagation, common architectures.

Key Concepts
ANN Multi-layer networks with input, hidden, output layers
CNN Spatial pattern detection using filters/kernels for images
RNN Process sequences with memory via hidden states
Transformers Self-attention based architecture, parallelizable across sequence
Backpropagation Algorithm for computing gradients and updating weights
Neural Network Basics (ANN)
Read

MNIST digit classification. 784 → 128 (ReLU) → 64 (ReLU) → 10 (Softmax). Adam optimizer. 97.78% accuracy in 10 epochs.

Key Concepts
Feedforward Network Data flows one direction through layers with no recurrence
ReLU Activation max(0, x) introduces non-linearity in hidden layers
Softmax Activation Converts outputs to probabilities summing to 1 for multi-class
Adam Optimizer Adaptive learning rate combining momentum with adaptive estimates
MNIST (60K train)97.78% Accuracy
CNN Image Classification
Read

CIFAR-10 color images. Conv2D → MaxPool → Conv2D → MaxPool → Conv2D → Dense. Feature extraction, translation invariance. 71.86%.

Key Concepts
Conv2D Layers Filter slides across image computing dot products at each position
MaxPooling Takes maximum value in window, reducing spatial dimensions
Feature Maps Output of convolutional layers representing learned features
Depth-wise Architecture Early layers learn edges, deeper layers learn objects
CIFAR-10 (50K, 32×32 RGB)71.86% Accuracy
RNN Sequence Modeling
Read

Next-word prediction. Embedding → SimpleRNN(50) → Dense. N-gram sequences, hidden states, vanishing gradient problem.

Key Concepts
Simple RNN Processes sequences step-by-step with hidden state carrying info
Recurrent Connection Previous hidden state feeds as input to next step
Embedding Layer Converts word indices to dense vector representations
Vanishing Gradient Information loss in long sequences (solved by LSTM)
Text Prediction200 Epochs
LSTM (Long Short-Term Memory)
Read

Gated architecture solving vanishing gradients. Input, Forget, Output gates. Cell state as long-term memory. Text generation.

Key Concepts
Forget Gate Decides what past information to discard from cell state
Input Gate Decides what new information to add to cell state
Output Gate Decides what cell state information to output
Cell State "Conveyor belt" carrying information through sequence unchanged
Text GenerationTensorFlow/Keras
LSTM for Time Series
Read

Sales forecasting with stacked LSTMs. LSTM(50) → LSTM(25) → Dense(1). Sliding window lookback=5, MinMaxScaler.

Key Concepts
Stacked LSTM Multiple LSTM layers with return_sequences for deeper temporal learning
Sliding Window Uses lookback window of past values to predict next
MinMaxScaler Scales values to [0,1] range for neural network training
MSE Loss Mean Squared Error appropriate for regression tasks
100 sales points50 Epochs
Multi-Layer Perceptron (MLP)
Read

MLPRegressor for house prices. Hidden (64, 32) with ReLU, Adam. Why StandardScaler is essential for NNs. R² = 0.97.

Key Concepts
MLP Fully connected layers for supervised learning on tabular data
Hidden Layers Intermediate layers learning non-linear feature transformations
Feature Scaling StandardScaler normalizes features, essential for neural networks
Linear Output No activation for regression; softmax/sigmoid for classification
10 housesR²=0.97
GANs (Generative Adversarial Networks)
Read

Generator vs Discriminator competing to produce realistic data. DCGAN, WGAN, CycleGAN, StyleGAN variants. Training challenges: mode collapse, instability.

Key Concepts
Generator Creates synthetic data from random noise trying to fool discriminator
Discriminator Classifies real vs fake data, evaluating generator quality
Minimax Game Generator minimizes, discriminator maximizes loss function
Mode Collapse Generator learns limited diversity of outputs (training problem)
Training Balance Networks must stay balanced; discriminator can't be too strong/weak
Autoencoders
Read

Encoder-decoder networks learning compressed representations. Vanilla, Denoising, Variational (VAE), Convolutional variants. Anomaly detection, image denoising.

Key Concepts
Encoder Compresses input into lower-dimensional bottleneck representation
Decoder Reconstructs input from bottleneck latent representation
Bottleneck Layer Forced compression learning most important features
Denoising Autoencoder Learns to reconstruct clean data from corrupted input
Reconstruction Loss Difference between input and output drives learning
Transfer Learning
Read

Reuse pre-trained models (ResNet, VGG, MobileNet, EfficientNet). Feature extraction vs fine-tuning strategies. When to freeze, when to unfreeze.

Key Concepts
Feature Extraction Freeze pre-trained layers, add custom head for new task
Fine-Tuning Unfreeze and retrain some/all layers with low learning rate
Pre-trained Models Leverage knowledge from ImageNet or other large datasets
Early Layers Learn universal features (edges, textures) shared across tasks
Later Layers Learn task-specific features needing adaptation
Transformer Architecture
Read

Self-attention, multi-head attention, positional encoding. The architecture behind GPT, BERT, and all modern LLMs. Encoder-decoder explained step by step.

Key Concepts
Self-Attention Each position attends to all other positions computing relevance
Multi-Head Attention Multiple attention heads learning different relationships
Query, Key, Value Three learned projections for attention computation
Positional Encoding Adds position information since processing is fully parallel
Encoder-Decoder Encoder processes input, decoder generates output
Regularization Techniques
Read

Prevent overfitting. L1/L2 regularization, Dropout, Batch Normalization, Early Stopping, Data Augmentation, Weight Decay. When to use each.

When to Use Each
L1 (Lasso) Adds sum of absolute weights; drives some to zero for sparsity
L2 (Ridge) Adds sum of squared weights; shrinks all proportionally
Dropout Randomly deactivates neurons during training, forcing redundancy
Batch Normalization Normalizes layer outputs to zero mean / unit variance
Early Stopping Stop training when validation loss increases
Data Augmentation Create synthetic variations of training data
🚀

Advanced Topics

NLP and Prompt Engineering — where classical ML meets modern AI.

NLP Fundamentals
Read

Full pipeline: tokenization, stopwords, stemming vs lemmatization, BoW, TF-IDF. Word embeddings: Word2Vec, GloVe, FastText, BERT.

Key Concepts
Tokenization Split text into discrete units (words, characters, subwords)
Bag of Words Count word frequencies ignoring order and context
TF-IDF Weight terms by importance in document vs corpus
Word2Vec Learn dense embeddings capturing semantic relationships
BERT Embeddings Contextual embeddings where same word gets different vectors per context
NLTK, sklearn, gensimText → Vectors
Prompt Engineering
Read

The most comprehensive notebook. Zero/few-shot, Chain-of-Thought, Tree of Thoughts, ReAct, Reflexion. APIs for GPT-4, Claude, Gemini. Debugging & domain-specific prompts.

Key Techniques
Zero-Shot Task performance with instructions only, no examples needed
Few-Shot Provide 2-5 examples showing expected pattern and format
Chain-of-Thought Instruct model to reason step-by-step before answering
Self-Consistency Ask same question multiple times, majority vote on answer
Tree of Thoughts Explore multiple reasoning branches evaluating promises
16+ sectionsMost comprehensive
BERT / GPT Fine-Tuning
Read

HuggingFace Transformers for sentiment classification, NER, QA. Pre-trained BERT fine-tuning with Trainer API. Pipeline for instant inference.

Key Concepts
BERT Fine-Tuning Adapt bidirectional transformer for classification, NER, QA
GPT Fine-Tuning Adapt autoregressive transformer for generation tasks
Low Learning Rate Use 2e-5 to 5e-5 to preserve pre-trained knowledge
HuggingFace Load 100K+ pre-trained models and tokenizers
Classification Heads Remove pre-training head, add custom layers for your task
Model Deployment
Read

Notebook to production. Save models (joblib, ONNX), build APIs (FastAPI, Flask), Dockerize, deploy to cloud. Input validation and monitoring.

Key Concepts
Model Serialization Save trained model with joblib, pickle, or ONNX
FastAPI / Flask Create REST endpoints wrapping model for predictions
Docker Package model and dependencies for reproducible deployment
Cloud Deployment Heroku, AWS Lambda, EC2, GCP for different needs
Monitoring Track prediction latency, error rates, data/model drift