ML Playground / Deep Learning Overview View Notebook

Deep Learning Overview

A subfield of machine learning that uses multi-layer neural networks to learn complex patterns from raw data.

What is Deep Learning?

Deep Learning uses artificial neural networks with multiple layers (hence "deep") to learn complex patterns from data. Unlike classical ML, deep learning automatically learns features from raw data -- pixels, text, or audio -- without manual feature engineering.

Core idea: just like the human brain processes signals through layers of neurons, deep learning models learn using layers of artificial neurons.

Neural Network Structure

Input Layer --> Hidden Layers --> Output Layer Each layer has: Neurons (nodes) - mini math units Weights - control importance of inputs Biases - offset added to weighted input Activation Function - introduces non-linearity

Deep Learning Algorithm Map

AlgorithmFull FormUsed ForKey Idea
ANNArtificial Neural NetworkTabular data, basic problemsInput -> hidden -> output layers
DNNDeep Neural NetworkAny complex taskANN with many hidden layers
MLPMultilayer PerceptronClassification & regressionFully connected layers, no memory
CNNConvolutional Neural NetworkImage & videoFilters/kernels detect spatial patterns
RNNRecurrent Neural NetworkTime-series & sequencesHas memory; passes info through time
LSTMLong Short-Term MemoryLong sequencesImproved RNN with memory gates
Transformers--NLP, vision, LLMsSelf-attention; parallel processing
GANsGenerative Adversarial NetworksImage generationGenerator vs Discriminator

Activation Functions

After computing the weighted sum in a neuron, the activation function decides whether the neuron should "fire". Without it, the network is just a linear function.

z = (input1 * weight1) + (input2 * weight2) + bias a = activation_function(z)

Sigmoid

Output: 0 to 1. Used for binary classification output layer. Limitation: vanishing gradient in deep networks.

ReLU

f(x) = max(0, x). Default for hidden layers. Fast and simple. Limitation: dead neurons if input is always negative.

Leaky ReLU

Allows small negative output to fix dead neuron problem. f(x) = x if x > 0, else 0.01x.

Softmax

Converts outputs to probabilities summing to 1. Used for multi-class classification output layer.

import numpy as np def sigmoid(x): return 1 / (1 + np.exp(-x)) def relu(x): return np.maximum(0, x) def leaky_relu(x, alpha=0.01): return np.where(x > 0, x, alpha * x)

Loss Functions

A loss function measures how far the model's prediction is from the actual value. Smaller loss means better performance. The optimizer uses the loss to adjust weights via backpropagation.

Regression

MSE (Mean Squared Error) and MAE (Mean Absolute Error).

Classification

Binary Cross-Entropy (2 classes) and Categorical Cross-Entropy (multi-class).

Optimizers

An optimizer updates weights to minimize the loss function. It uses gradients computed via backpropagation.

W_new = W_old - learning_rate * gradient
OptimizerHow It WorksWhen to Use
Gradient DescentUpdates after full dataset passSmall datasets
SGDUpdates after each sampleLarge datasets, noisy but fast
Mini-Batch GDUpdates after a batch (e.g., 32)Most common in practice
AdaGradAdapts learning rate per parameterSparse data (text)
RMSPropFixes AdaGrad's shrinking LRRNNs, non-stationary problems
AdamMomentum + RMSProp combinedDefault choice for most tasks

Adam (Adaptive Moment Estimation) is the most widely used optimizer. It combines the benefits of momentum and adaptive learning rates. Use it as your default.

Epochs, Batch Size, Learning Rate

Training Hyperparameters

Learning rate is the single most important hyperparameter. Start with 0.001 (Adam default). If training loss oscillates, reduce it. If training is too slow, increase it.

When to Use Deep Learning

Good ForNot Ideal For
Images, video, audio, textSmall tabular datasets
Large datasets (thousands+ samples)When interpretability is critical
Complex non-linear relationshipsLow-compute environments
End-to-end feature learningWhen classical ML works well enough

ANNCNNRNNLSTMTransformersAdamBackpropagationReLU