Autoencoders

Neural networks that learn compressed representations by encoding input to a bottleneck and decoding it back.

What is an Autoencoder?

An autoencoder is an unsupervised neural network that learns to compress (encode) data into a lower-dimensional representation and then reconstruct (decode) it back. The network is trained to minimize the difference between input and output, forcing the bottleneck layer to learn the most important features.

The key insight: by squeezing data through a bottleneck, the network must learn which features matter most — it can't memorize everything.

Architecture

Input → [Encoder] → Bottleneck (Latent Space) → [Decoder] → Output (Reconstruction) Example for MNIST (28×28 = 784 pixels): Encoder: 784 → 256 → 128 → 32 (latent) Decoder: 32 → 128 → 256 → 784

How It Works

Training Process

Encoder compresses input x into latent representation z = f(x)
Bottleneck (latent space) holds the compressed representation
Decoder reconstructs the input: x' = g(z)
Loss = difference between x and x' (reconstruction error)
Backpropagate and update weights to minimize reconstruction error

Loss Function

Reconstruction Loss (MSE): L = (1/n) * Σ (x_i - x'_i)² Or Binary Cross-Entropy (for normalized inputs): L = -Σ [x_i * log(x'_i) + (1 - x_i) * log(1 - x'_i)]

Types of Autoencoders

Vanilla Autoencoder

Simple encoder-decoder with fully connected layers. Basic dimensionality reduction and feature learning.

Denoising Autoencoder

Input is corrupted with noise; network learns to reconstruct the clean version. Learns more robust features.

Variational (VAE)

Encoder outputs a probability distribution (mean + variance) instead of a fixed vector. Can generate new data by sampling.

Convolutional Autoencoder

Uses Conv2D for encoding and Conv2DTranspose for decoding. Better for image data.

Sparse Autoencoder

Adds sparsity constraint to bottleneck. Most neurons are inactive, forcing useful feature extraction.

Contractive Autoencoder

Adds penalty on the derivatives of the hidden layer. Learns representations robust to small input changes.

Code Implementation

import numpy as np import tensorflow as tf from tensorflow.keras import layers, Model from tensorflow.keras.datasets import mnist # Load data (X_train, _), (X_test, _) = mnist.load_data() X_train = X_train.astype('float32') / 255.0 X_test = X_test.astype('float32') / 255.0 X_train = X_train.reshape(-1, 784) X_test = X_test.reshape(-1, 784) # --- Vanilla Autoencoder --- latent_dim = 32 # Encoder encoder_input = layers.Input(shape=(784,)) x = layers.Dense(256, activation='relu')(encoder_input) x = layers.Dense(128, activation='relu')(x) latent = layers.Dense(latent_dim, activation='relu')(x) encoder = Model(encoder_input, latent, name='encoder') # Decoder decoder_input = layers.Input(shape=(latent_dim,)) x = layers.Dense(128, activation='relu')(decoder_input) x = layers.Dense(256, activation='relu')(x) output = layers.Dense(784, activation='sigmoid')(x) decoder = Model(decoder_input, output, name='decoder') # Autoencoder autoencoder_input = layers.Input(shape=(784,)) encoded = encoder(autoencoder_input) decoded = decoder(encoded) autoencoder = Model(autoencoder_input, decoded) autoencoder.compile(optimizer='adam', loss='mse') autoencoder.fit(X_train, X_train, epochs=20, batch_size=256, validation_data=(X_test, X_test)) # --- Denoising Autoencoder --- noise_factor = 0.3 X_train_noisy = X_train + noise_factor * np.random.normal(size=X_train.shape) X_train_noisy = np.clip(X_train_noisy, 0., 1.) # Train with: autoencoder.fit(X_train_noisy, X_train, ...) # noisy→clean # --- Use encoder for dimensionality reduction --- compressed = encoder.predict(X_test) # Shape: (10000, 32) print(f"Compressed shape: {compressed.shape}") # 784D → 32D

Applications

Application	How
Dimensionality Reduction	Nonlinear alternative to PCA. Bottleneck = compressed features
Anomaly Detection	Train on normal data. Anomalies have high reconstruction error
Image Denoising	Denoising autoencoder removes noise from images
Data Generation	VAEs sample from latent space to create new data
Feature Learning	Use encoder output as features for downstream tasks
Image Compression	Lossy compression by encoding to small latent space

Autoencoder vs PCA

PCA

Linear transformation. Fast. Mathematically optimal for linear relationships. Limited to linear dimensionality reduction.

Autoencoder

Nonlinear transformation. Slower. Can capture complex patterns. More powerful but needs more data and tuning.

If your data has mostly linear relationships, PCA will work just as well and is much simpler. Use autoencoders when you need nonlinear compression.