Autoencoders
Neural networks that learn compressed representations by encoding input to a bottleneck and decoding it back.
What is an Autoencoder?
An autoencoder is an unsupervised neural network that learns to compress (encode) data into a lower-dimensional representation and then reconstruct (decode) it back. The network is trained to minimize the difference between input and output, forcing the bottleneck layer to learn the most important features.
The key insight: by squeezing data through a bottleneck, the network must learn which features matter most — it can't memorize everything.
Architecture
Input → [Encoder] → Bottleneck (Latent Space) → [Decoder] → Output (Reconstruction)
Example for MNIST (28×28 = 784 pixels):
Encoder: 784 → 256 → 128 → 32 (latent)
Decoder: 32 → 128 → 256 → 784
How It Works
Training Process
- Encoder compresses input x into latent representation z = f(x)
- Bottleneck (latent space) holds the compressed representation
- Decoder reconstructs the input: x' = g(z)
- Loss = difference between x and x' (reconstruction error)
- Backpropagate and update weights to minimize reconstruction error
Loss Function
Reconstruction Loss (MSE):
L = (1/n) * Σ (x_i - x'_i)²
Or Binary Cross-Entropy (for normalized inputs):
L = -Σ [x_i * log(x'_i) + (1 - x_i) * log(1 - x'_i)]
Types of Autoencoders
Vanilla Autoencoder
Simple encoder-decoder with fully connected layers. Basic dimensionality reduction and feature learning.
Denoising Autoencoder
Input is corrupted with noise; network learns to reconstruct the clean version. Learns more robust features.
Variational (VAE)
Encoder outputs a probability distribution (mean + variance) instead of a fixed vector. Can generate new data by sampling.
Convolutional Autoencoder
Uses Conv2D for encoding and Conv2DTranspose for decoding. Better for image data.
Sparse Autoencoder
Adds sparsity constraint to bottleneck. Most neurons are inactive, forcing useful feature extraction.
Contractive Autoencoder
Adds penalty on the derivatives of the hidden layer. Learns representations robust to small input changes.
Code Implementation
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, Model
from tensorflow.keras.datasets import mnist
# Load data
(X_train, _), (X_test, _) = mnist.load_data()
X_train = X_train.astype('float32') / 255.0
X_test = X_test.astype('float32') / 255.0
X_train = X_train.reshape(-1, 784)
X_test = X_test.reshape(-1, 784)
# --- Vanilla Autoencoder ---
latent_dim = 32
# Encoder
encoder_input = layers.Input(shape=(784,))
x = layers.Dense(256, activation='relu')(encoder_input)
x = layers.Dense(128, activation='relu')(x)
latent = layers.Dense(latent_dim, activation='relu')(x)
encoder = Model(encoder_input, latent, name='encoder')
# Decoder
decoder_input = layers.Input(shape=(latent_dim,))
x = layers.Dense(128, activation='relu')(decoder_input)
x = layers.Dense(256, activation='relu')(x)
output = layers.Dense(784, activation='sigmoid')(x)
decoder = Model(decoder_input, output, name='decoder')
# Autoencoder
autoencoder_input = layers.Input(shape=(784,))
encoded = encoder(autoencoder_input)
decoded = decoder(encoded)
autoencoder = Model(autoencoder_input, decoded)
autoencoder.compile(optimizer='adam', loss='mse')
autoencoder.fit(X_train, X_train, epochs=20, batch_size=256,
validation_data=(X_test, X_test))
# --- Denoising Autoencoder ---
noise_factor = 0.3
X_train_noisy = X_train + noise_factor * np.random.normal(size=X_train.shape)
X_train_noisy = np.clip(X_train_noisy, 0., 1.)
# Train with: autoencoder.fit(X_train_noisy, X_train, ...) # noisy→clean
# --- Use encoder for dimensionality reduction ---
compressed = encoder.predict(X_test) # Shape: (10000, 32)
print(f"Compressed shape: {compressed.shape}") # 784D → 32D
Applications
| Application | How |
| Dimensionality Reduction | Nonlinear alternative to PCA. Bottleneck = compressed features |
| Anomaly Detection | Train on normal data. Anomalies have high reconstruction error |
| Image Denoising | Denoising autoencoder removes noise from images |
| Data Generation | VAEs sample from latent space to create new data |
| Feature Learning | Use encoder output as features for downstream tasks |
| Image Compression | Lossy compression by encoding to small latent space |
Autoencoder vs PCA
PCA
Linear transformation. Fast. Mathematically optimal for linear relationships. Limited to linear dimensionality reduction.
Autoencoder
Nonlinear transformation. Slower. Can capture complex patterns. More powerful but needs more data and tuning.
If your data has mostly linear relationships, PCA will work just as well and is much simpler. Use autoencoders when you need nonlinear compression.