CNN Image Classification
Using Convolutional Neural Networks to classify images from the CIFAR-10 dataset (10 classes, 32x32 RGB).
What is a CNN?
A Convolutional Neural Network exploits the spatial structure of images by using convolutional layers to extract patterns like edges, textures, and shapes. Unlike regular neural networks that flatten images immediately, CNNs preserve and learn from the 2D spatial relationships between pixels.
CNNs are the standard for image tasks: object recognition, medical imaging, face recognition, and satellite imagery.
CNN Architecture Components
Layer-by-Layer Breakdown
- Conv2D(32, 3x3, relu) -- 32 filters scan 3x3 patches, detect basic features (edges, curves)
- MaxPooling2D(2x2) -- Reduces spatial size by half, keeps strongest features
- Conv2D(64, 3x3, relu) -- 64 filters detect more complex patterns (combinations of edges)
- MaxPooling2D(2x2) -- Further reduces dimensions
- Conv2D(64, 3x3, relu) -- Learns even higher-level features
- Flatten() -- Converts 3D feature maps to 1D vector for Dense layers
- Dense(64, relu) -- Combines features for classification
- Dense(10, softmax) -- Output probabilities for 10 classes
Key Concepts
- Convolution: A filter slides across the image, computing dot products to detect features at every position.
- MaxPooling: Takes the maximum value in each 2x2 block. Reduces computation and adds translation invariance.
- Filters/Kernels: Small weight matrices (e.g., 3x3) that learn to detect specific patterns during training.
Code: Load CIFAR-10 Data
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
import numpy as np
# Load the dataset
(x_train, y_train), (x_test, y_test) = datasets.cifar10.load_data()
# Normalize pixel values to [0,1]
x_train = x_train / 255.0
x_test = x_test / 255.0
# Class names for CIFAR-10
class_names = ['airplane','automobile','bird','cat','deer',
'dog','frog','horse','ship','truck']
print("Training images shape:", x_train.shape) # (50000, 32, 32, 3)
print("Test images shape:", x_test.shape) # (10000, 32, 32, 3)
# Visualize first 5 training images
plt.figure(figsize=(5,2))
for i in range(5):
plt.subplot(1,5,i+1)
plt.imshow(x_train[i])
plt.title(class_names[int(y_train[i])])
plt.axis('off')
plt.show()
Code: Build CNN Model
# Build CNN Model
model = models.Sequential()
# 1st Convolution + Pooling
model.add(layers.Conv2D(32, (3,3), activation='relu', input_shape=(32,32,3)))
model.add(layers.MaxPooling2D((2,2)))
# 2nd Convolution + Pooling
model.add(layers.Conv2D(64, (3,3), activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
# 3rd Convolution
model.add(layers.Conv2D(64, (3,3), activation='relu'))
# Flatten + Dense layers
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
Code: Train and Evaluate
# Compile and train
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test))
# Evaluate
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_acc:.4f}")
# Visualize Predictions
predictions = model.predict(x_test[:5])
plt.figure(figsize=(6,1))
for i in range(5):
plt.subplot(1,5,i+1)
plt.imshow(x_test[i])
plt.title(f"{class_names[np.argmax(predictions[i])]}")
plt.axis('off')
plt.show()
When to Use CNNs
| Good For | Not Ideal For |
| Image classification | Tabular / structured data |
| Object detection | Sequential / time-series data |
| Medical imaging | Small datasets without augmentation |
| Face recognition | Tasks where spatial structure is irrelevant |
CIFAR-10 images are only 32x32 pixels. The ~72% accuracy is typical for a basic CNN on this dataset. Adding data augmentation, dropout, batch normalization, and deeper architectures (ResNet) can push accuracy above 90%.
CNNCIFAR-10Conv2DMaxPoolingKerasImage Classification