CNN Image Classification

Using Convolutional Neural Networks to classify images from the CIFAR-10 dataset (10 classes, 32x32 RGB).

What is a CNN?

A Convolutional Neural Network exploits the spatial structure of images by using convolutional layers to extract patterns like edges, textures, and shapes. Unlike regular neural networks that flatten images immediately, CNNs preserve and learn from the 2D spatial relationships between pixels.

CNNs are the standard for image tasks: object recognition, medical imaging, face recognition, and satellite imagery.

CNN Architecture Components

Layer-by-Layer Breakdown

Conv2D(32, 3x3, relu) -- 32 filters scan 3x3 patches, detect basic features (edges, curves)
MaxPooling2D(2x2) -- Reduces spatial size by half, keeps strongest features
Conv2D(64, 3x3, relu) -- 64 filters detect more complex patterns (combinations of edges)
MaxPooling2D(2x2) -- Further reduces dimensions
Conv2D(64, 3x3, relu) -- Learns even higher-level features
Flatten() -- Converts 3D feature maps to 1D vector for Dense layers
Dense(64, relu) -- Combines features for classification
Dense(10, softmax) -- Output probabilities for 10 classes

Key Concepts

Convolution: A filter slides across the image, computing dot products to detect features at every position.
MaxPooling: Takes the maximum value in each 2x2 block. Reduces computation and adds translation invariance.
Filters/Kernels: Small weight matrices (e.g., 3x3) that learn to detect specific patterns during training.

Code: Load CIFAR-10 Data

import tensorflow as tf from tensorflow.keras import datasets, layers, models import matplotlib.pyplot as plt import numpy as np # Load the dataset (x_train, y_train), (x_test, y_test) = datasets.cifar10.load_data() # Normalize pixel values to [0,1] x_train = x_train / 255.0 x_test = x_test / 255.0 # Class names for CIFAR-10 class_names = ['airplane','automobile','bird','cat','deer', 'dog','frog','horse','ship','truck'] print("Training images shape:", x_train.shape) # (50000, 32, 32, 3) print("Test images shape:", x_test.shape) # (10000, 32, 32, 3) # Visualize first 5 training images plt.figure(figsize=(5,2)) for i in range(5): plt.subplot(1,5,i+1) plt.imshow(x_train[i]) plt.title(class_names[int(y_train[i])]) plt.axis('off') plt.show()

Code: Build CNN Model

# Build CNN Model model = models.Sequential() # 1st Convolution + Pooling model.add(layers.Conv2D(32, (3,3), activation='relu', input_shape=(32,32,3))) model.add(layers.MaxPooling2D((2,2))) # 2nd Convolution + Pooling model.add(layers.Conv2D(64, (3,3), activation='relu')) model.add(layers.MaxPooling2D((2,2))) # 3rd Convolution model.add(layers.Conv2D(64, (3,3), activation='relu')) # Flatten + Dense layers model.add(layers.Flatten()) model.add(layers.Dense(64, activation='relu')) model.add(layers.Dense(10, activation='softmax'))

Code: Train and Evaluate

# Compile and train model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) history = model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test)) # Evaluate test_loss, test_acc = model.evaluate(x_test, y_test) print(f"Test accuracy: {test_acc:.4f}") # Visualize Predictions predictions = model.predict(x_test[:5]) plt.figure(figsize=(6,1)) for i in range(5): plt.subplot(1,5,i+1) plt.imshow(x_test[i]) plt.title(f"{class_names[np.argmax(predictions[i])]}") plt.axis('off') plt.show()

When to Use CNNs

Good For	Not Ideal For
Image classification	Tabular / structured data
Object detection	Sequential / time-series data
Medical imaging	Small datasets without augmentation
Face recognition	Tasks where spatial structure is irrelevant

CIFAR-10 images are only 32x32 pixels. The ~72% accuracy is typical for a basic CNN on this dataset. Adding data augmentation, dropout, batch normalization, and deeper architectures (ResNet) can push accuracy above 90%.

CNNCIFAR-10Conv2DMaxPoolingKerasImage Classification