Neural Network Basics
Building and training an Artificial Neural Network (ANN) for handwritten digit classification using MNIST.
What is an ANN?
An Artificial Neural Network is a stack of layers: an input layer, one or more hidden layers, and an output layer. Each layer contains neurons that apply weights, biases, and an activation function to transform inputs into outputs. This is the foundation of all deep learning.
This notebook builds a simple feedforward ANN that recognizes handwritten digits (0-9) from the MNIST dataset with ~98% accuracy.
How It Works
Architecture: MNIST Digit Classifier
- Flatten(28, 28) -- Convert 28x28 pixel image into a 1D vector of 784 values
- Dense(128, relu) -- First hidden layer: 128 neurons learn features
- Dense(64, relu) -- Second hidden layer: 64 neurons learn higher-level features
- Dense(10, softmax) -- Output layer: 10 neurons (one per digit), probabilities sum to 1
Key Components
- ReLU activation: f(x) = max(0, x). Introduces non-linearity, avoids vanishing gradient.
- Softmax activation: Converts raw outputs into probabilities. The highest probability is the predicted digit.
- Adam optimizer: Efficient weight updates combining momentum and adaptive learning rates.
- sparse_categorical_crossentropy: Loss function for multi-class classification with integer labels (not one-hot encoded).
Code: Load and Prepare Data
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam
import matplotlib.pyplot as plt
import numpy as np
# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
# Normalize pixel values (0-255 -> 0-1)
x_train, x_test = x_train / 255.0, x_test / 255.0
# Display a sample image
plt.imshow(x_train[2], cmap='gray')
plt.title(f"Label: {y_train[2]}")
plt.show()
Code: Build the Model
# Define the ANN model
model = Sequential([
Flatten(input_shape=(28, 28)), # Convert 2D image to 1D vector
Dense(128, activation='relu'), # First hidden layer with 128 neurons
Dense(64, activation='relu'), # Second hidden layer with 64 neurons
Dense(10, activation='softmax') # Output layer for 10 classes (0-9)
])
Code: Compile and Train
model.compile(optimizer=Adam(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
history = model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test))
Code: Evaluate and Predict
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"Test Accuracy: {test_acc:.2f}")
predictions = model.predict(x_test)
predicted_label = np.argmax(predictions[0]) # Get the most probable digit
# Show the predicted and actual label
plt.imshow(x_test[0], cmap='gray')
plt.title(f"Predicted: {predicted_label}, Actual: {y_test[0]}")
plt.show()
Code: Plot Training History
plt.plot(history.history['accuracy'], label='train_acc')
plt.plot(history.history['val_accuracy'], label='val_acc')
plt.title('Accuracy over epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.show()
When to Use ANNs
| Good For | Not Ideal For |
| Tabular/structured data | Images (use CNNs instead) |
| Simple classification/regression | Sequential data (use RNN/LSTM) |
| Quick baseline deep learning model | Very small datasets (use classical ML) |
ANNs with Dense layers treat each pixel independently -- they ignore spatial structure. For image tasks, CNNs are significantly better. ANNs work well on MNIST because the digits are centered and simple.
ANNMNISTKerasSoftmaxReLUAdam