Neural Network Basics

Building and training an Artificial Neural Network (ANN) for handwritten digit classification using MNIST.

What is an ANN?

An Artificial Neural Network is a stack of layers: an input layer, one or more hidden layers, and an output layer. Each layer contains neurons that apply weights, biases, and an activation function to transform inputs into outputs. This is the foundation of all deep learning.

This notebook builds a simple feedforward ANN that recognizes handwritten digits (0-9) from the MNIST dataset with ~98% accuracy.

How It Works

Architecture: MNIST Digit Classifier

Flatten(28, 28) -- Convert 28x28 pixel image into a 1D vector of 784 values
Dense(128, relu) -- First hidden layer: 128 neurons learn features
Dense(64, relu) -- Second hidden layer: 64 neurons learn higher-level features
Dense(10, softmax) -- Output layer: 10 neurons (one per digit), probabilities sum to 1

Key Components

ReLU activation: f(x) = max(0, x). Introduces non-linearity, avoids vanishing gradient.
Softmax activation: Converts raw outputs into probabilities. The highest probability is the predicted digit.
Adam optimizer: Efficient weight updates combining momentum and adaptive learning rates.
sparse_categorical_crossentropy: Loss function for multi-class classification with integer labels (not one-hot encoded).

Code: Load and Prepare Data

import tensorflow as tf from tensorflow import keras from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Flatten from tensorflow.keras.optimizers import Adam import matplotlib.pyplot as plt import numpy as np # Load the MNIST dataset (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data() # Normalize pixel values (0-255 -> 0-1) x_train, x_test = x_train / 255.0, x_test / 255.0 # Display a sample image plt.imshow(x_train[2], cmap='gray') plt.title(f"Label: {y_train[2]}") plt.show()

Code: Build the Model

# Define the ANN model model = Sequential([ Flatten(input_shape=(28, 28)), # Convert 2D image to 1D vector Dense(128, activation='relu'), # First hidden layer with 128 neurons Dense(64, activation='relu'), # Second hidden layer with 64 neurons Dense(10, activation='softmax') # Output layer for 10 classes (0-9) ])

Code: Compile and Train

model.compile(optimizer=Adam(), loss='sparse_categorical_crossentropy', metrics=['accuracy']) history = model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test))

Code: Evaluate and Predict

test_loss, test_acc = model.evaluate(x_test, y_test) print(f"Test Accuracy: {test_acc:.2f}") predictions = model.predict(x_test) predicted_label = np.argmax(predictions[0]) # Get the most probable digit # Show the predicted and actual label plt.imshow(x_test[0], cmap='gray') plt.title(f"Predicted: {predicted_label}, Actual: {y_test[0]}") plt.show()

Code: Plot Training History

plt.plot(history.history['accuracy'], label='train_acc') plt.plot(history.history['val_accuracy'], label='val_acc') plt.title('Accuracy over epochs') plt.xlabel('Epoch') plt.ylabel('Accuracy') plt.legend() plt.show()

When to Use ANNs

Good For	Not Ideal For
Tabular/structured data	Images (use CNNs instead)
Simple classification/regression	Sequential data (use RNN/LSTM)
Quick baseline deep learning model	Very small datasets (use classical ML)

ANNs with Dense layers treat each pixel independently -- they ignore spatial structure. For image tasks, CNNs are significantly better. ANNs work well on MNIST because the digits are centered and simple.

ANNMNISTKerasSoftmaxReLUAdam