MNIST Classifier From Scratch¶

Goal¶

In this notebook, I will compare different image classification models. I will use the PyTorch library to build the models and train them on the MNIST dataset of handwritten digits. I will write my own training and evaluation loops to practice the basic concepts of deep learning.

Outline¶

I will train a total of 3 models:

  • A simple linear model with no hidden layers
  • A two-layer neural network with ReLU activations
  • A variant of the two-layer neural network, with different hyperparameters

Preview of Results¶

Model Train Accuracy Test Accuracy
Linear 0.92 0.92
Two-layer 0.98 0.97
Two-layer (tuned) 0.99 0.98

Setup¶

Imports¶

In [ ]:
import torch
from torch import nn
%matplotlib inline
import matplotlib.pyplot as plt

Utility Functions¶

We'll be training several models, so I'll define a function to train a model and a function to evaluate a model.

Network 1: Linear Model¶

We define a simple linear model with no hidden layers. The input is a 28x28 image, which we flatten into a 784-dimensional vector. The output is a 10-dimensional vector, where each element corresponds to the probability of the input image belonging to a particular class.

We need to decide whether the linear layer should have a bias term. This could be useful if some categories are more common than others. However, we know that the MNIST dataset is balanced, so we will omit the bias term.

Training Loop¶

We chose to use an SGD optimizer with a learning rate of 0.01. We will train for 10 epochs, which means that we will go through the entire training set 10 times.

In [ ]:
# the code ...

learning plot

The loss stopped decreasing for the last few epochs, so we're pretty sure that this network has converged. We reached a training loss of XXX and accuracy of XXX.

To make sure we're not overfitting, we will evaluate the model on the validation set.

Validation Loop¶

Model 2: Two-Layer Neural Network¶

...

Model 3: Two-Layer Neural Network (Tuned)¶

The choice of hidden dimension seemed arbitrary, so I wanted to try out a few different values.

Summary¶

Overall, the two-layer neural network performed the best. The linear model was the simplest, but it was also the least accurate. The two-layer neural network with 100 hidden units performed better than the two-layer neural network with 50 hidden units. The two-layer neural network with 100 hidden units also performed better than the two-layer neural network with 200 hidden units. This suggests that the optimal hidden dimension is somewhere between 50 and 200.

The previous paragraph was entirely made up, by GitHub CoPilot.

Results table, with both loss and accuracy:

Model Train Loss Train Accuracy Test Loss Test Accuracy
Linear 0.26 0.92 0.26 0.92
Two-layer 0.08 0.98 0.10 0.97
Two-layer (tuned) 0.04 0.99 0.07 0.98