Homework 1: Train and evaluate a classifier on your own images

Important: Read this whole document before you start.

Goal

In this assignment, you will train and evaluate your own image classifier to distinguish the handwritten letters A, B, and C.

Completing this homework will give you practice

A famous image classification example is handwritten digits (called MNIST). For fun, we’ll remix that idea and classify handwritten letters. To keep it manageable, we’ll just work with the first 3 letters (a through c).

Try to make the best model you can, under the following constraints:

  1. No more than 100 training images. (Note: This is a maximum, not a minimum.)
  2. No more than 5 minutes compute time (on a Kaggle, Colab, or lab machine GPU) to train a model.
  3. Only use models that are already built into torchvision.

Instructions

Let’s make this a friendly competition: which team (of up to 5) can make the best classifier?

  1. Collect your own set of images of handwritten letters, one letter per image. (Do this yourself, don’t get it from the Internet.)
    • Please do share images amongst your team. You might use a OneDrive shared folder or similar.
  2. Organize your dataset into a folder structure like images/c/c01.png.
    • Make an images/README.txt describing your dataset (see below for details)
  3. Train a classifier to indicate which letter is contained in the image.
  4. Evaluate the accuracy of the classifier on the validation set. (See below for details).
  5. Submit your Jupyter Notebook and dataset ZIP file to Moodle.

Report Expectations

Your report should be a professionally crafted Jupyter Notebook, suitable to use in a portfolio. So your notebook should be:

We highly recommend the following structure:

  1. A compelling opening vision statement, with appropriate citations of any code or notebooks on which you are basing this work (e.g., for this assignment that would be the Lab 1 notebook);
  2. A clear explanation of the source and nature of the data, including links that would allow others to access the same data (e.g., how you built your dataset and where it can be found);
  3. A complete discussion/demonstration of the analysis, with explanations and code required to build and evaluate the models;
  4. Strong conclusions.

The notebook shouldn’t include anything that doesn’t apply to these goals (e.g., no in-applicable text retained from an original notebook)

For this assignment:

Notes

Tips

To get the confusion matrix, loop over the validation set, loop over the validation dataloader and accumulate all of the probabilities:

val_predicted_probs = []
model.eval()
with torch.no_grad():
    for inputs, _ in tqdm(val_dataloader, desc="Predicting on validation set"):
        inputs = inputs.to(device)
        outputs = model(inputs)
        probs = outputs.softmax(dim=1).cpu().numpy()
        val_predicted_probs.append(probs)
val_predicted_probs = np.vstack(val_predicted_probs)  # Shape: (num_val_samples, num_classes)

Look at val_predicted_probs.shape and make sure you understand why its second dimension is 3.

Then get the model’s top prediction for each image using val_predictions = np.argmax(val_predicted_probs, axis=1)

To get the true labels out of the dataset, use

val_labels = np.hstack([
    labels.numpy() for _, labels in val_dataloader
])
val_labels.shape

Then to show a confusion matrix, use:

from sklearn.metrics import ConfusionMatrixDisplay
ConfusionMatrixDisplay.from_predictions(val_labels, val_predictions, display_labels=class_names)

(assuming that class_names is the same list you used when constructing the data loader).

Preparation 2