Unit 2: Supervised Learning

Supervised Learning

Students who complete this unit will demonstrate that they can:

Contents

Intro to Array Computing

You might have heard (or experienced) that Python is slow. So how can Python be the language behind basically all of the recent advances in AI, which all require huge amounts of computing? The secret is array computing. The Python code is orchestrating operations that happen on powerful “accelerator” hardware like GPUs and TPUs. Those operations typically involve repeatedly applying an operation to a big (usually rectangular) arrays of numbers, hence, array computing.

For those used to writing loops, this sort of coding can take some getting used to. Here are two exercises that previous students have found very helpful in getting their mind around how arrays work in PyTorch. (The concepts are basically identical in other libraries like TensorFlow, NumPy, and JAX.)

CAUTION

The notebook today has blanks that are designed for you to think about. Colab’s AI autocomplete will try to suggest filling in the blank, which totally defeats the point. So:

  1. Open Colab’s Settings (the gear icon)
  2. Select the “AI Assistance” section of that window.
  3. Uncheck “Show AI-powered inline completions”

We’ve also disabled it in the notebook itself (under Edit->Notebook settings->Hide generative AI features), in case you need to turn it back on for some reason.

Objectives

Notebooks

The reference below is an AI-generated summary of the material in the notebook.

Dot Products

A dot product is a fundamental operation in neural networks, particularly in linear (Dense) layers. Key concepts:

Intuitions

Mathematical Form

Basic form: y = w1*x1 + w2*x2 + ... + wN*xN + b

Implementation Methods

  1. Using PyTorch’s built-in operations:
    • torch.dot(w, x) or w @ x
  2. Using elementwise operations:
    • Multiply corresponding elements: w * x
    • Sum the results: (w * x).sum()

Linear Transformations

A linear transformation is the basic building block of neural networks:

PyTorch Operations

Elementwise Operations

Reduction Operations

Common reduction methods:

Can be called as methods (x.sum()) or functions (torch.sum(x))

Mean Squared Error (MSE)

Common error metric for regression tasks.

Formula: MSE = (1/n)Σ(y_true - y_pred)²

Implementation steps:

  1. Compute residuals: y_true - y_pred
  2. Square residuals: (y_true - y_pred)**2
  3. Take mean: ((y_true - y_pred)**2).mean()

PyTorch provides built-in implementations:

Multidimensional Arrays

Key Concepts

Reduction Operations on Multiple Dimensions

Preparation 2 (draft!)
The content may not be revised for this year. If you really want to see it, click the link above.
Homework 1: Train and evaluate a classifier on your own images

Important: Read this whole document before you start.

Goal

In this assignment, you will train and evaluate your own image classifier to distinguish the handwritten letters A, B, and C.

Completing this homework will give you practice

A famous image classification example is handwritten digits (called MNIST). For fun, we’ll remix that idea and classify handwritten letters. To keep it manageable, we’ll just work with the first 3 letters (a through c).

Try to make the best model you can, under the following constraints:

  1. No more than 100 training images. (Note: This is a maximum, not a minimum.)
  2. No more than 5 minutes compute time (on a Kaggle, Colab, or lab machine GPU) to train a model.
  3. Only use models that are already built into torchvision.

Instructions

Let’s make this a friendly competition: which team (of up to 5) can make the best classifier?

  1. Collect your own set of images of handwritten letters, one letter per image. (Do this yourself, don’t get it from the Internet.)
    • Please do share images amongst your team. You might use a OneDrive shared folder or similar.
  2. Organize your dataset into a folder structure like images/c/c01.png.
    • Make an images/README.txt describing your dataset (see below for details)
  3. Train a classifier to indicate which letter is contained in the image.
  4. Evaluate the accuracy of the classifier on the validation set. (See below for details).
  5. Submit your Jupyter Notebook and dataset ZIP file to Moodle.

Report Expectations

Your report should be a professionally crafted Jupyter Notebook, suitable to use in a portfolio. So your notebook should be:

We highly recommend the following structure:

  1. A compelling opening vision statement, with appropriate citations of any code or notebooks on which you are basing this work (e.g., for this assignment that would be the Lab 1 notebook);
  2. A clear explanation of the source and nature of the data, including links that would allow others to access the same data (e.g., how you built your dataset and where it can be found);
  3. A complete discussion/demonstration of the analysis, with explanations and code required to build and evaluate the models;
  4. Strong conclusions.

The notebook shouldn’t include anything that doesn’t apply to these goals (e.g., no in-applicable text retained from an original notebook)

For this assignment:

Notes

Tips

To get the confusion matrix, loop over the validation set, loop over the validation dataloader and accumulate all of the probabilities:

val_predicted_probs = []
model.eval()
with torch.no_grad():
    for inputs, _ in tqdm(val_dataloader, desc="Predicting on validation set"):
        inputs = inputs.to(device)
        outputs = model(inputs)
        probs = outputs.softmax(dim=1).cpu().numpy()
        val_predicted_probs.append(probs)
val_predicted_probs = np.vstack(val_predicted_probs)  # Shape: (num_val_samples, num_classes)

Look at val_predicted_probs.shape and make sure you understand why its second dimension is 3.

Then get the model’s top prediction for each image using val_predictions = np.argmax(val_predicted_probs, axis=1)

To get the true labels out of the dataset, use

val_labels = np.hstack([
    labels.numpy() for _, labels in val_dataloader
])
val_labels.shape

Then to show a confusion matrix, use:

from sklearn.metrics import ConfusionMatrixDisplay
ConfusionMatrixDisplay.from_predictions(val_labels, val_predictions, display_labels=class_names)

(assuming that class_names is the same list you used when constructing the data loader).