Unit 2: Supervised Learning

You might have heard (or experienced) that Python is slow. So how can Python be the language behind basically all of the recent advances in AI, which all require huge amounts of computing? The secret is array computing. The Python code is orchestrating operations that happen on powerful “accelerator” hardware like GPUs and TPUs. Those operations typically involve repeatedly applying an operation to a big (usually rectangular) arrays of numbers, hence, array computing.

For those used to writing loops, this sort of coding can take some getting used to. Here are two exercises that previous students have found very helpful in getting their mind around how arrays work in PyTorch. (The concepts are basically identical in other libraries like TensorFlow, NumPy, and JAX.)

CAUTION

The notebook today has blanks that are designed for you to think about. Colab’s AI autocomplete will try to suggest filling in the blank, which totally defeats the point. So:

Open Colab’s Settings (the gear icon)
Select the “AI Assistance” section of that window.
Uncheck “Show AI-powered inline completions”

We’ve also disabled it in the notebook itself (under Edit->Notebook settings->Hide generative AI features), in case you need to turn it back on for some reason.

Objectives

Apply mathematical operations to arrays using PyTorch

Notebooks

PyTorch Warmup (name: u02n1-pytorch.ipynb; show preview, open in Colab)
- Dot Products
  - for loop approach
    - Torch Elementwise Operations
  - Torch Reduction Ops
  - Building a dot product out of Torch ops
- Linear Layer
  - Linear layer, Module-style
- Mean Squared Error
- Multidimensional arrays
- Appendix

The reference below is an AI-generated summary of the material in the notebook.

Dot Products

A dot product is a fundamental operation in neural networks, particularly in linear (Dense) layers. Key concepts:

Intuitions

Measures similarity/alignment between vectors
Can be thought of as “How much does the input look like this pattern?”
In a Linear layer, performs rotation and stretching of input space
Similar to multiple linear regression’s weighted mixture

Mathematical Form

Basic form: y = w1*x1 + w2*x2 + ... + wN*xN + b

Each input x[i] is multiplied by its corresponding weight w[i]
Results are summed together
Often includes a bias term b (can be omitted for simplicity)

Implementation Methods

Using PyTorch’s built-in operations:
- torch.dot(w, x) or w @ x
Using elementwise operations:
- Multiply corresponding elements: w * x
- Sum the results: (w * x).sum()

Linear Transformations

A linear transformation is the basic building block of neural networks:

Takes form: y = w*x + b
w represents weights
b represents bias
Can be implemented as a function or as a class (Module-style)

PyTorch Operations

Elementwise Operations

Operations between tensors of same shape happen element-by-element
Example: w * x multiplies corresponding elements

Reduction Operations

Common reduction methods:

sum(): Adds all elements
mean(): Computes average
max(): Finds maximum value
argmax(): Finds index of maximum value

Can be called as methods (x.sum()) or functions (torch.sum(x))

Mean Squared Error (MSE)

Common error metric for regression tasks.

Formula: MSE = (1/n)Σ(y_true - y_pred)²

Implementation steps:

Compute residuals: y_true - y_pred
Square residuals: (y_true - y_pred)**2
Take mean: ((y_true - y_pred)**2).mean()

PyTorch provides built-in implementations:

Functional style: F.mse_loss(y_pred, y_true)
Module style: nn.MSELoss()

Multidimensional Arrays

Key Concepts

Can have multiple axes (dimensions)
Indexing can use positive or negative indices
Shape determines valid operations

Reduction Operations on Multiple Dimensions

Can reduce along specific axes using axis parameter
Reducing along an axis removes that dimension
Example: x.sum(axis=1) sums along axis 1

Important: Read this whole document before you start.

Goal

In this assignment, you will train and evaluate your own image classifier to distinguish the handwritten letters A, B, and C.

Completing this homework will give you practice

Working with image datasets
Training image classifiers
Evaluating image classifiers
Explaining your decisions and their possible consequences.

A famous image classification example is handwritten digits (called MNIST). For fun, we’ll remix that idea and classify handwritten letters. To keep it manageable, we’ll just work with the first 3 letters (a through c).

Try to make the best model you can, under the following constraints:

No more than 100 training images. (Note: This is a maximum, not a minimum.)
No more than 5 minutes compute time (on a Kaggle, Colab, or lab machine GPU) to train a model.
Only use models that are already built into torchvision.

Instructions

Let’s make this a friendly competition: which team (of up to 5) can make the best classifier?

Collect your own set of images of handwritten letters, one letter per image. (Do this yourself, don’t get it from the Internet.)
- Please do share images amongst your team. You might use a OneDrive shared folder or similar.
Organize your dataset into a folder structure like images/c/c01.png.
- Make an images/README.txt describing your dataset (see below for details)
Train a classifier to indicate which letter is contained in the image.
Evaluate the accuracy of the classifier on the validation set. (See below for details).
Submit your Jupyter Notebook and dataset ZIP file to Moodle.

Make sure you Restart your notebook and Run All.
Check that your code outputs match what you write about. There will be some variability in your classifier’s performance. Think statistically: remember that a number like accuracy is an estimate of a proportion and your sample size is probably small.

Report Expectations

Your report should be a professionally crafted Jupyter Notebook, suitable to use in a portfolio. So your notebook should be:

Well-formatted: use appropriate ## Headings; proofread text and code
Literate: explain what you are doing and why.
Reproducible: running it from a clean slate should reproduce the results shown, including training the main model described. (you don’t need to include code to train other models that you may have also tried, unless substantial modifications were necessary.)

We highly recommend the following structure:

A compelling opening vision statement, with appropriate citations of any code or notebooks on which you are basing this work (e.g., for this assignment that would be the Lab 1 notebook);
A clear explanation of the source and nature of the data, including links that would allow others to access the same data (e.g., how you built your dataset and where it can be found);
A complete discussion/demonstration of the analysis, with explanations and code required to build and evaluate the models;
Strong conclusions.

The notebook shouldn’t include anything that doesn’t apply to these goals (e.g., no in-applicable text retained from an original notebook)

For this assignment:

The dataset description in the notebook should include at minimum (1) how many images you have of each class and (2) how you collected the images (e.g., whether you used a mouse/finger/pen or took pictures of paper/whiteboard/chalkboard/documents you found in the Meeter Center/…). Also include this information in your dataset’s README.txt.
Your analysis should include at least:
1. How many images you have in your classifier’s training and validation sets.
2. Evaluations of the model on your validation set.
  - How accurate is the classifier overall?
  - Which letter is it most successful at classifying? Give an example of a correctly classified image (show a specific image file and its classification).
  - What mistakes does it make most frequently? Give an example of a mistake (show a specific image file and its classification).
  - For the previous 3 questions, any ideas about why?
  - Generalization: Suppose someone else gave you one of their images. How likely do you think your classifier would be to get it right, and why? report your answer in terms of a percentage, either overall or broken down by which letter.
Your conclusion should include at least: what choices did you have to make in the process of collecting data, processing it, and analyzing the results?
- What are one or two choices that you could have made differently?
- What do you expect would be different if you made that different choice?

Notes

Include all the code needed to get one good accuracy number.
Don’t try to show the results of every model you trained, but do make a single cell to change numbers for any aspects you varied (e.g., the seed, how many images you used)
Don’t include extraneous code
Use Markdown cells, not code comments, to report results.

Tips

Collecting data
- I’ve hacked together this little webapp to let you sketch and share/save. It’s clunky; improvements welcome! Think about whether it makes sense to have a lot of images like these.)
- You can also take pictures of sketches on paper, whiteboards, etc.
- You should have at least 10 images per letter.
- To get started, you can use this dataset I hacked together very quickly. But it’s bad in various ways, so please collect your own.
Coding
- Start with the Lab 1 classifier code and a small set of images.
  - I’ve made some updates to the starter notebook so that it runs on MPS (Apple Silicon) and includes some example code for getting validation set predictions.
- One easy way to get your dataset into your notebook is to put it your public_html folder on the lab computers. Then you can access it at https://students.cs.calvin.edu/~username/filename.zip (make sure you include the tilde.) Then you can set url to that in the Lab 1 data loader code. (Be sure to change archive_path and extract_path as needed.)
  - If you have any trouble on the notebook, try accessing that URL on your own browser. If you get a permissions error, check the permissions on the ZIP file on the lab computers (right-click and Properties, or via the command line). For the web server to be able to read it, “Other” has to be able to “read”, i.e., chmod o+r ~/public_html/letter_images.zip.
- You might need to set the batch size to be smaller than the default.
Improving a model
- Try changing config parameters.
  - The model preset might have a particularly big impact on speed and accuracy. Look up documentation, such as torchvision models
  - One epoch is probably not enough.
- Visualize things:
  - What does your data look like?
  - What do the predictions of your classifier look like?
  - What does the confusion matrix look like?

To get the confusion matrix, loop over the validation set, loop over the validation dataloader and accumulate all of the probabilities:

val_predicted_probs = []
model.eval()
with torch.no_grad():
    for inputs, _ in tqdm(val_dataloader, desc="Predicting on validation set"):
        inputs = inputs.to(device)
        outputs = model(inputs)
        probs = outputs.softmax(dim=1).cpu().numpy()
        val_predicted_probs.append(probs)
val_predicted_probs = np.vstack(val_predicted_probs)  # Shape: (num_val_samples, num_classes)

Look at val_predicted_probs.shape and make sure you understand why its second dimension is 3.

Then get the model’s top prediction for each image using val_predictions = np.argmax(val_predicted_probs, axis=1)

To get the true labels out of the dataset, use

val_labels = np.hstack([
    labels.numpy() for _, labels in val_dataloader
])
val_labels.shape

Then to show a confusion matrix, use:

from sklearn.metrics import ConfusionMatrixDisplay
ConfusionMatrixDisplay.from_predictions(val_labels, val_predictions, display_labels=class_names)

(assuming that class_names is the same list you used when constructing the data loader).

Supervised Learning

Contents