You might have heard (or experienced) that Python is slow. So how can Python be the language behind basically all of the recent advances in AI, which all require huge amounts of computing? The secret is array computing. The Python code is orchestrating operations that happen on powerful “accelerator” hardware like GPUs and TPUs. Those operations typically involve repeatedly applying an operation to a big (usually rectangular) arrays of numbers, hence, array computing.
For those used to writing loops, this sort of coding can take some getting used to. Here are two exercises that previous students have found very helpful in getting their mind around how arrays work in PyTorch. (The concepts are basically identical in other libraries like TensorFlow, NumPy, and JAX.)
The notebook today has blanks that are designed for you to think about. Colab’s AI autocomplete will try to suggest filling in the blank, which totally defeats the point. So:
We’ve also disabled it in the notebook itself (under Edit->Notebook settings->Hide generative AI features), in case you need to turn it back on for some reason.
u02n1-pytorch.ipynb; show preview,
open in Colab)
for loop approach
The reference below is an AI-generated summary of the material in the notebook.
A dot product is a fundamental operation in neural networks, particularly in linear (Dense) layers. Key concepts:
Basic form: y = w1*x1 + w2*x2 + ... + wN*xN + b
x[i] is multiplied by its corresponding weight w[i]b (can be omitted for simplicity)torch.dot(w, x) or w @ xw * x(w * x).sum()A linear transformation is the basic building block of neural networks:
y = w*x + bw represents weightsb represents biasw * x multiplies corresponding elementsCommon reduction methods:
sum(): Adds all elementsmean(): Computes averagemax(): Finds maximum valueargmax(): Finds index of maximum valueCan be called as methods (x.sum()) or functions (torch.sum(x))
Common error metric for regression tasks.
Formula: MSE = (1/n)Σ(y_true - y_pred)²
Implementation steps:
y_true - y_pred(y_true - y_pred)**2((y_true - y_pred)**2).mean()PyTorch provides built-in implementations:
F.mse_loss(y_pred, y_true)nn.MSELoss()axis parameterx.sum(axis=1) sums along axis 1Important: Read this whole document before you start.
In this assignment, you will train and evaluate your own image classifier to distinguish the handwritten letters A, B, and C.
Completing this homework will give you practice
A famous image classification example is handwritten digits (called MNIST). For fun, we’ll remix that idea and classify handwritten letters.
To keep it manageable, we’ll just work with the first 3 letters (a through c).
Try to make the best model you can, under the following constraints:
torchvision.Let’s make this a friendly competition: which team (of up to 5) can make the best classifier?
images/c/c01.png.
images/README.txt describing your dataset (see below for details)Your report should be a professionally crafted Jupyter Notebook, suitable to use in a portfolio. So your notebook should be:
## Headings; proofread text and codeWe highly recommend the following structure:
The notebook shouldn’t include anything that doesn’t apply to these goals (e.g., no in-applicable text retained from an original notebook)
For this assignment:
README.txt.public_html folder on the lab computers. Then you can access it at https://students.cs.calvin.edu/~username/filename.zip (make sure you include the tilde.) Then you can set url to that in the Lab 1 data loader code. (Be sure to change archive_path and extract_path as needed.)
chmod o+r ~/public_html/letter_images.zip.torchvision modelsTo get the confusion matrix, loop over the validation set, loop over the validation dataloader and accumulate all of the probabilities:
val_predicted_probs = []
model.eval()
with torch.no_grad():
for inputs, _ in tqdm(val_dataloader, desc="Predicting on validation set"):
inputs = inputs.to(device)
outputs = model(inputs)
probs = outputs.softmax(dim=1).cpu().numpy()
val_predicted_probs.append(probs)
val_predicted_probs = np.vstack(val_predicted_probs) # Shape: (num_val_samples, num_classes)
Look at val_predicted_probs.shape and make sure you understand why its second dimension is 3.
Then get the model’s top prediction for each image using val_predictions = np.argmax(val_predicted_probs, axis=1)
To get the true labels out of the dataset, use
val_labels = np.hstack([
labels.numpy() for _, labels in val_dataloader
])
val_labels.shape
Then to show a confusion matrix, use:
from sklearn.metrics import ConfusionMatrixDisplay
ConfusionMatrixDisplay.from_predictions(val_labels, val_predictions, display_labels=class_names)
(assuming that class_names is the same list you used when constructing the data loader).