In this unit, after reviewing where we’ve been, we push towards state-of-the-art models (still focusing on computer vision). We’ll first show how our work last 2 weeks connects to the pre-trained models we used in the opening weeks. Then, we’ll introduce or revisit tools that allow our models to achieve high performance, such as data augmentation and regularization.
The process of completing this assignment will improve your ability to:
Along the way, we’ll participate in a Kaggle competition, so you’ll get to practice with that.
Load up the classifier you trained in Homework 1. Use it to make predictions on a set of images collected by others in the class. You’ll do this by participating in a Kaggle competition.
Click the link provided in Moodle to join the Kaggle competition. Then make a copy of your Homework 1 notebook (in Google Colab, File → Save a copy) to use as your starting point for this assignment.
Download the competition dataset into your Colab notebook:
import urllib.request, zipfile
from pathlib import Path
competition_url = "https://students.cs.calvin.edu/~ka37/letter-images-26sp.zip"
competition_dir = Path("./data/competition")
competition_dir.mkdir(parents=True, exist_ok=True)
archive_path = competition_dir / "letter-images-26sp.zip"
if not archive_path.exists():
print(f"Downloading {competition_url}...")
urllib.request.urlretrieve(competition_url, archive_path)
with zipfile.ZipFile(archive_path, "r") as z:
z.extractall(competition_dir)
print("Done.")
Then check what’s inside:
!ls {competition_dir}
You should see folders and CSV files similar to:
sample_submission.csv test/ test.csv train/ train.csv
The competition images are in a flat folder with a CSV file mapping filenames to labels (not sorted into class subfolders like your Homework 1 data). Here’s how to load them:
import pandas as pd
from torch.utils.data import Dataset
from PIL import Image
valid_df = pd.read_csv(competition_dir / 'train.csv').sort_values('filename')
test_df = pd.read_csv(competition_dir / 'test.csv').sort_values('filename')
class CSVImageDataset(Dataset):
"""Load images from a flat folder using a CSV for labels."""
def __init__(self, df, image_dir, transform, class_names):
self.df = df.reset_index(drop=True)
self.image_dir = Path(image_dir)
self.transform = transform
self.class_names = class_names
def __len__(self):
return len(self.df)
def __getitem__(self, idx):
row = self.df.iloc[idx]
image = Image.open(self.image_dir / row['filename']).convert('RGB')
image = self.transform(image)
if 'label' in row and pd.notna(row['label']):
label = self.class_names.index(row['label'])
return image, label
return image, -1 # test set has no labels
valid_dataset = CSVImageDataset(valid_df, competition_dir / 'train', data_transforms, class_names)
valid_dataloader = DataLoader(valid_dataset, batch_size=config.batch_size, shuffle=False)
Here data_transforms and class_names should be the same ones from your HW1 notebook (class_names should be ['a', 'b', 'c']).
submission.csv. Upload it to the Kaggle competition. Name your submission “Homework 1 baseline” or the like. Write down in your analysis how well your baseline does on the leaderboard.test_dataset = CSVImageDataset(test_df, competition_dir / 'test', data_transforms, class_names)
test_dataloader = DataLoader(test_dataset, batch_size=config.batch_size, shuffle=False)
# Get predictions
test_predictions = []
model.eval()
with torch.no_grad():
for inputs, _ in test_dataloader:
inputs = inputs.to(device)
outputs = model(inputs)
preds = outputs.argmax(dim=1).cpu().numpy()
test_predictions.extend(preds)
# Map predictions to class names and save
test_df['label'] = [class_names[p] for p in test_predictions]
test_df[['id', 'label']].to_csv('submission.csv', index=False)
Upload submission.csv to the Kaggle competition page.
Submit your Homework 3 notebook to Moodle. You don’t need to submit a revised Homework 1 notebook, but make sure that your Homework 3 notebook includes details of what you changed in that notebook.
Your notebook should include:
Possible things to adjust:
Think about what other sources of variation might come up, and how you might be systematic about them.
We didn’t do code for image augmentation in the lab, but it’s straightforward with torchvision.transforms. In your Homework 1 notebook, create an augmentation pipeline that applies random transformations before the standard preprocessing:
from torchvision import transforms
augmentation_transforms = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(15),
transforms.ColorJitter(brightness=0.2, contrast=0.2),
transforms.Resize((config.image_size, config.image_size)),
# then apply the same normalization as your pretrained model expects:
config.pretrained_weights.transforms(crop_size=config.image_size),
])
Then create a new version of your training dataset that uses these augmented transforms:
train_dataset_aug = datasets.ImageFolder(
root=your_data_path,
transform=augmentation_transforms
)
Because the random transforms are applied each time an image is loaded, each epoch will see different augmented versions of your training images.
I suggest visualizing some example batches from the augmented dataset to make sure the augmentation is working as you expect. When you train the model, use the augmented dataset instead of the original.
Think carefully about which augmentations make sense for handwritten letters. For example, would a horizontal flip be appropriate for distinguishing ‘b’ from ’d’?
None yet this year.
Try the Gradient Game: How few calls do you need to get the Loss small? How do you do it?
MNIST with PyTorch
(name: u06n1-mnist-torch.ipynb; show preview,
open in Colab)
Some stragglers:
Critical Theory has monopolized ethical discussions in many areas of society lately, so many ethics researchers tend to highlight aspects of their work that relate to power differentials across race and gender. But other aspects are also important; just a few examples include the relationship of AI moderation to freedom of speech, the environmental impact of AI, the existential risks that we may be taking in developing more powerful AI technology, and systems that optimize themselves to hold our attention.
There are some things in our book that the fastai people make a much bigger deal about than most researchers and practitioners. It’s a lot of reading and stuff to understand, so I’m trying to help you focus on the parts that are going to pay most dividends down the line.
Yes!
Generally very much so. They’re trained with lots of noise added intentionally, so little differences in floating point behavior don’t tend to matter much.
The abs function should work as a nonlinearity, but my intuition is it would be harder to learn (because the effect of increasing an activation flips when you’re on the other side of 0).
A linear layer.
The Wikipedia article is actually pretty good here.
They used to have to be, but modern networks can work with any size. It’s still more efficient to run a batch of images at the same size through at the same time though.
Piecewise linear approximation.
See the Glossary.