Outcomes
The process of completing this assignment will improve your ability to:
- Explain the importance of evaluating image classifiers on unseen data.
- Describe characteristics of a dataset that are relevant to downstream performance.
- Tweak a model to try to improve its performance.
We’ll be making a Datasheets for Datasets.
Task
Load up the classifier you trained in Homework 1. Evaluate it on a set of images collected by others in the class.
To keep our attention on the data instead of the process of writing code, an example is provided of all of the code necessary for a basic analysis. However, please do not simply copy and paste from this notebook; retype any code you need yourself. That should help you make sure that you understand what each step is for.
- Report what actual accuracy you got. Also report the loss, if you can.
- What were your classifier’s most frequent mistakes? Quantify the mistakes (using the confusion matrix) and make an educated guess as to why these might be the most common mistakes (by, for example, studying the top losses).
- Recall that in Homework 1 you estimated the accuracy that your classifier would obtain on other people’s images. Compare the accuracy you observe to the accuracy that you thought you’d get.
- Now let’s write a datasheet for the specific training data you collected. Read the introduction to the Dataset Documentation (Datasheets for Datasets) template. Then, skim through the questions that follow. Choose two or three questions that are most relevant to how well the model that you trained on that data worked on new data. At the end of your notebook, include both the question texts and your answers. Good answers are those that would most help someone who is training on your dataset predict how it will work on new data.
- Go back to your Homework 1 classifier. Make one change to how the classifier is trained. Evaluate the change in accuracy you observe during training, and the change in accuracy you observe here (i.e., on the broader set of data). (Note: 0/1 accuracy has high variance with such a small dataset, so you probably want to compare the cross-entropy loss values you see on the validation set.)
Possible things to adjust:
- How big your validation set is
- Which foundation model to use (e.g.,
resnet34vsresnet16) - What data augmentation (if any) to apply
- How many epochs to train
- What learning rate to use
Details
Obtaining data
A dataset of images uploaded by 11 different students is available here: https://students.cs.calvin.edu/~ka37/letter_images_dataset_v0.zip. The structure is the same as what was expected in Homework 1. So you should be able to run:
dataset_path = untar_data(url)
In future years we might actually make this a Kaggle competition, but I couldn’t get the logistics worked out in time this year.
Note: The example provided would re-train the training set classifier each time, leading to different performance depending on the random initialization of the classifiaction head. You may want to save your trained classifier. (e.g.,
saved_clf_file = Path('classifier.pkl'); if saved_clf_file.exists(): learn.load(saved_clf_file); else: learn = ...; learn.export(saved_clf_file). Think about what other sources of variation might come up, and how you might be systematic about them.
Batches
fastaidrops incomplete batches in the training set. Unfortunately you can’t set the batch size to 1 (because the model uses batch normalization–beyond the scope of this course). So give the keyword argbs = 2to whatever creates yourdataloaders(e.g.,ImageDataLoaders.from_path_func(..., bs = 2)).- There’s a code chunk in the sample solution that demonstrates how to look at the batch sizes and number of images that your classifier actually gets trained on.
- Since you have so little data,
fine_tune()for several epochs – at leastfine_tune(4).
Other details
All code is given in the example linked above.