Task: Train a simple image classifier using cross-entropy loss
from fastai.vision.all import *
from fastbook import *
# Input tensors get tagged as `TensorImageBW`, and they keep that tag even after going through the model.
# I'm not sure how you're supposed to drop that tag, but this works around a type dispatch error.
TensorImageBW.register_func(F.cross_entropy, TensorImageBW, TensorCategory)
import sys
if sys.platform == "darwin":
# https://stackoverflow.com/a/64855500/69707
import os
os.environ['OMP_NUM_THREADS'] = '1'
Load up the MNIST dataset. It has 10 digits.
path = untar_data(URLs.MNIST)
path
Create a subset of the images, so we train faster. We do this by taking 500 random images of each digit.
set_seed(0)
num_imgs_per_digit = 500
items = L([
p
for split in ['training', 'testing']
for digit in range(10)
for p in (path/split/str(digit)).ls().shuffle()[:num_imgs_per_digit]
])
Create the dataloaders
. We need a slightly special ImageBlock
because we want grayscale images.
block = DataBlock(
blocks=(ImageBlock(PILImageBW), CategoryBlock),
get_y = parent_label,
splitter=GrandparentSplitter(train_name='training', valid_name="testing"),
)
dataloaders = block.dataloaders(items, bs=16)
print(f"{dataloaders.train.n} training images, {dataloaders.valid.n} validation images")
Let's inspect a batch of data.
dataloaders.train.show_batch()
print(f"Available categories: {dataloaders.train.vocab}")
Let's make a neural network to predict which digit was written, using the raw pixel values. We'll keep it at a single layer today, so this is actually just a fancy way of doing logistic regression. But it'll give us a chance to work with minibatches and loss functions.
Step 1: Create a linear layer of the appropriate dimensionality.
bias
off because it's redundant in this setting.linear_1 = nn.Linear(in_features=..., out_features=..., bias=False)
model = nn.Sequential(
nn.Flatten(),
linear_1,
)
We test out our model on one batch of data.
images, labels = dataloaders.train.one_batch()
images = TensorBase(images) # work around a fastai quirk
images.shape
logits = model(images)
assert logits.shape == (16, 10)
logits.shape
Think about why logits.shape
contains those two numbers.
labels
Now let's compute the cross-entropy loss. We'll use F.cross_entropy
from PyTorch, which has the following basic signature (simplified somewhat from the official docs):
F.cross_entropy(
logits: Tensor[Batch, Categories], # the unnormalized scores of each class, for each item in the batch
target: TensorCategory[Batch], # the correct label index (an int) for each item in the class
reduction: str = 'mean' # whether to return a single number for the average loss across the batch ('mean') or not ('none')
label_smoothing: float = 0.0 # how much label smoothing to apply (none by default)
)
Let's try it on our logits and labels for this batch. (I'm unsure why it's still a TensorCategory
, it should just be a normal Tensor
.)
loss = F.cross_entropy(logits, labels, reduction='none')
loss
loss.mean()
We can use argmax
to ask which category got the highest probability for each label. This will be useful for compting a metric like accuracy.
predictions = logits.argmax(dim=1)
print(predictions.shape)
predictions
Now, fill in the blanks in the code below to train the model.
num_epochs = 10
learning_rate = .01
losses = []
# Re-initialize the parameters of the model, so training restarts when this block starts.
linear_1.reset_parameters()
for epoch in range(num_epochs):
# Keep track of some things for each epoch.
total_images = 0
total_correct = 0
for images, labels in dataloaders.train:
images = TensorBase(images) # work around a quirk in fastai, ignore this
logits = ...
loss = ...
# take an SGD step.
loss.backward()
for parameter in model.parameters():
parameter.data -= learning_rate * parameter.grad
model.zero_grad()
# Track metrics
predictions = logits.argmax(axis=1)
num_accurate = (predictions == labels).sum()
total_images += len(labels)
total_correct += num_accurate
# Track losses.
losses.append(loss.item())
# Epoch done, print some stats.
avg_loss_this_epoch = np.mean(losses[-total_images:])
print(f"Epoch {epoch:2d}: loss={avg_loss_this_epoch:.2f}, train accuracy {total_correct:3d}/{total_images}")
# Plot the un-smoothed loss
#plt.plot(losses)
# Plot a smoothed version of the loss (easier to see the trend)
pd.Series(losses).ewm(alpha = .1).mean().plot()
plt.xlabel("Iteration")
plt.ylabel("Cross-Entropy Loss");
Let's inspect the weights of our trained network. Since we have a single layer, it's relatively easy to do this. First, look at the weights of the linear_1
layer:
linear_1.weight.shape
weight_images = linear_1.weight.data.view((10, 28, 28))
show_images(weight_images)
Q1: Why is logits.shape
16 by 10?
your answer here
Q2: Before we trained the model (i.e., it just had random weights), the cross entropy was all about the same number. What was that number, and why? Hint:
np.log(10)
your answer here
Q3: Adjust the learning rate parameter. Give an example of a learning rate that is too high, one that is too low, and one that is good. For each, explain your answer by describing what the loss curve looks like; how do its shape and its values indicate good or bad training?
your answer here
Q4: Why the weight images look the way they do? (Why might they look similar to the digits in question? Why might they look not exactly like the digits in question?)
your answer here
PyTorch gives us optimizer objects that do all the work of updating parameters. It not only saves code, it lets us swap in fancier optimizers.
SGD
with AdamW
and compare the results.num_epochs = 10
learning_rate = .01
losses = []
# Initialize the optimizer.
optimizer = torch.optim.SGD(params=model.parameters(), lr=learning_rate)
# Re-initialize the parameters of the model, so training restarts when this block starts.
linear_1.reset_parameters()
for epoch in range(num_epochs):
# Keep track of some things for each epoch.
total_images = 0
total_correct = 0
for images, labels in dataloaders.train:
images = TensorBase(images) # work around a quirk in fastai, ignore this
logits = ...
loss = ...
# take an SGD step.
loss.backward()
optimizer.step()
model.zero_grad()
# Track metrics
predictions = logits.argmax(axis=1)
num_accurate = (predictions == labels).sum()
total_images += len(labels)
total_correct += num_accurate
# Track losses.
losses.append(loss.item())
# Epoch done, print some stats.
avg_loss_this_epoch = np.mean(losses[-total_images:])
print(f"Epoch {epoch:2d}: loss={avg_loss_this_epoch:.2f}, train accuracy {total_correct:3d}/{total_images}")
# Plot the un-smoothed loss
#plt.plot(losses)
# Plot a smoothed version of the loss (easier to see the trend)
pd.Series(losses).ewm(alpha = .1).mean().plot()
plt.xlabel("Iteration")
plt.ylabel("Cross-Entropy Loss");