Task: perform broadcast and reduction operations on a tensor representing a batch of color images
Goal: The goal of this exercise was just to get used to thinking about shapes of multidimensional structures. A surprisingly large amount of the thinking that goes into implementing neural net code is getting the shapes right. I didn’t really believe that until I had to figure it out myself a couple of times, and that convinced me that everyone could use some guided practice with that.
As usual, you don't need to understand the code in this section.
from fastai.vision.all import *
# Make one-channel images display in greyscale.
# See https://forums.fast.ai/t/show-image-displays-color-image-for-mnist-sample-dataset/78932/4
# But "Grays" is inverted, so we use "gray" instead.
matplotlib.rc('image', cmap='gray')
Download dataset.
path = untar_data(URLs.PETS) / "images"
Make a stable order for the images: first sort, then randomize using a known seed.
image_files = get_image_files(path).sorted().shuffle()
Define how we're going to split the data into a training and validation set.
splitter = RandomSplitter(valid_pct=0.2, seed=42)
In this dataset, cat breeds start with a capital letter, so we can get the label from the filename.
def cat_or_dog(x):
return 'cat' if x[0].isupper() else 'dog'
def get_y(file_path):
return cat_or_dog(file_path.name)
Define a standard image-classification DataBlock
dblock = DataBlock(blocks = (ImageBlock, CategoryBlock),
get_y = get_y,
splitter = splitter,
item_tfms = Resize(224))
Override shuffle_fn
so that the images never actually get shuffled (batch order is consistent).
dataloaders = dblock.dataloaders(image_files, batch_size=9, shuffle_fn=lambda idxs: idxs)
Since we set the shuffle_fn
to the identity above, the images will always get loaded in the same order, so the first batch will always be the same:
batch = dataloaders.train.one_batch()
images_orig, labels = batch
images = images_orig.clone() # make a copy that we can modify later.
show_image_batch((images, labels))
Note: all of these operations are one-liners. If you find yourself writing lots of code, pause and think more (or ask for help).
. What does each number represent?images.shape
torch.Size([9, 3, 224, 224])
your answer here
. Explain those numbers, with the help of dataloaders.train.vocab
TensorCategory([1, 0, 0, 0, 0, 1, 1, 0, 1])
['cat', 'dog']
your answer here
.)# your code here
; think about what the blank is.# your code here
You'll need to use slicing to compute this. To make sure you're doing it right, first show the middle 3 images.
# your code here
across channels.show_images
to show all of the images.# your code here
# your code here
These are a bit more involved; you might need two or three lines of code.
The next exercise will require you to assign to slices. It wil also require you to "skip" dimensions in slicing. To prepare, study what this does:
torch.Size([9, 3, 224, 224])
Consider assigning to a slice along the first dimension. First, we'll assign to a single index on that dimension. Think about what shape we get from a single item on the first dimension:
torch.Size([3, 224, 224])
# The following three lines are equivalent in this case:
images[2] = 0.0
images[2] = torch.zeros(3, 224, 224)
images[2, :, :, :] = 0.0
images = images_orig.clone() # restore the original images
Now the same thing, but with a slice:
# assign to first dimension. The following three lines are equivalent in this case:
images[2:4] = 0.0
images[2:4] = torch.zeros(2, 3, 224, 224)
images[2:4, :, :, :] = 0.0
images = images_orig.clone() # restore the original images
Now we slice along the second dimension:
images[:, 0] = 0.0
images = images_orig.clone() # restore the original images
Try this for the third dimension:
images[:, :, 30] = 0.0
images = images_orig.clone() # restore the original images
# your code here
# restore the original images for the next step
images = images_orig.clone()
# your code here