Image Operations¶

Task: perform broadcast and reduction operations on a tensor representing a batch of color images

Goal: The goal of this exercise was just to get used to thinking about shapes of multidimensional structures. A surprisingly large amount of the thinking that goes into implementing neural net code is getting the shapes right. I didn’t really believe that until I had to figure it out myself a couple of times, and that convinced me that everyone could use some guided practice with that.

Setup¶

from fastai.vision.all import *

# Make one-channel images display in greyscale.
# See https://forums.fast.ai/t/show-image-displays-color-image-for-mnist-sample-dataset/78932/4
# But "Grays" is inverted, so we use "gray" instead.
matplotlib.rc('image', cmap='gray')

Download dataset.

path = untar_data(URLs.PETS) / "images"

Make a stable order for the images: first sort, then randomize using a known seed.

set_seed(333)
image_files = get_image_files(path).sorted().shuffle()

Define how we're going to split the data into a training and validation set.

splitter = RandomSplitter(valid_pct=0.2, seed=42)

In this dataset, cat breeds start with a capital letter, so we can get the label from the filename.

def cat_or_dog(x):
    return 'cat' if x[0].isupper() else 'dog'

def get_y(file_path):
    return cat_or_dog(file_path.name)

Define a standard image-classification DataBlock.

dblock = DataBlock(blocks    = (ImageBlock, CategoryBlock),
                   get_y     = get_y,
                   splitter  = splitter,
                   item_tfms = Resize(224))

Override shuffle_fn so that the images never actually get shuffled (batch order is consistent).

dataloaders = dblock.dataloaders(image_files, batch_size=9, shuffle_fn=lambda idxs: idxs)

Since we set the shuffle_fn to the identity above, the images will always get loaded in the same order, so the first batch will always be the same:

batch = dataloaders.train.one_batch()
images_orig, labels = batch
images = images_orig.clone()

show_image_batch((images, labels))

Task¶

Evaluate images.shape. What does each number represent?

images.shape

torch.Size([9, 3, 224, 224])

your answer here

Evaluate labels. Explain those numbers, with the help of dataloaders.train.vocab.

labels

TensorCategory([1, 0, 0, 0, 0, 1, 1, 0, 1])

dataloaders.train.vocab

['cat', 'dog']

your answer here

Show the first image in the batch. (Use show_image.)

# your code here

Show the average image. Hint: you can compute this by taking the .mean(axis=___); think about what the blank is.

# your code here

Part B: Show the average of the middle 3 images.

You'll need to use slicing to compute this.

# your code here

Show the grayscale version of all of the images.

Do this by making minimal changes to the previous exercise; do not import anything new.
For simplicity, just use an equal weighting of the red, green, and blue channels.
You can use show_images to show all of the images.

# your code here

Invert the color of the images (e.g., black becomes white). Show the result.

# your code here

The next exercise will require you to assign to slices. It wil also require you to "skip" dimensions in slicing. To prepare, study what this does:

images.shape

torch.Size([9, 3, 224, 224])

images[:, 0].shape

torch.Size([9, 224, 224])

images[:, :, 0, 0] = 0.0
print(images[0, 0, 0, 0], images[5, 1, 0, 0], images[0, 0, 5, 1])
# restore the original images
images = images_orig.clone()

TensorImage(0.) TensorImage(0.) TensorImage(0.8353)

Black out (set to 0) the top 75 pixels of each image, then grey out (set to 0.5) the right 50 pixels of each image.

# your code here

# restore the original images for the next step
images = images_orig.clone()

Show only the red color channel by zeroing out the green and blue channels.

# your code here

Analysis¶

What does each number in images.shape represent?

your answer here

Explain the numbers in labels.

your answer here