Image Operations¶

Task: perform broadcast and reduction operations on a tensor representing a batch of color images

Goal: The goal of this exercise was just to get used to thinking about shapes of multidimensional structures. A surprisingly large amount of the thinking that goes into implementing neural net code is getting the shapes right. I didn’t really believe that until I had to figure it out myself a couple of times, and that convinced me that everyone could use some guided practice with that.

Setup¶

As usual, you don't need to understand the code in this section.

In [1]:
from fastai.vision.all import *

# Make one-channel images display in greyscale.
# See https://forums.fast.ai/t/show-image-displays-color-image-for-mnist-sample-dataset/78932/4
# But "Grays" is inverted, so we use "gray" instead.
matplotlib.rc('image', cmap='gray')

Download dataset.

In [2]:
path = untar_data(URLs.PETS) / "images"

Make a stable order for the images: first sort, then randomize using a known seed.

In [3]:
set_seed(333)
image_files = get_image_files(path).sorted().shuffle()

Define how we're going to split the data into a training and validation set.

In [4]:
splitter = RandomSplitter(valid_pct=0.2, seed=42)

In this dataset, cat breeds start with a capital letter, so we can get the label from the filename.

In [5]:
def cat_or_dog(x):
    return 'cat' if x[0].isupper() else 'dog'

def get_y(file_path):
    return cat_or_dog(file_path.name)

Define a standard image-classification DataBlock.

In [6]:
dblock = DataBlock(blocks    = (ImageBlock, CategoryBlock),
                   get_y     = get_y,
                   splitter  = splitter,
                   item_tfms = Resize(224))

Override shuffle_fn so that the images never actually get shuffled (batch order is consistent).

In [7]:
dataloaders = dblock.dataloaders(image_files, batch_size=9, shuffle_fn=lambda idxs: idxs)

Since we set the shuffle_fn to the identity above, the images will always get loaded in the same order, so the first batch will always be the same:

In [8]:
batch = dataloaders.train.one_batch()
images_orig, labels = batch
images = images_orig.clone() # make a copy that we can modify later.
In [9]:
show_image_batch((images, labels))

Task¶

Note: all of these operations are one-liners. If you find yourself writing lots of code, pause and think more (or ask for help).

  1. Evaluate images.shape. What does each number represent?
In [10]:
images.shape
Out[10]:
torch.Size([9, 3, 224, 224])

your answer here

  1. Evaluate labels. Explain those numbers, with the help of dataloaders.train.vocab.
In [11]:
labels
Out[11]:
TensorCategory([1, 0, 0, 0, 0, 1, 1, 0, 1])
In [12]:
dataloaders.train.vocab
Out[12]:
['cat', 'dog']

your answer here

  1. Show the first image in the batch. (Use show_image.)
In [13]:
# your code here
  1. Show the average image. Hint: you can compute this by taking the .mean(axis=___); think about what the blank is.
In [14]:
# your code here
  1. Show the average of the middle 3 images.

You'll need to use slicing to compute this. To make sure you're doing it right, first show the middle 3 images.

In [15]:
# your code here
  1. Show the grayscale version of all of the images.
  • Do this by making minimal changes to the previous exercise; do not import anything new.
  • For simplicity, just use an equal weighting of the red, green, and blue channels, i.e., take the mean across channels.
  • You can use show_images to show all of the images.
In [16]:
# your code here
  1. Invert the color of the images (e.g., black becomes white). Show the result.
In [17]:
# your code here

Second Set of Tasks¶

These are a bit more involved; you might need two or three lines of code.

The next exercise will require you to assign to slices. It wil also require you to "skip" dimensions in slicing. To prepare, study what this does:

In [18]:
images.shape
Out[18]:
torch.Size([9, 3, 224, 224])

Consider assigning to a slice along the first dimension. First, we'll assign to a single index on that dimension. Think about what shape we get from a single item on the first dimension:

In [19]:
images[0].shape
Out[19]:
torch.Size([3, 224, 224])
In [20]:
# The following three lines are equivalent in this case:
images[2] = 0.0
images[2] = torch.zeros(3, 224, 224)
images[2, :, :, :] = 0.0
show_images(images)
images = images_orig.clone() # restore the original images

Now the same thing, but with a slice:

In [21]:
# assign to first dimension. The following three lines are equivalent in this case:
images[2:4] = 0.0
images[2:4] = torch.zeros(2, 3, 224, 224)
images[2:4, :, :, :] = 0.0
show_images(images)
images = images_orig.clone() # restore the original images

Now we slice along the second dimension:

In [22]:
images[:, 0] = 0.0
show_images(images)
images = images_orig.clone() # restore the original images

Try this for the third dimension:

In [23]:
images[:, :, 30] = 0.0
show_images(images)
images = images_orig.clone() # restore the original images
  1. Black out (set to 0) the top 75 pixels of each image, then grey out (set to 0.5) the right 50 pixels of each image.
In [24]:
# your code here
In [25]:
# restore the original images for the next step
images = images_orig.clone()
  1. Show only the red color channel by zeroing out the green and blue channels.
In [26]:
# your code here