Task: trace and explain the dimensionality of each tensor in a simple image classifier.
from fastai.vision.all import *
from fastbook import *
matplotlib.rc('image', cmap='Greys')
Get some example digits from the MNIST dataset.
path = untar_data(URLs.MNIST_SAMPLE)
threes = (path/'train'/'3').ls().sorted()
sevens = (path/'train'/'7').ls().sorted()
len(threes), len(sevens)
(6131, 6265)
Here is one image:
example_3 = Image.open(threes[1])
example_3
To prepare to use it as input to a neural net, we first convert integers from 0 to 255 into floating point numbers between 0 and 1.
example_3_tensor = tensor(example_3).float() / 255
example_3_tensor.shape
torch.Size([28, 28])
height, width = example_3_tensor.shape
Our particular network will ignore the spatial relationship between the features; later we'll learn about network architectures that do pay attention to spatial neighbors. So we'll flatten the image tensor into 28*28 values.
example_3_flat = example_3_tensor.view(width * height)
example_3_flat.shape
torch.Size([784])
We'll define a simple neural network (in the book, chapter 4, a 3-vs-7 classifier) as the sequential combination of 3 layers.
Terminology note: This is a Multi-Layer Perceptron (MLP) with one hidden layer of 30 hidden features. It has one output feature (which we would train to generate the log-odds of 3 vs 7.)
First we define each layer:
# Define the layers. This is where you'll try changing constants.
linear_1 = nn.Linear(in_features=784, out_features=30, bias=True)
relu_layer = nn.ReLU()
linear_2 = nn.Linear(in_features=30, out_features=1, bias=True)
# Then we put them together in sequence.
simple_net = nn.Sequential(
linear_1,
relu_layer,
linear_2
)
Each of nn.Linear, nn.ReLU, and nn.Squential are PyTorch modules. We can call a module with some input data to get the output data:
simple_net(example_3_flat)
tensor([-0.1385], grad_fn=<AddBackward0>)
Your turn:
The outputs of each layer are called activations, so we can name the variables act1 for the activations of layer 1, and so forth. Each act will be a function of the previous act (or the input, for the first layer.)
inp = example_3_flat
act1 = ...
act2 = ...
act3 = ...
act1, act2, and act3. (Code already provided; look at the results.)act1
tensor([-0.1971, -0.2886, 0.2023, -0.0984, 0.1338, -0.1604, 0.2701, -0.3103, 0.2313, 0.1280, -0.3245, 0.1302, -0.1761, -0.1394, 0.0234, -0.1384, 0.3531, 0.5236, -0.1388, 0.1109, 0.0033,
0.1793, -0.3673, -0.0706, -0.1324, -0.4853, 0.3566, 0.1476, -0.2868, -0.0929], grad_fn=<AddBackward0>)
act2
tensor([0.0000, 0.0000, 0.2023, 0.0000, 0.1338, 0.0000, 0.2701, 0.0000, 0.2313, 0.1280, 0.0000, 0.1302, 0.0000, 0.0000, 0.0234, 0.0000, 0.3531, 0.5236, 0.0000, 0.1109, 0.0033, 0.1793, 0.0000, 0.0000,
0.0000, 0.0000, 0.3566, 0.1476, 0.0000, 0.0000], grad_fn=<ReluBackward0>)
act3
tensor([-0.1385], grad_fn=<AddBackward0>)
How would you describe the relationship between act1 and act2? Specifically,
shape of act1, act2, and act3.# your code here
(torch.Size([30]), torch.Size([30]), torch.Size([1]))
linear_1.in_features, linear_2.out_features, etc. (ignore the torch.Size( part)linear_1.in_features
784
act1_shape = [linear_1.out_features]
act2_shape = [...]
act3_shape = [...]
assert list(act1_shape) == list(act1.shape)
assert list(act2_shape) == list(act2.shape)
assert list(act3_shape) == list(act3.shape)
shape of linear_1.weight, linear_1.bias, and the same for linear_2. Write expressions that give the value of each shape in terms of the in_features and other parameters.print(f"Linear 1: Weight shape is {list(linear_1.weight.shape)}, bias shape is {list(linear_1.bias.shape)}")
print(f"Linear 2: Weight shape is {list(linear_2.weight.shape)}, bias shape is {list(linear_2.bias.shape)}")
Linear 1: Weight shape is [30, 784], bias shape is [30] Linear 2: Weight shape is [1, 30], bias shape is [1]
linear_1_weight_shape = [...]
linear_1_bias_shape = [...]
linear_2_weight_shape = [...]
linear_2_bias_shape = [...]
assert list(linear_1_weight_shape) == list(linear_1.weight.shape)
assert list(linear_1_bias_shape) == list(linear_1.bias.shape)
assert list(linear_2_weight_shape) == list(linear_2.weight.shape)
assert list(linear_2_bias_shape) == list(linear_2.bias.shape)
nn.Linear modules. Identify an example of:your answer here
act1 and act2.your answer here
Linear layer (weight and bias).your answer here