In this unit we extend our modeling skills to encompass classification models, and start to build the tools that will let us represent complex functions by using hidden layers. Both of these objectives require us to learn about nonlinear operations. We’ll focus on the two most commonly used ones: the softmax operator (which converts scores to probabilities) and the rectifier (“ReLU”, which clips negative values).
We’ll be doing some automatic differentiation this week:
autograd-for-dummies: A minimal autograd engine and neural network library for machine learning students.softmax input: a vector x of shape (n,)softmax output: a vector y of shape (n,) where y[i] is the probability that x is in class i.y[i] = exp(x[i]) / sum(exp(x))Jargon:
Open the softmax and cross-entropy interactive demo that Prof Arnold created.
Try adjusting the logits (the inputs to softmax) and get a sense for how the outputs change. Describe the outputs when:
Finally, describe the input that gives the largest possible value for output 1.
Softmax, part 1
(name: u04n2-softmax.ipynb; show preview,
open in Colab)
logistic_regression input: an array X of shape (samples, features)logistic_regression output: an array y of shape (samples, classes) where y[i, j] is the probability that sample i is in class j.logits = x @ W + b where W is an array of shape (features, classes) and b is an array of shape (classes,).logits is then passed through the softmax function to get the output probabilities: y = softmax(logits).
softmax(x) = exp(x) / sum(exp(x)), where the sums are taken across the classes.loss_i = -sum(y_true_onehot_i * log(y_pred_i)).Jargon:
Imports:
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
Building a model object with the desired architecture (structure)
model = nn.Linear(in_features=2, out_features=3, bias=True)
# or
class Model(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(in_features=2, out_features=3, bias=True)
def forward(self, x):
return self.linear(x)
model = Model()
# or
n_hidden = 100
model = nn.Sequential(
nn.Linear(in_features=2, out_features=n_hidden, bias=True),
nn.ReLU(),
nn.Linear(in_features=n_hidden, out_features=3, bias=True)
)
Training a model:
loss_fn = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters())
# in a training loop ...
y_pred = model(x)
loss = loss_fn(y_pred, y_true)
loss.backward()
optimizer.step()
We’re classifying houses as low/medium/high price based on longitude and latitude using logistic regression. The model outputs 3 scores, one for each class. For 100 houses (processed all at once in a “batch” of samples):
a. What shape is X? X.shape =
b. What shape should W (the array of weights) be? W.shape =
c. What shape should b (the array of biases) be? b.shape =
d. What shape will the output have? (X @ W + b).shape =
For one house, if our model outputs scores [1.0, 2.0, -1.0] for low/med/high prices:
Write the steps to convert these scores to probabilities that sum to 1. (You can use words or math notation.)
If the true label for this house is “medium”, what’s the model’s accuracy and loss for this house? (You can use words or math notation.)
From Linear Regression in NumPy to Logistic Regression in PyTorch
(name: u04n3-logreg-pytorch.ipynb; show preview,
open in Colab)