Unit 4: Neural Models

In this unit we extend our modeling skills to encompass classification models, and start to build the tools that will let us represent complex functions by using hidden layers. Both of these objectives require us to learn about nonlinear operations. We’ll focus on the two most commonly used ones: the softmax operator (which converts scores to probabilities) and the rectifier (“ReLU”, which clips negative values).

Neural Models

Describe and compute cross-entropy loss

Explain the purpose and mathematical properties of the softmax operation.

Explain the role of nonlinearities in a neural network (e.g., why they are used between linear layers)

Implement a logistic regression model using basic numerical computing primitives

Implement numerical computing operations on multidimensional arrays using PyTorch

Explain the purpose of automatic differentiation in machine learning

Contents

Preparation 4 (draft!)

The content may not be revised for this year. If you really want to see it, click the link above.

Softmax

Background

Analogy: softmax takes skill scores and turns them into probabilities of winning.
Example:
- equal skill scores -> 50-50 chance of winning.
- one player has much higher skill score -> much higher chance of winning.
Plain-English: (1) make sure scores aren’t negative, (2) make sure they add up to 1.
Technical definition:
- softmax input: a vector x of shape (n,)
- softmax output: a vector y of shape (n,) where y[i] is the probability that x is in class i.
- y[i] = exp(x[i]) / sum(exp(x))

Jargon:

Logits or scores: the inputs to the softmax function.
probabilities or probs: the outputs of the softmax function.
logprobs: the log of the probabilities.

Warm-Up Activity

Open the softmax and cross-entropy interactive demo that Prof Arnold created.

Try adjusting the logits (the inputs to softmax) and get a sense for how the outputs change. Describe the outputs when:

All of the inputs are the same value. (Does it matter what the value is?)
One input is much bigger than the others.
One input is much smaller than the others.

Finally, describe the input that gives the largest possible value for output 1.

Notebooks

Softmax, part 1 (name: u04n2-softmax.ipynb; show preview, open in Colab)

PyTorch and Logistic Regression

Logistic Regression

Analogy: logistic regression is like linear regression, but for classification.
Example: predict whether an email is spam or not (binary logistic regression), or predict which of several categories a news article belongs to (multiclass logistic regression).
Plain-English: (1) multiply inputs by weights, (2) add a bias, (3) squash the result to numbers between 0 and 1, (4) train to make the right answer more likely.
Technical definition:
- logistic_regression input: an array X of shape (samples, features)
- logistic_regression output: an array y of shape (samples, classes) where y[i, j] is the probability that sample i is in class j.
- logits = x @ W + b where W is an array of shape (features, classes) and b is an array of shape (classes,).
- logits is then passed through the softmax function to get the output probabilities: y = softmax(logits).
  - The softmax function is defined as softmax(x) = exp(x) / sum(exp(x)), where the sums are taken across the classes.
- Categorical cross entropy loss (negative log likelihood) is used to train the model: loss_i = -sum(y_true_onehot_i * log(y_pred_i)).

Jargon:

Logits or scores: the inputs to the softmax function.
probabilities or probs: the outputs of the softmax function.

PyTorch

Imports:

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

Building a model object with the desired architecture (structure)

model = nn.Linear(in_features=2, out_features=3, bias=True)

# or

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(in_features=2, out_features=3, bias=True)
    
    def forward(self, x):
        return self.linear(x)
model = Model()

# or

n_hidden = 100
model = nn.Sequential(
    nn.Linear(in_features=2, out_features=n_hidden, bias=True),
    nn.ReLU(),
    nn.Linear(in_features=n_hidden, out_features=3, bias=True)
)

Training a model:

loss_fn = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters())
# in a training loop ...
y_pred = model(x)
loss = loss_fn(y_pred, y_true)
loss.backward()
optimizer.step()

Warm-Up Questions

We’re classifying houses as low/medium/high price based on longitude and latitude using logistic regression. The model outputs 3 scores, one for each class. For 100 houses (processed all at once in a “batch” of samples):

a. What shape is X? X.shape =

b. What shape should W (the array of weights) be? W.shape =

c. What shape should b (the array of biases) be? b.shape =

d. What shape will the output have? (X @ W + b).shape =
For one house, if our model outputs scores [1.0, 2.0, -1.0] for low/med/high prices:

Write the steps to convert these scores to probabilities that sum to 1. (You can use words or math notation.)
If the true label for this house is “medium”, what’s the model’s accuracy and loss for this house? (You can use words or math notation.)

Notebooks

From Linear Regression in NumPy to Logistic Regression in PyTorch (name: u04n3-logreg-pytorch.ipynb; show preview, open in Colab)

Neural Models

Automatic Differentiation

Contents

Background

Warm-Up Activity

Notebooks

Logistic Regression

PyTorch

Warm-Up Questions

Notebooks