Task: fit a linear regression using gradient descent, with gradients computed by backpropagation using PyTorch
import torch
torch.set_printoptions(precision=3)
from torch import tensor
import ipywidgets as widgets
import matplotlib.pyplot as plt
%matplotlib inline
Suppose we have a dataset with just a single feature x and continuous outcome variable y.
import pandas as pd
data = pd.read_csv("https://data.giss.nasa.gov/gistemp/graphs_v4/graph_data/Global_Mean_Estimates_based_on_Land_and_Ocean_Data/graph.csv", skiprows=1)
data.head()
| Year | No_Smoothing | Lowess(5) | |
|---|---|---|---|
| 0 | 1880 | -0.16 | -0.09 |
| 1 | 1881 | -0.08 | -0.13 |
| 2 | 1882 | -0.11 | -0.16 |
| 3 | 1883 | -0.17 | -0.20 |
| 4 | 1884 | -0.28 | -0.24 |
# Shape x to be items-by-features
x = torch.tensor(data.iloc[:, 0], dtype=torch.float).view(-1, 1)
# scale x to a reasonable range
x -= 1880.0
x /= 100.
y_true = torch.tensor(data.iloc[:, 1]).view(-1, 1)
plt.plot(x, y_true);
print(x.shape, y_true.shape)
torch.Size([143, 1]) torch.Size([143, 1])
Create some features that we'll need later; don't worry about this code right now.
# a simple hardcoded linear layer
pretend_first_layer_out = x @ torch.ones(1, 2) + torch.tensor([0.0, -0.5])
# a rectifier
pretend_first_layer_activations = pretend_first_layer_out.clamp_min(0.0)
f1, f2 = pretend_first_layer_activations.T
We'll be using our Linear layer from the previous activity.
class Linear:
def __init__(self, weights, bias):
self.weights = weights
self.bias = bias
def forward(self, x):
return x @ self.weights + self.bias
Next time: It was a bit confusing to students that this
Linearlayer did the matrix multiply and the bias term all at the same time. Perhaps we should have students start with just directly computingy_pred = x @ w + band then later introduce theLinearlayer.
This Fundamentals notebook will be a bit different from usual: you'll be editing a single block of code. So the final notebook won't show your intermediate steps. That doesn't mean they aren't important; make sure you successfully complete each step before moving on to the next one.
Here is a basic outline for fitting a model using PyTorch. We'll be filling in the blanks here later. Read the code and check that you understand the basic steps. We'll fill in the ...s in Step 1 below.
# Set hyperparameters
learning_rate = 0.1
num_iter = 50
# Initialize parameters
layer = Linear(
weights=torch.tensor([[0.0]], requires_grad=True),
bias=torch.tensor([0.0], requires_grad=True))
# Keep a log of loss values
errors = []
# Repeat for each iteration
for i in range(num_iter):
# Forward pass
# 1. compute predictions
# In Step 1, this is just doing y_pred = x * weights because `bias` will always be 0.
y_pred = layer.forward(x)
# 2. Compute loss
resid = ...
loss = ...
errors.append(loss.item())
# Backward pass: compute gradients
loss.backward()
# Descend the gradient.
layer.weights.data -= learning_rate * layer.weights.grad
# Clear out the gradient data so it doesn't corrupt the next iteration.
layer.weights.grad.zero_()
# Plot the results
fig, axs = plt.subplots(ncols=2, figsize=(10, 4))
axs[0].plot(errors)
axs[0].set(title="Loss", xlabel="iteration", ylabel="Loss")
axs[1].scatter(x, y_true)
axs[1].plot(x, y_pred.detach(), 'r');
axs[1].set(title="Fitted model", xlabel='x', ylabel='y')
print(f"Final Loss: {errors[-1]:.3f}, final weights: {layer.weights.data.flatten()}, final bias: {layer.bias.data}")
Final Loss: 0.093, final weights: tensor([0.259]), final bias: tensor([0.])
layer.weights.data.flatten().tolist()
[0.25871825218200684]
Fill in the blanks in the code above so that it fits the model y_pred = weights * x to the data x and y_true in a way that minimizes the mean squared error (MSE).
Note that there is no intercept, so y_pred is 0.0 when x is 0.0.
Terminology:
You should achieve a loss between 0.09 and 0.11.
# don't write new code here; edit the code block above.
The model currently has a single parameter, weights. Now, incorporate another parameter into the model, bias. The new parameter should have the same shape as the output (in this case, a single number). The prediction equation will now be y_pred = weights * x + bias. Use the same data and loss function.
Terms:
You can initialize bias to 0 just like weights. (In general you'd use a random number, but this problem is well-behaved so zero will work.)
You should get an MSE of between 0.03 and 0.035.
Note: Make sure that you descend the gradient of the bias, and also that you zero this gradient each time, just like we do for the weights. You probably need to increase the number of iterations or tune the learning rate.
Now, minimize the Mean Absolute Error (mean of abs(resid)) instead of the MSE. Use the same model y_pred = weights * x + bias and data.
You should get an MAE of around 0.15.
Note that the data seems to have different slopes in different parts. Ideally, rather than fitting a single line, we should fit a model where the slopes can change.
We can actually do this using linear regression in the same way as we did before. We can think of linear regression as mixing together some functions. In Steps 2 and 3 we mixed together two functions: an "always going up" function for x and a flat function, i.e., $f_1(x) = x$ and $f_2(x) = 1.0$.
Let's add another function into the mix. The easiest way to describe it is to see it:
plt.plot(x, f1, label="always going up")
plt.plot(x, f2, label="hinge at 0.5")
plt.legend();
Why might that help us? Well, we can mix them together. Try adjusting the mixing weights to fit the data.
r = 2.0
@widgets.interact(w1=(-r, r), w2=(-r, r), bias=(-1.0, 1.0))
def plot_linreg(w1=.1, w2=-.3, bias=0.0):
y_pred = w1 * f1 + w2 * f2 + bias
plt.scatter(x, y_true)
plt.plot(x, y_pred, 'r')
resid = y_true - y_pred
mse = resid.pow(2).mean()
mae = resid.abs().mean()
print(f"MSE: {mse}, MAE: {mae}")
interactive(children=(FloatSlider(value=0.1, description='w1', max=2.0, min=-2.0), FloatSlider(value=-0.3, des…
Now, switch back to MSE, but edit your code block so that it fits the model y_pred = w1 * f1 + w2 * f2 + bias. Use MSE loss and the same data.
We've given you a suggestively-named pretend_first_layer_activations that has f1 and f2 as its two columns. So you can use that as the x for your model. First, study the following code block to see how the shapes work. Try changing torch.zeros(2, 1) to torch.zeros(1, 2) and be able to explain the error you get.
Once you understand that, adapt tho code block to use the pretend_first_layer_activations.
You should get an MSE between 0.01 and 0.03, i.e., significantly better than the first model.
example_weight_mat = torch.zeros(2, 1)
print("weight matrix:", example_weight_mat)
print("input has shape", pretend_first_layer_activations.shape)
print("example_weight_mat has shape", example_weight_mat.shape)
print("Matrix multiply has shape", (pretend_first_layer_activations @ example_weight_mat).shape)
weight matrix: tensor([[0.],
[0.]])
input has shape torch.Size([143, 2])
example_weight_mat has shape torch.Size([2, 1])
Matrix multiply has shape torch.Size([143, 1])