Compute gradients using PyTorch¶
Task: compute the gradient of a simple function using PyTorch
So far we've we had to compute the gradients by a numerical approximation. That is unreliable and inefficient. Today we'll look at how PyTorch's autograd functionality lets us compute that efficiently. (Under the hood this is using backpropagation; we'll learn about that in a future lesson.)
Setup¶
import torch
from torch import tensor
import matplotlib.pyplot as plt
%matplotlib inline
We now define a function of two variables:
def square(w):
return w * w
def double(w):
return 2 * w
def f(w1, w2):
return double(w1) + square(w2) + 5.0
We evaluate it at a few values.
w1 = 1.0
w2 = 2.0
f(w1, w2)
11.0
How does it change when we increment w1 by a bit?
f(w1 + 0.1, w2)
11.2
How does it change when we increment w2 by a bit?
f(w1, w2 + 0.1)
11.41
Reflection: which variable has the larger effect on the output?
f is a function of two variables. Let's sweep each one and see how it changes.
# make two subplots
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5), sharey=True)
# plot f(w1, w2) as a function of w1
w1s = torch.arange(-3, 3, 0.1)
w2 = 2.0
ax1.plot(w1s, f(w1s, w2))
# make a dot at the point (w1, w2)
ax1.plot(w1, f(w1, w2), 'ro')
# label the dot
ax1.text(w1, f(w1, w2), ' ({:.1f}, {:.1f})'.format(w1, w2))
ax1.set_xlabel('w1')
ax1.set_ylabel('f(w1, w2)')
ax1.set_title('f(w1, w2) as a function of w1')
# plot f(w1, w2) as a function of w2
w1 = 1.0
w2s = torch.arange(-3, 3, 0.1)
ax2.plot(w2s, f(w1, w2s))
# make a dot at the point (w1, w2)
ax2.plot(w2, f(w1, w2), 'ro')
# label the dot
ax2.text(w2, f(w1, w2), ' ({:.1f}, {:.1f})'.format(w1, w2))
ax2.set_xlabel('w2')
ax2.set_ylabel('f(w1, w2)')
ax2.set_title('f(w1, w2) as a function of w2');
At the marked point, what is the slope of the line when we increment w1 by a bit?
At the marked point, what is the slope of the line when we increment w2 by a bit?
Task 1¶
Compute the gradient of f with respect to w1, when w1 = 1.0 and w2 = 2.0.
Steps:
- Initialize the input tensors. Tell PyTorch to track their gradients.
w1 = torch.tensor(1.0, requires_grad=True)
w2 = ...
Note that these tensors have a grad attribute, but it's currently None because we haven't yet defined what we want to compute the gradient of.
- Call the function to get the output.
result = f(w1, w2)
result
tensor(11., grad_fn=<AddBackward0>)
Notice that the output is a tensor with a single element. We can get the value of that element using .item(). It also has a grad_fn attribute, which you'll notice is a AddBackward0 object. This is a PyTorch object that represents the operation that was performed to get this result. We'll learn more about this later. For now, just know that it's a way for PyTorch to keep track of the computation that was performed. (This is what makes it possible to compute the gradient.) Look at the definition of f to see if you can figure out what addition was performed.
Note that the grad attribute of the input tensors is still None. That's because we haven't yet told PyTorch to run backpropagation.
w1.grad, w2.grad
(None, None)
- Call
backwardon the result. This will compute the gradient of the result with respect to the input tensors. It doesn't return anything, but it does set the.gradattribute of the input tensors.
result.backward()
The gradient is now stored in w1.grad.
w1.grad
tensor(2.)
Note: you might not get the value of 2.0 for a few reasons:
- If you didn't call
backwardon the result, then the gradient won't be computed. Remember that Jupyter notebooks don't run cells in order, only when you tell them to. So if you run the cell that callsbackwardafter the cell that prints the gradient, then the gradient won't be computed. - If you re-run the
result =line and runbackwardagain, then the gradient will be added to the previous value. So you'll get 2.0 + 2.0 = 4.0. To fix this, you can either restart the kernel and run all the cells again, or you can callzero_gradon the input tensors before callingbackward.
Check that this value matches the slope of the line at the marked point. (Which slope are we talking about? The one when we increment w1 by a bit, or the one when we increment x2 by a bit?)
Task 2¶
Compute the gradient of f with respect to w2, when w1 = 1.0 and w2 = 2.0. (i.e., same point, but evaluating the gradient wrt a different parameter.)
w1 = torch.tensor(1.0, requires_grad=True)
w2 = torch.tensor(2.0, requires_grad=True)
# your code here
tensor(4.)
Check that this value matches the slope of the line at the marked point. (Which slope are we talking about? The one when we increment w1 by a bit, or the one when we increment x2 by a bit?)
Analysis¶
Repeat both tasks above for several other values of w1 and w2. Also look at the definition of f and recall what you learned about derivatives in Calculus. Based on that:
- Write a simple mathematical expression that evaluates to the value of w1.grad for any values of w1 and w2. Use only basic math operations like
+or*; don't use any autograd functionality (like.backward()). The correct solution here is extremely simple!
w1_grad = ...
- Write a simple mathematical expression that evaluates to the value of w2.grad for any values of w1 and w2.
Make sure that you understand why this is different from the value of w1.grad.
w2_grad = ...