PyTorch Autograd and SGD | CS 375-376 Spring 2025 at Calvin University

How to Compute Gradients

Numerical approximation: $\frac{f(x+h) - f(x)}{h}$
- Pros: Easy to implement
- Cons: Computationally expensive, not accurate
Symbolic differentiation: $\frac{df}{dx}$
- Pros: Accurate
- Cons: Can make unwieldy expressions
Automatic differentiation: grad(f, x)
- Pros: Accurate, efficient, works even with billions of parameters
- Cons: Can be hard to debug, requires intermediate values to be stored

PyTorch Autograd

PyTorch uses automatic differentiation to compute gradients.

Example:

import torch

# Let's call it "w" as if it were a weight in a neural network
w = torch.tensor(2.0, requires_grad=True)
y = w**2
y.backward()
print(w.grad)

After calling y.backward(), the gradient of y with respect to w is stored in w.grad.

(Stochastic) Gradient Descent

If we want to minimize a function $f(w)$, we can use gradient descent:

Initialize $w$ randomly
Repeat:
- Compute the gradient of $f$ with respect to $w$
- Update $w$ by moving in the opposite direction of the gradient

If the function depends on some data (e.g., it’s the loss of a neural network computed on a batch of data), we often use stochastic gradient descent (SGD):

Initialize $w$ randomly
Repeat:
- Sample a batch of data
- Compute the gradient of the loss with respect to $w$ on the batch
- Update $w$ by moving in the opposite direction of the gradient

We call it stochastic because it uses a stochastic estimate of the gradient.

Warm-Up Activity

Suppose we’re trying to minimize a function $f(p) = p^2 + 2x + 1$ using gradient descent:

def f(p):
  return p**2 + 2*p + 1

def grad_f(p):
  # gradient of f with respect to p
  return ______________________

print(f(3)) # ______
print(grad_f(3)) # ______

# fill in the blank to minimize
p = random()
for i in range(100):
  p = ______________________

What should we fill in the blanks to minimize the function?

Notebooks

Compute gradients using PyTorch (name: u06n2-compute-grad-pytorch.ipynb; show preview, open in Colab)