Unit 4: Models

Students who complete this unit will demonstrate that they can:

Describe the basic structure of a machine learning model.
Describe the overall approach of Stochastic Gradient Descent: how does it use information from a batch of data to improve its performance on that and other data?
Describe the parameters of a linear layer and how they are used to compute its output.
Identify the following loss functions: Mean Squared Error and Mean Absolute Difference.
Trace the execution of a basic image classifier model using a fully-connected network.
Apply automatic differentiation (as implemented in PyTorch) to compute the gradients of programs

(Note that we’re focusing on regression models this week; next week we’ll add classification.)

Preparation

For this week, focus on how things are used rather than the underlying math, especially for tensors (which have several different definitions) and derivatives (which we’ll get to shortly).
The book uses “rank” to refer to the number of axes of a tensor, but “rank” means something different in linear algebra. To avoid confusion, let’s call it “number of axes”, or perhaps “number of dimensions” (abbreviated “ndim” in PyTorch).
- For example, a length-5 column vector times a length-4 row vector would give a matrix (tensor) with two axes (2-dimensional), with shape (5, 4) and rank 1 in the linear algebra sense. See this notebook.

For an explanation of minimizing squared error, see Ordinary Least Squares Regression explained visually.

There are too many videos out there on deep learning to list here, but here’s a few very different styles: