flowchart LR
A1[("Training Data (X and y)")] --> B{{fit}}
A2[Model Object] --> B
B --> FM[Fitted Model]
FM --> C{{Predict}}
B2[(New data X)] --> C
C --> D[("predicted y's")]
Team up with one or two other people near you.
Discuss with your partners:
Figures from Understanding Deep Learning, by Simon J.D. Prince, used with permission
flowchart LR
A1[("Training Data (X and y)")] --> B{{fit}}
A2[Model Object] --> B
B --> FM[Fitted Model]
FM --> C{{Predict}}
B2[(New data X)] --> C
C --> D[("predicted y's")]
Labels are continuous numbers

Labels are discrete categories (so outputs are probabilities)

No explicit labels
A type of unsupervised learning
X is the independent variables, y the dependent. (this terminology is more common in a statistics setting)X the features or predictors, and y the target.MAE is like the median (robust to outliers); MSE/RMSE/R2 is like the mean (cares about the magnitude of errors)
All of these are also valid loss functions (i.e., we can use them to train a model).
Answer as a probability distribution.
Suppose A and B are playing chess. Model M gives them equal odds (50-50), Model Q gives A an 80% win chance.
| Player | Model M win prob | Model Q win prob |
|---|---|---|
| A | 50% | 80% |
| B | 50% | 20% |
Now we let them play 5 games, and A wins each time. (data = AAAAA)
What is P(data given model) for each model?
0.5 * 0.5 * 0.5 * 0.5 * 0.5 = (0.5)^5 = 0.031250.8 * 0.8 * 0.8 * 0.8 * 0.8 = (0.8)^5 = 0.32768Which model was better able to predict the outcome?
Likelihood: probability that a model assigns to the data. (The P(AAAAA) we just computed.)
Assumption: data points are independent and order doesn’t matter. (i.i.d). So P(AAAAA) = P(A) * P(A) * P(A) * P(A) * P(A)
Log likelihood of data for a model:
Technical note: MSE loss minimizes cross-entropy if you model the data as Gaussian.
For technical details, see Goodfellow et al., Deep Learning Book chapters 3 (info theory background) and 5 (application to loss functions).
Cross-entropy when the data is categorical (i.e., a classification problem).
Definition: Average of negative log of probability of the correct class.
(Usually use natural log, so units are nats.)