Week 5

Ken Arnold

Monday

Lab Review

  1. Draw a diagram of the model we built in Step 2 (linear regression).
  2. Draw a diagram of the model we built in Step 5 (logistic regression). What’s the difference?
  3. How is the softmax operation useful in classification?
  4. Suppose an interviewer asks you “What’s the difference between linear regression and logistic regression?” Describe at least two differences you could mention.

Review Exercise

Suppose we give our digit classifier an image of a 3, and it outputs a score (logit) of 1 for every digit.

  1. Compute the predicted probabilities.
  2. Compute the (categorical) cross-entropy loss.
  3. What would have changed if the logits had been a 0 for every digit instead?
  4. What if the logit for the 3 had been 2 instead?

Nonlinear Features

ReLU

Chop off the negative part of its input.

y = max(0, x)

(Gradient is 1 for positive inputs, 0 for negative inputs)

Why is ReLU Useful?

In 2D

Interactive Activity

ReLU interactive (name: u04n00-relu.ipynb; show preview, open in Colab)

Going Deeper

Logistic Regression

model = Sequential([
    Input(shape=(784,)),
    Dense(10, activation='softmax')
])
model.compile(loss='crossentropy')

Multilayer Perceptron (MLP)

model = Sequential([
    Input(shape=(784,)),
    Dense(800, activation='relu'),
    Dense(10, activation='softmax')
])
model.compile(loss='crossentropy')

Consider first the logistic regression, then the MLP. For each:

  1. Write the shape of the weights and biases for each layer.
  2. How many parameters does the model have?
  3. Write the forward pass using matrix multiplication and addition.

Variations on Classification

Logistic Classification: Two variations

  1. “Pick the best answer”: one big softmax
  2. “Choose all that apply”: a softmax (really a sigmoid) for each class

What’s the difference between the two?

  • For a “yes”/“no” question, they’re equivalent.
  • Pick the best: zero-sum. Increasing probability of one class decreases probability of all others.
  • Choose all that apply: each class is chosen independently.
  • Which to choose: match your situation. Rarely makes a big difference.

Thresholds

At what probability do you decide that a class is present?

  • Medical alarm example: may want to signal a potential problem even if uncertain
  • Trade-off between false positives and false negatives
  • Applies to both “pick the best” and “choose all that apply”
  • Can be different for each class