Diagnose and Probe an Image Classifier¶

Today we'll:

  • Look at the images that have the highest loss (does that necessarily mean that the classifier got them wrong?)
  • Run the output layer (the linear classifier) by hand to see how to interpret it as comparing features with prototypes for each class.
  • Compute the cross-entropy loss by hand and check if we match Keras's output.

Setup¶

In [ ]:
# Check versions of Keras and Tensorflow
!pip list | egrep 'keras|tensorflow '
keras                                    2.15.0
keras-core                               0.1.7
keras-cv                                 0.8.1
keras-nlp                                0.7.0
keras-tuner                              1.4.6
tensorflow                               2.15.0
In [ ]:
import os
# Results are better with the TensorFlow backend; this is probably a bug in Keras 3 but I haven't tracked it down.
os.environ["KERAS_BACKEND"] = "tensorflow"

from IPython.display import display, HTML
import io
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path
import keras
import keras_cv
import tensorflow as tf
import tensorflow_datasets as tfds
print(f"Keras version: {keras.__version__}, backend: {keras.backend.backend()}")
num_gpus = len(tf.config.list_physical_devices('GPU'))
print(f"GPUs: {num_gpus}")
if num_gpus == 0:
    display(HTML("No GPUs available. Training will be slow. <b>Please enable an accelerator.</b>"))
2024-02-17 15:34:51.788905: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-17 15:34:51.789025: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-17 15:34:51.923008: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Using TensorFlow backend
Keras version: 2.15.0, backend: tensorflow
GPUs: 1
In [ ]:
def show_image_grid(images, titles=None, rows=None, cols=3, title_fontsize=8, figsize=(10, 10)):
    if rows is None:
        rows = (len(images) + (cols - 1)) // cols

    fig, axs = plt.subplots(rows, cols, figsize=figsize)
    for ax in axs.flatten(): ax.axis('off')
    for i, ax in enumerate(axs.flatten()):
        if i >= len(images): break
        ax.imshow(np.array(images[i]).astype('uint8'))
        if titles is not None:
            ax.set_title(titles[i], fontsize=title_fontsize)

def get_images_from_dataset(dataset, indices):
    if hasattr(dataset, 'file_paths'):
        # FIXME: hardcoded options
        img_loader_opts = dict(target_size=(256, 256), keep_aspect_ratio=True)
        items_by_idx = {idx: keras.utils.load_img(dataset.file_paths[idx], **img_loader_opts) for idx in indices}
    else:
        items_by_idx = {idx: item for idx, (item, label) in enumerate(dataset.unbatch()) if idx in indices}
    return [items_by_idx[idx] for idx in indices]    

Configure our experiments¶

In [ ]:
class config:
    seed = 123
    learning_rate = 1e-3
    epochs = 1
    batch_size = 16
    image_size = (256, 256)
    model_preset = "efficientnetv2_b0_imagenet"
In [ ]:
# Reproducibility
# See https://keras.io/examples/keras_recipes/reproducibility_recipes/
#
# Set a seed so that the results are the same every time this is run.
keras.utils.set_random_seed(config.seed)

# If using TensorFlow, this will make GPU ops as deterministic as possible,
# but it will affect the overall performance, so be mindful of that.
tf.config.experimental.enable_op_determinism()

Load the data¶

We'll use a dataset of flower images for this example, but you can later switch this out for another dataset as long as you keep the file-and-folder structure.

The details of the code in this section are not important at this time; just run these cells.

In [ ]:
path_to_downloaded_file = keras.utils.get_file(
    origin="https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz",
    extract=True,
)
Downloading data from https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz
228813984/228813984 [==============================] - 1s 0us/step

Let's see what just got downloaded.

In [ ]:
data_path = Path(path_to_downloaded_file).parent / 'flower_photos'
!ls {data_path}
LICENSE.txt  daisy  dandelion  roses  sunflowers  tulips

We'll use a Keras helper function to load the data.

Docs: https://keras.io/api/data_loading/image/#imagedatasetfromdirectory-function

In [ ]:
# Define which classes we want to use, in what order.
class_names = ['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']

# Create training and validation datasets
train_dataset, val_dataset = keras.utils.image_dataset_from_directory(
    data_path,
    validation_split=0.2,
    labels='inferred',
    class_names=class_names,
    label_mode='int',
    batch_size=config.batch_size,
    image_size=config.image_size,
    shuffle=True,
    seed=128,
    subset='both',
    crop_to_aspect_ratio=True   
)
Found 3670 files belonging to 5 classes.
Using 2936 files for training.
Using 734 files for validation.

Let's show some example images.

In [ ]:
[[example_images, example_labels]] = train_dataset.take(1)
show_image_grid(
    example_images,
    titles=[f"{label} ({class_names[label]})" for label in example_labels])
No description has been provided for this image

Train a model¶

In [ ]:
# Create a model using a pretrained backbone
# See https://keras.io/api/keras_cv/models/tasks/image_classifier/ for options
model = keras_cv.models.ImageClassifier.from_preset(
    config.model_preset,
    num_classes=len(class_names))

# Freeze the feature extractor so it doesn't get updated during training
model.backbone.trainable = False

# Set up the model for training
model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer=keras.optimizers.Adam(learning_rate=config.learning_rate),
    metrics=['accuracy']
)
model.summary(show_trainable=True)

# Train the model. (Note: this may show some warnings, and it may stop without showing
# progress for up to a minute while it translates the model to run on the GPU.)
history = model.fit(
    train_dataset,
    validation_data=val_dataset,
    epochs=config.epochs
)
Attaching 'config.json' from model 'keras/efficientnetv2/keras/efficientnetv2_b0_imagenet/2' to your Kaggle notebook...
Attaching 'config.json' from model 'keras/efficientnetv2/keras/efficientnetv2_b0_imagenet/2' to your Kaggle notebook...
Attaching 'model.weights.h5' from model 'keras/efficientnetv2/keras/efficientnetv2_b0_imagenet/2' to your Kaggle notebook...
/opt/conda/lib/python3.10/site-packages/keras_cv/src/models/backbones/backbone.py:44: UserWarning: `Model.state_updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  return id(getattr(self, attr)) not in self._functional_layer_ids
/opt/conda/lib/python3.10/site-packages/keras_cv/src/models/backbones/backbone.py:44: UserWarning: `layer.updates` will be removed in a future version. This property should not be used in TensorFlow 2.0, as `updates` are applied automatically.
  return id(getattr(self, attr)) not in self._functional_layer_ids
Model: "image_classifier"
____________________________________________________________________________
 Layer (type)                Output Shape              Param #   Trainable  
============================================================================
 input_1 (InputLayer)        [(None, None, None, 3)]   0         N          
                                                                            
 efficient_net_v2b0_backbon  (None, None, None, 1280   5919312   N          
 e (EfficientNetV2Backbone)  )                                              
                                                                            
 avg_pool (GlobalAveragePoo  (None, 1280)              0         Y          
 ling2D)                                                                    
                                                                            
 predictions (Dense)         (None, 5)                 6405      Y          
                                                                            
============================================================================
Total params: 5925717 (22.60 MB)
Trainable params: 6405 (25.02 KB)
Non-trainable params: 5919312 (22.58 MB)
____________________________________________________________________________
2024-02-17 15:35:31.630171: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:961] layout failed: INVALID_ARGUMENT: Size of values 0 does not match size of permutation 4 @ fanin shape inimage_classifier/efficient_net_v2b0_backbone/block2b_/block2b_drop/dropout/SelectV2-2-TransposeNHWCToNCHW-LayoutOptimizer
  3/184 [..............................] - ETA: 5s - loss: 1.6046 - accuracy: 0.2917   
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1708184134.657989     120 device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
184/184 [==============================] - 19s 45ms/step - loss: 0.8729 - accuracy: 0.7674 - val_loss: 0.5492 - val_accuracy: 0.8392

Top Losses¶

The following code will compute the model's predictions on the validation set and extract the corresponding correct labels. We'll use this to compute the loss for each image.

In [ ]:
val_predicted_probs = model.predict(val_dataset)
val_predicted_probs.shape
46/46 [==============================] - 3s 22ms/step
Out[ ]:
(734, 5)

Quick Check: what do the two numbers in that shape mean?

your answer here

In [ ]:
# Get the labels from the dataset (to check whether the model got them right)
val_labels = np.array([int(label) for img, label in val_dataset.unbatch()])
In [ ]:
# compute loss for each sample
loss_func = keras.losses.SparseCategoricalCrossentropy(reduction='none')
val_losses = loss_func(val_labels, val_predicted_probs).numpy()
In [ ]:
def plot_top_losses(dataset, predictions, losses, labels, class_names, n=9, **kw):
    top_n_indices = np.argsort(losses)[-n:][::-1].tolist()
    titles = []
    for idx in top_n_indices:
        label = labels[idx]
        pred = predictions[idx]
        titles.append(f"label={class_names[label]}\npred={class_names[np.argmax(pred)]}\nprobs[lbl]={pred[label]:.3f}, loss={losses[idx]:.2f}")
    images = get_images_from_dataset(dataset, top_n_indices)
    show_image_grid(images, titles, **kw)

plot_top_losses(val_dataset, val_predicted_probs, val_losses, val_labels, class_names, n=9)
# To show more:
# plot_top_losses(val_dataset, val_predicted_probs, val_losses, val_labels, class_names, n=100, figsize=(30, 20), cols=10)
No description has been provided for this image

Observations

  1. What trend do you observe about the loss values shown above the images, as you move from top left to bottom right?
  2. What trend do you observe about the probs[label] values shown?
  3. added later: If you know flowers: were the labels correct? Could the loss help us identidy mis-labeled images?
  4. Compute the cross-entropy loss for the bottom-right image by hand and check if it matches the loss value shown. (Note: Keras uses log base e, i.e., the "natural" logarithm, called "ln" in some calculators. math.log and np.log both use this base.)
  5. Could an image show up on this grid if it was classified correctly? Why or why not? (Try passing n=100 and figsize=(30, 20) and cols=6)

Your answer here

Manual Last Layer¶

We'll now run the last layer of the model by hand to see how it compares features with prototypes for each class.

The following code will compute the outputs of the feature extractor (the input to the last layer of the model) for all of the images in the validation set.

In [ ]:
last_linear_layer = model.layers[-1]
feature_extractor = keras.Model(inputs=model.inputs, outputs=last_linear_layer.input)
val_features = feature_extractor.predict(val_dataset)
print("\nvalidation features shape:", val_features.shape)
46/46 [==============================] - 3s 20ms/step

validation features shape: (734, 1280)

Observe

  1. What do those two numbers in the shape mean?
  2. How many features did the feature extractor produce for each image?

your answer here

The following code will extract the weights and biases of the last layer of the model.

In [ ]:
weights, bias = last_linear_layer.get_weights()
print("weights shape:", weights.shape)
print("bias shape:", bias.shape)
weights shape: (1280, 5)
bias shape: (5,)

Observe:

  1. How does the shape of weights compare to your answer to the previous question?
  2. How many parameters does this layer have? Check your answer against the summary table that Keras showed when you trained the model. (The last entry inside the table represents this layer.)

your answer here

As we discussed in class, we can interpret the columns of weights as "prototypes" for each class. Since we're now working in thousands of dimensions, we can't visualize these prototypes directly. But we can visualize them in terms of what images are aligned with them.

Let's start by extracting the prototype for one class. Quick NumPy reference:

  • Extract a row of an array: arr[i]
  • Extract a column of an array: arr[:, j]

Exercise: Extract the prototype for the "rose" class. Check the shape of the resulting array!

In [ ]:
class_names
Out[ ]:
['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']
In [ ]:
rose_class_index = ...
rose_prototype = ...
rose_prototype.shape
Out[ ]:
(1280,)

Now let's compute how much each image in the validation set aligns with this prototype. We'll do this by computing the dot product between the prototype and the feature vector for each image.

In [ ]:
rose_scores = [
  feature_vec @ rose_prototype
  for feature_vec in val_features
]

This is actually exactly the same as the dot product of the feature array with the rose prototype vector:

In [ ]:
rose_scores = val_features @ rose_prototype
rose_scores.shape
Out[ ]:
(734,)

We'll use np.argsort to find the images that have the highest and lowest scores. We'll identify an image by its index in the validation set.

In [ ]:
images_by_rosiness = np.argsort(rose_scores)
print("images_by_rosiness shape:", images_by_rosiness.shape)
print("Least rosy image:", images_by_rosiness[0])
print("Rosiest image:", images_by_rosiness[-1])
images_by_rosiness shape: (734,)
Least rosy image: 587
Rosiest image: 196

Now, show the rosiest images (note that [::-1] is a Python idiom to reverse a list).

In [ ]:
show_image_grid(
  get_images_from_dataset(
    val_dataset, images_by_rosiness[::-1][:9]))
No description has been provided for this image

Exercise: Show the least rosy images.

In [ ]:
# your code here
No description has been provided for this image

Now, let's do this for all of the classes. We could loop over all of the classes and do the dot products above...or we could realize that this is exactly what the matrix multiplication of the feature array with the weights matrix does!

In [ ]:
print("Features matrix shape:", val_features.shape)
print("Weights shape:", weights.shape)

# Compute the logits by a forward pass through the linear layer
# using the validation features (val_features), weights, and bias
logits = ...
print("Logits shape:", logits.shape)
Features matrix shape: (734, 1280)
Weights shape: (1280, 5)
Logits shape: (734, 5)

Softmax and Cross-Entropy¶

The last steps in doing by hand what Keras was doing for us are:

  1. Apply softmax to get the predicted probabilities
  2. Compute the cross-entropy loss

Let's do each of those.

First, softmax. For numerical stability, we subtract the maximum value from each row before taking the exponentials. This doesn't change the result -- think about why. Then fill in the missing code to compute the softmax.

In [ ]:
logits -= np.max(logits, axis=1, keepdims=True)
exp_logits = np.exp(logits)
sum_exp_logits = np.sum(exp_logits, axis=1, keepdims=True)
val_predicted_probs_manual = ...
In [ ]:
assert np.allclose(val_predicted_probs, val_predicted_probs_manual, atol=1e-3)

Now the cross-entropy. To get the negative log of the predicted probability for the correct class, we'll first compute the negative log of all of the predicted probabilities, then multiply by the one-hot encoded correct labels. Fill in the missing code to compute the cross-entropy loss.

In [ ]:
logprobs = np.log(val_predicted_probs_manual)
print("logprobs shape:", logprobs.shape) # num images by num classes
one_hot_labels = keras.utils.to_categorical(val_labels, num_classes=len(class_names)) # num images by num classes

loss_per_sample = -np.sum(one_hot_labels * logprobs, axis=...)
logprobs shape: (734, 5)

Now find the average of the cross-entropy loss for the entire validation set (using np.mean). Does it match the loss that Keras computed for us during training?

In [ ]:
# your code here
Out[ ]:
0.5492409

your answer here