Image Classification: Losses and Feature ExtractionΒΆ
Today we'll:
- Look at the images that have the highest loss (does that necessarily mean that the classifier got them wrong?)
- We can explain the importance of good features for neural network models.
- We can explain how the dot product can be interpreted as a similarity measure.
SetupΒΆ
# Check versions of Keras and Tensorflow
!pip list | egrep 'keras|tensorflow '
fish: Unknown command: pip fish: pip list | egrep 'keras|tensorflow ' ^~^
import os
# Results are better with the TensorFlow backend; this is probably a bug in Keras 3 but I haven't tracked it down.
os.environ["KERAS_BACKEND"] = "tensorflow"
from IPython.display import display, HTML
import io
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path
import keras
import keras_cv
import tensorflow as tf
#import tensorflow_datasets as tfds
print(f"Keras version: {keras.__version__}, backend: {keras.backend.backend()}")
num_gpus = len(tf.config.list_physical_devices('GPU'))
print(f"GPUs: {num_gpus}")
if num_gpus == 0:
display(HTML("No GPUs available. Training will be slow. <b>Please enable an accelerator.</b>"))
Keras version: 3.8.0, backend: tensorflow GPUs: 1
def show_image_grid(images, titles=None, rows=None, cols=3, title_fontsize=8, figsize=(10, 10)):
if rows is None:
rows = (len(images) + (cols - 1)) // cols
fig, axs = plt.subplots(rows, cols, figsize=figsize)
for ax in axs.flatten(): ax.axis('off')
for i, ax in enumerate(axs.flatten()):
if i >= len(images): break
ax.imshow(np.array(images[i]).astype('uint8'))
if titles is not None:
ax.set_title(titles[i], fontsize=title_fontsize)
def get_images_from_dataset(dataset, indices):
if hasattr(dataset, 'file_paths'):
# FIXME: hardcoded options
img_loader_opts = dict(target_size=(256, 256), keep_aspect_ratio=True)
items_by_idx = {idx: keras.utils.load_img(dataset.file_paths[idx], **img_loader_opts) for idx in indices}
else:
items_by_idx = {idx: item for idx, (item, label) in enumerate(dataset.unbatch()) if idx in indices}
return [items_by_idx[idx] for idx in indices]
Configure our experimentsΒΆ
class config:
seed = 123
learning_rate = 1e-3
epochs = 1
batch_size = 16
image_size = (256, 256)
model_preset = "efficientnetv2_b0_imagenet"
# Reproducibility
# See https://keras.io/examples/keras_recipes/reproducibility_recipes/
#
# Set a seed so that the results are the same every time this is run.
keras.utils.set_random_seed(config.seed)
# If using TensorFlow, this will make GPU ops as deterministic as possible,
# but it will affect the overall performance, so be mindful of that.
tf.config.experimental.enable_op_determinism()
Load the dataΒΆ
We'll use a dataset of flower images for this example, but you can later switch this out for another dataset as long as you keep the file-and-folder structure.
The details of the code in this section are not important at this time; just run these cells.
path_to_downloaded_file = keras.utils.get_file(
fname="flower_photos",
origin="https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz",
extract=True,
)
Let's see what just got downloaded.
path_to_downloaded_file
'/Users/ka37/.keras/datasets/flower_photos'
data_path = Path(path_to_downloaded_file).parent / 'flower_photos'
if (data_path / 'flower_photos').exists(): # work around a strange nested folder structure
data_path = data_path / 'flower_photos'
!ls {data_path}
LICENSE.txt daisy/ dandelion/ roses/ sunflowers/ tulips/
We'll use a Keras helper function to load the data.
Docs: https://keras.io/api/data_loading/image/#imagedatasetfromdirectory-function
# Define which classes we want to use, in what order.
class_names = ['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']
# Create training and validation datasets
train_dataset, val_dataset = keras.utils.image_dataset_from_directory(
data_path,
validation_split=0.2,
labels='inferred',
class_names=class_names,
label_mode='int',
batch_size=config.batch_size,
image_size=config.image_size,
shuffle=True,
seed=128,
subset='both',
crop_to_aspect_ratio=True
)
Found 3670 files belonging to 5 classes. Using 2936 files for training. Using 734 files for validation.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR I0000 00:00:1741379580.593271 10426512 pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support. I0000 00:00:1741379580.593571 10426512 pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
Let's show some example images.
[[example_images, example_labels]] = train_dataset.take(1)
show_image_grid(
example_images,
titles=[f"{label} ({class_names[label]})" for label in example_labels])
Train a modelΒΆ
# Create a model using a pretrained backbone
# See https://keras.io/api/keras_cv/models/tasks/image_classifier/ for options
model = keras_cv.models.ImageClassifier.from_preset(
config.model_preset,
num_classes=len(class_names))
# Freeze the feature extractor so it doesn't get updated during training
model.backbone.trainable = False
# Set up the model for training
model.compile(
loss='sparse_categorical_crossentropy',
optimizer=keras.optimizers.Adam(learning_rate=config.learning_rate),
metrics=['accuracy']
)
model.summary(show_trainable=True)
# Train the model. (Note: this may show some warnings, and it may stop without showing
# progress for up to a minute while it translates the model to run on the GPU.)
print("Training the model...")
history = model.fit(
train_dataset,
validation_data=val_dataset,
epochs=config.epochs
)
Model: "image_classifier"
βββββββββββββββββββββββββββββββ³ββββββββββββββββββββββββ³βββββββββββββ³ββββββββ β Layer (type) β Output Shape β Param # β Traiβ¦ β β‘βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ© β input_layer (InputLayer) β (None, None, None, 3) β 0 β - β βββββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββΌββββββββ€ β efficient_net_v2b0_backbone β (None, None, None, β 5,919,312 β N β β (EfficientNetV2Backbone) β 1280) β β β βββββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββΌββββββββ€ β avg_pool β (None, 1280) β 0 β - β β (GlobalAveragePooling2D) β β β β βββββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌβββββββββββββΌββββββββ€ β predictions (Dense) β (None, 5) β 6,405 β Y β βββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββ΄βββββββββββββ΄ββββββββ
Total params: 5,925,717 (22.60 MB)
Trainable params: 6,405 (25.02 KB)
Non-trainable params: 5,919,312 (22.58 MB)
Training the model... 184/184 ββββββββββββββββββββ 23s 92ms/step - accuracy: 0.6332 - loss: 1.1413 - val_accuracy: 0.8351 - val_loss: 0.5568
Top LossesΒΆ
The following code will compute the model's predictions on the validation set and extract the corresponding correct labels. We'll use this to compute the loss for each image.
val_predicted_probs = model.predict(val_dataset)
val_predicted_probs.shape
46/46 ββββββββββββββββββββ 7s 102ms/step
(734, 5)
Quick Check: what do the two numbers in that shape mean?
your answer here
# Get the labels from the dataset (to check whether the model got them right)
val_labels = np.array([int(label) for img, label in val_dataset.unbatch()])
# compute loss for each sample
loss_func = keras.losses.SparseCategoricalCrossentropy(reduction='none')
val_losses = loss_func(val_labels, val_predicted_probs).numpy()
def plot_top_losses(dataset, predictions, losses, labels, class_names, n=9, **kw):
top_n_indices = np.argsort(losses)[-n:][::-1].tolist()
titles = []
for idx in top_n_indices:
label = labels[idx]
pred = predictions[idx]
titles.append(f"label={class_names[label]}\npred={class_names[np.argmax(pred)]}\nprobs[lbl]={pred[label]:.3f}, loss={losses[idx]:.2f}")
images = get_images_from_dataset(dataset, top_n_indices)
show_image_grid(images, titles, **kw)
plot_top_losses(val_dataset, val_predicted_probs, val_losses, val_labels, class_names, n=9)
# To show more:
# plot_top_losses(val_dataset, val_predicted_probs, val_losses, val_labels, class_names, n=100, figsize=(30, 20), cols=10)
Observations
- What trend do you observe about the
lossvalues shown above the images, as you move from top left to bottom right? - What trend do you observe about the
probs[label]values shown? - If you know flowers: were the labels correct? Could the loss help us identify mis-labeled images?
- Compute the cross-entropy loss for the bottom-right image by hand and check if it matches the
lossvalue shown. (Note: Keras uses log base e, i.e., the "natural" logarithm, called "ln" in some calculators.math.logandnp.logboth use this base.) - Could an image show up on this grid if it was classified correctly? Why or why not? (Try passing
n=100andfigsize=(30, 20)andcols=6)
Your answer here
Model as Feature ExtractorΒΆ
The following code will compute the outputs of the feature extractor (the input to the last layer of the model) for all of the images in the validation set.
last_linear_layer = model.layers[-1]
feature_extractor = keras.Model(inputs=model.inputs, outputs=last_linear_layer.input)
val_features = feature_extractor.predict(val_dataset)
print("\nvalidation features shape:", val_features.shape)
/Users/ka37/code/github.com/Calvin-Data-Science/cs375-376/.venv/lib/python3.12/site-packages/keras/src/models/functional.py:237: UserWarning: The structure of `inputs` doesn't match the expected structure. Expected: ['keras_tensor'] Received: inputs=Tensor(shape=(16, 256, 256, 3)) warnings.warn(msg)
44/46 ββββββββββββββββββββ 0s 46ms/step
/Users/ka37/code/github.com/Calvin-Data-Science/cs375-376/.venv/lib/python3.12/site-packages/keras/src/models/functional.py:237: UserWarning: The structure of `inputs` doesn't match the expected structure. Expected: ['keras_tensor'] Received: inputs=Tensor(shape=(None, 256, 256, 3)) warnings.warn(msg)
46/46 ββββββββββββββββββββ 8s 119ms/step validation features shape: (734, 1280)
Observe
- What do those two numbers in the shape mean?
- How many features did the feature extractor produce for each image?
your answer here
Let's look at three example images, two in the same class and one in a different class
class_names
['daisy', 'dandelion', 'roses', 'sunflowers', 'tulips']
sunflowers = [idx for idx, label in enumerate(val_labels) if label == 3]
tulips = [idx for idx, label in enumerate(val_labels) if label == 4]
example_image_indices = [sunflowers[0], sunflowers[2], tulips[0]]
show_image_grid(
get_images_from_dataset(
val_dataset, example_image_indices))
The model computed a feature vector for each of those images.
example_features = val_features[example_image_indices]
vec0, vec1, vec2 = example_features
vec0.shape, vec1.shape, vec2.shape
((1280,), (1280,), (1280,))
The dot products between pairs of vectors is a rough measure of similarity: the more two vectors are "in the same direction", the higher their dot product.
vec0 @ vec1
np.float32(8.949706)
vec0 @ vec2
np.float32(7.814732)
vec1 @ vec2
np.float32(8.189959)
We can look at all pairwise comparisons to get an overview of similarity.
example_features @ example_features.T
array([[20.31578 , 8.949707, 7.814732],
[ 8.949707, 16.917578, 8.189959],
[ 7.814732, 8.189959, 24.84832 ]], dtype=float32)
Note that this dot product similarity depends on the magnitude of both vectors: larger vectors will seem more similar to everything else, all else being equal. So we typically normalize by magnitude. (This is called "cosine similarity".) THink about what the maximum and minimum values could be.
normalized_features = keras.utils.normalize(example_features)
normalized_features @ normalized_features.T
array([[1. , 0.4827507 , 0.34781542],
[0.4827507 , 1. , 0.39945152],
[0.34781542, 0.39945152, 0.9999999 ]], dtype=float32)
Let's show the images that are most and least similar to the first example image.
similarity_to_vec0 = keras.utils.normalize(val_features) @ vec0
similarity_to_vec0.shape
(734,)
images_by_similarity = np.argsort(similarity_to_vec0)
show_image_grid(
get_images_from_dataset(
val_dataset, images_by_similarity[::-1][:9]))
Exercise: show the least similar images (copy-paste the previous code block and just remove something).
# your code here
Importance of Feature ExtractorsΒΆ
For contrast, let's look at the similarities in pixel space.
val_images = np.array([img for img, label in val_dataset.unbatch()])
val_images.shape
(734, 256, 256, 3)
We'll flatten each image into a single big vector.
val_raw_vecs = val_images.reshape(len(val_images), 256 * 256 * 3)
val_raw_vecs.shape
(734, 196608)
And like before we'll compute similarities.
first_image_raw_vec = val_raw_vecs[example_image_indices[0]]
similarity_to_first_image_raw = val_raw_vecs @ first_image_raw_vec
similarity_to_first_image_raw.shape
(734,)
images_by_similarity = np.argsort(similarity_to_first_image_raw)
show_image_grid(
get_images_from_dataset(
val_dataset, images_by_similarity[::-1][:9]))
Check-InΒΆ
These check-in questions were AI-generated
We saw two ways to represent images:
- Raw pixels (256 * 256 * 3 = 196,608 numbers)
- Feature vectors (1,280 numbers)
For each representation:
- What information does it preserve?
- What information does it discard?
- When would similarity in this space be (or not be) meaningful?
The model makes predictions in three steps:
- Convert image to feature vector (backbone)
- Compute similarity with learned patterns (final layer weights)
- Convert to probabilities (softmax)
For an image that the model misclassifies, which step(s) might have gone wrong? How could you investigate this?
your thoughtful answers here