Unit 7: Vision and Perspectives | CS 375-376 Spring 2026 at Calvin University

Vision

Describe the structure of a convolutional neural network (CNN) and explain its use in image classification

Describe the structure of the data provided as input to an image classifier in terms of batches and labels

Explain the concept of transfer learning and how it is used in practice

Identify some of the major milestones in the history of AI.

Explain examples of social impacts of AI systems in wide use today

Explain examples of biases in AI systems.

Contents

The content may not be revised for this year. If you really want to see it, click the link above.

The content may not be revised for this year. If you really want to see it, click the link above.

The content may not be revised for this year. If you really want to see it, click the link above.

Embeddings

Analogy: like a map: each object has its GPS coordinates, similar objects are neighbors
Definition: a vector representation of an object, constructed to be useful for some task (not necessarily human-interpretable)
Examples: words, sentences, images, movies, users, etc.

Key Concepts

Body and Head: A pretrained model can be split into a feature extractor (body) that produces embeddings and a classifier (head) that maps embeddings to predictions
Similarity: Dot products and cosine similarity measure how close two embeddings are — similar items should have similar embeddings
Embeddings vs raw pixels: Learned embeddings capture semantic meaning (what’s in the image), while raw pixels capture surface appearance (color, brightness)
Prototypes: The classifier’s weight matrix contains a prototype vector for each class; prediction is just comparing an embedding to each prototype

Slides (including graphics)

Image Embeddings (name: u07n1-image-embeddings.ipynb; show preview, open in Colab)

Another Example: Sentence Embeddings (name: u08s1-sentence-embeddings.ipynb; show preview, open in Colab)

We’ll discuss this much more in CS 376, but here are some ideas for further exploration:

Try the SigLIP demo that embeds images and text together. Try computing the dot products between a few texts that you write by hand. Does the dot product reflect the similarity of the texts? Repeat with images. What do you find? (Use Colab for this one.)