Lab 376.3: Implementing Self-Attention | CS 375-376 Spring 2026 at Calvin University

Warning: This content has not yet been fully revised for this year.

In this lab, you’ll trace through parts of the implementation of a Transformer language model, focusing on the self-attention mechanism. We’ll compare the performance of a Transformer model with a baseline that only uses a feedforward network (MLP).

This lab address the following course objectives:

[NC-Embeddings]
[NC-SelfAttention]
[NC-TransformerDataFlow]
[MS-LLM-Generation]
[MS-LLM-Tokenization]

It could also be used to address the following course objectives:

[MS-LLM-Train]
[MS-LLM-Compute]
[NC-Scaling]
[LM-SelfSupervised]
[CI-Topic-History]
[CI-LLM-Failures]

Task

Start with this notebook:

Implementing self-attention (name: u10n1-implement-transformer.ipynb; show preview, open in Colab)

You may find it helpful to refer to The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time.

Extension idea

measure how much this network speeds up when you move it to a GPU (you may need to torch.compile it first)
Other extensions are described on the Architectural Experimentation page.