Lab 376.3: Implementing Self-Attention

In this lab, you’ll trace through parts of the implementation of a Transformer language model, focusing on the self-attention mechanism. We’ll compare the performance of a Transformer model with a baseline that only uses a feedforward network (MLP).

This lab address the following course objectives:

It could also be used to address the following course objectives:

Task

Start with this notebook:

Implementing self-attention (name: u10n1-implement-transformer.ipynb; show preview, open in Colab)

You may find it helpful to refer to The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time.

Extension idea

Optional Extension: Architectural Experimentation
Exercise 376.2: Perplexity