Warning: This content has not yet been fully revised for this year.
In this lab, you’ll trace through parts of the implementation of a Transformer language model, focusing on the self-attention mechanism. We’ll compare the performance of a Transformer model with a baseline that only uses a feedforward network (MLP).
This lab address the following course objectives:
- [NC-Embeddings]
- [NC-SelfAttention]
- [NC-TransformerDataFlow]
- [MS-LLM-Generation]
- [MS-LLM-Tokenization]
It could also be used to address the following course objectives:
- [MS-LLM-Train]
- [MS-LLM-Compute]
- [NC-Scaling]
- [LM-SelfSupervised]
- [CI-Topic-History]
- [CI-LLM-Failures]
Task
Start with this notebook:
Implementing self-attention
(name: u10n1-implement-transformer.ipynb; show preview,
open in Colab)
You may find it helpful to refer to The Illustrated GPT-2 (Visualizing Transformer Language Models) – Jay Alammar – Visualizing machine learning one concept at a time.
Extension idea
- measure how much this network speeds up when you move it to a GPU (you may need to
torch.compileit first) - Other extensions are described on the Architectural Experimentation page.