This lab is designed to help you make progress towards the following course objectives:
- [MS-LLM-Tokenization] I can explain the purpose, inputs, and outputs of tokenization.
- [MS-LLM-TokenizationImpact] I can analyze how tokenization choices affect the performance of an LLM.
Work through the following notebook. (No accelerator is needed. Either Kaggle or Colab is fine; if you use Colab, remember to “Copy to Drive”.)
- Tokenization
(name:
u08n1-tokenization.ipynb; show preview, open in Colab)
If you finish, you may get started on next week’s notebook:
Logits in Causal Language Models
(name: u09n1-lm-logits.ipynb; show preview,
open in Colab)