Lab: RL, Transformers, or other topics | CS 375-376 Spring 2026 at Calvin University

Warning: This content has not yet been fully revised for this year.

Choose from one of the following notebooks, or do the Reinforcement Learning activities at the bottom of this page.

Neural Net Architecture

Why so big? Counting parameters in sequence models (name: u13n1-count-params.ipynb; show preview, open in Colab)
Models for Sequence Data (name: u13n2-seq-models.ipynb; show preview, open in Colab)
Programming with Self-Attention (name: u13n3-self-attention.ipynb; show preview, open in Colab)

Reinforcement Learning

Policy, Value, and Q functions

Open up Observable RL Playground

Read through “Strategically Making Mistakes”.
What does a low epsilon do? What does a high epsilon do?
Try editing the maze = definition to edit the environment. What does it take to get the agent to tolerate a short-term negative reward to achieve a higher long-term reward?

Older activity that doesn’t work anymore

Go to the “Playground” at the bottom of this article.

Change the Algorithm to Q-Learning. We won’t look at the others at this time.
Try each of the “Visualization” options. What does each one show? Each one is a different function
Add one agent. How does completing an episode affect each of the functions that the agent is learning?

Exploration

Set the Explore-Exploit slider all the way to Explore. What do you notice about the agent’s behavior?
Set it all the way to Exploit. What do you notice now?

This environment isn’t rich enough for exploration to help much. So: go to a different playground, where we can actually edit the environment and see what the agent learns.