Choose from one of the following notebooks, or do the Reinforcement Learning activities at the bottom of this page.
Neural Net Architecture
- Why so big? Counting parameters in sequence models
(name:
u13n1-count-params.ipynb; show preview, open in Colab) - Models for Sequence Data
(name:
u13n2-seq-models.ipynb; show preview, open in Colab) - Programming with Self-Attention
(name:
u13n3-self-attention.ipynb; show preview, open in Colab)
Reinforcement Learning
Policy, Value, and Q functions
Open up Observable RL Playground
- Read through “Strategically Making Mistakes”.
- What does a low epsilon do? What does a high epsilon do?
- Try editing the
maze =definition to edit the environment. What does it take to get the agent to tolerate a short-term negative reward to achieve a higher long-term reward?
Older activity that doesn’t work anymore
Go to the “Playground” at the bottom of this article.
- Change the Algorithm to Q-Learning. We won’t look at the others at this time.
- Try each of the “Visualization” options. What does each one show? Each one is a different function
- Add one agent. How does completing an episode affect each of the functions that the agent is learning?
Exploration
- Set the Explore-Exploit slider all the way to Explore. What do you notice about the agent’s behavior?
- Set it all the way to Exploit. What do you notice now?
This environment isn’t rich enough for exploration to help much. So: go to a different playground, where we can actually edit the environment and see what the agent learns.