Q&A Week 12

Warning: This content has not yet been fully revised for this year.

This covers both the RL unit and Human-Centered AI part 1.

RL

Different Approaches to RL

Main difference is what functions we learn:

What’s the “loss” (or target) in RL?

That’s what makes it hard! e.g,. in Q-learning we try to minimize the temporal difference: how much the reward we get differs from the reward we predict (by subtracting the next-state value function from the current-state value function). But that’s a difference of two predictions; if we were wrong, which of those two predictions was wrong?

In general, we’re hoping to learn something about all possible things that could happen and things we could do, given data about only a fraction of what happened and things we did.

They’re good at different things:

So, unsurprisingly, the state of the art often combines both! See, e.g., MuZero.

Can an agent trained in simulation be trusted in the real world?

Hm. Pro:

Con:

Do human newborns learn by RL?

Maybe somewhat, but not really:

Interpretable AI

Why can we ever trust a model if we can’t see how it’s making its decisions?

We trust human doctors routinely, even though, despite decades of effort by cognitive scientists, we have very limited knowledge about the process by which people make their decisions.

Is there always a trade-off between understandability and accuracy?

No.

So why are we only learning about this now? Good question…

What’s CART?

The classic algorithm for learning decision trees.

Other

What’s dropout?

How do you get bitwise determinism?

Discussion: Fans and Skeptics, Optimists and Pessimists