cs344 → reinforcement → homework10

For this homework, do the following things:

Do the following for the MPD specified in the text exercise 17.8:
1. Implement the MPD using the tools from the lab. Run the four r values listed in the exercise and add a case for r == -1 as well.
2. Compute the first iterative values for cell (2,3) (i.e., the top middle cell) and then cell (2,2) (i.e., the middle cell) where r == -1.
3. For the optimal policy learned with r == -1, apply the passive, ADP learning used in the lab. How close are the learned utilities to those you derived using value iteration in part b?

Final project suggestion: Consider studying game theory and comparing/contrasting it with AI in some problem domain; Peter Norvig provides a nice overview in his AI class lecture.

Checking in

Submit the files specified above in Moodle under homework 10. We will grade your work according to the following criteria:

Exercise 1 Turn in the specified code and explanations. The exercises are weighted equally.

The revised project proposal is submitted and graded separately; upload it here: submission site