For this homework, do the following things:
Do the following for the MPD specified in the text
exercise 17.8:
- Implement the MPD using the tools from the lab. Run the four r values listed in the exercise and add a case
for r == -1 as well.
- Compute the first iterative values for cell (2,3) (i.e.,
the top middle cell) and then cell (2,2) (i.e., the middle cell)
where r == -1.
- For the optimal policy learned with r == -1, apply the
passive, ADP learning used in the lab. How close are the learned
utilities to those you derived using value iteration in part b?
Final project suggestion: Consider studying game theory and
comparing/contrasting it with AI in some problem domain; Peter Norvig
provides a nice overview in his AI class lecture.
Checking in
Submit the files specified above in Moodle under homework 10. We will grade your
work according to the following criteria:
- Exercise 1 Turn in the specified code and explanations. The
exercises are weighted equally.
The revised project proposal is submitted and graded separately;
upload it here: submission site