Understand Policy Iteration

Check my demo code here to get familiarized with policy iteration

Implement Value Iteration

Feel free to heavily adapt the policy iteration code to implement value iteration.

Verify

If you have correct code, assuming your final value function is stored in an array/list named V, please copy the following code into your code to verify if the value function is corrected computed.

assert all(abs(v - 0.82352941) < 1e-5 for v in V[0:5]), "Value function is not calculated correctly"
assert all(abs(v - 0) < 1e-5 for v in V[[5,7,11,12,15]]), "Value function is not calculated correctly"

Write Report

Questions to answer in the report:

  • (10 pts) Based on your understanding and experimental observation of the two methods, what are the pros and cons of value iteration compared to policy iteration?

Deliverables and Rubrics

Overall, you need to complete the environment installation and be able to run the demo code. You need to submit:

  • (90 pts) PDF (exported from jupyter notebook) and python code. Your code must pass the assert statements I give above.
  • (10 pts) Reasonable answers to the questions.