Lab 02-1: Policy/Value Iteration
Understand Policy Iteration
Check my demo code here to get familiarized with policy iteration
Implement Value Iteration
Feel free to heavily adapt the policy iteration code to implement value iteration.
Verify
If you have correct code, assuming your final value function is stored in an array/list named V
, please copy the following code into your code to verify if the value function is corrected computed.
assert all(abs(v - 0.82352941) < 1e-5 for v in V[0:5]), "Value function is not calculated correctly"
assert all(abs(v - 0) < 1e-5 for v in V[[5,7,11,12,15]]), "Value function is not calculated correctly"
Write Report
Questions to answer in the report:
- (10 pts) Based on your understanding and experimental observation of the two methods, what are the pros and cons of value iteration compared to policy iteration?
Deliverables and Rubrics
Overall, you need to complete the environment installation and be able to run the demo code. You need to submit:
- (90 pts) PDF (exported from jupyter notebook) and python code. Your code must pass the
assert
statements I give above. - (10 pts) Reasonable answers to the questions.