Lab 01: Cross-Entropy Method
Environment Setup
Install VSCode
Download and install this fantastic editor
Install Extensions:
- Python
- Jupyter
- Install Extension Remote - WSL if you use WSL2.
Install Anaconda
Go to the website https://www.anaconda.com/products/individual#Downloads. Find the Installer Link for Linux. Don’t download it directly, because this will download it to your Windows system. Instead, right-click the link and copy it. Then go to the terminal of your Ubuntu, and run the following command:
wget <THE LINK YOU COPIED>
For the 2122Fall quarter, the command looks like this:
wget https://repo.anaconda.com/archive/Anaconda3-2021.05-Linux-x86_64.sh
Alternatively, go to https://repo.anaconda.com/archive/ to find the correct script for your system.
After downloading the installer, run it by
bash THE_INSTALLER_FILENAME
Then go thru the questionnaires to finish the installation.
Create a new virtual environment:
conda update conda
conda create -n drl python=3.8
Install Libs
Activate the conda environment by typing
source activate drl
or
conda activate drl
- Install
PyTorch
: Find out more Pytorch installation options on the official website. If you have GPU enabled, make sure to install the correct version. - Install other libraries:
pip install tqdm gym pyglet==1.5.11 seaborn==0.11.0
- Install OpenGL:
sudo apt install python-opengl
- Install jupyter:
conda install jupyter notebook
Install X server for GUI apps if you haven’t.
- Download (VcXsrv)[https://sourceforge.net/projects/vcxsrv/] and install
- Locate the XLaunch shortcut in the Start Menu, and click it
- Install GLUT with
sudo apt-get install freeglut3-dev
- Run this command
export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):0
- Go to Windows Security -> Firewall & network protection -> Allow an app through firewall. Make sure VcXsrv has both public and private checked.
- Launch VcXsrv with “Disable access control” ticked
Verify your setup by running the code
Download the demo code HERE and try to run it on your computer.
Remember to launch the Xlaunch
first if you are using WSL2 because the code invokes a GUI app.
Write Report
Each lab, you’re required to write a report to
- demonstrate your code result
- answer a few questions to show your understanding of the content
Please go thru this Submission Guideline to get familiarized the overall process.
For this lab, you should include the following items in the report:
- Answer to the question: What are the cons and pros of CEM for RL?
- A picture of the learning curve of CEM on
CartPole-v1
Extra Thing for Full Credit
So far, your effort is mostly on installing the environment. Ideally, I want you to learn more about RL and CEM thru this lab.
CEM for Continuous Control
The demo shows you how to use CEM to run a discrete case, meaning the action is categorical (either 0 or 1 in Cartpole-v1
). Now, I want you to modify the code to run a continuous task - Pendulum-v0
, in which the action takes a number ranging from -2 to 2.
There are several places you need to change:
- To get the dimension of the action space, use
env.action_space.shape[0]
. - Change the parameters
self.W
andself.b
so that the output is a signal scalar number from -2 to 2.- To bound (or squash) the range, feel free to put a
2*np.tanh()
around the output.
- To bound (or squash) the range, feel free to put a
Debug: you will encounter a bug that is due to incorrect data dimension. Use the opportunity to practice your python debugging skill.
Hint: squeeze() is a very common function to remove redundant dimensions.
When you have done this part, plot the learning curve of Pendulum-v0
into your report and briefly share your thoughts on it: what is the performance looks like, and why it is this way.
Deliverables and Rubrics
Overall, you need to complete the environment installation and be able to run the demo code. You need to submit:
- (90 pts) A PDF from running the demo code in jupyter notebook with embedded learning curve picture.
- (10 pts) If you finish the continuous control part, submit the additional PDF from running the modified code (also with the learning curve picture).