Environment Setup

Install VSCode

Download and install this fantastic editor

Install Extensions:

  • Python
  • Jupyter
  • Install Extension Remote - WSL if you use WSL2.

Install Anaconda

Go to the website https://www.anaconda.com/products/individual#Downloads. Find the Installer Link for Linux. Don’t download it directly, because this will download it to your Windows system. Instead, right-click the link and copy it. Then go to the terminal of your Ubuntu, and run the following command:

wget <THE LINK YOU COPIED>

For the 2122Fall quarter, the command looks like this:

wget https://repo.anaconda.com/archive/Anaconda3-2021.05-Linux-x86_64.sh

Alternatively, go to https://repo.anaconda.com/archive/ to find the correct script for your system.

After downloading the installer, run it by

bash THE_INSTALLER_FILENAME

Then go thru the questionnaires to finish the installation.

Create a new virtual environment:

conda update conda
conda create -n drl python=3.8

Install Libs

Activate the conda environment by typing

source activate drl

or

conda activate drl
  • Install PyTorch: Find out more Pytorch installation options on the official website. If you have GPU enabled, make sure to install the correct version.
  • Install other libraries: pip install tqdm gym pyglet==1.5.11 seaborn==0.11.0
  • Install OpenGL: sudo apt install python-opengl
  • Install jupyter: conda install jupyter notebook

Install X server for GUI apps if you haven’t.

  • Download (VcXsrv)[https://sourceforge.net/projects/vcxsrv/] and install
  • Locate the XLaunch shortcut in the Start Menu, and click it
  • Install GLUT with sudo apt-get install freeglut3-dev
  • Run this command export DISPLAY=$(cat /etc/resolv.conf | grep nameserver | awk '{print $2}'):0
  • Go to Windows Security -> Firewall & network protection -> Allow an app through firewall. Make sure VcXsrv has both public and private checked.
  • Launch VcXsrv with “Disable access control” ticked

Verify your setup by running the code

Download the demo code HERE and try to run it on your computer.

Remember to launch the Xlaunch first if you are using WSL2 because the code invokes a GUI app.

Write Report

Each lab, you’re required to write a report to

  • demonstrate your code result
  • answer a few questions to show your understanding of the content

Please go thru this Submission Guideline to get familiarized the overall process.

For this lab, you should include the following items in the report:

  • Answer to the question: What are the cons and pros of CEM for RL?
  • A picture of the learning curve of CEM on CartPole-v1

Extra Thing for Full Credit

So far, your effort is mostly on installing the environment. Ideally, I want you to learn more about RL and CEM thru this lab.

CEM for Continuous Control

The demo shows you how to use CEM to run a discrete case, meaning the action is categorical (either 0 or 1 in Cartpole-v1). Now, I want you to modify the code to run a continuous task - Pendulum-v0, in which the action takes a number ranging from -2 to 2.

There are several places you need to change:

  • To get the dimension of the action space, use env.action_space.shape[0].
  • Change the parameters self.W and self.b so that the output is a signal scalar number from -2 to 2.
    • To bound (or squash) the range, feel free to put a 2*np.tanh() around the output.

Debug: you will encounter a bug that is due to incorrect data dimension. Use the opportunity to practice your python debugging skill.

Hint: squeeze() is a very common function to remove redundant dimensions.

When you have done this part, plot the learning curve of Pendulum-v0 into your report and briefly share your thoughts on it: what is the performance looks like, and why it is this way.

Deliverables and Rubrics

Overall, you need to complete the environment installation and be able to run the demo code. You need to submit:

  • (90 pts) A PDF from running the demo code in jupyter notebook with embedded learning curve picture.
  • (10 pts) If you finish the continuous control part, submit the additional PDF from running the modified code (also with the learning curve picture).