Part 1: Paper Understanding

1.1 How to Read Papers

Watch this excellent video lecture by Dr. Andrew Ng. It offers valuable insight not only on reading research papers but also on career advice.

Here is a summary article of the lecture.

1.2 Paper Pool

Below is a curated list of research papers suitable for this course. In addition to the original papers, related articles and tutorials are provided to support your learning.

Phasic Policy Gradient

Dueling Network Architectures for Deep Reinforcement Learning

Curiosity-driven Exploration by Self-supervised Prediction

Unifying Count-Based Exploration and Intrinsic Motivation

Model-Based Value Expansion for Efficient Model-Free Reinforcement Learning

Model-Based Reinforcement Learning via Meta-Policy Optimization

Feel free to suggest RL papers you find interesting and well-written.

1.3 Paper Presentation

Each group will be assigned one paper from the pool above. You are expected to read the paper carefully (most likely multiple passes) and prepare a slide deck to present it to the class.

Format: 20 minutes + discussion. All group members must participate in the presentation. Divide the presentation time equally among members (e.g., a group of 4 should have each member present for about 7 minutes).

Your presentation should address the following questions:

  • What problem does the paper aim to solve?
  • Why is the proposed method novel compared to prior approaches?
  • What is the motivation or intuition behind the proposed idea?
  • What experiments does the paper conduct to evaluate the proposed method?
  • What additional experiments would be interesting or necessary that the paper does not include?
  • What are your critiques of the paper?

Paper Presentation Rubrics

All presentations will be evaluated by your peers. The rubric below serves as guidance for both presenters and the audience.

Item Points
Motivation: Is the problem and its significance clearly presented? 20
Clarity: Is the general idea (non-mathematical aspects) of the paper understandable from the presentation? 30
Depth: Does the presentation clearly explain the mathematical models involved? 30
Discussion: Do the presenters provide accurate and thoughtful answers to audience questions? 20

Part 2: Paper Implementation

2.1 Implementation Requirements

Each team is required to implement the core algorithm described in their assigned paper and evaluate it in a chosen application or environment. This implementation serves as your baseline.

Beyond the baseline, each team must design and implement two enhancements that have meaningful scientific value and demonstrably improve performance over the baseline. The enhancements should reflect genuine methodological contributions — trivial modifications do not count. For example, simply changing the neural network architecture (e.g., adding layers or switching activation functions) is not considered scientifically novel. You must discuss your proposed enhancements with the instructor to confirm they are non-trivial before proceeding.

For a team of four, it is acceptable to divide into two sub-teams of two, with each sub-team responsible for one enhancement.

All enhancements must be rigorously evaluated and compared against the baseline using appropriate metrics (e.g., cumulative reward, convergence speed, stability). Present clear visualizations (learning curves, tables) that highlight the differences.

2.2 Final Presentation

The final presentation focuses on your implementation results. The presentation time is 15 minutes + discussion.

Your presentation should address the following:

  • Implementation details:
    • What neural networks are involved?
    • What are the loss functions used to optimize each network?
    • How are the policy and value networks updated?
  • Challenges: What were the major difficulties in implementing and debugging the code?
  • Baseline performance:
    • How does your implementation of the original paper’s algorithm perform in your chosen environment?
  • Enhancement results:
    • How does each enhancement compare to the baseline (the original paper’s algorithm)?
    • Given the experimental results (e.g., losses, rewards), how do you interpret the differences or similarities?
  • Future directions: What further improvements or research directions do you envision?

2.3 Final Presentation Rubrics

The final presentation will be evaluated by the instructor using the following rubric.

Item Points
Implementation Effort: How challenging was it to implement the method? 20
Correctness: Are the models and neural networks correctly trained? 40
Performance: Do the experimental results match expectations? 30
Vision: Does the presentation propose reasonable future work? 10