Project Guide
Part 1: Paper Understanding
1.1 How to Read Papers
Watch this excellent video lecture by Dr. Andrew Ng. It offers valuable insight not only on reading research papers but also on career advice.
Here is a summary article of the lecture.
1.2 Paper Pool
Below is a curated list of research papers suitable for this course. In addition to the original papers, related articles and tutorials are provided to support your learning.
Phasic Policy Gradient
Dueling Network Architectures for Deep Reinforcement Learning
Curiosity-driven Exploration by Self-supervised Prediction
Unifying Count-Based Exploration and Intrinsic Motivation
Model-Based Value Expansion for Efficient Model-Free Reinforcement Learning
Model-Based Reinforcement Learning via Meta-Policy Optimization
Feel free to suggest RL papers you find interesting. Good candidate papers should be highly cited, published in top-tier conferences (NeurIPS, ICML, ICLR), and well-written with clear motivation, method, and experiments.
1.3 Paper Presentation
Each group will be assigned one paper from the pool above. You are expected to read the paper carefully (most likely multiple passes) and prepare a slide deck to present it to the class.
Format: 15 minutes + discussion. All group members must participate in the presentation. Divide the presentation time equally among members (e.g., a group of 4 should have each member present for about 7 minutes).
Your presentation should address the following questions:
- What problem does the paper aim to solve?
- Why is the proposed method novel compared to prior approaches?
- What is the motivation or intuition behind the proposed idea?
- What experiments does the paper conduct to evaluate the proposed method?
- What additional experiments would be interesting or necessary that the paper does not include?
- What are your critiques of the paper?
Paper Presentation Rubrics
All presentations will be evaluated by your peers. The rubric below serves as guidance for both presenters and the audience.
| Item | Points |
|---|---|
| Motivation: Is the problem and its significance clearly presented? | 20 |
| Clarity: Is the general idea (non-mathematical aspects) of the paper understandable from the presentation? | 30 |
| Depth: Does the presentation clearly explain the mathematical models involved? | 30 |
| Discussion: Do the presenters provide accurate and thoughtful answers to audience questions? | 20 |
Part 2: Paper Implementation
2.1 Implementation Requirements
Each team is required to implement the core algorithm described in their assigned paper and evaluate it in a chosen application or environment. This implementation serves as your baseline.
Beyond the baseline, each team must design and implement two enhancements that have meaningful scientific value and demonstrably improve performance over the baseline. The enhancements should reflect genuine methodological contributions — trivial modifications do not count. For example, simply changing the neural network architecture (e.g., adding layers or switching activation functions) is not considered scientifically novel. You must discuss your proposed enhancements with the instructor to confirm they are non-trivial before proceeding.
For a team of four, it is acceptable to divide into two sub-teams of two, with each sub-team responsible for one enhancement.
All enhancements must be rigorously evaluated and compared against the baseline using appropriate metrics (e.g., cumulative reward, convergence speed, stability). Present clear visualizations (learning curves, tables) that highlight the differences.
2.2 Final Presentation
The final presentation focuses on your implementation results, with particular emphasis on the baseline experiment, the design rationale of your enhancements, and a rigorous evaluation of the enhancements against the baseline. The presentation time is 15 minutes + discussion.
Your presentation should address the following:
- Implementation details:
- What neural networks are involved?
- What are the loss functions used to optimize each network?
- How are the policy and value networks updated?
- Challenges: What were the major difficulties in implementing and debugging the code?
- Baseline experiment (central focus):
- What environment did you choose, and why is it appropriate for evaluating this algorithm?
- What hyperparameters, training budget (episodes/steps), and evaluation protocol did you use?
- How does your implementation of the original paper’s algorithm perform? Report learning curves, final performance, and variance across multiple random seeds.
- How do you know your baseline is correctly implemented? (e.g., comparison with published results, sanity checks, ablations)
- Enhancement design (central focus):
- What is the motivation for each enhancement? What specific limitation of the baseline does it address?
- What is the scientific hypothesis behind each enhancement (i.e., why do you expect it to improve performance)?
- What are the design choices (algorithmic, architectural, or training-related), and what alternatives did you consider?
- How did you confirm the enhancement is non-trivial (per the discussion with the instructor)?
- Evaluation against the baseline (central focus):
- Use the same environment, evaluation protocol, and random seeds as the baseline so that comparisons are fair and controlled.
- Report appropriate metrics: cumulative reward, convergence speed, sample efficiency, stability/variance, and any algorithm-specific metrics (e.g., loss curves, entropy, KL divergence).
- Present clear visualizations (learning curves with confidence intervals, comparison tables) that highlight where the enhancement helps, hurts, or has no effect.
- Interpret the results: do they confirm or refute your hypothesis? What does this tell you about the underlying algorithm?
- Future directions: What further improvements or research directions do you envision?
2.3 Final Presentation Rubrics
The final presentation will be evaluated by the instructor using the following rubric.
| Item | Points |
|---|---|
| Implementation Effort & Correctness: Are the models, neural networks, and training procedures implemented correctly? | 20 |
| Baseline Experiment: Is the baseline rigorously implemented and evaluated (appropriate environment, protocol, multiple seeds, sanity-checked against the paper)? | 25 |
| Enhancement Design: Is each enhancement well-motivated, scientifically meaningful, and clearly explained (hypothesis, design choices, alternatives considered)? | 25 |
| Evaluation Against Baseline: Is the comparison fair and controlled? Are appropriate metrics, visualizations, and interpretations provided? | 20 |
| Vision: Does the presentation propose reasonable future work grounded in the experimental findings? | 10 |