离线学习
如果我们已经有大量的实验数据,如何从中学出最佳策略?这就是离线学习。
课程材料
- 滑铁卢 CS885 RL PPT
- Berkeley CS285 Lec 15: Offline Reinforcement Learning (Part 1), slides, Youtube Video
- Berkeley CS285 Lec 16: Offline Reinforcement Learning (Part 2), slides, Youtube Video
- 上海交大伯禹增强学习 Lec 11 离线强化学习
论文
斯坦福 CS 224r 论文 Offline RL
- Conservative Q-Learning for Offline Reinforcement Learning. Kumar et al. (2020)
- COMBO: Conservative Offline Model-Based Policy Optimization. Yu et al. (2021)
- Offline Reinforcement Learning with Implicit Q-Learning. Kostrikov et al. (2021)
滑铁卢 论文
- Levine, Kumar, Tucker, Fu (2021) Offline reinforcement learning: Tutorial, review, and perspectives on open problems, arxiv.
- Kumar, Zhou, Tucker, Levine (2020) Conservative Q-Learning for Offline Reinforcement Learning, NeurIPS.
练习
- 上海交大伯禹增强学习 练习 第18章-离线强化学习.ipynb
- CS886 练习 2,conservative Q-Learning
- 伯克利 CS285 HW 5: Exploration and Offline reinforcement learning, Website
- 斯坦福 CS224R DRL HW 3: Offline Mujoco
课本材料
N/A
Index | Previous | Next |