挑战
增强学习依然有很多问题没有解决。具体挑战如下:
- 我们拥有可以从大量数据中学习的出色方法
- 我们有很棒的强化学习优化方法
- 我们(还)没有同时使用数据和强化学习的惊人方法
- 人类的学习速度非常快,但深度强化学习方法通常很慢
- 人类重用过去的知识,强化学习中的迁移学习是一个悬而未决的问题
- 不清楚奖励函数应该是什么
- 不清楚预测的作用是什么
定理证明
- 上海交大伯禹增强学习 Lec 16 强化学习部分定理证明
大模型
- 上海交大伯禹增强学习 Lec 15 决策智能大模型(英文)
参数化行动空间
- 上海交大伯禹增强学习 Lec 12 参数化行动空间
Batch 增强学习
- Stanford CS234 RL Lecture 13/14/15: Batch Reinforcement Learning
研究现状
- DeepMind UCL Hadovan RL 2021 Lec 12 Deep RL 1
- DeepMind UCL Hadovan RL 2021 Lec 13 Deep RL 2
Hierarchical RL and Skill Discovery
斯坦福 CS 224r PPT 和论文
- Data-Efficient Hierarchical Reinforcement Learning. Nachum et al. (2018)
- Diversity is All You Need: Learning Skills without a Reward Function. Eysenbach et al. (2018)
- Dynamics-Aware Unsupervised Discovery of Skills. Sharma et al. (2019)
- Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning. Gupta et al. (2019)
Bayesian RL
滑铁卢 CS885 RL PPT
论文:
- Michael O’Gordon Duff’s PhD Thesis (2002)
- Vlassis, Ghavamzadeh, Mannor, Poupart, Bayesian Reinforcement Learning (Chapter in Reinforcement Learning: State-of-the-Art), Springer Verlag, 2012
Maximum entropy RL
滑铁卢 CS885 RL PPT
论文:
- Haarnoja, Tang, Abbeel, Levine (2017) Reinforcement Learning with Deep Energy-Based Policies, ICML.
- Haarnoja, Zhou, Abbeel, Levine (2018) Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor, ICML.
Soft 策略迭代
- 上海交大伯禹增强学习 练习 第14章-SAC算法.ipynb
伯克利 Berkeley CS285
- Lec 17: Reinforcement Learning Theory Basics, slides, Youtube Video
- Lec 18: Variational Inference and Generative Models, slides, Youtube Video
- Lec 19: Connection between Inference and Control, slides, Youtube Video
- Lec 23: Challenges and Open Problems, slides, Youtube Video
Index | Previous | Next |