基于分布的学习

如果模型的策略不是返回一个“动作”，而是返回一组动作中，各个动作的概率，这就是基于分布的学习。

课程材料

滑铁卢 CS885 RL PPT Distributional RL

论文

滑铁卢论文

Bellemare, Dabney, Munos. A distributional perspective on reinforcement learning. ICML. 2017.
Bellemare, Dabney, Rolland. Distributional Reinforcement Learning, MIT Press, 2023.

练习

CS885 练习 3，categorical (C51) distributional RL algorithm

课本材料

N/A