Episodic Control through Meta-Reinforcement Learning

AbstractRecent research has placed episodic reinforcement learning (RL) alongside model-free and model-based RL on the list of processes centrally involved in human reward-based learning. In the present work, we extend the unified account of model-free and model-based RL developed by Wang et al. (2017) to further integrate episodic learning. In this account, a generic model-free "meta-learner" learns to deploy and coordinate all of these RL algorithms. The meta-learner is trained on a broad set of novel tasks with limited exposure to each task, such that it learns to learn about new tasks. We show that when equipped with an episodic memory system inspired by theories of reinstatement and gating, the meta-learner learns to use the episodic and model-based RL algorithms observed in humans in a task designed to dissociate among the influences of these learning algorithms. We discuss implications and predictions of the model.


Return to previous page