Although there has been considerable debate about the existence of metarepresentational capacities in non-human animals and their scope in humans, the well-confirmed temporal difference reinforcement learning models of reward-guided decision making have been largely overlooked. This paper argues that the reward prediction error signals which are postulated by temporal difference models and have been discovered empirically through single unit recording and neuroimaging do have metarepresentational contents.