Dopamine neurons response during reward-based learning reflects temporal shift of temporal difference model

Feb 17, 2023

Title: A gradual temporal shift of dopamine responses mirrors the progression of temporal difference error in machine learning

Journal: Nature Neuroscience (2022) 25:1082–1092. Link: https://www.nature.com/articles/s41593-022-01109-2

Comments: Anticipating the future reward is very important for the survival of the animal. For this purpose, the animals learn to predict the different outcomes from the environment and the dopamine neurons are the ones which play an important role in associative learning. When the animals start to learn and associate a reward with a specific cue, a change in the amplitude of the dopamine neurons activity occurs, these neurons gradually decrease their responses to the reward and increase their responses after the cue is given. Since this phenomenon is similar to the prediction error term in animal learning models, it is thought that the dopamine neurons transmit reward prediction errors. This type of learning is like the algorithms the machine learning uses and one of the most influential is the temporal difference (TD) algorithm. This model computes prediction errors for every trial based on the change in values and rewards received. A feature of TD learning is that the timing of TD errors moves gradually backward from the timing of reward to cue. Despite these similarities, previous studies couldn’t observe this backward shift in the dopamine neurons activity. So, in this study, the authors carried out some experiments using optical techniques (fiber photometry) and dopamine biosensors to examine if a gradual backward shift occurs in the dopamine response. They registered the activity of dopamine neuron axons in the ventral striatum while the animals learned to associate odor cues with water reward. During learning, they observed an increase in cue responses and decrease in reward responses in the dopamine neurons. Also, using the dopamine biosensor, they observed a gradual shift in the activity peak timing as the trials increases. In addition, they performed similar experiments with reversal learning and observed a temporal shift in dopamine responses too. With these results, the authors demonstrated that a gradual backward shift of the timing of dopamine neurons activity can be detected in learning paradigms. Also, these provide evidence supporting a TD error account of dopamine activity.

Dopamine neurons response during reward-based learning reflects temporal shift of temporal difference model

Laura Ayaka Noguera Oishi

大学院生