Department News

Understanding Dopamine Neurons on Multiple Timescales

Understanding Dopamine Neurons on Multiple Timescales

A new paper in Nature (PDF) presents intriguing science on how dopamine neurons operate across multiple timescales to guide learning and decision-making. The study provides compelling experimental evidence for a computational model that could reshape our understanding of dopamine-based reinforcement learning in the brain. This research was co-led by Paul Masset, formerly of Naoshige Uchida‘s MCB lab and now an Assistant Professor at McGill University, in collaboration with the group of Alexandre Pouget at the University of Geneva.

A Paradigm Shift in Reinforcement Learning

Reinforcement learning, a framework widely used in both neuroscience and artificial intelligence, explains how agents—biological or artificial—learn from rewards. Traditionally, these models assume that future rewards are discounted at a constant rate, meaning that the value of a reward diminishes exponentially over time. However, real-world decision-making often appears to follow a pattern known as hyperbolic discounting, where immediate rewards are discounted more steeply than those in the far future.

The research led by Masset and colleagues provides strong evidence that each dopamine neuron follows an exponential discounting with a fixed rate, but the discount rates differ across dopamine neurons. “Instead of representing the value of things at a single timescale, we show that different dopamine neurons apply different discount rates in parallel,” Masset explains.

Uchida ads, “Previous theoretical studies suggested that the sum of exponential discounting with different discount factors might generate hyperbolic discounting. Our results demonstrating diverse discount factors across dopamine neurons seem to support such an idea.”

Experimental Proof and Computational Implications

To test this hypothesis, the team performed neural recordings in mice, analyzing dopamine neuron activity across two behavioral tasks. These experiments revealed a diversity in how individual neurons discount values over time, i.e. whether they are short- or far-sighted . “We found that some neurons primarily care about immediate outcomes, while others track rewards over much longer periods,” says Masset. “This variability was previously thought to be noise, but we now believe it’s a fundamental feature of how the brain optimizes learning.”

The computational side of the study, carried out in collaboration with Pablo Tano, a graduate student in Pouget’s research team, explored the advantages of this multi-timescale reinforcement learning model in artificial neural networks. The researchers demonstrated that reinforcement learning systems incorporating multiple discounting rates were more efficient at solving complex learning tasks than those relying on a single discount factor. “We used artificial agents to show that multi-timescale agents have inherent computational advantages, particularly in environments where reward contingencies shift over time,” Masset notes.

Beyond the Lab: Implications for Behavior and Disease

The implications of this discovery extend beyond basic neuroscience into fields such as psychology, economics, and medicine. One particularly intriguing application is in understanding impulsivity and addiction. “Changes in discounting behavior are often observed in individuals with substance use disorders,” Masset explains. “By identifying the neural mechanisms that generate different timescales of valuation, we might uncover new therapeutic targets for conditions where decision-making is impaired.”

Another critical direction is determining how these timescales are adjusted in response to environmental uncertainty. “If you’re in an unpredictable environment, it makes sense to prioritize immediate rewards,” Masset notes. “Conversely, in stable environments, planning further into the future is beneficial. Understanding how the brain flexibly adapts to these conditions is one of our next research questions.”

A Collaborative Effort in Scientific Discovery

Notably, this study was part of a unique collaborative effort, with another complementary paper from the research group of Joseph Paton at the Champalimaud Center in Lisbon published alongside it. “Rather than competing, we coordinated our work, ensuring that our findings built upon each other,” says Masset. “Both studies begin with similar foundational observations but diverge into distinct computational and behavioral analyses, offering a more comprehensive picture of multi-timescale learning. Uchida appreciates the convergence of independent studies, saying “the two groups performed experiments independently and arrived at the same main conclusions, essentially replicating one another. This is very satisfying and I hope people will appreciate this as a good model of doing science.” 

With reinforcement learning models playing a growing role in artificial intelligence and neurobiology alike, these findings could influence not just our understanding of the brain but also the development of more sophisticated AI systems capable of human-like learning adaptability.

The team continues to collaborate to build upon these findings. “We’ve demonstrated that the brain has the machinery for multi-timescale learning,” says Masset. “The next step is to understand how it leverages this machinery in real-world decision-making.”

PDF

(l to r) HyungGoo Kim, Athar N. Malik, Paul Masset, and Nao Uchida

(l to r) HyungGoo Kim, Athar N. Malik, Paul Masset, and Nao Uchida