Nao Uchida (l) and Neir Eshel
Say you’re at a supermarket, staring at two cartons of ice cream: chocolate and caramel. Before making your choice, you try to predict which will be more delicious. Wasn’t the caramel a bit too sweet last time? Wait, wasn’t the chocolate a little bitter? You hem and haw, and then choose the one you expect to be better.
Our new study demonstrates how the brain makes this type of prediction and uses it to optimize decisions.
We recorded from neurons deep in the brain while mice performed simple tasks. The animals had to learn the association between different odors and different rewards. Rather than ice cream, the researchers used water, which was rewarding to the thirsty mice. Usually, the mice would receive the reward they expected. Occasionally, however, the reward would be bigger or smaller. In those cases when the outcome was different from predicted, the chemical dopamine became especially important. If reward was bigger than predicted, dopamine neurons increased their activity. If reward was smaller than predicted, dopamine neurons decreased their activity. And if reward was the same as predicted, the neurons made no changes. In this way, dopamine neurons calculated the difference between expected and actual reward.
This pattern of responses is called ‘reward prediction error’, and dopamine neurons have been known to calculate it for over 20 years. It is thought that this signal is crucial for animals, including humans, to improve their predictions over time, allowing us to maximize reward (and the chance for a truly delicious ice cream dessert). However, it was never known how dopamine neurons make this calculation. In particular, how do dopamine neurons know how much reward to expect?
In our paper, published this week in the journal Nature, we discovered that a group of neurons intermingled with dopamine neurons provide the expectation signal. A previous paper from our lab had shown that when reward was expected, these inhibitory neurons (called GABA neurons) became active. But it was unknown whether dopamine neurons use this signal to calculate prediction error. In the paper published this week, we artificially increased the activity of GABA neurons, using a technique called optogenetics that makes neurons sensitive to light shined through a fiber-optic in the brain. When we did so, we found that dopamine neuron activity was reduced, as if reward was expected, even though it was not. Conversely, if we artificially decreased the activity of the GABA neurons, dopamine neuron activity was increased, as if the previously expected reward had become surprising. In other words, shifting the level of activity in GABA neurons appeared to shift the level of expectation reflected by the dopamine neurons.
These manipulations also affected mouse behavior. When we artificially increased GABA neuron activity on both sides of the brain, thereby artificially increasing the level of expectation, mice acted as if they were disappointed by the reward they got. The same reward that used to cause high levels of anticipation no longer elicited any anticipation when GABA activity was increased.
Finally, we designed an experiment to understand exactly how this prediction error calculation is made. We gave the mice different sizes of reward and plotted how dopamine neurons respond to these different sizes. Then we taught the mice to expect reward, and watched how expectation shifts the dopamine response. It turns out that dopamine neurons simply subtract the expectation signal, which we now know comes from GABA neurons. This is consistent with classic learning theories, but actually quite surprising in the brain. There are very few other examples where neurons seem capable of pure addition or subtraction; instead, the brain generally works through multiplication or division. In this case, though, subtraction allows for a precise and consistent calculation, and appears to be exactly what the brain evolved to do.
Together, our experiments demonstrate how a small circuit deep in the brain makes a simple calculation that enables a crucially important behavior: learning what’s good and what isn’t.