Predicting future rewards depends critically on beliefs about the current state of the world. For example, suppose you go to a restaurant with two chefs, one who you’ve heard makes excellent food and one who you’ve heard makes terrible food, but you don’t know which chef is working at the moment. If the appetizer is delicious, then you can look forward excitedly to the rest of the meal, whereas if it is disgusting then you may await future courses with dread. But what happens if you get a passable appetizer? Now you have uncertainty about the chef, and hence your expectation about the rest of the meal lies somewhere between excitement and dread.
Assuming you don’t have any prior experience with this restaurant, you may wish to update your expectations about the chefs based on your visit. Reinforcement learning algorithms suggest a simple approach: calculate a prediction error (the discrepancy between observed and expected reward), and use it to update your expectations. Specifically, positive prediction errors mean that you got more than you expected, so you should increase your expectations, whereas negative prediction errors mean that you got less than you expected, so you should decrease your expectations. A substantial body of experimental work suggests that the firing of midbrain dopamine neurons reports these prediction errors.
The critical question addressed in our work was whether dopamine prediction error signals reflect state uncertainty: as your beliefs about the hidden state (e.g., which chef is cooking) change, will your dopamine signals accordingly? Intuitively, if you think the bad chef is cooking, then a mediocre appetizer will be pleasantly surprising (positive prediction error, dopamine burst), whereas the same appetizer will be unpleasantly surprising (negative prediction error, dopamine pause) if you think the good chef is cooking.
In lieu of a functioning rodent restaurant, we tested this hypothesis by training mice to expect the delivery of either small or big sugar water drops, two seconds after they smell an odor cue. It’s always the same odor which predicts water delivery, so those two states (small or big drop of sugar water) are ambiguous to the mice until they get their drop of sugar water. We then test the mice’s ability to infer the identity of the upcoming state by presenting intermediate-sized drops of sugar-water. Using theoretical predictions, we anticipated a non-monotonic relationship between reward magnitude and dopamine activity. Indeed, if mice use the two training volumes to estimate they are in one of those two states with a certain probability, then, in comparison, smaller intermediate sugar-water drops would be better than the expected small drop (producing a positive prediction error) and bigger intermediate drops would be worse than the expected big drop (resulting in a negative prediction error).