a, To understand how dopamine neurons compute reward prediction error, we first determined how dopamine neurons respond to various sizes of unexpected reward (schematized as orange curves). We then taught the mice to expect reward and observed how expectation shifted this dose–response (black curves). We modelled four types of shift: output subtraction (top left), input subtraction (bottom left), output division (top right), and input division (bottom right). Output subtraction was consistently the best fit. For equations, see Methods. Analysis adapted from a previous study39. b–e, Results from dopamine identification experiment. f–i, Results from GABA stimulation experiment. b, c, Results from all putative dopamine neurons (n = 84). ***P < 0.001, bootstrap. d, e, Results from light-identified dopamine neurons (n = 40). ***P < 0.001, bootstrap. f, g, Results from putative dopamine neurons in the GABA stimulation experiment (n = 45). *P < 0.05, bootstrap. h, i, Results from putative dopamine neurons in the GABA stimulation experiment, subtracting the 500 ms period immediately before reward delivery. This takes into account the laser-induced baseline shift in dopamine responses. *P < 0.05, bootstrap. b, d, f, h, Average responses (mean ± s.e.m. across neurons) to different sizes of reward, with fits for output subtraction (solid line) and output division (dotted line). c, e, g, i, Results of bootstrapping analysis. For each resample, we compared the mean squared error for the subtractive fit with the mean squared error for the divisive fit. Negative numbers favour subtraction. P values were calculated as the proportion of resamples in which division was a better fit than subtraction.