Adolescents exhibit reduced Pavlovian biases on instrumental learning

Multiple learning systems allow individuals to flexibly respond to opportunities and challenges present in the environment. An evolutionarily conserved “Pavlovian” learning mechanism couples valence and action, promoting a tendency to approach cues associated with reward and to inhibit action in the face of anticipated punishment. Although this default response system may be adaptive, these hard-wired reactions can hinder the ability to learn flexible “instrumental” actions in pursuit of a goal. Such constraints on behavioral flexibility have been studied extensively in adults. However, the extent to which these valence-specific response tendencies bias instrumental learning across development remains poorly characterized. Here, we show that while Pavlovian response biases constrain flexible action learning in children and adults, these biases are attenuated in adolescents. This adolescent-specific reduction in Pavlovian bias may promote unbiased exploration of approach and avoidance responses, facilitating the discovery of rewarding behavior in the many novel contexts that adolescents encounter.


Logistic regression
To investigate the contributions of Pavlovian and instrumental learning across development, accuracy for the four different trial types (i.e. Go to Win, Go to Avoid Losing, No-Go to Win, No-Go to Avoid Losing) was assessed. We performed a mixed-effects logistic regression including valence (reward or punishment), action (go or no-go), and age (z-score transformed), as independent predictors of correct choice in the model using the lme4 package (Version 1.1-13) for R software 1,2 . The logistic regression was repeated including both age and age-squared (both z-score transformed) predictors to test which model provided a better fit. All interactions with age and age-squared were included as fixed-effects. Subject-specific adjustments to the fixed intercept were specified in the model. The terms of interest were the valence-by-action interaction (i.e., the degree of Pavlovian facilitation or interference) and the valence-by-actionby-age interaction (i.e., developmental changes in the interaction between Pavlovian and instrumental behaviors). The inclusion of age-squared as a predictor of correct choice significantly improved the model fit (ƒ 2 (4,13) = 39.464, p = 5.59 x 10 -8 ). Participants' accuracy was greater when Pavlovian reactive biases were aligned with correct instrumental responses, compared to when they were in conflict (valence-by-action interaction; Supplementary Table  S1). This Pavlovian bias effect also interacted with age and age-squared (valence-by-action-byage and valence-by-action-by-age-squared interactions; Supplementary Table S1.), revealing differential Pavlovian influences on instrumental action across development.

Response time analyses
We examined whether responses were faster when Pavlovian expectancies were aligned with instrumental contingencies to press a button as compared to in opposition. All "Go" responses regardless of the accuracy were log-transformed prior to being included in linear regressions. Similar to the logistic regression of correct choice, the response time analysis included cue valence (reward or punishment), correct action (go or no-go), and age (z-score transformed) as independent predictors in the model using lme4 package (Version 1.1-13) for R software 1,2 . Subject-specific adjustments to the fixed intercept were specified in the model. A linear model that included age as a predictor of response time significantly improved the fit compared to a model without the inclusion of an age term (ƒ 2 (4) = 20.06, p = .0005). Participants were faster to respond to reward-associated stimuli, when the correct response was "Go," and with increasing age (Supplementary Table S4).

Correlation between Pavlovian parameter estimates and Pavlovian performance bias values
To ensure that the Pavlovian parameter values estimated from the computational model are aligned with the Pavlovian performance bias values, we performed a non-parametric Spearman's rank correlation between the two variables. As we hypothesized a positive relationship between the two, we used a one-sided test. As expected, this analysis revealed a robust positive correlation between the two distinct ways of calculating Pavlovian bias (ρ(59) = .531, p < .001).

Relationship between Reward Invigoration and Punishment Suppression
The Pavlovian performance bias reflects both a reward-based invigoration of action and a punishment-based suppression of action. To confirm that the Pavlovian performance bias was being driven by reflexive responses to both reward and punishment, we performed a Pearson correlation between the reward-based invigoration score and the punishment-based suppression score. This analysis revealed a strong positive correlation between the degree to which reward invigorates action and punishment suppresses action (Supplementary Figure S2; r = .753, t(59): 8.794, p = 2.522e-12). This relationship remained significant even after performing a partial correlation to control for age (r = .763, t(59) = 8.983, p = 1.414e-12), suggesting that the relationship between these components of Pavlovian bias does not exhibit age-related change.

Posterior Predictions
Data simulated from each model were used to predict accuracy on the task. Individual subject parameter estimates from each model were used to simulate the choice data, from which accuracy was calculated for each age group and trial type. Model 8, which provided the best fit and includes Go and Pavlovian biases as well as a reinforcement sensitivity parameter, closely approximated participant behavior (Supplementary Figure S4).

Parameter Recovery
As the free parameters from the best-fitting model exhibited some degree of covariance between parameters (Supplementary Table S3), we sought to test if these parameters were recoverable. Parameter recovery was performed by simulating 10,000 sets of parameter values for the best-fitting model (Model 8) by bootstrapping the parameter estimates that were derived from participant choice behavior. These parameters were used to generate choice data that were then fit using Model 8 to see how well the simulated parameters could be recovered. We used Spearman correlation to assess the relation between the simulated and recovered values. All parameters could be recovered well, reflected in the high correlation coefficients (learning rate: ρ(9998) = .774; lapse rate ρ(9998) = .718; Go bias ρ(9998) = .811; Pavlovian bias ρ(9998) = .773; Reinforcement sensitivity term ρ(9998) = .661; all p's < .001).