Introduction

Attention-deficit/hyperactivity disorder (ADHD) is characterized by symptoms of inattention, hyperactivity, and impulsivity that negatively affect psychosocial functioning, education, and self-esteem [1,2,3]. While ADHD is typically considered to be a childhood-onset disorder that can persist into adulthood [4], ADHD may also have an adult onset [5] and ultimately affects ~2.5% of the U.S. adult population [6]. More than 90% of adults with ADHD report moderate to severe lifetime impairment related to their ADHD symptoms [7].

ADHD is known to cause cognitive problems in measures of response inhibition, vigilance, and working memory [8,9,10]. For example, individuals with ADHD tend to show more reaction time variability and make more omission errors on sustained attention tests, suggesting impaired vigilance [9]. ADHD has also been associated with altered motivation and reward sensitivity, such as a preference for immediate over delayed rewards [11,12,13,14]. In particular, reinforcement contingencies have a stronger effect on improving attentional performance among individuals with ADHD, suggesting low levels of intrinsic motivation and/or elevated reward thresholds [15].

Both cognitive performance and motivated behavior have been linked to catecholamine signaling, primarily dopamine (DA), in the mesocorticolimbic brain pathway [16]. DA plays a role in responding to cues that predict reward, which motivates exploration or exploitation in search of that reward; DA also modulates flexible cognitive control processes that are sensitive to changes in motivation [16]. ADHD is associated with lower DA transporter availability and lower D2/D3 receptor densities in the mesocorticolimbic pathway [17,18,19]. Methylphenidate (MPH), a common psychostimulant prescribed for ADHD, blocks the reuptake of DA and norepinephrine (NE) [20] and improves ADHD symptoms and cognitive deficits [21,22,23,24].

DA and NE are also believed to modulate the explore/exploit trade-off, which is the decision between choosing a familiar option with the highest expected reward value or choosing an unfamiliar option with an unknown or uncertain reward value [25]. Exploration provides useful information about the environment, but exploitation maximizes rewards, thus a flexible balance between exploration and exploitation is needed for advantageous performance [26,27,28]. This trade-off is particularly relevant to ADHD, since several neurobiological theories of ADHD predict alterations in explore/exploit decision making [29]. In general, they predict that ADHD would be associated with less reward-driven and more exploratory decisions, usually as a result of faster reaction times and more impulsive/random selections [12, 29,30,31,32]. Explore/exploit decisions can be measured with an n-armed bandit task. Choices on the bandit task are often modeled using reinforcement learning, which summarizes playersā€™ strategies in a small number of parameters such a learning rate. Learning rates represent how quickly expected reward value is updated, and these rates can vary depending on environmental stability. In a rapidly changing environment, reward values should be updated quickly resulting in higher learning rates and more exploration. Such models are thus useful tools for understanding mechanisms that underlie normal and abnormal decision making.

Despite an abundance of theoretical models proposing abnormal reinforcement learning as part of the etiology of ADHD [12, 13, 15, 30,31,32,33], relatively little empirical research has been conducted. One study reported that adolescents with ADHD made more exploratory choices, which was not related to differences in learning rates or random selections [34]. Another study reported that adults with ADHD made more novel choices (i.e., preferred previously unseen options) and had lower learning rates than controls, and DAergic medication improved ADHD performance and increased their learning rates [35]. Here, we tested the DA/NE modulation of explore/exploit decisions in relation to ADHD status. Medication-free adults with and without ADHD completed a 6-armed bandit task (6ABT) at baseline and after a single dose of MPH (40ā€‰mg) or placebo in counter-balanced order. We hypothesized that ADHD participants would make more exploratory decisions than controls, and that MPH would reduce group differences.

Methods

Participants

Participants were recruited from the Durham, North Carolina (nā€‰=ā€‰12 ADHD, 9 control) and Little Rock, Arkansas (nā€‰=ā€‰14 ADHD, 14 control) communities via social media, flyers, and word-of-mouth. Participants completed a phone interview and in-person screening session to determine eligibility. Eligible participants were between the ages of 18ā€“45 years and were not currently taking stimulant medications. To be eligible, ADHD participants had to have T-scoresā€‰ā‰„ā€‰65 for inattentive and/or hyperactive-impulsive symptoms on the Connersā€™ Adult ADHD Rating Scale (CAARS) [36], and were evaluated to meet criteria for a primary diagnosis of ADHD based on the Connersā€™ Adult ADHD Diagnostic Interview for DSM-IV [37]. Controls had to have CAARS T-scoresā€‰<ā€‰55 for inattentive, hyperactive-impulsive, and total symptoms.

Participants were excluded if they reported serious health problems (e.g., uncontrolled cardiovascular disease) or neurological problems (e.g., seizure disorder or traumatic brain injury), met criteria for a psychiatric disorder other than ADHD (except for symptoms of depression or anxiety co-morbid with ADHD) based on the MINI International Neuropsychiatric Interview [38], reported drug or alcohol dependence in the past 12 months (other than tobacco), reported daily use of medication for ADHD in the past 6 months, had hypertension (i.e., blood pressureā€‰>ā€‰140/90ā€‰mmHg), or had contraindications for MPH (e.g., motor tics). Participants were also excluded if they tested positive for drugs (iCup, Alere Toxicology Services Portsmouth, VA), alcohol (Alco-Sensor III, Intoximeters Inc St. Louis, MO), or pregnancy (QuickVue+, Quidel Corporation San Diego, CA).

Seventy-nine individuals were consented and screened to participate in the study, and 28 participants were ineligible because they did not meet ADHD/control criteria (nā€‰=ā€‰11), had hypertension (nā€‰=ā€‰6), had a positive drug screen (nā€‰=ā€‰4) had another Axis I diagnosis (nā€‰=ā€‰3), or withdrew before the study day (nā€‰=ā€‰4). Of the 51 participants that met eligibility criteria and began the study, 49 participants completed the baseline 6ABT and were included in the data analysis. Participants provided written informed consent and this protocol was approved by Duke Universityā€™s and University of Arkansas for Medical Sciencesā€™ Institutional Review Boards.

6-Armed bandit task (6ABT)

This version of the ā€œrestless banditā€ task was adapted from previous studies [39,40,41,42,43,44,45] and has been published previously [46]. On each trial, six bandit options were depicted on a computer screen and participants selected one to play by pressing a corresponding number on the keypad. Following the selection, the number of points awarded was displayed on the screen for 500ā€‰ms. The number of points paid off by each option gradually changed from trial to trial, independently of other bandit options. See Fig.Ā 1. The point values were calculated as follows: bandit options began with an initial point-value of 50 on the first trial and subsequent values were drawn from a Gaussian distribution with a standard deviation (Ļƒ) = 2.8 around a moving mean and rounded to the nearest integer. Point values were randomly adjusted according to a biased random walk,

$$r_{i,t + 1} = \lambda \left( {r_{i,t} - \theta } \right) + \theta + \eta,$$
(1)

where ri,t is the reward value of the ith target on trial t, Īø is the asymptotic mean reward value (equal to 50), and Ī» is a central tendency parameter that represents the tendency of r to drift back toward Īø. Ī· is a Gaussian random variable with mean zero and standard deviation Ļƒ. We used parameter values of Ī»ā€‰=ā€‰0.015 and Ļƒā€‰=ā€‰2.8 since this yielded payouts variable enough to encourage exploration and a low likelihood that a single option would remain most profitable for the entire task. The number of points awarded by option i on trial t was allowed to range between 0 and 100 (the resulting range was āˆ’4 to 105). A single version of the task was administered to all participants, and all participants received the same pattern of point values.

Fig. 1: The 6-Armed bandit task.
figure 1

a. Trial structure of the 6-armed bandit task. On each trial, six bandit options were displayed. A number pad was used to select a single bandit and the selected bandit was outlined in white. Then, the reward value of the selected bandit on that trial was displayed onscreen. b. The hidden reward values during the first 20 trials of the 6-armed bandit task. On the y-axis are the values of each of the 6 bandit options per trial. Each bandit option began with an initial value of 50, and values for subsequent trials were randomly adjusted by a biased random walk. Only when a bandit is selected by the player is the banditā€™s value revealed. The selections made by a single control participantĀ are shown as black squares.

Participants were told the goal of this task was to earn as many points as possible. Each time they played the task, participants could earn up to an additional $5 based on the ratio of the number of points they earned to the total number of points possible (up to $15 total). Each task lasted approximatelyĀ 15ā€‰min and consisted of 900 trials. The bandit task was programmed in Matlab (MathWorks, Inc. Natick, MA) using the Psychophysics Toolbox [47].

Procedure

After consenting and eligibility evaluation, participants completed a baseline 6ABT. Then, participants were scheduled for 2 more study visits. These visits occurred within 2 weeks of each other but were at least 48ā€‰h apart. For each participant, both study visits occurred either in the afternoon or the morning. Participants were instructed to skip the meal prior to the study visit (i.e., either breakfast or lunch). Participants were administered either immediate-release methylphenidate (MPH: 40ā€‰mg) or a matching placebo (PLA) under double-blind conditions and in counter-balanced order. Drugs were ordered and compounded through a pharmacy, and the placebo consisted of lactose. After drug administration, participants were given two cereal bars, a fruit cup, and 8ā€‰oz of water and rested for 1ā€‰h to allow for drug absorption. The study visit lasted for a total of 3ā€‰h, and the 6ABT was completed approximatelyĀ 2ā€‰h after drug administration. At the end of the visit, participants rated to what extent they felt a drug effect on a scale from 1 (not at all) to 10 (extremely). The protocol included other tasks and questionnaires, which have been described previously [48].

Modeling of the bandit task

Choices made in the bandit task were classified as exploratory or exploitative according to model-based account of participantsā€™ individual choices (previously described in [39, 44, 46, 49]). Four reinforcement learning models, which each calculate the estimated bandit option pay-offs differently, were initially fit to the participantsā€™ data and compared using the Bayesian Information Criterion (BIC). The BIC is a test of the efficiency of the reinforcing learning model for predicting the data (smaller values represent better fit). The results from the best fitting model are reported here; seeĀ Supplementary information for a description of the other three models. On each trial, selection of the option with highest expected value (based on previously seen options) was coded as exploitative, all other choices as exploratory.

As in previous studies, the best fitting model valued the bandit options according to a softmax rule and Kalman filter [39, 50]. The softmax rule describes how individuals select among multiple options, specifically, how individuals choose bandit options probabilistically based on their expected reward values:

$$P\left( {i|\beta ,Q_i} \right) = \frac{{e^{\beta Q_i}}}{{\mathop {\sum }\nolimits_j e^{\beta Q_j}}},$$
(2)

where P(i|Ī²,Qi) is the probability of choosing option i, and Ī² is a so-called softmax decision temperature parameter. A lower value of Ī² typically leads to a higher percentage of explore decisions. The Kalman filter [50] is a Bayes-optimal filtering process used to predict the values of options available for future selection based on the values of options previously chosen. Here, the posterior probability estimates for the option values took the form of normal distributions with mean and variance for all options updated each trial according to a drift rule:

$$\mu _i \leftarrow \left( {1 - \zeta } \right)\mu _i + \zeta \theta,$$
(3)
$$\sigma _i^2 \leftarrow \left( {1 - \zeta } \right)^2\sigma _i^2 + D^2,$$
(4)

where Ī¼i and Ļƒi are the mean and standard deviation of the previous estimate of each optionā€™s value, Ī¶ is a central tendency of options to drift toward an asymptotic mean reward value, Īø, and D reflects the growing variance in an unchosen optionā€™s value over time due to drift. Due to random changes in the option values over time, uncertainties of unchosen options grow each trial, and mean values decay slowly back toward a subject-specific asymptotic value. Note that participants did not know the true value of the central tendency, so Ī¶ā€‰ā‰ ā€‰Ī» in general. In addition, for the chosen option, we calculated learning parameters as follows:

$$\delta _i = r - \mu _i,$$
(5)
$$\alpha _i = \frac{{\sigma _i^2}}{{\sigma _i^2 + \sigma _0^2}}.$$
(6)

With r the outcome on the current trial, Ī¼i the mean of the chosen option, and Ļƒ0 the previous standard deviation of the option. As usual, Ī“ is the reward prediction error and Ī± the learning rate, used to update the chosen target value according to

$$\mu _i \leftarrow \mu _i + \alpha _i\delta _i,$$
(7)
$$\sigma _i^2 \leftarrow \left( {1 - \alpha _i} \right)\sigma _i^2.$$
(8)

As a result, each trial yields a single Ī“ and Ī±, along with vectors Ī¼ and Ļƒ. The learning rate is the rate at which values of the options are updated (i.e., the sensitivity to the most recent reward value of each bandit option). Learning rates are higher in more variable environments and typically positively correlate with exploration.

Data analysis

Participant demographic data and CAARS T-scores were analyzed using independent-samples t-tests and Chi-Square tests. Age tended to negatively correlate with the percentage of exploratory decisions (e.g., during the baseline performance rā€‰=ā€‰āˆ’0.280, pā€‰=ā€‰0.051). Age was included as a covariate of no interest in all subsequent analyses due to between-group differences.

The 6ABT consisted of 900 trials. The main dependent variable was the percentage of trials coded as ā€œexploratory.ā€ Average reaction time, within-subject reaction time variability (i.e., standard deviation), and reward points (percentage of total points possible) were measured. In addition, we explored two other trial-to-trial variables to investigate qualitative differences in bandit performance between groups, the softmax decision temperature parameter and the learning rate.

6ABT variables were natural-log (LN) transformed to adjust for non-normal distributions and analyzed using univariate analysis of covariance (ANCOVA) (controlling for age) and 2 (drug)ā€‰Ć—ā€‰2 (group) repeated-measures ANCOVA (controlling for age). Associations between 6ABT exploratory choices and CAARS scores were performed using multiple regression, controlling for age. 6ABT data were missing from one ADHD and one control participant during drug administration study days. Drug effect self-report data was missing from one non-ADHD participant. All analyses were performed with SPSS (Chicago, IL) with alpha set to 0.05.

Results

Participants

A total of 26 ADHD (14 men) and 23 controls (10 men) were included in the analysis. Participant demographics are shown in TableĀ 1. Groups did not differ in sex ratio or years of education. ADHD participants were older (independent-samples t-test t(47)ā€‰=ā€‰2.2, pā€‰=ā€‰0.034). As expected, ADHD participants had greater CAARS T-scores for inattentive symptoms (t(47)ā€‰=ā€‰20.8, pā€‰<ā€‰0.001), hyperactivity symptoms (t(47)ā€‰=ā€‰14.8, pā€‰<ā€‰0.001), and DSM ADHD score (t(47)ā€‰=ā€‰21.3, pā€‰<ā€‰0.001).

Table 1 Participant demographics for ADHD and control groups, meanā€‰Ā±ā€‰standard deviation.

There was a trend towards participants feeling a greater drug effect after MPH (meanā€‰Ā±ā€‰standard deviation: 4.8ā€‰Ā±ā€‰2.9) compared to PLA (1.8ā€‰Ā±ā€‰1.5) (F(1,45)ā€‰=ā€‰3.8, pā€‰=ā€‰0.057). There were no significant group or drugā€‰Ć—ā€‰group interaction effects for self-reported drug effect.

6ABT performance

Baseline performance

BIC values for the best fitting reinforcement learning model were larger for ADHD (estimated marginal meanā€‰Ā±ā€‰standard error: 1390ā€‰Ā±ā€‰119) than for controls (905ā€‰Ā±ā€‰127), indicating a better model fit for controls (between-group effect: F(1,46)ā€‰=ā€‰7.4, pā€‰=ā€‰0.009, partial ʞ2ā€‰=ā€‰0.139) and greater unexplained variance among ADHD.

ADHD participants made more exploratory choices than controls across all task blocks (between-group effect: F(1,46)ā€‰=ā€‰4.8, pā€‰=ā€‰0.034, partial ʞ2ā€‰=ā€‰0.094). ADHD earned fewer points (F(1,46)ā€‰=ā€‰7.8, pā€‰=ā€‰0.008, partial ʞ2ā€‰=ā€‰0.145) and had lower learning rates (F(1,46)ā€‰=ā€‰7.8, pā€‰=ā€‰0.008, partial ʞ2ā€‰=ā€‰0.145) compared to controls. There were no other significant differences in performance measures. See Fig.Ā 2 for illustration, TableĀ 2a for means and standard errors, and Supplementary Fig.Ā S2 for box plots of baseline 6ABT performance data. A summary of the parameters from the best fitting softmax rule and Kalman filter model is shown in Supplementary TableĀ S1.

Fig. 2: Overall percent exploratory choices (estimated marginal means) for ADHD and controls across baseline, placebo (PLA) and methylphenidate (MPH) administration.
figure 2

At baseline and across the two drug administration conditions, ADHD made more exploratory choices than controls (pā€™sā€‰<ā€‰0.05). Error bars are standard error of the mean.

Table 2 a. 6-Armed Bandit Task performance data at baseline for ADHD and control participants. ANCOVA analyses were performed on natural-log transformed data controlling for age. Shown in the table are antilog values of the estimated marginal meansā€‰Ā±ā€‰standard error (lower to upper 95% confidence intervals). b. 6-Armed Bandit Task performance data across placebo (PLA) and methylphenidate (MPH) administration for ADHD and control participants. ANCOVA analyses were performed on natural-log transformed data controlling for age. Shown in the table are antilog values of the estimated marginal meansā€‰Ā±ā€‰standard error (lower to upper 95% confidence intervals).

Methylphenidate versus placebo

Across the two study days, ADHD made more exploratory choices than controls across both drug conditions (between-group effect: F(1,44)ā€‰=ā€‰4.8, pā€‰=ā€‰0.034, partial ʞ2ā€‰=ā€‰0.098). See Fig.Ā 2 and TableĀ 2b. There were no significant drug or interaction effects on exploratory choices.

Reaction times were faster after MPH than PLA (drug effect: F(1,44)ā€‰=ā€‰4.8, pā€‰=ā€‰0.034, partial ʞ2ā€‰=ā€‰0.098) and showed a drugā€‰Ć—ā€‰group interaction (F(1,44)ā€‰=ā€‰4.2, pā€‰=ā€‰0.046, partial ʞ2ā€‰=ā€‰0.087). Follow-up analyses showed a trend towards controls having slower reaction times after MPH (drug effect: F(1,20)ā€‰=ā€‰3.8, pā€‰=ā€‰0.064), but no significant effect among ADHD. There was no significant between-group effect on reaction time.

Across both study days, ADHD had greater reaction time variability compared to controls (between-group effect: F(1,44) = 4.7, pā€‰=ā€‰0.036, partial ʞ2ā€‰=ā€‰0.097). ADHD also earned fewer points compared to controls (between-group effect: F(1,44)ā€‰=ā€‰8.6, pā€‰=ā€‰0.005, partial ʞ2ā€‰=ā€‰0.163). There were no other significant differences in performance measures.

Associations between ADHD symptoms and 6ABT performance

In a multiple regression model including CAARS hyperactive T-scores, inattentive T-scores, and age as predictor variables, baseline 6ABT percent exploratory choices (LN transformed data) positively associated with hyperactive T-scores from both ADHD and control participants (Ī²ā€‰=ā€‰0.031, standard errorā€‰=ā€‰0.013, pā€‰=ā€‰0.019). The association between 6ABT percent exploratory choices and inattentive T-scores was not significant (Ī²ā€‰=ā€‰āˆ’0.015, standard errorā€‰=ā€‰0.011, pā€‰>ā€‰0.1), see Fig.Ā 3. Within each group, the beta coefficients between exploratory choices and hyperactive T-scores were relatively greater than with inattentive T-scores, although associations were not significant within the smaller samples (ADHD hyperactive T-scores Ī²ā€‰=ā€‰0.030, standard errorā€‰=ā€‰0.016, pā€‰=ā€‰0.074, inattentive T-scores Ī²ā€‰=ā€‰āˆ’0.015, standard error = 0.018, pā€‰>ā€‰0.4; control hyperactive T-scores Ī²ā€‰=ā€‰0.048, standard errorā€‰=ā€‰0.031, pā€‰>ā€‰0.1, inattentive T-scores Ī²ā€‰=ā€‰āˆ’0.031, standard errorā€‰=ā€‰0.029, pā€‰>ā€‰0.2).

Fig. 3: Scatterplot between CAARS hyperactive T-score (raw data values) and baseline 6ABT percent exploratory choices (LN transformed data).
figure 3

Multiple regression analysis indicated a significant association between hyperactive T-scores and exploratory choices, controlling for age and inattentive T-scores (Ī²ā€‰=ā€‰0.031, standard errorā€‰=ā€‰0.013, pā€‰=ā€‰0.019).

Discussion

In summary, non-medicated adults with and without ADHD completed the 6ABT, a computerized measure of explore/exploit decision making, at baseline and then after methylphenidate (MPH, 40ā€‰mg) and PLA on separate occasions. In support of our first hypothesis, ADHD participants made more exploratory choices and earned fewer points. Across all participants, the number of exploratory choices positively associated with hyperactivity symptoms. These results are consistent with theoretical models of increased exploratory decisions in ADHD [12, 29,30,31,32,33]. Contrary to our second hypothesis, MPH did not affect exploratory choices. ADHD participants continued to make more exploratory choices and earned fewer points than controls in both drug administration sessions. The results of the present study show that individuals with ADHD consistently explore low-value options at the expense of maximizing their rewards. The inability to suppress actions with little to no reward value may be a key feature of hyperactive ADHD symptoms. The lack of an MPH effect is consistent with other studies showing no effects of MPH on some higher-order cognitive processes [51, 52]. For example, MPH reliably improves task performance on measures of eye movement control, attention/vigilance, and inhibitory control, but MPH is less effective on measures of working memory/divided attention, potentially because multiple cognitive processes are engaged and MPH-modulation of DA/NE signaling has less direct influence over these processes [51].

While many theoretical models have proposed increased exploratory decisions in ADHD, potential explanations have varied. These results challenge some straightforward explanations. For example, ADHD participants may make faster, more impulsive decisions and have more lapses in attention, resulting in shorter reaction times and more reaction time variability [53, 54]. Ultimately, this can manifest as less reward-driven/more exploratory decisions [12, 29, 31]. However, we report similar reaction times and reaction time variability between groups at baseline, in spite of differences in exploratory choices. This suggests that cognitive processing times and attentional performance were similar between groups and did not contribute to differences in explore/exploit decisions.

The advantage of reinforcement learning models is that they can provide insight into decision-making sub-functions and reveal impairments not always evident in gross behavioral measures, such as reaction time. The reinforcement learning model used here consists of two complementary components, a Kalman filter that describes how the expected values of bandit options are updated based on the experienced reward history, and a softmax rule that describes how options are selected. The equations for these components provide sub-function values that can inform how differences in explore/exploit decision making may occur.

The learning rate is a sub-function of the Kalman filter and is the rate at which values of the bandit options are updated. Higher learning rates result in fast learning from recent experience, but also fast forgetting. Lower learning rates lead to slow adaptation, but also less influence from random variations in feedback. According to Ziegler et al., several neurobiological models of ADHD predict lower learning rates for rewards [29,30,31]. At baseline, ADHD participants had significantly lower learning rates, which is surprising since higher learning rates tend to associate with more exploratory decisions. However, if exploration occurs in a way that does not track value, the learning rate will be lower and the model will perform more smoothing of the variability in option selection. These group differences in learning rate disappeared in the subsequent drug administration sessions. Previous computational studies have shown mixed effects of ADHD status on learning rate. Sethi et al. recently reported that, in a placebo session, adults with ADHD had lower learning rates, earned fewer points, and made more novel selections on a 3-armed bandit task compared to controls [35]. Conversely, Hauser et al. reported no differences in learning rate, although adolescents with ADHD were more exploratory during a probabilistic reversal learning task [34]. Similar learning rates indicate that ADHD participants learned the reward contingencies and provides more evidence that increased exploratory decisions is not simply more random selections [34]. Altogether, this suggests that differences in learning rates may not have caused the differences in explore/exploit decisions.

The decision temperature parameter is aĀ sub-function of the softmax rule that influences whether a choice will be coded as explore or exploit. In the model used here, lower temperature values are expected to correspond with more exploratory decisions. Many neurobiological models predict different temperature values and more exploratory decisions among individuals with ADHD [12, 29,30,31,32]. Unexpectedly, we report similar temperature values at baseline, and nonsignificantly lower values during the drug administration sessions among ADHD. Similarly, Sethi et al. reported that ADHD participants on placebo had nonsignificantly lower temperature values [35]. In contrast, Hauser et al. reported that differences in the decision temperature parameter accounted for more exploratory decisions among adolescents with ADHD [34].

It is unclear why the expected relationships between reinforcement learning model sub-functions and explore/exploit decisions did not occur across groups. It may be that decision rules based on typical goal-directed decisions do not fully explain the atypical decisions shown in ADHD. ADHD participants had more unexplained variance in their choices, which potentially decreased the signal-to-noise ratio resulting in a lower learning rate. This may be why the ADHD participants had a significantly larger BIC, indicating the model did not fit their data as well as the control participants. One potential explanation for the unexplained variance in ADHD choices is that ADHD is akin to being in a low gain state [34, 55, 56]. Gain refers to the degree to which the salience of specific information in the environment can be enhanced and acted on immediately or be suppressed and acted on later. In a low gain state, no single bit of information dominates and less important information is monitored and acted on, which can lead to more variable, exploratory decisions. This is a recognized phenomenon in ADHD, and the CAARS queries this as ā€œSometimes my attention narrows so much that Iā€™m oblivious to everything else; other times itā€™s so broad that everything distracts meā€ [36]. Anecdotally, individuals with ADHD are less able to allocate their attention appropriately and may be hyper focused on one activity and struggle to maintain attention on another activity. Not being able to suppress the salience of less valuable, alternate bandit options while exploiting the most valuable option may be a reasonable explanation for increased exploratory decisions. This may also explain the significant association between exploratory choices and hyperactive ADHD symptoms.

We had hypothesized that MPH would reduce group differences in exploratory choices based on extensive preclinical and clinical evidence that DA and NE modulate explore/exploit decisions and reinforcement learning sub-functions (reviewed in [57]). For example, elevated tonic DA levels in DA-transporter knock-down mice are associated with smaller decision temperature values, indicating more exploratory decisions [58], and reduced DA transmission in Parkinsonā€™s disease has been associated with lower learning rates, which are increased by DAergic medications [59]. However, very few human-subject studies have tested the catecholaminergic modulation of explore/exploit decisions. A recent study administered a 4-armed bandit task to healthy men after L-dopa (a DA precursor), haloperidol (a DA antagonist), and placebo [60]. Compared to placebo, L-dopa reduced uncertainty-based exploration (i.e., trials where the option with the highest exploration bonus was chosen); whereas haloperidol had no effect [60]. Another study administered a 3-armed bandit task with a novelty manipulation to adults with and without ADHD [35]. DAergic medication increased points earned and learning rates in ADHD compared to controls, but there was no group or drug effect on decision temperature values [35]. The NE system has also been implicated in explore/exploit decision making [49], however, administration of a NE reuptake inhibitor did not increase exploratory decisions as hypothesized [45]. More research is needed to understand these inconsistencies in the context of phasic versus tonic catecholamine signaling, and striatal versus prefrontal regulation of explore/exploit decisions.

The strengths of this study include a placebo-controlled, counterbalanced design. Several limitations include the group differences in age and the lack of a validated measure of real-world exploratory decisions or personality traits. We used a dose of MPH that our team has administered previously to adults with and without ADHD, and which has shown to produce behavioral effects [61,62,63]. However, 40ā€‰mg MPH is a relatively large dose among treatment-naĆÆve individuals. MPH has different effects on behavior at small, medium and large doses and future studies should compare the effects across doses. Lastly, given the racial disparities in ADHD diagnosis and treatment, future studies should pay special attention to the recruitment of minorities.

In support of several theoretical models of ADHD [12, 29,30,31,32], these results indicate that adults with ADHD make more exploratory decisions at the expense of maximizing rewards. Future studies could investigate whether this increased exploratory decision making is related to striatal or prefrontal function using neuroimaging. These results have clinical implications. Reinforcement learning models can help elucidate higher-order cognitive impairments and provide a more nuanced explanation of symptoms. In particular, the processes that underlie exploratory decisions on the 6ABT may be driving hyperactive symptoms, and a better understanding of such processes could help guide therapy. For instance, clinicians may want to be especially attuned to the decision-making capabilities of their patients with greater levels of hyperactivity. In addition, new therapeutic methods that emphasize top-down regulation of attention and conflict detection could be useful in reducing this particular impairment [64].

Funding and disclosure

This work was supported by the Brain and Behavior Research Foundation (Grant #23703, PI: Addicott). MDW reports nonfinancial support from Purdue Pharma CA, personal fees and nonfinancial support from Takeda, personal fees and nonfinancial support from Global Medical Education, personal fees from Huron Consulting, personal fees and nonfinancial support from Rhodes, personal fees and nonfinancial support from MHS, personal fees from Mundipharma, personal fees from Johns Hopkins University Press, nonfinancial support from Eunethydis, nonfinancial support from World Federation of ADHD, nonfinancial support from Israeli Foundation for ADHD, nonfinancial support from Canadian Attentio Deficit Resource Alliance, nonfinancial support from American Professional Association for ADHD, nonfinancial support from Purdue Pharma US, personal fees and nonfinancial support from Cingulate, nonfinancial support from Children and Adults with ADD, personal fees from Boston Childrenā€™s Hospital, and personal fees from Tris. Other authors declare that they have no conflict of interest pertaining to this manuscript.