Introduction

Substance-use disorders (SUDs) affect hundreds of millions of individuals and are responsible for a substantial global burden of disease1. To improve translational research, as well as treatment and prevention, researchers and clinicians need a better understanding of the underlying neurocognitive mechanisms of SUDs2. There is also a need for better brain-based biomarkers to study the course and treatment response in SUDs3. A powerful method for investigating brain function among people with SUDs is task-based functional magnetic resonance imaging (fMRI) of drug cue reactivity (FDCR) paradigms4. In FDCR studies, subjects are exposed to drug-associated cues in one or more sensory modalities while undergoing fMRI. fMRI cue-reactivity paradigms are popular among researchers, and on the basis of a systematic review, 370 published studies (through April 30, 2021) have used this paradigm (based on a database available at ref. 5). The results of these studies can help in understanding the neurobiology of SUDs, diagnostic classification of people with SUDs, discovering intervention targets, understanding the temporal evolution of the disease process, and monitoring the effectiveness of treatments and treatment outcomes; for more details, see refs. 6,7,8. An overview of typical procedures in an FDCR study is presented in Fig. 1.

Fig. 1: Schematic representation of key reportable aspects of an fMRI drug cue reactivity study.
figure 1

1. Participants are recruited on the basis of explicit criteria, and baseline data are collected on participant demographics, handedness, psychiatric history and substance use history. 2. Participants undergo fMRI scanning with carefully selected hardware and software parameters, and data are analyzed through specified preprocessing and analysis pipelines for statistical inference. 3. Participants engage with drug and neutral cues during fMRI scanning, with cues of specified durations presented in events and/or blocks with a chosen temporal architecture. 4. These cues stimulate one or more sensory modalities and are typically matched in terms of psychological characteristics, such as induced arousal or valence, and/or physical characteristics, such as saturation and hue for pictorial cues. 5. and 6. Participants provide craving self-reports outside and/or inside the scanner, using various short and long-form instruments and hardware such as response boxes or joysticks. 7. In addition to pre-scanning sources of between-study variance such as task instructions and scanner familiarization, there are important post-scanning safety procedures such as craving-management interventions and additional assessments before participants leave the imaging center.

Despite the promising results of FDCR studies, the field has been plagued by important limitations. Most studies are cross-sectional5 rather than longitudinal, which means that it is difficult to get information about cue-induced circuitry changes associated with the many factors that influence drug cue reactivity. In common with other fMRI research, the FDCR literature also suffers from small sample sizes and insufficient power9,10. All fMRI experiments can be influenced by random noise that affects study results11. It has also been suggested that the low reproducibility of task-based fMRI studies, in general12, might be due to a combination of methodological factors, which, if addressed, could improve reproducibility13,14. Issues complicating the picture are the sheer methodological complexity of FDCR and researcher discretion in the specification of hypotheses, participant recruitment, FDCR task design, choice of fMRI hardware, analysis pipelines and more. Unless these choices are explicitly and consistently reported across studies, unknown methodological heterogeneities can limit rigor and reproducibility. In turn, this will hinder knowledge production and clinical translation by undermining generalizability and the ability to optimally conduct comparative reports and meta-analyses7.

There are many sources of potentially significant methodological heterogeneity that probably affect FDCR results, including participants’ characteristics, types of cues, durations of cue exposure and analysis methods, such that the field would benefit from the establishment of best/standardized practices for methods reporting to inform the generalizability of specific FDCR study outcomes and guide future research.

There are multiple ways to achieve greater clarity, interpretability and replicability across FDCR studies. They include the following:

  1. 1

    Preregistered replicable protocols. Study protocols define the structure of a study and can include the sequence of different imaging sessions, data acquisition settings and other methodological details15,16.

  2. 2

    Published drug cue databases. Drug cues in FDCR studies can be validated and standardized in terms of their average effects on arousal and valence, including affect and craving, and activations in relevant brain areas/networks. They can also be matched to control stimuli in multiple respects. One way of achieving this goal would be the sharing and utilization of standardized cue databases17,18,19,20,21. For example, the first openly accessible database with 360 cues is a recently validated methamphetamine and opioid cue database19.

  3. 3

    Data-analysis guides and pre-registered and standardized analysis pipelines. Preprocessing and analysis pipelines have significant effects on fMRI study results22. Researchers can use credible recommendations (e.g., by the Committee on Best Practice in Data Analysis and Sharing (COBIDAS)23). Pre-registration and open sharing of pipelines would also help in this regard, and moving towards consistent software and toolboxes is recommended24.

  4. 4

    Extant checklists. Many itemized checklists and recommendations have been developed to address different elements of research design and reporting in fMRI studies in general, with differing degrees of specificity (e.g., see refs. 25,26,27,28,29,30,31,32,33,34,35). Regarding fMRI analysis specifically, the COBIDAS proposes a checklist with the goal of enhancing the reporting of MRI studies23. However, no checklist with clear recommendations for FDCR research design and reporting exists.

Most authoritative research checklists and guidelines represent consortium efforts. This expert consensus development helps to elucidate the research process and its various aspects and clarify opinion on the importance of these aspects. Furthermore, consortium involvement substantiates the claim of the checklist to represent a diversity of opinions in the field36. One of the most common methods of achieving expert consensus is the Delphi technique. In the Delphi process, experts in the field approach consensus on a matter by participating in a series of commenting and/or item rating rounds with feedback37. An example of the use of this method in addiction sciences is a 2019 study to determine the significance of Research Domain Criteria (RDoC) in addiction medicine38.

The purpose of the present study was to develop and validate an itemized checklist of methodological parameters for FDCR researchers to use to clarify methods in future studies. The checklist would include items that are most important in study design and reporting to facilitate the interpretation of study results and data sharing, enable future meta-analyses, increase replicability and validity and improve the transparency of FDCR studies37. Using the Delphi consensus technique, we aimed to develop this checklist through an international consensus of FDCR experts. Furthermore, this paper represents the views of experts who participated in the Delphi process, exploring why and how various categories within the checklist affect FDCR research. It should be specifically noted that this checklist does not aim at prescribing the specific methods used in the design of FDCR studies. Instead, it is meant to help researchers explicitly consider and report various study design parameters that may importantly affect the results of their study, and report these methodological decisions when designing and reporting the results of FDCR research.

Methods

Scope of the checklist

The items included in the checklist were predominantly those identified as being methods parameters that are specific to FDCR studies, such as sensory modality of cues. This checklist was developed to act as a standalone tool for describing methodological details considered to influence results of FDCR studies. The authors also detailed additional recommendations for each item that should be considered to increase the quality of reporting. The checklist can be used to increase transparency, support replicability, improve quality of data acquisition, facilitate future data sharing between laboratories and make increasingly sophisticated meta-analyses possible.

Contributors

The contributions to this project were organized on two levels: a steering committee (SC) and a larger expert panel (EP). This method was chosen because it enables a small and collaborative group of leaders to flexibly and rapidly make decisions and resolve conflicts within the SC and lead the project to fruition. This approach also ensured that the voices of a much broader and more diverse group of international experts meaningfully affect the consensus process.

Steering committee

The SC consisted of 14 individuals: Anna Rose Childress, Hamed Ekhtiari, Rita Goldstein, Andreas Heinz, Amy Janes, Jane Joseph, Hedy Kober, F. Joseph McClernon, Martin Paulus, Lara Ray, Rajita Sinha, Elliot Stein, Reagan Wetherill and Anna Zilverstand. This group grew out of the Enhanced NeuroImaging Genetics through Meta-Analyses (ENIGMA) Addiction working group (https://www.enigmaaddictionconsortium.com) after a series of meetings in which substantial heterogeneity in FDCR studies, poor reporting of methods (insufficient for replication) and disagreements over the importance of various methodological parameters were discussed along with strategies to amend the situation. These discussions led to formation of a group called ENIGMA Addiction Cue Reactivity Initiative (ACRI). Furthermore, the initial members of the SC were asked to identify additional members chosen on the basis of their scientific expertise and contributions to the FDCR literature.

The SC members outlined the scope of the Delphi project39 and its important questions, developed and approved the initial checklist of important methodological parameters, processed the comments and revisions and led the authorship of this paper, all based on consensus.

Expert panel

The panel of experts for this Delphi study was chosen primarily on the basis of 318 addiction-related FDCR studies published by the end of 2019, from the database of a systematic review5. The main inclusion criteria were (i) appearing among the authors of at least four papers in the systematic review database and (ii) holding first, last or corresponding authorship position in at least one of the 318 papers. In addition, the members of the SC were asked to nominate candidates in the field of FDCR for inclusion within the EP. All SC members agreed on the list of experts before the invitation process.

All chosen experts received an email briefly outlining the importance, structure and goals of this Delphi study and were asked to state whether they wished to participate. To invite new participants, each candidate was contacted by email, and if there was no answer, two reminders were sent within roughly 2-week intervals. Those who decided to enroll received a further email with more details about how their feedback would be collected and used in the Delphi study, and then they formally entered the Delphi process. A total of 76 EP candidates were contacted by email, 21 did not respond to the email, 6 had incorrect email addresses, 4 explicitly declined to participate and 45 accepted to join the EP. Providing the study participants with information is not necessary for Delphi studies, which did not rely on explicit information or published data37,40. Therefore, in this study, participants were asked to primarily rely on their prior knowledge of FDCR task design and methodology during the Delphi process, although they were provided with the list of the 318 studies included in the aforementioned systematic review, so they could have viewed the relevant articles if needed.

Procedure

A general schematic of the methodology and its various stages is depicted in Fig. 2.

Fig. 2: A schematic of the entire Delphi study methodology.
figure 2

The process has been roughly divided into distinct stages: the selection of the SC (in black) using the results of an earlier mentioned systematic review to choose the initial checklist items and expert committee candidates (in pink), checklist development phase (in red), expert panel selection (in purple), checklist commenting and revision phase (in green), checklist rating phase (in yellow) and data analysis and Delphi process finalization (in blue). The number of contributors to each section is displayed by ‘n =’. To the left of the main graph, an overview of the structure of the checklist at each stage is presented. recom, recommendations.

Checklist development phase

To simplify consensus development and facilitate the process of finalizing a comprehensive but concise list of important methodological aspects of FDCR studies, the SC decided to begin the feedback rounds after developing a basic set of categories, items and their associated recommendations. Each item included one concise point of an aspect in the category in which it appeared (the final list of categories and items are available in Tables 16 in Results). There could also be some additional recommendations associated with each item. This basic structure evolved on the basis of the initial feedback of the SC and a consideration of the methodological parameters commonly observed to be important to the studies included in the aforementioned systematic review. Upon completion, the items in the checklist questionnaire were pilot-tested by rating five randomly selected FDCR papers with Yes/No ratings on whether the item was reported in the paper or not. Using data from the pilot-testing analysis, the SC reworded and/or combined items that could not be easily given a Yes/No rating for inclusion in the revision phase.

Table 1 Items to report and recommendations in the Participants’ Characteristics category (category 1) of the checklist
Table 2 Items to report and recommendations in the General fMRI Information category (category 2) of the checklist
Table 3 Items to report and recommendations in the General Task Information category (category 3) of the checklist
Table 4 Items to report and recommendations in the Cue Information category (category 4) of the checklist
Table 5 Items to report and recommendations in the Craving Assessment categories (categories 5 and 6) of the checklist
Table 6 Items to report and recommendations in the Pre- and Post-Scanning Considerations category (category 7) of the checklist

Checklist revision phase

In the revision phase, 45 EP and 14 SC members were sent the checklist and were asked to add comments and suggest revisions to the existing items and their associated additional recommendations. They were also asked to suggest new items that they feel were overlooked, along with an explanation of why they thought the item should be included. They also were informed that there was no limit to the number of new items they could suggest. 41 members of the EP responded. 10 SC members also added additional comments in this phase. Overall, we reached a response rate of 85% across all participants (EP and SC).

In this revision phase, members of the EP and SC answered a short questionnaire41 assessing their basic demographic information (age, sex, highest academic degree, country of residence and primary affiliation/place of work), primary field of research (e.g., psychiatry, psychology, pharmacology, neuroscience, cognitive science), primary place of work (e.g., university, hospital, business, independent research institute), length of time spent in addiction medicine and length of time spent specifically researching FDCR. These questions were asked to ensure that we included a diverse field of experts (Supplementary Table 1).

Comments for each item were processed by the SC. During processing, repetitive comments were removed, items with unclear meaning were reworded and those outside the scope of the study were removed42 so that a list of clear and unique single-point notes extracted from the comments was obtained.

The notes obtained after the processing of comments were of three kinds: first, proposed changes to an existing item or its associated recommendations; second, adding or removing items; and third, general changes or critiques regarding the checklist. The decisions to apply or reject each note were made by the SC.

The modified version was sent once more to the SC and EP, and the members were asked to comment on the new changes. After receiving and applying their comments, the final version was approved by the SC members.

Checklist rating phase

In the second round, participants from the SC and EP were sent the edited checklist along with the newly added items. The participants were asked to rate each item in terms of importance in the methodology of FDCR studies, from 1 to 5 (87.5% completed the entire survey). The exact question was: ‘To facilitate visibility, replication and data sharing, how important is it to report this item?’. In addition, for each additional recommendation, we asked: ‘Do you support the inclusion of this additional note as a recommendation to be considered in fMRI drug cue reactivity studies?’. Out of 59 members of the SC and EP, 49 (83%) participated in the rating phase.

To avoid a non-neutral center rating and encourage deliberation, ratings were termed ‘not important’, ‘slightly important’, ‘moderately important’, ‘highly important’ and ‘extremely important’. The participants were allowed not to rate an item if they chose not to do so. The inclusion of each additional recommendation for each item could be rated ‘Yes’ or ‘No’.

Data analysis

All statistical analyses were conducted using RStudio (RStudio version 3.4.1). For the rating phase, the average rating and the number of responses were calculated. On the basis of the distribution of the ratings, it was calculated whether items passed either of two importance thresholds. The more-stringent threshold was a rating of 4 or 5 by ≥80% of participants (threshold 2, preregistered43), and the less-stringent threshold was a rating of ≥3 by ≥70% of participants (threshold 1) (dotted lines in Fig. 3). It was decided that items that do not pass the less-stringent threshold would be removed from the checklist, whereas items that pass the less-stringent threshold but not the more-stringent one are included but considered less important than items that pass both thresholds. For additional recommendations, we defined those with a ‘Yes’ rating by >50% of respondents as key ENIGMA ACRI checklist recommendations.

Fig. 3: Ratings for 38 items in seven categories.
figure 3

This figure depicts the rating of 49 raters (11 from the steering committee and 38 from the expert panel) for the checklist items. Each item was rated from 1 to 5 (not important to extremely important). All the items met threshold 1 and were rated as moderately, highly or extremely important by >70% of the raters. In addition, 24 items reached the more-stringent threshold 2 of being rated as either highly or extremely important by 80% of raters (the ones that did not reach this threshold are marked with ‘†’). Items are represented by their summary in the figure. Full text of the items is provided in Tables 16.

Reporting state of the checklist items

The state of reporting of the checklist items was assessed among 108 articles (ranging from January 1, 2017 to December 30, 2020) identified through a systematic review5. Rating was done by three independent raters (M.Z.-B., A.K.Z., and P.G.A.). An initial pilot rating of 19 articles was conducted and supervised by M.Z.-B., A.S. and H.E. to train the raters. After pilot rating, the remaining 89 articles were assessed by the three raters. Conflicts between raters were resolved by M.Z.-B., A.S. and H.E. in two group meetings, with all raters and supervisors reaching agreement on the final scores. The overall state of the reporting of the checklist items for each of the 108 studies (‘reporting score’) was calculated as the number of reported items divided by the total number of checklist items, excluding those with a ‘not applicable’ rating for each study. The inter-rater reliability of the checklist was also assessed on the basis of the three ratings for the 89 articles, using Fleiss’ Kappa43. To assess whether papers with a better reporting status appear in journals with higher impact factors, whether the reporting status has improved across recent years and whether word-count limitations have an impact on reporting status, the correlations of reporting score with journal word limit, article word count and journal impact factor were also assessed. A number of example papers reporting each item are presented in Supplementary Table 6.

To support the potential utility of the checklist, a list of papers that demonstrate how each checklist item might affect the results of an FDCR study and its importance for interpretability and generalizability is also provided in Supplementary Table 6.

Ethical considerations

To ensure informed autonomy, all contributors were informed about the study’s aims and methods in the invitation email. Further notes within the questionnaire and emails during each round provided extra details, although the general study design and purpose remained unchanged. Members of both the SC and EP were invited to view the study’s evolving Open Science Foundation (OSF) page43. All contributors were informed that they could terminate their participation whenever they wished. To ensure confidentiality, contributors were kept anonymous during both rounds of the Delphi survey, and comments and ratings were anonymized to all except the lead authors. Neither responding to the basic information collected nor commenting on and rating the checklist items was deemed to require the disclosure of personal information.

Results

Characteristics of SC and EP and response rates

Of the original 14 SC members and 45 EP members who accepted the invitation, 51 (86.4%) respondents completed the revision round of the ENIGMA ACRI Delphi questionnaire. In the rating phase, 49 (83%) sent back complete responses. Four members of the EP responded to neither the revision nor the rating phase and therefore, were subsequently removed from the EP.

The characterization of the SC and EP is provided in Supplementary Table 1, which shows that SC members were older overall than the EP without any significant difference (mean ± s.d.: 51.1 ± 9.1 versus 45.3 ± 9.4); 60% (5 SC and 28 EP) of respondents were male. Most respondents hold a PhD (79% SC and 80% EP) and MD and PhD degrees (21% SC and 10% EP) and reported their primary field of research predominantly in neuroscience (29% SC and 44% EP) and psychiatry (43% SC and 34% EP). The professional affiliations of respondents were primarily universities (57% SC and 80% EP), hospitals (21% SC and 10% EP) and independent research institutes (14% SC and 10% EP). EP and SC members’ research involved cue-reactivity studies of many SUD cohorts (e.g., methamphetamine, cocaine, opioid, alcohol, tobacco and gambling).

Delphi process results

A schematic of the entire study process and checklist development stages can be viewed in Fig. 2.

Checklist development phase

After the systematic review of 318 articles, an initial list of suggestions for the overall structure of the checklist and important items was developed. This list consisted of 42 items in 5 categories: 13 General Task Information items, 9 Drug Cue Information items, 9 Control-Cue Information items, 6 Craving Assessment Inside Scanner items and 5 Craving Assessment Outside Scanner items. After the discussions within the SC members, this initial draft was developed into a checklist with 7 categories and 37 items: 8 Participants’ Characteristic items, 4 General fMRI Information items, 5 General Task Information items, 6 Cue Information items, 5 Craving Assessment Inside Scanner items, 4 Craving Assessment Outside Scanner items and 5 Pre- and Post-Scanning Considerations items. In addition, on the basis of the SC inputs, a column with 27 additional recommendations corresponding to the different items was added to this checklist.

Revision phase

On the basis of SC and EP comments on the checklist, one Participants’ Characteristic item, one Cue Information item, one Craving Assessment Inside Scanner item, one Craving Assessment Outside Scanner item and two Pre- and Post-Scanning Considerations items were excluded. New items were refined and added to the ENIGMA ACRI checklist following suggestions made by respondents to the ‘please suggest extra variable’ question. Additional Participants’ Characteristic items were ‘Psychiatric Profile’ and ‘Substance Use Profile-Main Drug’. The additional General Task Information items were about ‘Temporal Information of the Event/Block Duration’ and ‘Data and Resource-Sharing’. The additional Pre- and Post-Scanning Considerations item was about ‘Other Tasks and Procedures in the Imaging Session’. In addition, one item was split into two items: item 4—Advanced Demographics I and item 5—Advanced Demographics II. Thus, in the rating round, there were 11 Participants’ Characteristic items, 4 General fMRI Information items, 7 General Task Information items, 5 Cue Information items, 4 Craving Assessment Inside Scanner items, 3 Craving Assessment Outside Scanner items and 4 Pre- and Post-Scanning Considerations items. The 22 additional recommendations were also expanded to 75, of which 69 were item-specific recommendations and 6 were category-specific recommendations. All the comments received in the revision phase are provided in an anonymized database on the project’s OSF page43.

Rating phase

Rating phase results can be viewed in Fig. 3. Respondents had a high rate of agreement on most checklist items, and all items reached the less-stringent threshold (>70% of participants selected the ‘extremely important’, ‘highly important’ or ‘moderately important’ rating), and no item was excluded due to not reaching the thresholds. Most of the items also met the more-stringent threshold of the consensus (>80% of participants selected the ‘extremely important’ or ‘highly important’ rating). The following items (marked with † in Fig. 3) did not reach the most stringent a priori threshold of the consensus: Advanced Demographics I, Advanced Demographics II, Handedness, Substance Use Profile-Main Drug, Substance Use Profile-Other Drug, Data and Resource-Sharing, Sources of Cues-Development, Drug and Neutral/Control Cue Content, Neutral/Control Matching to Drug Cues for Physical Features, Craving Assessment Inside Scanner-Technology, Craving Assessment Outside Scanner-Time Points, Pre-scanning Training and Familiarization, Other Tasks and Procedures in the Imaging Session and Post-scanning Craving Management. The results of the ‘Yes/No’ rating of the 75 additional recommendations are presented in Fig. 4. The results show that 69 (92%) recommendations reached the 50% threshold, but the following 6 (8%) did not: Interviewer Qualification, Motivation to Quit, Socio-economic Status, Body Mass Index, Menstrual Status and Sleepiness/Alertness. With the exception of revisions for minor grammatical and typographical errors, the checklist was not changed in the rating phase, and no item or category changes were made as a priori planned43. The average ratings of the ENIGMA ACRI checklist items and the frequency of ‘Yes’ ratings for additional recommendations are presented in Tables 16.

Fig. 4: Ratings for 75 additional recommendations in seven categories.
figure 4

This figure depicts the rating of 49 raters (11 from the steering committee and 38 from the expert panel) for the checklist additional recommendations. Each additional recommendation was rated either ‘Yes’ or ‘No’ on the question of whether it should be included as a recommendation. Recommendations are represented by their summary in the figure. Full text of the recommendations is provided in Tables 1–6.

The short form of the checklist is available in Table 7. The other checklist forms, including both the items and the additional recommendations, are available as PDF or Excel files in Supplementary Tables 25.

Table 7 ENIGMA ACRI Checklist, short form

Reporting state of the checklist items

The consistency of the raters’ responses between the three raters resulted in a Fleiss’ Kappa of 0.799, indicating that the consistency is between ‘substantial agreement’ and ‘almost perfect agreement’43. The Kappa indices for all individual items except ‘Other Tasks and Procedures in the Imaging Session’ and ‘Substance-Use Profile-Other Drugs’ items were higher than 0.4, indicating at least a ‘moderate agreement’ among the raters. The Fleiss’ Kappa for each individual item can be found in Extended Data Fig. 1. The reporting status of the ENIGMA-ACRI checklist items ranged from near-universal reporting (99%; Basic Demographic Data) to almost not-reported (8%; post-scanning craving management). Articles also varied widely in terms of their overall reporting score, ranging from reporting only 27% of the checklist items to reporting 92%. On average, 70.4% ± 10.5% (mean ± s.d.) of checklist items were reported by the papers in our database (Fig. 5). Overall, the ‘General fMRI Information’ section had the highest average reporting across the 108 studies at 90.5% reporting, and the ‘pre- and post-scanning considerations’ section had the lowest reporting at 44.7%. The highest reporting score was 91.7%, and 10 articles had a score of higher than 80%. The lowest reporting score was 27.3%, and only 6 studies failed to meet a reporting threshold of 50%.

Fig. 5: State of reproducibility/transparency in fMRI drug cue reactivity research in the context of the ENIGMA-ACRI checklist.
figure 5

Assessments by three independent raters on the basis of 108 FDCR articles. a, Percentage of articles that reported each checklist item. Note that the percentages are calculated out of applicable items for each article. For example, craving-rating technology was not applicable for an article without craving rating. b, Percentage of overall reporting status of articles.

The correlations of study reporting status with journal word limit, article word count and journal impact factor were not significant, and relevant graphs are presented in Extended Data Fig. 2.

Discussion

We developed a checklist resulting from a consensus process that represents the views of participating scientists regarding what they presumed to be important methodological aspects of conducting an FDCR study that would merit universal inclusion as methods details. We also investigated the state of the reporting of these checklist items in the FDCR literature. Key methodological aspects include seven distinct categories of core items and additional recommendations, as enumerated below.

Participants’ Characteristics

The Participants’ Characteristics section covers data about subjects’ demographics, psychiatric profile, handedness, substance-use profile, abstinence status and treatment status. All the items listed in this category were considered important by the experts (Fig. 3 and Table 1), although some such as race or ethnicity and handedness are not frequently reported in the literature (Fig. 5).

Age and sex/gender passed our more-stringent consensus threshold. In terms of age, FDCR studies can typically be divided into two major categories, those involving adolescents/emerging adults (e.g., refs. 44,45) and those involving adults (e.g., refs. 46,47). This distinction is important in part because of the development of the cortical circuitry that provides top-down control over bottom-up limbic systems that continue to mature throughout adolescence to early adulthood48. In addition, it is likely that age is correlated with years of substance use49, and neurocircuitry adaptations also occur over time, leading to potential confounding. Moreover, although FDCR studies often include participants in specified developmental stages, not much is known about the association of age (in years) with FDCR in each developmental category, perhaps partly due to restriction of participant age range. In addition, older adults have been routinely excluded from MRI studies that do not focus on aging and the shared neurodegenerative impacts of addiction and biological aging50, and there is relatively little known about FDCR among the elderly. In terms of sex/gender, multiple studies have demonstrated sex-/gender-related differences in FDCR, particularly in participants who smoke cigarettes51,52, individuals with cocaine dependence53,54 and those with gambling55 and gaming disorders56,57,58, which may depend, in part, on menstrual cycle phase in women59.

Additional demographics that passed the less-stringent consensus threshold included education/intelligence, handedness and race/ethnicity. These were rated as relatively less important than age and sex/gender partly because of a lack of published evidence for their association with FDCR.

It is perhaps not surprising that education/intelligence has not been found to be reliably associated with FDCR, given the often-low cognitive demands of a typical FDCR task (i.e., passively perceiving sensory stimuli). However, education/intelligence might be an important factor in FDCR in populations with intellectual disabilities60. Seventy-two percent of the assessed studies reported a measure of intelligence or education. Although handedness can be a critical consideration in fMRI studies of cognition (e.g., language and memory61), it does not appear to play a major role in the lateralization of FDCR, and only 41% of the 108 FDCR studies reported a measure of handedness.

In the case of race and ethnicity, it is possible that the literature as a whole has not provided sufficient opportunity to detect associations between FDCR and participant ethnicity or race (which could be driven entirely by unmodeled environmental/contextual variables), because studies have historically contained too few non-white/Hispanic participants to provide adequate statistical power to detect such associations. Only 40% of the reviewed FDCR studies reported participants’ race or ethnicity. Some racial and ethnic differences in brain activation during fear processing62 and social evaluation63 have been noted in the literature, but the importance of these differences in FDCR remains largely unknown.

In terms of clinical characteristics, the pattern/severity of substance use, addiction treatment status, last use and abstinence status, psychiatric profile and study inclusion/exclusion criteria passed our more-stringent consensus threshold. All of these items were reported in ≥75% of the assessed FDCR studies, with the exception of abstinence status, which was reported in only 59% of the studies. The importance of all of these items has been discussed previously. For example, in people who use cocaine, greater FDCR has been positively associated with addiction severity8,46,64 and could be predictive of relapse8,65,66. Perhaps unsurprisingly, self-reported craving has also been associated with FDCR across various drugs8,16.

Although both treatment seekers and non-treatment seekers demonstrate similar activation to drug cues in the ventral striatum67, treatment seekers have lower activation to drug cues in various non-limbic (e.g., frontal, cingulate and temporal) brain regions than non-treatment seekers49. This difference may be attributable to the expected availability of drug reward after cue exposure68,69, an additional variable of potential interest to consider for future consensus checklists.

Abstinence has also been associated with increased drug cue reactivity (e.g., in dorsolateral PFC and occipital cortex) in cigarette smokers70 and (e.g., in the midbrain) in individuals with cocaine use disorder71 but needs further study. Although individuals with acute psychiatric illness co-occurring with SUDs are typically excluded in FDCR studies, studies could collect information on lifetime histories of psychiatric illness and present subclinical symptoms of psychiatric disorders like depression and anxiety and investigate the interaction of past psychopathology or present subclinical symptoms on FDCR72,73,74. Researchers should consider explicitly stating whether individuals were assessed for the existence of subclinical symptoms of psychiatric disease, even if the assessment was performed as part of the inclusion or exclusion criteria. If individuals with subclinical symptoms are included, the impact of psychiatric symptoms on FDCR parameters and the sensitivity of the analyses to their presence may be estimated.

Finally, all study inclusion/exclusion criteria, including those already discussed, must be carefully considered. As just one example, psychiatric medications have been shown to alter FDCR75; information concerning psychiatric medications should be provided to readers in a standardized manner (e.g., in chlorpromazine equivalents for neuroleptic medication), and attempts should be made to prevent or at least examine the potential impact of all medication classes on FDCR via appropriate randomization and/or analytic strategies.

Additional clinical characteristics that passed our less-stringent consensus threshold included substance administration method and the co-occurring use of other drugs.

FDCR studies often isolate participants by route of drug administration either purposefully or through convenience sampling (e.g., demographic homogeneity due to geographic location of participant recruitment). Nonetheless, care (e.g., in cue representation and covariate analysis) should be taken when combining groups of individuals who use the same drug (e.g., opioids) but self-administer it via different routes (e.g., intravenous versus oral76) within the same sample or study. In our sample of FDCR studies, 75% reported the route of drug administration, although this is partly because some substances commonly investigated in FDCR studies (such as alcohol) have only a single plausible administration route, and in these cases the studies were not required to explicitly report the administration route for a ‘Yes’ rating.

Although researchers typically aim to isolate a single or ‘primary’ drug in FDCR studies, the use of other drugs should also be considered, because sensory cues of the ‘primary’ drug may nonetheless trigger neurobehavioral responses to multiple drugs, particularly when such drugs are commonly used simultaneously (e.g., cannabis and alcohol77). Only 17% of studies failed to report the use of other drugs.

Another potentially important participants’ characteristic is genetics. This factor was not considered important for inclusion in this checklist by our participating experts, perhaps because the influence of genes on various aspects of FDCR remains understudied. Nonetheless, polymorphisms in dopaminergic, GABAergic, glutamatergic, cholinergic, opioidergic and other genes may affect FDCR results (e.g., refs. 78,79,80,81,82,83,84,85,86,87,88,89,90,91). As FDCR methods are harmonized and more data sharing can occur, we suggest that FDCR studies consider banking subject DNA for future genotyping so that DNA will be available to support analyses such as those involving polygenic risk scoring. Prospective use of genetic data could involve explicit informed consent or a waiver of informed consent from independent review boards to use deidentified data.

General fMRI Information

This section covers general details for the reporting of methods for fMRI acquisition details (hardware and software), data analytic procedures and scanning results in FDCR studies (Fig. 3 and Table 2). These items were considered extremely important to report by >80% of raters, and the category overall had the highest mean rating of all seven reporting categories. Similarly, for additional recommendation items (Fig. 4 and Table 2), the General fMRI Information category had the highest proportion of elements (89%) recommended by ≥75% of raters. This strong consensus is not surprising because these FDCR elements robustly influence data quality and variability. Nearly all of the 108 assessed studies reported all except the more specific ‘fMRI data reporting’ item, the requirements for which were met in 65% of the studies (Fig. 5). Below, we discuss selected items in each subcategory (acquisition, preprocessing, processing and reporting) to illustrate key points.

It was recommended with near unanimity that FDCR data acquisition details be reported using detailed checklists (e.g., COBIDAS Report23 and/or ref. 92). Detailed reporting can increase experimental design consistency, assist investigators new to the field in implementing robust methods, and increase FDCR replicability and enable data sharing and meta-analyses. For example, it is very important to report hardware details that could affect fMRI signals in different ways across the brain, such as the number of head-coil channels (e.g., 32 versus 8).

Indeed, a ‘coil-bias’ effect has been documented by several studies: one study determined that a 32-channel coil was more sensitive than an 8-channel coil for detecting cortical surface signals during a finger-tapping paradigm but less sensitive for detecting subcortical activations93. A more recent and comprehensive study investigating coil bias determined that head-coil channel number affects volumetric and diffusion measures as well as resting-state BOLD signal measures, with channel number strongly affecting BOLD signals in posterior visual and default mode network areas94.

In addition, although most current FDCR studies are conducted on 3-Tesla (T) systems, other factors will need to be considered in future as more studies are conducted at higher magnetic field strengths. For example, a preliminary (bioRxiv) communication compared fMRI results on a monetary incentive task in eight subjects scanned both at 7 and 3 T95. The study reported that 7-T scans yielded higher effects than 3-T scans in small subcortical nuclei relevant to FDCR studies, including the substantia nigra, ventral tegmentum and locus coeruleus.

Detailed reporting of preprocessing parameters using the structured checklists noted above was unanimously endorsed. Preprocessing parameters such as the spatial smoothing Full-Width Half Maximum value should be reported because they affect statistical inferences. In this regard, a meta-analysis of fMRI tasks involving rewarding stimuli revealed that the spatial smoothing value affects apparent nucleus accumbens volumes and anatomical positions96.

There was near unanimity in the endorsement of reporting of artifact detection methods and motion thresholds for data exclusion.

There was substantial but lower agreement (79%) regarding reporting of group motion parameters during FDCR drug- versus neutral-cue blocks, which, if differing by group, could confound data analyses. This version of the checklist did not explicitly include denoising protocols, which when applied can affect task-related fMRI data by reducing noise and signal97. Future checklist versions might consider including denoising procedures, which hopefully will evolve to more selectively attenuate noise.

For data processing pipeline procedures, there was near unanimity (98–100%) for most elements, including recommendations to report on single-subject and group-level processing steps, nature of GLM analyses (random, mixed and fixed), whether covariates or demeaning are used, software tools used, multiple comparisons corrections applied and regions of interest specifications, if applicable (e.g., manually drawn, atlas-based or dataset-determined).

Reporting of the pre-registration of data-processing methods and reporting of effect sizes were considered important but with lower priorities. This lower priority does not mean that the checklist contributors did not believe that reporting the effect size matters. However, it should be noted that the focus of the survey was on the consideration and reporting of methodological factors, not details of the results. This might explain why effect sizes have been de-prioritized by survey respondents. The sample sizes commonly used in task-based fMRI research tend to generate small-to-medium effect sizes (Cohen’s d < 0.8 98). However, it seems likely that effect size reporting will be considered a higher priority in the future.

There was greater variability across fMRI data-reporting elements, with >80% of raters endorsing detailed reporting of second-level maps or activation foci within groups, whole-brain contrasts, beta-weights during craving and neutral conditions and inclusion of whole-brain maps even in studies not using standard analytic methods, to facilitate data comparisons across studies.

Other reporting elements were considered somewhat lower priorities, including providing non-thresholded statistical maps and stating whether data have been or will be deposited in publicly available repositories, which can be challenging given inconsistencies in repository reporting requirements. Most (78%) raters recommended that reporting go beyond the use of checklists by providing as much experimental detail as possible. Undoubtedly, over time, as more data are aggregated in meta-analyses and as additional factors are determined to affect FDCR data effect sizes, such factors will be added to the reporting checklist.

General Task Information

While FDCR tasks are often straightforward cue-presentation paradigms, an adequate description of the task design, task components, requested subject engagement and precise temporal information is essential to assess the appropriateness of analytical procedures and interpret the results. As such, it is not surprising that experts considered this category to be almost as important as the ‘Participants’ Characteristics’ and ‘General fMRI Information’ sections (Figs. 3 and 4 and Table 3), and three of the seven items were reported by almost all of the assessed FDCR studies (Fig. 5). Because of its fundamental implications for modeling and design efficiency, it is necessary to report the exact temporal structure of the task, specifically the order, the onset, the spacing and the duration of stimuli, and it is not sufficient to merely report whether stimuli were presented in blocks or an event-related or mixed design was used. The temporal pattern of stimulation also significantly influences the amplitude of the evoked hemodynamic response.

In addition to simple cue-presentation experiments, sophisticated tasks with complex trial structures are increasingly used to investigate the interactions between various affective and cognitive trial components, such as attentional bias99 or response inhibition during the presentation of drug cues49. In these cases, a detailed description of the timing of stimulus presentations and participant responses within trials and blocks and the related modeling approach can be especially necessary to understand and assess the experimental procedure. To optimally sample hemodynamic responses in event-related designs and also decrease the predictability of stimulus presentation, the interstimulus interval (ISI) is often jittered, resulting in random ISIs across the task duration. The formulations used to obtain jittered intervals and the distribution of the resulting ISIs are important to assess design efficiency and should be described in detail100,101.

Beyond this micro-timing information, information like the overall duration of the scanner session, the duration of the experimental paradigm, the start in relation to the onset of the scanning session and the position within the order of possible additional paradigms are also of interest because multi-paradigm fMRI experiments are known to be prone to carryover and order effects16.

Reporting should further mention whether and how the order and timing of stimulus presentation were optimized. If appropriate, all of this information could be provided in compact and understandable ways by means of graphic displays (e.g., see refs. 44,102,103,104). Most of the assessed FDCR studies report at least some information regarding these items, with the least frequently reported item being the ‘Temporal Information of the Task’ item at 80% reporting. In the interest of a complete description of the experimental setup, we also suggest that the technical details of stimulation procedures and parameters and the equipment used be reported, especially if a less-common sensory modality was targeted. For example, studies using gustatory cues (e.g., alcoholic beverages) could report substance concentration and temperature, whether cues were preceded with another stimulus, potential latencies in substance delivery and the equipment and material that were used.

Whether participants are instructed to interact passively or actively with the cue, to allow or to regulate craving, is an important component of instruction, influencing the experimental setting. To enable the reader to judge the clarity of the instruction, the verbatim instructions given to the participant should be included. Especially in passive tasks, additional processes such as mind wandering and attentional drift could occur105, potentially harming the specificity of statistical analyses. Therefore, the chosen activity level and possible attempts to quantify participants’ compliance, attention and vigilance should be described in detail. For instance, some studies include trials to assess participant attention or use eye-tracking technologies (e.g., see refs. 106,107,108. Over 39% of the rated studies failed to report this crucial item.

Although 58% of the panel experts were of the opinion that the task code and stimuli-sharing item (Table 3) should be included in the checklist, its importance was rated lower (3.31) compared to the other items. This is particularly surprising given the intense contemporary discussion about reproducibility in fMRI research98. In our opinion, authors should still report whether they have used an open scientific platform to provide task-related data (stimuli and software) to the imaging community. Therefore, the manuscript should include, where appropriate, information on access points and conditions of access (e.g., see refs. 109,110), in accordance with the FAIR principles for data exchange (https://www.force11.org/fairprinciples). This item was the least frequently included in the rated FDCR studies, with only 6% of the 108 papers sharing their task-related data and resources.

Cue Information

The drug and control cues used in FDCR research fall under a number of different sensory modalities, can be developed and parametrized depending on modality and preferably validated and matched in terms of their important characteristics. This checklist category includes information regarding important features of the utilized cues and their origin, validity and content, and several items and recommendations received near-unanimous support (Figs. 3 and 4 and Table 4). Item rating means ranged from 4.07 (for the description of the validation extent of the cues) to 4.77 (for the description of the sensory modality of cues).

Multiple drug- or control-cue–related aspects of FDCR studies may affect study outcomes16. The most important factor may be the description of the sensory modality of drug and control cues, which was also reported in 97% of the rated FDCR studies (Fig. 5). Although cues in different sensory modalities often induce distinct brain activation profiles111, some studies do not clearly describe the sensory modality of their utilized cues. Depending on the sensory modality, there are various parameters that may need to be further considered and specified for drug cues and control stimuli. For instance, for pictorial cues, it is recommended that authors provide details regarding picture luminance, complexity (including human presence), hue and saturation. For auditory cues, it is important to consider factors such as intensity and frequency (loudness and pitch)19,21,112. Only half of the 108 assessed FDCR studies reported their choices regarding cue matching (i.e., trying to control for both physical features like size and color and content features in the substance and control cues).

Furthermore, these parameters may be used to ‘match’ drug cues and control stimuli (or those belonging to other cue categories in a study). Matching is done to minimize the effects of these other factors on the differential activation patterns elicited by different cue types. In addition, cues can be matched on the basis of their standardized arousal, valence or craving induction scores19,112,113.

Another important but often overlooked factor limiting replicability and interpretation of FDCR studies is confusion over the sources of utilized cues, how they were obtained or developed and whether they have been validated (i.e., shown to elicit a certain range of arousal, affective or craving-related responses in individuals). Experts considered providing cue-validation details to be very important, but the reporting of cue-development details was not rated as highly. Nevertheless, there was near-unanimous support that researchers should consider reporting the exact source of their cues and how their cues were developed from this source, where applicable, which suggests that the participating experts broadly considered this a significant aspect of an FDCR study. Even in cases where authors are using cues developed or validated in another published study, it is still desirable to provide minimal development and validation details in addition to references. A notable gap between the aggregated expert opinion and reporting status in the reviewed literature was also observed, with 72% of FDCR studies containing information on cue development but only 28% reporting any cue-validation processes.

Although not always optimal, using cues from already validated and widely used cue databases may save researchers considerable resources and improve consistency across studies. There have been recent attempts to develop large pictorial cue databases to address these issues19,112. These databases include cues that have been developed in a methodologically consistent manner and whose craving and arousal elicitation effects have been formally studied. The best FDCR cue databases include neutral stimuli as well as drug cues that are matched according to various characteristics21,114. Newer databases with a greater focus on drug cue–reactivity studies have become available in recent years17, and large developing cue banks may even contain multiple drug cues and control stimuli types19.

The exact content of cues can also influence multiple dimensions of cue reactivity. Drug cues may depict the drugs themselves, drug paraphernalia, individuals preparing or using drugs or spaces where drug use is likely. Differences in the content of cues (drug versus drug-use tools versus drug-use actions) may recruit different brain areas, and this may have implications for how these cues link to drug-seeking behavior115. It may be important to consider this aspect of cue selection when designing studies, because certain cue contents may be more appropriate for testing some, but not all, hypotheses.

In addition, among recommendations in this category, there was widespread agreement on the importance of describing substance-delivery methods in studies in which a substance is administered as a cue, prior cue exposure, and cue tailoring. Studies in which a substance is directly administered (usually in small amounts) remain relatively rare in the field of FDCR as a whole. However, given the popularity of these paradigms in some fields (such as in tobacco use disorder and alcohol use disorder) and the large variety of substance-delivery mechanisms used, it is recommended that researchers describe their delivery mechanisms in detail and cite the relevant literature when possible116,117,118. Prior exposure of participants to cues is also important. Some brain regions may rapidly habituate to specific drug cues, decreasing their reactivity to them, even in the absence of a reduction in self-reported craving119. Lastly, personalized tailoring of cues presents unique challenges and opportunities in FDCR studies. Although it potentially leads to maximal cue reactivity in all participants, it also leads to heterogeneous cues that present problems for generalizability and interpretation. It is recommended that authors specify whether tailoring was conducted (if there is room for misunderstanding) and present precise details for how tailoring was conducted for each participant. Although all individual cues in a study may be tailored120, tailoring can be particularly applied on the basis of the participant drug of choice in samples of individuals who use multiple drugs109. Tailoring of drug-related messages meant to encourage drug-use cessation is another possibility121. Tailoring for gender/race/ethnicity is another area that is not well explored yet.

Task-Related Assessments

This section includes items regarding the inside- and outside-scanner assessment of the subject’s craving, including when and how the craving was assessed. Integration of self-report, behavioral or physiological measures as part of FDCR is commonplace122,123,124. Yet, perhaps because fMRI is the primary focus of these papers, the methodological details of other task-related assessments (e.g., self-reported drug use and craving/urge) that would be standard to report in behavioral research papers are sometimes excluded. Details of items, ratings and recommendations are presented in Figs. 3 and 4 and Table 5. A recent review of opioid-craving measurement identified many different questionnaires for assessing opioid craving; however, many had not been tested for reliability and validity125. Harmonization and validation of the questionnaires used for subjective reporting of drug craving should be considered a priority in the field. As an example, a systematic review is ongoing to develop an extensive map of every instrument used to assess craving in clinical trials126.

The timing of additional task-related assessments received high ratings of importance overall, with universal agreement that reporting the time period considered for in-scanner tasks (i.e., urges while viewing the image versus afterward) is important. Assessment time points were reported by ~90% of the rated FDCR studies for craving assessments both inside and outside the scanner. This information is critical for proper interpretation of the nature and magnitude of the response. There is evidence that the effects of imagery-based cue procedures on urge may persist for extended periods of time (e.g., 15–30 min)127,128, but the duration of effects from the brief image presentations commonly used in FDCR are largely unknown. Indeed, given that many FDCR paradigms rely on random/pseudo-random presentation of interleaved images from varying categories, an implicit assumption of most research is that the duration of these effects is brief. Continued research on this topic examining the validity of this assumption is critical and could conceivably lead to the development of formal guidelines for such assessments depending upon the nature of the study, the cue modality used and the specific question being asked.

As with timing, there was near-universal agreement that detailed reporting of the contents of both in-scanner and out-of-scanner assessments is important. This is perhaps particularly critical for in-scanner assessments, for which research has historically relied more heavily on single-item measures and may not have been subjected to the same rigorous examination of psychometric properties common for traditional self-report measures129,130,131. Although the general construct is frequently reported (e.g., urge or liking), reporting the exact phrasing is less common despite long-standing recognition that subtle differences in wording can affect participant interpretation and study outcomes132,133. This issue will be particularly important as research continues to explore covariation of constructs with brain activation. Indeed, research has already shown that patterns of activation may be at least partly dependent upon urge strength134. It should be recognized, however, that subjective ‘craving/urge’ is highly variable and situation specific (e.g., scanner versus bar). As such, brain activation to cues during fMRI might be less variable and, in fact, was one of the reasons for the initial development of FDCR paradigms.

There was also agreement about the importance of reporting hardware (e.g., button box and response pad) used for collection of these assessments. This may be particularly critical for research in which response time is examined as a primary or secondary outcome. An extensive body of literature documents the existence of substantial variability in the accuracy of data-collection devices outside the scanner135,136,137. To our knowledge, no similar evaluation of variability in the accuracy of common MRI-compatible devices has been conducted. However, the importance of reporting utilized hardware in fMRI research138 and using similar and calibrated hardware in multi-site fMRI studies139 has been noted in the literature.

Comparatively fewer experts (61%) recommended the inclusion of other physiological measures relative to other topics under consideration. One likely reason is that to date, these measures have rarely been included in FDCR studies. Nonetheless, examination of heart rate, skin conductance and other peripheral physiological measures are standard in the broader drug cue reactivity140. It is certainly plausible that changes in peripheral physiology could influence findings, particularly for certain types of imaging (e.g., arterial spin labeling). Moreover, inclusion of peripheral signals as covariates is becoming standard in resting scans in light of evidence showing it can alter connectivity maps141, and there is little reason to believe that these concerns should not extend to task-based scans. Although it may be premature to make formal recommendations for inclusion of peripheral measures at this time, continued exploration of this topic is critical and may reveal a need for inclusion in later instances.

Pre- and Post-Scanning Considerations

This section covers the items that have to be considered before and after the scanning session, which includes training and familiarization, pre-scanning substance consumption, other tasks and procedures besides cue reactivity and post-scanning craving consumption. Of the pre-/post-scanning considerations, pre-scanning drug and smoking consumption was the only metric rated as moderately to extremely important by all reviewers (Figs. 3 and 4 and Table 6). This is probably because of the impact that both abstinence and recent substance use can have on cue-induced craving and brain function. The length of abstinence also matters, because studies generally support the idea that short-term abstinence enhances cue reactivity relative to satiety142,143,144,145,146, which mirrors preclinical findings147. In contrast, longer-term abstinence is associated with reduced cue reactivity146. Furthermore, deprivation and cue presentations may have independent, interactive effects on subjective reports of craving148, supporting the need to clearly indicate the conditions under which cue reactivity is evaluated. There is also a need to report the recency of other substance use and medications because they may influence subjective cue responses and the physiology underlying the fMRI signal, but this was reported by only 54% of the 108 rated FDCR studies.

Other recommendations include indicating whether participants have had prior cue exposure in the context of the study. This is important because habituation to emotionally evocative stimuli has been identified in specific brain regions149, yet not in all participant groups, particularly those who may be more reactive to the cue content150. While within-session habituation is a potential confounder119,149, cues continue to elicit subjective craving and comparable brain activity patterns over repeated sessions separated by longer durations (2–3 weeks)151,152,153. However, this finding has not been supported in all studies119, thus supporting the need to clearly report details surrounding previous cue presentations. Reporting drug expectancy is also recommended, because recent work suggests that participant expectations influence cue reactivity and related circuitry154,155,156.

Several elements of pre/post scanning considerations did not reach a stringent consensus. Pre-scanning training and familiarization were ranked as highly important by ~60% of respondents, because some reviewers felt this was such a fundamental aspect of good scientific procedures that it was assumed that study participants were familiarized in some way with the task, and only 25% of the assessed FDCR studies reported this item. In addition, most cue reactivity tasks involve passive exposure to cues, which, unlike complex behavioral tasks, do not require extensive pre-scan training. However, such familiarization may also affect potential habituation and expectancy, which would support the need to report on the basis of the discussion points above. The need to report other tasks and procedures in the imaging session was similarly ranked and did not reach a stringent consensus. It is plausible that the lack of reporting of other tasks may imply a singular focus on cue reactivity, with no potential influence for the other tasks. That said, reporting tasks that have the potential to influence cue reactivity is considered best practice. Post-scanning craving management was rated the lowest element, with <35% of the respondents ranking it as extremely/highly important, perhaps because it is viewed as more of an ethical consideration that would be considered by local institutional review boards rather than a factor that would affect cue reactivity directly. Given the potential ethical importance of craving management, it may be concerning that it was included in only 8% of the FDCR study sample. However, the ethical implications of this element depend on the nature of the specific study, because the consequences of inducing craving are more profound when assessing a cohort in treatment for opiate use disorder than when assessing a community sample of nicotine-dependent individuals not seeking treatment.

Conclusion and future directions

As demonstrated by the consensus of the experts participating in this study and the review of the literature, FDCR studies have a vast methodological parameter space in which many impactful choices regarding study design and reporting can be made. The lack of methodological transparency complicates replication and generalizability and hampers data synthesis and clinical translation, necessitating further harmonization in reporting methodological details. Focusing primarily on representing expert opinion on best reporting practices in the field, this initial checklist is envisioned as a starting point to gain further empirical insight into the effect of methodological details in FDCR research. Importantly, this checklist was derived from FDCR researcher estimations of what methods parameters are likely to substantially affect FDCR study results. However, uniform and thorough reporting of these parameters in future studies is necessary to enable sensitivity analyses (e.g., meta-analyses) to confirm or refute the ostensible importance of these factors, yielding critical mechanistic insights into cue reactivity in the process. We hope that the development of this checklist will set an initial standard for research practices and encourage scientific authorities in other areas of task-based fMRI to promote harmonization and transparency in reporting methodological details across different areas of functional human brain mapping38. As a secondary effect, journal reviewers and editors may consider aspects of this checklist during the peer review of relevant FDCR articles.

This paper presents the results of an international effort to develop an initial checklist of important items and recommendations that FDCR researchers can use to plan future studies or assess past work. The itemized and hierarchical structure of the checklist is meant to help researchers read and consider various parts as needed, and the ratable format makes it possible to use the checklist to score an FDCR study. In addition, a list of papers that appropriately report checklist items is provided in the supplementary materials and can be consulted when using the checklist. Our ultimate hope is that this checklist will be used widely within the field to foster transparency in FDCR research and facilitate data syntheses. Crucially, the checklist is not meant to limit variance and flexibility in study design, but rather to invite attention to various methodological aspects of an FDCR study, in particular under-reported elements such as abstinence status/recent drug use, participant task familiarization and compliance/attention, cue validation and matching and how they bear on the obtained results, wherever they might be applicable in the context of a particular project.

This is merely the first iteration of the checklist. Considering the rapid rate of progress in the field and based on feedback from the FDCR academic community, the checklist will be revised in later editions and is now an open-source project at https://osf.io/gwrh6/ for public commenting and discussion. To ensure the feasibility of the checklist application, we suggest considering and reporting the ‘items’ as a ‘must’ in FDCR studies and the use of ‘additional recommendations’ as suggestions to improve the methodological design and reporting of FDCR studies. The extent to which the checklist is adopted by journal editors/reviewers and FDCR researchers around the world will determine its influence in the long term.