Abstract
Psychiatric neuroimaging faces challenges to rigour and reproducibility that prompt reconsideration of the relative strengths and limitations of study designs. Owing to high resource demands and varying inferential goals, current designs differentially emphasise sample size, measurement breadth, and longitudinal assessments. In this overview and perspective, we provide a guide to the current landscape of psychiatric neuroimaging study designs with respect to this balance of scientific goals and resource constraints. Through a heuristic data cube contrasting key design features, we discuss a resulting trade-off among small sample, precision longitudinal studies (e.g., individualised studies and cohorts) and large sample, minimally longitudinal, population studies. Precision studies support tests of within-person mechanisms, via intervention and tracking of longitudinal course. Population studies support tests of generalisation across multifaceted individual differences. A proposed reciprocal validation model (RVM) aims to recursively leverage these complementary designs in sequence to accumulate evidence, optimise relative strengths, and build towards improved long-term clinical utility.
Similar content being viewed by others
Introduction
The “traditional” model in psychiatric neuroimaging, characterised by small, cross-sectional, and observational studies has historically dominated research endeavours. Over the last decade, numerous challenges to this model have been presented, including reproducibility, inference, reliability and power [1,2,3,4,5,6,7,8]. These challenges have raised concern for the ultimate potential clinical utility of neuroimaging (particularly widely used non-invasive techniques like structural and functional MRI [fMRI]) to advance the aetiology or treatment of psychiatric disorders. As a result, there have been recent calls for a shift toward either population-level “big data” or more targeted precision studies, often longitudinal in nature [9,10,11]. Another perspective on this shift is to acknowledge that there is no “free lunch” in any study design, as practical (e.g., participant recruitment) and financial constraints (e.g., costs associated with neuroimaging studies) prevent researchers from achieving the largest possible participant numbers, longitudinal time points, and breadth of assessments [9, 10, 12]. While these are important considerations for neuroimaging, balancing ideal study designs and resource constraints likewise drive many other areas of psychiatric and medical research [13]. Investigators should carefully consider these trade-offs to achieve robust and reproducible psychiatric neuroimaging and ultimately increase potential clinical utility.
In this overview and perspective, we provide a guide to the current landscape of psychiatric neuroimaging study designs with respect to a balance of scientific goals and resource constraints. We outline methodological considerations among common designs, highlighting an emerging global trade-off in within-person precision and between-person generalisability. To conclude, we propose a reciprocal validation model (RVM) that aims to leverage small-sample precision studies and large-sample population studies in a recursive sequence towards evidence accumulation and long-term clinical utility.
Study designs
Common designs in psychiatric neuroimaging can be visualised using a heuristic data cube [14, 15] (Fig. 1). In this illustration, each dimension represents a design choice: study sample size, measurement breadth (e.g., cognitive tasks, self-report scales), and measurement time points (e.g., longitudinal assessments). Owing to the balance of scientific goals and resource constraints, intensive longitudinal studies for example tend to have smaller samples (e.g., single-participant, cohort studies), whereas large sample population studies often have very few, or just a single, assessment time point. Likewise, population studies typically focus on a broad set of potential measures or constructs, whereas targeted single-participant or cohort studies emphasise more precise estimates (i.e., higher reliability and convergent validity) of fewer variables or constructs. Thus, while any combination of design features is, in principle, possible, finite resources lead to prototypical examples that differentially emphasise study design features.
When examined in isolation or through the lens of a specific research question, relative strengths and weaknesses of a given study design may appear as a much-needed emphasis or a fundamental flaw. Recent perspectives have highlighted the importance of large sample sizes, underscoring the value of more inclusive samples and generalisability towards real-world complexity [16,17,18]. Others have recognised that investing in such samples may critically come at the cost of measurement precision and carefully targeted assessments [9, 19] and that even in recent large-scale population samples such inclusivity is not guaranteed [20]. Critically, common designs should support distinct but complementary research questions and inferential goals. While misalignment between research questions and study designs precipitates statistical issues and leads to inappropriate inferences, proper alignment is possible for all designs. Further, acknowledging the high resource demands of neuroimaging, leveraging the relative strengths and complementary insights from multiple designs is essential for improvements to reproducibility and evidence building. To illustrate these points, we consider prototypical examples from three psychiatric neuroimaging designs: single-participant or individualised studies, cohort or group studies, and population-level “big data” studies. These are not meant to exhaustively capture the current state of the literature, but rather serve as an overview of common designs.
The focus here is on psychiatric neuroimaging, operationalised as studies examining neural markers of treatment, longitudinal course, or correlates of disorders/symptoms. Where relevant, studies of normative brain variability and development are also highlighted as they provide context for clinical presentations and broader consideration of brain-behaviour methods. Further, this work emphasises study design and corresponding methodological and inferential trade-offs, though it necessarily discusses related issues of measurement reliability, validity, and prediction. We discuss these in context and refer the reader to existing literature [18,19,20,21,22,23,24,25].
Single-participant and individualised studies
Single-participant and individualised designs analyse a single participant or a series of individuals over repeated sessions, and often multiple contexts [21, 22]. In this way, they prototypically aim to maximise the time point dimension (z-axis) of the theoretical data cube (Fig. 1—left). This design is akin to a case study in psychiatry or medicine that describes clinical phenomena and their longitudinal course. Key to individualised designs is that the unit of analysis is a single participant that often “serves as their own control” across time, contexts or experimental manipulations. This importantly differs from longitudinal cohorts (see below) where inferences are made at the group level or “on average”.
The major advantages of this design are a potentially high degree of experimental control and flexibility, as measurements and manipulations can be precisely tailored to the individual or phenomena under study. This makes individualised studies potentially amenable to both hard-to-recruit patient populations as well as more complex study protocols. To this end, an emphasis on intensive longitudinal assessments well aligns this design with deep behavioural phenotyping or precision functional neuroimaging approaches that collect large amounts of data within an individual to maximise measurement reliability (i.e., the relative stability or consistency of assessments) [22,23,24] (see also [26]). Together, these strengths make individualised studies particularly well suited for investigating within-person mechanistic insights by establishing interventional and longitudinal links between neuroimaging metrics and symptom course. An investigator interested in brain changes relevant to depression treatment, for example, might use an individualised design with precision neuroimaging before, during, and after treatment using transcranial magnetic stimulation (TMS) to investigate person-specific functional connectivity changes. This is in contrast to a cohort study, where inferences might be drawn about average effects among a group of patients receiving TMS (note that group effects do not necessarily translate to individual-level findings - see [8] and below sections for details), or a population study that might examine variation among typical treatments in standard care (i.e., not determined by the study protocol).
While individualised studies offer the upside of drawing longitudinal inferences on the level of an individual participant, methodological challenges such as low statistical power for inferential statistics, and difficulties with interpreting single-subject effect sizes [21], often mean that generalisation to a broader population needs to be considered. That is, an emphasis on the time point dimension often comes at the cost of a larger sample size that is required to determine whether the results would vary among other patients with the same presentation or as a function of salient sociodemographic or psychiatric factors (Fig. 2). Not all individualised studies will aim for such inferences at the outset (e.g., functional mapping of a specific patient for surgery), but subsequent clinical translation will require demonstrations of generalisability for the developed approach (e.g., person specific TMS targeting; forecasting model of symptom severity) or mechanism (e.g., longitudinal covariation between brain function and symptom course).
Recent empirical examples illustrating the utility of individualised studies in neuroimaging have followed a single individual weekly for 18 months to demonstrate changes in connectivity [25, 27] or daily to study functional changes throughout the menstrual cycle [28]. Most relevant to psychiatry, individualised precision neuroimaging studies have shown that brain network organisation varies significantly between individuals [22], and suggests plasticity even after large lesions early in life [29]. These reports also illustrate a more general claim that holds irrespective of a particular study design: large amounts of functional neuroimaging data are needed to reliably estimate individual brain topography that gets otherwise obscured when averaging across individuals [8, 22]. Ongoing work using multi-echo acquisition suggests opportunities to shorten the requisite time for delineating patient-specific functional neuroanatomy, although still utilises acquisitions longer than what may be typically acquired [30].
When precision individualised designs are paired with an experimental manipulation, as opposed to an observational design, they can offer within-person mechanistic insights. For example, a recent study [31] examined functional neuroimaging alterations during and after immobilisation of the upper extremity in three participants. These findings suggest that spontaneous activity pulses that emerged during arm immobilisation may contribute to maintaining connectivity within disused motor regions. Within psychiatry, recent investigations using densely sampled participants have indicated the impact of inter-subject variability in functional organisation on the effectiveness of transcranial magnetic stimulation (TMS); illustrating that coil placement guided by group average maps will stimulate different functional networks across subjects [32].
Cohort and group studies
Cohort studies (also referred to as group designs) are arguably the predominant psychiatric neuroimaging design (Fig. 1—middle). In contrast to individualised designs, cohort studies make inferences about averages within a well-defined group (e.g., participants with a specific mental health presentation). Cohort studies therefore necessarily have a broader “sampling frame” (i.e., population drawn from to create the study sample: e.g., patients treated at a local clinic) and “target population” (i.e., a larger group that inferences derived from the study ought to generalise to: e.g., young adults with depression) [33, 34] than individualised studies. Typical sample sizes for this design range from tens to hundreds of participants (cf., openneuro.org) which is partly related to the considerable variety of potential cohort designs, including longitudinal cohorts of a single diagnostic group, case-control studies, and those comparing treatments within and/or between individuals [35]. For the current overview and guide to common trade-offs across psychiatric neuroimaging designs, we emphasise a prototypical case that can be viewed as moderately large and moderately longitudinal, relative to other designs (individualised and population studies; see Fig. 1).
Many traditional statistical approaches were developed with group comparisons in mind and are more powered in a cohort study compared to an individualised study when matched for longitudinal assessments and precision [21]. However, due to resource constraints, cohort studies and more standard neuroimaging designs have not historically achieved the same measurement precision and longitudinal time points [36]. Finally, while cohort studies offer higher generalisability compared to single-participant studies, they lack the very large sample sizes and real-world sociodemographic or psychiatric complexity targeted in large population studies (see below).
A major advantage of cohort studies for psychiatric neuroimaging lies in a potential balance of relatively high within-person precision (e.g., measurement reliability, internal validity, and potential for experimental control) while retaining some degree of between-person generalisability beyond a single-participant design (Fig. 2). Cohort studies can thus be well-suited to leverage this balance to characterise varied clinically relevant processes with careful longitudinal assessments, sample selection, and experimental conditions. Similarly to individualised designs, cohort studies can investigate within-person mechanisms by means of experimental manipulations (see also for detailed discussion [37,38,39,40]). Longitudinal and/or intervention-based designs (e.g., lesion mapping, TMS, pharmacological interventions or psychosocial treatments) are particularly well suited for this, as they can determine temporal precedence and track common clinical or biological change. Following the example above, an investigator interested in neural circuits of depression treatment might employ a cohort design with longitudinal neuroimaging to compare treatment and control groups before and after a TMS intervention. Critically, such a study should aim to balance within-person precision and overall sample size.
Cohort designs have been widely used, with varying success, to investigate circuit-level “mechanisms of action” underlying psychiatric treatments such as TMS [41], deep brain stimulation (DBS; [42, 43]) and pharmacological intervention [44]. Cohort designs have likewise been invaluable for characterising average changes in normative brain development during the lifespan period of adolescence when psychiatric disorders first emerge [45,46,47,48,49]. For example, structural neuroimaging in longitudinal developmental cohorts has demonstrated and replicated changes in multiple properties (grey and white matter volume, cortical thickness) by leveraging key design strengths in longitudinal data: relatively high reliability of structural neuroimaging measures (compared to other neuroimaging measures [50]) and large magnitude changes (see [51]). However, we note that to concurrently estimate both average developmental change and variability in development, as in normative modelling approaches, very large population samples are required [52,53,54]. Cohort studies can also be well suited to examine very focused questions among patient groups. For example, owing to the relatively low prevalence rate of many psychiatric disorders within a general population [55], a targeted cohort study, if appropriately powered, may be better equipped to assess heterogeneity within a given diagnostic group, or trajectories over time compared to population-based samples. Especially when paired with high precision, this can allow for individualised insights into underlying neural patterns [56]. Taken together, strengths of existing psychiatric neuroimaging cohort studies most often arise from leveraging longitudinal data with high-precision measurements to track common clinically relevant change (e.g., treatment) or biological change (e.g., development, ageing) [9, 10]. We note, however, that collecting such precision imaging and phenotypic data in clinical cohorts is currently rare [57].
Cohort studies have been significantly less successful at providing reproducible and robust associations between individual differences in measures of brain function and structure and mental health (e.g., associations between functional connectivity and depression symptom severity), and individual differences more generally. This is likely driven by the underlying effect sizes of brain-behaviour associations across the population, which have been shown to be relatively small, compared to traditional effect size thresholds [3, 58, 59]. Moreover, given the inherent complexity of both human behaviour and common non-invasive neuroimaging techniques (e.g., MRI/fMRI), such small effects are not unreasonable. As a result, these challenges in cohort studies may be viewed as a misalignment between the study design and the question of multifaceted individual differences that likely require high sample complexity and generalisability (see below). Even if clearly labelled as exploratory, correlation or prediction analyses in traditional single-site cohort studies with tens of participants are likely to be underpowered and consequentially produce false or inflated positive findings by chance. Cross-sectional individual differences should, therefore, nearly always be avoided in such studies and instead, investigated in a larger dataset using appropriate methods [60]. Nevertheless, the appropriateness of individual difference analyses within cohort studies will depend on the range of inter-individual variability in the behavioural and brain metrics captured in the cohort study. Therefore, investigators must consider the study sampling frame and whether results are expected to generalise beyond the specific sample or clinical group to the broader population when choosing a study design and interpreting brain-behaviour relationships.
Population studies
Described as the intersection of neuroscience and epidemiology, population (or population-based, or sometimes referred to as “big data”) neuroimaging studies have the broadest sampling frame among designs, including a wide range of individuals with various psychiatric presentations (and/or risk factors) and sociodemographic backgrounds [16, 17]. Ideally, population studies accurately represent a complex target population (e.g., nationally representative of adolescents in the United States) through inclusive and well-planned sampling (cf., [34, 61]). Population studies thus emphasise sample size (typically thousands of participants or more) and measurement breadth (X & Y-axes of the heuristic data cube, Fig. 1 - right) and can have higher sample diversity and generalisability (see also [62]). Owing to the recurring theme of balancing scientific goals with resource constraints, however, this often comes at the expense of the number of longitudinal time points and measurement precision relative to other designs (Fig. 2). Likewise, while population studies can, in theory, be interventional (e.g., phase 3 clinical trial in medicine), in neuroimaging they have most often been observational.
Large and relatively more complex samples make population studies particularly well-suited to study multifaceted psychiatric, sociodemographic, and neuroimaging inter-individual differences. In contrast, the typical use of observational designs with few longitudinal assessments renders these studies less suitable for studying within-person mechanisms of longitudinal symptom course or treatment. As a result, population designs in psychiatric neuroimaging often emphasise observational, and frequently cross-sectional, correlations with psychiatric phenotypes or behavioural traits. An investigator might use a population sample to develop a neuroimaging-based diagnostic biomarker for depression, leveraging the larger and more inclusive (relative to other designs) sample to accurately integrate and validate multiple behavioural and neuroimaging risk factors.
The use of population samples towards the potential development of diagnostic biomarkers and cross-sectional prediction of behavioural traits has rapidly increased with the relatively recent arrival of multiple large-scale open neuroimaging datasets (for an overview see: [63, 64]). While most of these efforts are new and several studies have shown critical challenges with leveraging population neuroimaging in psychiatry, some approaches have shown early promise. For instance, results from an international biomarker challenge indicate that patients with Autism Spectrum Disorder can be reliably classified using structural and functional imaging with accuracy many fold larger than with genotyping [65]. Outside of diagnostic biomarker discovery, prediction approaches can be utilised to uncover robust multivariate associations between neuroimaging and behaviour [18, 66, 67]. Data aggregation efforts integrating population samples, as well as independent investigator-initiated studies, have also significantly advanced neural models of multiple psychiatric diagnoses (e.g., [68, 69]). The relatively broad and more inclusive sampling in population studies further aligns well with a focus on continuous transdiagnostic dimensions that span normative to clinical ranges [70, 71]. Perhaps most notably, large population samples have provided unparalleled resources towards methods development and opportunities for replication and reproducibility studies (e.g., [3, 18, 51, 66, 72,73,74,75,76,77,78,79]).
Despite the strengths of population studies, they are, of course, not without limitations. Favouring measurement breadth and sample size for several phenomena (e.g., all mental health disorders) often leads to a relative loss of within-participant measurement precision and depth for specific phenomena (e.g., carefully curated convergent indicators of depression). This raises concerns for measurement reliability and validity as well as efforts to directly apply inferences from population studies to more targeted, individualised or cohort designs. For example, observable population-based effects (e.g., from studies sampling individuals without consideration for a given psychiatric trait) may be smaller than those found with pre-defined “extreme group” comparisons where participants are selected based on pre-screening [80]. Thus, effects from standard population studies, with characteristically broad sampling frames (i.e., independent of a specific psychiatric symptom) that span normative to clinical ranges, may not directly generalise to patient cohort studies. However, this raises questions of inclusivity for the broader population outside of extreme groups, and whether relatively increased effects in extreme groups should be seen as an initial building block for evidence accumulation [80, 81] or as a potential artefact of dichotomising dimensions (see [82, 83]). A further challenge for population studies is that confound control at the level of study design, which is typical in cohort studies (e.g., by matching control groups), is not feasible in large observational studies that, instead, require careful post hoc confound control [84]. Similarly, heterogeneity within diagnostic categories [85] likely becomes more evident in large datasets with population sampling, as participant selection is intentionally broad [16, 17]. Finally, a focus on individual differences in population studies may magnify more general challenges to symptom measurement in psychiatry (cf., [86, 87]), as current population studies generally incorporate existing scales that often lack validation for such population sampling. Additionally, many cognitive and behavioural measures have been optimised for minimal individual differences and maximal group or context (e.g., task contrast) differences, which compromise reliability (e.g., [88]). As a consequence, such assessments will not be ideal for assessing inter-individual differences in brain function and behaviour due to attenuation by reliability, even with large population samples [89, 90].
A reciprocal validation model for building evidence across complementary designs
Even when carefully considering the relative strengths and weaknesses of individualised, cohort, and population study designs, clarity surrounding the decision to prioritise one over another can be elusive. Furthermore, comparing inferences across designs can be challenging, as group or population-level associations do not necessarily translate to individual-level mechanisms and changes observed at the individual level may not correspond to (or may even go in opposite directions as) those at the level of the whole sample (cf. ecological fallacy [91]). As a field, psychiatric neuroimaging currently lacks codified stages of evidence accumulation that shape such design choices and inferential goals in other resource-demanding areas. Clinical trials in broader medicine, as well as biomarker development in cancer research, for example, follow formal steps (cf., nih.gov; [92]), where new study designs, analogous to the reviewed psychiatric neuroimaging designs, are sequentially required for accumulating complementary evidence with prior success. Conversely, in less resource-demanding research areas (e.g., behavioural sciences) hybrid study designs, emphasising multiple axes of the data cube, and multiple simultaneous inferences, including those requiring multiple designs are more feasible. In this final section, we propose a reciprocal validation model (RVM) for how common psychiatric neuroimaging designs may be sequenced to accumulate evidence, leverage relative strengths, and build towards improved long-term clinical utility (Fig. 3).
An RVM sequence can start with observations from individualised and cohort studies that often prioritise intensive longitudinal data from relatively small sample sizes. While there are key distinctions among these designs (see preceding sections), both can provide within-person mechanistic insights that are essential for evidence building. For example, individualised and cohort designs that acquire a large amount of neuroimaging acquisition time and/or time points may identify novel longitudinal brain-behaviour associations (e.g., neural changes in brain region “A” track depression treatment). However, the extent to which the observed results generalise to large more complex samples may remain unclear. Therefore, subsequently testing aspects of this observation (e.g., the link between brain region “A” and depression symptoms) in a new or existing large, population study (in parallel and/or in addition to replication with a similar design) can clarify generalisability across key participant-level factors (e.g., attenuation of the association by parental income). This “test of generalisation” might then mirror stages of evidence accumulation in other related fields, such as in clinical trial development - where stage 3 identifies an effect in a well-controlled cohort study of hundreds, while stage 4 tests the effect in a large population sample of thousands. Recent work in broader neuroimaging has demonstrated the utility of this approach, with observations from targeted individual and cohort studies being tested for replication and generalisation in large population datasets [93]. Similarly, existing big datasets may be utilised as a bridge between group and population designs, as they can allow for subgroup analyses of a specific psychiatric condition as well as population-level individual differences.
An RVM sequence can also start with population studies that often prioritise large sample sizes at the cost of (relative to other designs) minimally longitudinal designs. For example, population samples may identify a neuroimaging feature or (more likely) a set of features that relate to individual differences in psychiatric symptoms in a cross-sectional sample (e.g., whole-brain multi-network connectivity with depression). Emphasising the large number of participants, these observations may have relatively high generalisability across participant-level features at the cost of within-person precision (measurement reliability and potential experimental control) and clearer evidence towards causality (temporal precedence and modifiability). Working to subsequently test findings from population studies in intensive longitudinal, and potentially interventional, individualised and cohort designs, offers a complementary inference towards temporal precedence and modifiability of the neural correlate. For example, an investigator may examine whether a neuroimaging spatial pattern predictive of depression symptoms, derived from a population study, changes in the context of depression treatment in a targeted cohort. Following a recursive sequence, this process could continue with further reciprocal testing between designs.
Recent concerns for rigour and reproducibility in psychiatric neuroimaging, underscore a goal to move from relatively early stages of evidence accumulation, “exploration”, to more advanced stages of “validation” (Fig. 3) or “confirmation” (see also [94, 95] for discussion). Use of the same general design with a new sample or new measures represents clear examples of such validation (“replication”; cf., [96] for a longer discussion on this terminology), as does assessing associations out-of-sample and, even more importantly, out-of-dataset in clinical prediction studies. Importantly, evidence accumulation in psychiatric neuroimaging will critically benefit from reciprocal validation and leveraging complementary designs in a recursive sequence. While reciprocal validation may not be feasible or warranted for all results (e.g., those with marginal significance or low replicability across studies within a given design [see Fig. 3]), an established framework via RVM can provide clarity on the current relative evidence accumulation.
Taken together, complementary designs with reciprocal validation can guide evidence accumulation towards long-term clinical utility. Resource constraints and varying inferential goals prevent a single study design or research team from simultaneously emphasising all axes of the data cube and advancing neuroimaging clinical utility in all manners of intervention, individualised tracking of longitudinal symptom course, or population-level clinical prediction. Nevertheless, appropriate alignment of such translational goals to specific study designs, each with inherent methodological trade-offs, and subsequent reciprocal validation with alternate approaches can provide a clearer path forward. While the long-term clinical utility of psychiatric neuroimaging remains unknown, a formalised evidence accumulation framework, like the proposed reciprocal validation model, is essential to organise these efforts and quantify progress.
Conclusion
Psychiatric neuroimaging faces challenges in rigour and reproducibility that prompt reconsideration of the relative strengths and limitations of current designs. As reviewed through a heuristic data cube, a balance of scientific goals and resource constraints leads common psychiatric neuroimaging designs to differentially emphasise sample size, the number of measures or constructs, and the number of time points assessed. Investigators must be familiar with such trade-offs to ensure an appropriate alignment between research questions, designs and analyses. We emphasise a resulting global trade-off among common designs in within-person precision (relatively high in individualised, moderate in cohort studies, low in population studies) and between-person generalisability (relatively high in population studies, moderate in cohort studies, low in individualised studies). A proposed reciprocal validation model (RVM) aims to recursively leverage complementary designs in sequence to accumulate evidence, optimise relative strengths, and build towards improved long-term clinical utility.
References
Noble S, Spann MN, Tokoglu F, Shen X, Constable RT, Scheinost D. Influences on the test–retest reliability of functional connectivity mri and its relationship with behavioral utility. Cereb Cortex. 2017;27:5415–29.
Milham MP, Vogelstein J, Xu T. Removing the reliability bottleneck in functional magnetic resonance imaging research to achieve clinical utility. JAMA Psychiatry. 2021;78:587–8.
Marek S, Tervo-Clemmens B, Calabro FJ, Montez DF, Kay BP, Hatoum AS, et al. Reproducible brain-wide association studies require thousands of individuals. Nature 2022;603:654–60.
Nour MM, Liu Y, Dolan RJ. Functional neuroimaging in psychiatry and the case for failing better. Neuron 2022;110:2524–44.
Karvelis P, Paulus MP, Diaconescu AO. Individual differences in computational psychiatry: a review of current challenges. Neurosci Biobehav Rev. 2023;148:105137.
Botvinik-Nezer R, Holzmeister F, Camerer CF, Dreber A, Huber J, Johannesson M, et al. Variability in the analysis of a single neuroimaging dataset by many teams. Nature 2020;582:84–88.
Woo C-W, Chang LJ, Lindquist MA, Wager TD. Building better biomarkers: brain models in translational neuroimaging. Nat Neurosci. 2017;20:365–77.
Kraus B, Zinbarg R, Braga RM, Nusslock R, Mittal VA, Gratton C. Insights from personalized models of brain and behavior for identifying biomarkers in psychiatry. Neurosci Biobehav Rev. 2023;152:105259.
Gratton C, Nelson SM, Gordon EM. Brain-behavior correlations: two paths toward reliability. Neuron 2022;110:1446–9.
Tervo-Clemmens B, Marek S, Barch DM. Tailoring psychiatric neuroimaging to translational goals. JAMA Psychiatry. 2023;80:765–6.
Laumann TO, Zorumski CF, Dosenbach NU. Precision neuroimaging for localization-related psychiatry. JAMA Psychiatry. 2023;80:763–4.
Ooi LQR, Orban C, Nichols TE, Zhang S, Tan TWK, Kong R, et al. MRI economics: balancing sample size and scan duration in brain wide association studies. 2024:2024.02.16.580448.
March JS, Silva SG, Compton S, Shapiro M, Califf R, Krishnan R. The case for practical clinical trials in psychiatry. AJP 2005;162:836–46.
Revelle W. Personality structure and measurement: the contributions of Raymond Cattell. Br J Psychol. 2009;100:253–7.
De Ribaupierre A, Lecerf T. On the importance of intraindividual variability in cognitive development. J Intell. 2018;6:17.
Tiemeier H, Muetzel R. Population Neuroscience. In: Taylor E, Verhulst FC, Wong J, Yoshida K, Nikapota A, editors. Mental Health and Illness of Children and Adolescents, Singapore: Springer; 2020. p. 1–22.
Paus T. Population neuroscience: why and how. Hum Brain Mapp. 2010;31:891–903.
Tervo-Clemmens B, Marek S, Chauvin RJ, Van AN, Kay BP, Laumann TO, et al. Reply to: Multivariate BWAS can be replicable with moderate sample sizes. Nature 2023;615:E8–E12.
Rosenberg MD, Finn ES. How to establish robust brain–behavior relationships without thousands of individuals. Nat Neurosci. 2022;25:835–7.
Ricard JA, Parker TC, Dhamala E, Kwasa J, Allsop A, Holmes AJ. Confronting racially exclusionary practices in the acquisition and analyses of neuroimaging data. Nat Neurosci. 2023;26:4–11.
Smith JD. Single-case experimental designs: a systematic review of published research and current standards. Psychol Methods. 2012;17:510.
Gordon EM, Laumann TO, Gilmore AW, Newbold DJ, Greene DJ, Berg JJ, et al. Precision functional mapping of individual human brains. Neuron 2017;95:791–807.
Gratton C, Laumann TO, Nielsen AN, Greene DJ, Gordon EM, Gilmore AW, et al. Functional brain networks are dominated by stable group and individual factors, not cognitive or daily variation. Neuron 2018;98:439–52.
Schmiedek F, Lövdén M, Lindenberger U. Hundred days of cognitive training enhance broad cognitive abilities in adulthood: Findings from the COGITO study. Frontiers in Aging. Neuroscience 2010;2:27.
Poldrack RA, Laumann TO, Koyejo O, Gregory B, Hover A, Chen M-Y, et al. Long-term neural and physiological phenotyping of a single human. Nat Commun. 2015;6:8885.
Demeter DV, Greene DJ. The promise of precision functional mapping for neuroimaging in psychiatry. Neuropsychopharmacol. 2024. https://doi.org/10.1038/s41386-024-01941-z.
Laumann TO, Gordon EM, Adeyemo B, Snyder AZ, Joo SJ, Chen M-Y, et al. Functional system and areal organization of a highly sampled individual human brain. Neuron 2015;87:657–70.
Pritschet L, Santander T, Taylor CM, Layher E, Yu S, Miller MB, et al. Functional reorganization of brain networks across the human menstrual cycle. NeuroImage 2020;220:117091.
Laumann TO, Ortega M, Hoyt CR, Seider NA, Siegel JS, Nguyen AL, et al. Brain network reorganisation in an adolescent after bilateral perinatal strokes. Lancet Neurol. 2021;20:255–6.
Lynch CJ, Power JD, Scult MA, Dubin M, Gunning FM, Liston C. Rapid precision functional mapping of individuals using multi-echo fMRI. Cell Reports. 2020;33.
Newbold DJ, Laumann TO, Hoyt CR, Hampton JM, Montez DF, Raut RV, et al. Plasticity and spontaneous activity pulses in disused human brain circuits. Neuron 2020;107:580–9.
Lynch CJ, Elbau IG, Ng TH, Wolk D, Zhu S, Ayaz A, et al. Automated optimization of TMS coil placement for personalized functional network engagement. Neuron 2022;110:3263–77.
Krause M, Lutz W, Boehnke JR. The role of sampling in clinical trial design. Psychother Res. 2011;21:243–51.
Tyrer S, Heyman B. Sampling in epidemiological research: issues, hazards and pitfalls. BJPsych Bull. 2016;40:57–60.
Samet JM, Muñoz A. Evolution of the cohort study. Epidemiol Rev. 1998;20:1–14.
Noble S, Scheinost D, Constable RT. A decade of test-retest reliability of functional connectivity: a systematic review and meta-analysis. Neuroimage 2019;203:116157.
Marinescu IE, Lawlor PN, Kording KP. Quasi-experimental causality in neuroscience and behavioural research. Nat Hum Behav 2018;2:891–8.
Vaidya AR, Pujara MS, Petrides M, Murray EA, Fellows LK. Lesion studies in contemporary neuroscience. Trends Cogn Sci. 2019;23:653–71.
Siddiqi SH, Kording KP, Parvizi J, Fox MD. Causal mapping of human brain function. Nat Rev Neurosci. 2022;23:361–75.
Ross LN, Bassett DS. Causation in neuroscience: keeping mechanism meaningful. Nat Rev Neurosci. 2024;25:81–90.
Philip NS, Barredo J, Aiken E, Carpenter LL. Neuroimaging mechanisms of therapeutic transcranial magnetic stimulation for major depressive disorder. Biol Psychiatry Cogn Neurosci Neuroimaging. 2018;3:211–22.
Ashkan K, Rogers P, Bergman H, Ughratdar I. Insights into the mechanisms of deep brain stimulation. Nat Rev Neurol. 2017;13:548–54.
Hollunder B, Ostrem JL, Sahin IA, Rajamani N, Oxenford S, Butenko K, et al. Mapping dysfunctional circuits in the frontal cortex using deep brain stimulation. Nat. Neuroscience. 2024:27:573–86.
Wall MB, Harding R, Zafar R, Rabiner EA, Nutt DJ, Erritzoe D. Neuroimaging in psychedelic drug development: past, present, and future. Mol Psychiatry. 2023;28:3573–80.
Shulman EP, Smith AR, Silva K, Icenogle G, Duell N, Chein J, et al. The dual systems model: review, reappraisal, and reaffirmation. Developmental Cogn Neurosci. 2016;17:103–17.
Luna B, Wright C. Adolescent brain development: Implications for the juvenile criminal justice system. 2016. 2016.
Casey BJ, Getz S, Galvan A. The adolescent brain. Dev Rev. 2008;28:62–77.
Steinberg L. A dual systems model of adolescent risk-taking. Dev Psychobiol. 2010;52:216–24.
Tervo-Clemmens B, Quach A, Calabro FJ, Foran W, Luna B. Meta-analysis and review of functional neuroimaging differences underlying adolescent vulnerability to substance use. NeuroImage 2020;209:116476.
Hedges EP, Dimitrov M, Zahid U, Vega BB, Si S, Dickson H, et al. Reliability of structural MRI measurements: the effects of scan session, head tilt, inter-scan interval, acquisition sequence, FreeSurfer version and processing stream. Neuroimage 2022;246:118751.
Bethlehem RA, Seidlitz J, White SR, Vogel JW, Anderson KM, Adamson C, et al. Brain charts for the human lifespan. Nature 2022;604:525–33.
Rutherford S, Kia SM, Wolfers T, Fraza C, Zabihi M, Dinga R, et al. The normative modeling framework for computational psychiatry. Nat Protoc. 2022;17:1711–34.
Bučková BR, Fraza C, Rehák R, Kolenič M, Beckmann C, Španiel F, et al. Using normative models pre-trained on cross-sectional data to evaluate longitudinal changes in neuroimaging data. 2023:2023.06.09.544217.
Tervo-Clemmens B, Calabro FJ, Parr AC, Fedor J, Foran W, Luna B. A canonical trajectory of executive function maturation from adolescence to adulthood. Nat Commun. 2023;14:1–17.
Steel Z, Marnane C, Iranpour C, Chey T, Jackson JW, Patel V, et al. The global prevalence of common mental disorders: a systematic review and meta-analysis 1980-2013. Int J Epidemiol. 2014;43:476–93.
Seitzman BA, Gratton C, Laumann TO, Gordon EM, Adeyemo B, Dworetsky A, et al. Trait-like variants in human functional brain networks. Proc Natl Acad Sci USA. 2019;116:22851–61.
Lynch CJ Jr, Elbau I, Ng T, Ayaz A, Zhu S, Manfredi N, et al. Expansion of a frontostriatal salience network in individuals with depression. bioRxiv. 2023:2023–08.
Owens MM, Potter A, Hyatt CS, Albaugh M, Thompson WK, Jernigan T, et al. Recalibrating expectations about effect size: A multi-method survey of effect sizes in the ABCD study. PloS One. 2021;16:e0257535.
Liu S, Abdellaoui A, Verweij KJ, van Wingen GA. Replicable brain–phenotype associations require large-scale neuroimaging data. Nature Human. Behaviour 2023;7:1344–56.
Varoquaux G, Poldrack RA. Predictive models avoid excessive reductionism in cognitive neuroimaging. Curr Opin Neurobiol. 2019;55:1–6.
Heeringa SG, Berglund PA. A guide for population-based analysis of the Adolescent Brain Cognitive Development (ABCD) Study baseline data. BioRxiv. 2020. 2020.
Marek S, Laumann TO. Replicability and generalizability in population psychiatric neuroimaging. Neuropsychopharmacol. 2024. https://doi.org/10.1038/s41386-024-01960-w.
Laird AR. Large, open datasets for human connectomics research: considerations for reproducible and responsible data use. Neuroimage 2021;244:118579.
Jahanshad N, Lenzini P, Bijsterbosch J. Current best practices and future opportunities for reproducible findings using large-scale neuroimaging in psychiatry. Neuropsychopharmacol. 2024. https://doi.org/10.1038/s41386-024-01938-8.
Traut N, Heuer K, Lemaître G, Beggiato A, Germanaud D, Elmaleh M, et al. Insights from an autism imaging biomarker challenge: Promises and threats to biomarker discovery. NeuroImage 2022;255:119171.
Spisak T, Bingel U, Wager TD. Multivariate BWAS can be replicable with moderate sample sizes. Nature 2023;615:E4–E7.
Eickhoff SB, Langner R. Neuroimaging-based prediction of mental traits: road to utopia or Orwell? PLoS Biol. 2019;17:e3000497.
Thompson PM, Stein JL, Medland SE, Hibar DP, Vasquez AA, Renteria ME, et al. The ENIGMA Consortium: large-scale collaborative analyses of neuroimaging and genetic data. Brain Imaging Behav. 2014;8:153–82.
Norman LJ, Sudre G, Price J, Shaw P. Subcortico-cortical dysconnectivity in ADHD: a voxel-wise mega-analysis across multiple cohorts. AJP. 2024:appi.ajp.20230026.
Cuthbert BN, Insel TR. Toward the future of psychiatric diagnosis: the seven pillars of RDoC. BMC Med. 2013;11:126.
Kotov R, Krueger RF, Watson D, Achenbach TM, Althoff RR, Bagby RM, et al. The hierarchical taxonomy of psychopathology (HiTOP): a dimensional alternative to traditional nosologies. J Abnorm Psychol. 2017;126:454–77.
Greene AS, Constable RT. Clinical promise of brain-phenotype modeling: a review. JAMA Psychiatry. 2023;80:848–54.
Dhamala E, Yeo BTT, Holmes AJ. One size does not fit all: methodological considerations for brain-based predictive modeling in psychiatry. Biol Psychiatry. 2022. https://doi.org/10.1016/j.biopsych.2022.09.024.
Easley T, Chen R, Hannon K, Dutt R, Bijsterbosch J. Population modeling with machine learning can enhance measures of mental health - Open-data replication. Neuroimage: Rep. 2023;3:100163.
Hermosillo RJ, Moore LA, Feczko E, Miranda-Domínguez Ó, Pines A, Dworetsky A, et al. A precision functional atlas of personalized network topography and probabilities. Nat Neurosci. 2024;27:1000–13.
Byington N, Grimsrud G, Mooney MA, Cordova M, Doyle O, Hermosillo RJ, et al. Polyneuro risk scores capture widely distributed connectivity patterns of cognition. Dev Cogn Neurosci. 2023;60:101231.
He T, An L, Chen P, Chen J, Feng J, Bzdok D, et al. Meta-matching as a simple framework to translate phenotypic predictive models from big to small data. Nat Neurosci. 2022;25:795–804.
Greene AS, Shen X, Noble S, Horien C, Hahn CA, Arora J, et al. Brain–phenotype models fail for individuals who defy sample stereotypes. Nature 2022;609:109–18.
Winter NR, Leenings R, Ernsting J, Sarink K, Fisch L, Emden D, et al. Quantifying deviations of brain structure and function in major depressive disorder across neuroimaging modalities. JAMA Psychiatry. 2022;79:879–88.
Kang K, Seidlitz J, Bethlehem RA, Xiong J, Jones MT, Mehta K, et al. Study design features that improve effect sizes in cross-sectional and longitudinal brain-wide association studies. bioRxiv. 2023. 2023.
Amanat S, Requena T, Lopez-Escamez JA. A systematic review of extreme phenotype strategies to search for rare variants in genetic studies of complex disorders. Genes 2020;11:987.
Preacher KJ, Rucker DD, MacCallum RC, Nicewander WA. Use of the extreme groups approach: a critical reexamination and new recommendations. Psychol Methods. 2005;10:178.
Fisher JE, Guha A, Heller W, Miller GA. Extreme-groups designs in studies of dimensional phenomena: Advantages, caveats, and recommendations. J Abnorm Psychol. 2020;129:14.
Komeyer V, Eickhoff SB, Grefkes C, Patil KR, Raimondo F. A framework for confounder considerations in AI-driven precision medicine. 2024:2024.02.02.24302198.
Feczko E, Fair DA. Methods and challenges for assessing heterogeneity. Biol Psychiatry. 2020;88:9–17.
Flake JK, Fried EI. Measurement schmeasurement: questionable measurement practices and how to avoid them. Adv Methods Pr Psychological Sci. 2020;3:456–65.
Fried EI, Flake JK, Robinaugh DJ. Revisiting the theoretical and methodological foundations of depression measurement. Nat Rev Psychol 2022;1:358–68.
Hedge C, Powell G, Sumner P. The reliability paradox: Why robust cognitive tasks do not produce reliable individual differences. Behav Res. 2018;50:1166–86.
Gell M, Eickhoff SB, Omidvarnia A, Küppers V, Patil KR, Satterthwaite TD, et al. the burden of reliability: how measurement noise limits brain-behaviour predictions. 2024:2023.02.09.527898.
Nikolaidis A, Chen AA, He X, Shinohara R, Vogelstein J, Milham M, et al. Suboptimal phenotypic reliability impedes reproducible human neuroscience. 2022:2022.07.22.501193.
Piantadosi S, Byar DP, Green SB. The ecological fallacy. Am J Epidemiol. 1988;127:893–904.
Pepe MS, Etzioni R, Feng Z, Potter JD, Thompson ML, Thornquist M, et al. Phases of biomarker development for early detection of cancer. J Natl Cancer Inst. 2001;93:1054–61.
Gordon EM, Chauvin RJ, Van AN, Rajesh A, Nielsen A, Newbold DJ, et al. A somato-cognitive action network alternates with effector regions in motor cortex. Nature 2023;617:351–9.
Tukey JW. We need both exploratory and confirmatory. Am Statistician. 1980;34:23–25.
Fife DA, Rodgers JL. Understanding the exploratory/confirmatory data analysis continuum: moving beyond the “replication crisis”. Am Psychol. 2022;77:453.
Goodman SN, Fanelli D, Ioannidis JPA. What does research reproducibility mean? Sci Transl Med. 2016;8:341ps12.
Funding
National Institutes of Health: DA057486 (Tervo-Clemmens), MH129616 (Laumann), MH130894 (Noble). American Psychological Foundation Visionary Grant (Tervo-Clemmens). University of Minnesota Institute for Translational Neuroscience (Tervo-Clemmens). Taylor Family Institute for Innovative Psychiatric Research (Laumann). Deutsche Forschungsgemeinschaft (DFG, German Research Foundation): 269953372/GRK2150, 69953372/GRK2150, EI816/11-1 (Gell).
Author information
Authors and Affiliations
Contributions
Conceptualisation: BT-C, MG. Writing-Original Draft: MG, BT-C, SN, TOL, SMN. Writing-Review & Editing: BTC, MG, SN, TOL, SMN. Supervision. BT-C.
Corresponding authors
Ethics declarations
Competing interests
TOL holds a patent for taskless mapping of brain activity licensed to Sora Neurosciences and a patent for optimising targets for neuromodulation, implant localisation, and ablation is pending. TOL is a consultant for Turing Medical Inc. which commercialises Framewise Integrated Real-Time Motion Monitoring (FIRMM) software. These interests have been reviewed and managed by Washington University in St. Louis in accordance with its Conflict of Interest policies. SMN is a consultant for Turing Medical Inc. which commercialises Framewise Integrated Real-Time Motion Monitoring (FIRMM) software. This interest has been reviewed and managed by the University of Minnesota in accordance with its Conflict of Interest policies. The other authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gell, M., Noble, S., Laumann, T.O. et al. Psychiatric neuroimaging designs for individualised, cohort, and population studies. Neuropsychopharmacol. (2024). https://doi.org/10.1038/s41386-024-01918-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41386-024-01918-y