The modern Diagnostic and Statistical Manual of Mental Disorders recognizes at least 256 unique symptom presentations that meet criteria for Major Depressive Disorder (MDD) [1]. For most of recorded history, scholars have recognized that depression is a disabling syndrome that comes in many forms [2,3,4]. Multiple factors contribute to diagnostic heterogeneity. Depression frequently presents with anxiety disorders, substance use disorders, ADHD, and other psychiatric co-morbidities, and exhibits strong sexual dimorphism in its prevalence and molecular correlates [5,6,7]. Individuals also present with markedly varying temporal dynamics in the frequency and duration of their depressive episodes and the stability of remission [8].

Lacking better alternatives, most studies to date have tended to treat depression as a unitary disease entity. More recently, investigators have recognized that this approach may be an obstacle to progress in research. This is especially true for efforts to identify the neurobiological mechanisms that give rise to depression, which may not be the same for all individuals, and for optimizing existing antidepressants and developing fundamentally new ones, since distinct pathophysiological mechanisms could warrant different types of treatment [9,10,11]. Indeed, diagnostic heterogeneity may be an important obstacle to achieving better outcomes in a clinical setting. On average, 11 years elapse between the onset of mental illness and the initiation of treatment [12, 13]. The majority of patients do not achieve full remission after an initial treatment and a substantial portion are treatment-resistant [14,15,16]. Untreated depression, in turn, has substantial personal and societal costs, affecting 17.7 million people in the U.S. in 2018 [17] and costing the global economy 1 trillion USD in lost productivity annually [18]. Depression is a leading cause of disability worldwide [19], and is associated with a 40% higher lifetime risk of developing cardiovascular and metabolic diseases [20] and worse outcomes for comorbid medical conditions [21,22,23,24]. In 2018, suicide due to mental illness was the second leading cause of death among 10- to 34-year-olds and the tenth cause of death overall in the United States [25]. Depression is a leading risk factor for suicide [26, 27], and over 2/3 of suicides in the United States occur in patients with depression [28].

Thus, there is a pressing need for studies aimed at understanding the neurobiological basis of heterogeneity in depression, delineating subtype-specific mechanisms, and developing new treatment strategies. Historically, most efforts to define depression subtypes have searched for clusters of symptoms that tend to co-occur in subgroups of patients and tested for neurophysiological correlates. More recently, investigators have attempted a converse approach, subgrouping or dimensionally stratifying patients on the basis of neuroimaging and other biological measures and validating the resulting models based on their ability to predict clinical symptoms, treatment outcomes, or other clinical variables. With the advent of extremely large genome-wide association studies in depression, geneticists have been able to identify multiple genetic variants, each conferring a small increase in depression risk (with many odds ratios <1.2) [29,30,31,32,33]. This highly polygenic inheritance pattern could be another important contributor to heterogeneity.

The goals of this work are to highlight some of the major causes of heterogeneity in depression; to consider how heterogeneity could impede progress in elucidating the neurobiological mechanisms of depression; to review progress on biologically informed subtyping approaches focused primarily on functional neuroimaging; and to discuss efforts from related fields to integrate neuroimaging analyses with genomic and transcriptomic data. We begin by reviewing clinical approaches to subtyping depression and consider how subtyping based on quantitative neuroimaging measures could compliment these approaches and generate testable mechanistic hypotheses. We provide a brief overview of dimensional, categorical, and hybrid approaches to parsing diagnostic heterogeneity in depression based on functional neuroimaging data, and refer readers to recent reviews from our group [34] and others providing more detailed accounts of this topic [35, 36]. Next, we review several approaches to integrating data from neuroimaging and genetics in order to advance our understanding of how genetic risk variants and molecular signaling pathways contribute to pathophysiology and heterogeneity in depression. Many of these methods have only recently been applied in efforts to parse depression heterogeneity and are fraught with potential technical obstacles. Therefore, we have attempted to highlight important technical challenges, along with emerging views on how to overcome them. We conclude by considering especially promising areas for future research, including the neurobiological basis of sex differences and episodic temporal dynamics and the potential for reverse translational approaches to test subtype-specific mechanisms generated from human neuroimaging studies.

Symptom-based subtypes of depression

Historical Perspective

For as long as scholars have been studying and writing about depression, they have recognized that it comes in different forms and may have many distinct causes. Hippocrates identified a “melancholic temperament” associated with symptoms of moroseness, despondency, and fear. It was thought to be caused by an excess of black bile, one of the four humors (blood, yellow bile, black bile, phlegm) that were thought to underlie most medical illnesses at the time [37]. Robert Burton’s The Anatomy of Melancholy, another seminal historical work, underscored distinct psychosocial causes of melancholy and depression [4]. Beginning in the 19th century, psychiatrists began to identify different forms of depression, including Kahlbaum differentiating dysthymia from the fluctuating mood of cyclothymia and Kraepelin differentiating depression from manic-depressive psychosis and dementia praecox [38, 39]. One of the first modern attempts to standardize depression nosology was with the introduction of “neurotic depression” in the DSM-II (1968), and subsequently the recognition in the DSM-III (1980) that episodic depression was not a personality disorder and was distinct from dysthymia [40, 41]. At the same time, motivated in part by an effort to devise diagnostic categories with increased reproducibility and inter-rater reliability, the DSM-III and -IV defined broad categories of depression—including the modern definition for MDD that requires a patient to present with five or more of nine possible symptoms. This definition yields 256 unique symptom combinations (126 five-symptom combinations, 84 six-symptom combinations, 36 seven-symptom combinations, etc.), and comprises a very large and highly heterogeneous diagnostic category that encompasses >17 million people annually in the United States alone [17]. With increasing interest in psychopharmacological treatments, investigators in the 1980s began to systematically evaluate the causes of heterogeneity in depression and their neurobiological substrates.

Sources of heterogeneity

Investigators now recognize multiple sources of depression heterogeneity. First, as noted above, there is symptom heterogeneity: there are at least 256 unique symptom combinations that meet the criteria for MDD, and some of those criteria are themselves heterogeneous. For example, MDD patients could be sleeping too little or too much; losing weight or gaining weight; and suffering from psychomotor agitation or retardation—opposing symptoms that suggest distinct mechanisms.

Second, like most psychiatric disorders, depression commonly co-occurs with other conditions. About 75% of MDD patients have at least one comorbid neuropsychiatric illness, most commonly generalized anxiety disorder, post-traumatic stress disorder, various phobias, and substance use disorders, but also including obsessive compulsive disorder, attention-deficit hyperactivity disorder and personality disorders [6, 7, 16]. These high co-morbidity rates suggest that transdiagnostic mechanisms may contribute to the biology underlying heterogeneous symptom presentations not only in depression, but also in other closely related disorders. Importantly, depression and other psychiatric diagnoses are also unstable over time: in one recent study [42] involving 1037 participants tracked from birth to age 45, 85% of participants diagnosed with at least one psychiatric disorder also had multiple comorbidities, and there was substantial flux not only between closely related psychiatric conditions (e.g. MDD and generalized anxiety disorder), but also between seemingly disparate families of disorders (e.g. internalizing and externalizing disorders). Remarkably, among the 1037 participants, this report identified 692 unique psychiatric disorder lifetime trajectories, and 605 (87.4%) were unique to just one person. These findings underscore the need for human neuroimaging studies to recruit samples that include a range of related diagnoses and to follow subjects longitudinally. However, as discussed in more detail below, it is also critical to think carefully about subject recruitment strategies and to quantify current mood state, as the mechanisms that give rise to specific symptoms and behaviors and enable their persistence may fluctuate with mood state.

Third, patients differ widely in variables related to the longitudinal course of their illness, including the age of onset (adolescence vs. young adulthood vs. late-life), the frequency of episodes, and the temporal dynamics of episodic changes in depression and euthymia (dysthymia, ultradian cycling, episode duration, durability of remission, “kindling” and cycle acceleration, etc.)—all of which could have a neurobiological basis [8, 43,44,45,46,47]. For example, the age of onset is associated with depression severity and recurrence rate and may be linked to different biological processes [46, 48,49,50,51,52,53,54]. Fourth, sex is one of the most important risk factors for depression and may contribute to diagnostic heterogeneity [5, 6], but the mechanisms are not yet well defined. Fifth, the mechanisms underlying “spontaneous” depressive episodes may also differ from those that can be linked to exposure to a specific antecedent psychosocial stressor such as a history of adversity, trauma, or abuse. The brain’s response to psychosocial stress reflects the allostatic load (lifetime accumulation of the physiological costs of biological responses to stressors) and the long-term psychobiological effects of stressors [55,56,57]. It follows that the same stressor may elicit heterogeneous neurobiological effects in different individuals as a function of their lifetime history, including exposure to known risk factors such as early childhood neglect or abuse, financial insecurity, discrimination, income inequality, and other socioeconomic disparities [58,59,60,61,62]. Table 1 outlines the relative risks associated with some of the most important risk factors.

Table 1 Risk factors for major depressive disorder (MDD).

Subtyping on clinical symptom profiles

Pioneering early efforts to define depression subtypes and parse diagnostic heterogeneity tended to subgroup patients with distinct co-occurring symptom patterns and then test for neurobiological correlates of these subgroups, which could potentially be used as biomarkers for aiding diagnosis or predicting treatment outcomes [63, 64]. The most clinically significant example is of course the distinction between unipolar and bipolar depressive episodes, which are often clinically indistinguishable but are associated with distinct risk factors, clinical trajectories, and treatment response profiles [65]. Some of the most influential examples of unipolar subtypes include melancholic depression, atypical depression, seasonal depression, and psychotic depression. Melancholic depression is characterized by anhedonia, psychomotor disturbances, and persistent nonreactivity of mood, and has been associated with more severe depressive episodes and a history of childhood abuse [66, 67]. Atypical depression is defined by heightened mood reactivity and rejection sensitivity and the presence of other “atypical” symptoms, including hyperphagia, hypersomnia, and “leaden paralysis” [68]. Seasonal depression, a highly influential concept that has seeped into the public lexicon, is characterized by depressive episodes coinciding with different seasonal transitions, most commonly in winter. It is associated with specific disturbances in sleep, circadian rhythms, and melatonin cycling and may be especially responsive to phototherapy and melatonin-derivatives [69,70,71]. Psychotic depression is a severe form of unipolar depression thought to affect ~5% of patients, characterized by hallucinations, delusions, guilty rumination, and feelings of worthlessness [72, 73] and associated with alterations in circadian glucocorticoid rhythms and a potentially increased responsiveness to glucocorticoid-targeting augmentation strategies [74, 75]. A recent meta-analysis supports the distinction of psychotic depression from nonpsychotic depression, showing that the rate of psychotic episodes was consistently and stably elevated in the former groups across twelve studies between 1981 and 2015 [76].

These pioneering efforts to define subtypes based on clinical symptoms yielded multiple important insights into the neurobiological basis of heterogeneity in depression, and they have also raised several unexpected and important questions. First, some studies suggest that the specific clinical symptoms that a patient experiences in one episode tend to vary over time, and consequently, patients may transition between multiple subtypes over the course of a lifetime and possibly within a single episode [34, 77,78,79,80,81,82]. Some studies report that the melancholic, agitated, and atypical depression subtypes are less stable over time than psychotic and severity-based subtypes [76,77,78,79]. Second, some promising leads notwithstanding [69, 75], it has been challenging to identify neuroimaging, neuroendocrine, or other blood-based biomarkers that can be used for diagnosing these subtypes on an individual level.

Third, despite some promising findings for psychotic and seasonal depression [70, 74], it has been difficult overall to use these subtypes to reliably predict who will respond best to which antidepressants. For example, the iSPOT-D Trial found that there were no differences in treatment response between melancholic, atypical, anxious, and mixed subtypes and found significant overlap between the criteria used to classify individuals into different subtypes [83]. Thus, while these symptom-based subtypes have been the focus of important research and some have been described in the current and previous versions of the DSM, they are not widely used in general clinical practice for informing treatment decisions. This limitation applies equally to the initial results of brain-derived subtype discovery efforts described below: widespread clinical utilization will require robust evidence and prospective replication of the ability to reliably inform clinical decision making.

Of note, recent data-driven approaches to symptom-based clustering and treatment prediction have achieved some promising early results. In one such study involving Quick Inventory of Depressive Symptomatology and Hamilton Depression Rating Scale data from >4700 patients [84], the authors identified three symptom clusters associated with insomnia, core emotional symptoms, and atypical depressive symptoms (n.b. This study focused on symptom clusters, not clusters of patients based on their symptoms). They replicated this clustering solution in a second sample and found that antidepressants were most effective for core emotional symptoms. Interestingly, high-dose duloxetine was superior to escitalopram in treating core emotional symptoms with an effect size of 2.3 HAM-D points. In another large-scale study [85], the authors identified seven treatment response clusters comprising patients with differing trajectories of antidepressant response over time (e.g. fast and complete response vs. slow and incomplete response). These subgroups were predictable based on baseline clinical symptoms, personality traits, mood episode duration, and life events. Data mining of medical records, which leverages breakthroughs in natural language processing and extremely large sample sizes, is another promising symptom-based approach to discovering novel subtypes of depression [86], predicting disease severity [87], and potentially informing treatment selection.

Toward brain-based depression subtypes

Motivated in part by the challenges described above, recent studies have sought to parse heterogeneity in depression with a “brain-first” approach, developing novel dimensional rating and subtyping models based on neurobiological measures and then validating them by testing their ability to predict clinical symptoms, behaviors, and treatment outcomes. One way to achieve this aim, inspired by the NIMH’s Research Domain Criteria Initiative (RDoC), is to search for multivariate combinations of brain measures that explain individual differences in clinical symptoms across a continuous spectrum—a dimensional approach to parsing heterogeneity. A second option is to search for data-driven subgroups of patients that differ with respect to depression-related brain measures—a categorical approach. A third option is a hybrid approach that integrates dimensional modeling of individual differences with data-driven clustering on these dimensions. Here, we provide a brief review of dimensional, categorical, and hybrid approaches to parsing diagnostic heterogeneity (Fig. 1), focusing on functional neuroimaging data. For additional details on these three approaches, see a recent review from our group [34]. For comprehensive reviews of other methods, see [35, 36].

Fig. 1: Approaches to parsing heterogeneity in depression.
figure 1

Schematic illustrating two data-driven approaches to parsing heterogeneity in depression described in the text using a “top-down” symptom-based approach (right, blue) or a “bottom-up” brain-based mechanistic approach (left, green). The goals of both approaches are to advance our understanding of neurobiological mechanisms underlying depression-related symptoms and behaviors and to develop new tools for informing diagnosis and treatment decisions.

Dimensional approaches

The NIMH RDoC Initiative was designed to overcome some of the obstacles inherent in studying psychiatric disorders that exhibit high comorbidity rates and have a weak correspondence to their biological substrates by integrating multiple units of analysis including self-reported symptoms, objective behaviors, neurophysiology, circuit function, cellular and molecular neurobiology, and genetics [10, 11]. Instead of searching for neurobiological features that differentiate a heterogeneous group of MDD patients from healthy controls, investigators seek to develop models that predict individual differences in a clinically relevant domain (e.g. “anhedonia”) across multiple diagnostic groups. In accord with this perspective, dimensional approaches to parsing heterogeneity in depression develop models for predicting individual differences in specific clinical symptom domains across a continuous spectrum, based on neuroimaging measures of brain structure and function. In this vein, Xia et al. identified brain-behavior dimensions of psychopathology based on fMRI measures of functional connectivity that predicted specific combinations of psychiatric symptoms and behaviors [88]. The study included a broad sample of 663 participants aged 8–22 years old from the Philadelphia Neurodevelopmental Cohort (PNC). They performed sparse canonical correlation analysis (sCCA) on functional connectivity measures and clinical items from a structured clinical interview (see overview schematic in Fig. 2a). CCA is a multivariate dimensionality reduction technique that identifies linear combinations of functional connectivity features that maximally correlate with linear combinations of clinical features. In sCCA, elastic net regularization is used to prevent overfitting and ensure a more interpretable model based on a sparse set of brain features. Their sCCA analysis identified four brain-behavior dimensions, predicting individual differences in 37 out of the 111 clinical symptoms studied (Fig. 2b–i). Although the analysis was purely data driven (i.e. unbiased by hypothetical priors), the four dimensions mapped onto clinically intuitive combinations of mood, psychosis, fear, and externalizing symptoms. Interestingly, each brain-behavior dimension described coherent clinical features and was correlated with symptoms present in several other clinical diagnostic categories. The results support the hypothesis that shared patterns of atypical functional circuitry underlie multiple neuropsychiatric disorders.

Fig. 2: Transdiagnostic psychopathology brain connectivity-behavior dimensions.
figure 2

a Schematic of the analytical pipeline depicting the calculation of functional connectivity matrices from the Pearson correlations between the average BOLD fMRI signals for each of 264 spherical regions of interest (ROI) and every other ROI. Sparse canonical correlation (sCCA) was used to define linear combinations of clinical symptoms across a range of psychiatric diagnoses (lower panel) that were maximally correlated with linear combinations of functional connectivity. Psychopathology domains: psychotic and subthreshold symptoms (PSY), depression (DEP), mania (MAN), suicidality (SUI), attention-deficit hyperactivity disorder (ADD), oppositional defiant disorder (ODD), conduct disorder (CON), obsessive-compulsive disorder (OCD), separation anxiety (SEP), generalized anxiety disorder (GAD), specific phobias (PHB), mental health treatment (TRT), panic disorder (PAN), post-traumatic stress disorder (PTSD). ROI communities: somatosensory/motor network (SMT), cingulo-opercular network (COP), auditory network (AUD), default mode network (DMN), visual network (VIS), fronto-parietal network (FPT), salience network (SAL), subcortical network (SBC), ventral attention network (VAT), dorsal attention network (DAT), cerebellar. be Four brain-behavior dimensions of psychopathology were identified that captured individual differences in clinical symptoms. Scatter plots depict brain connectivity and clinical dimension scores, which are linear combinations of functional connectivity features and psychiatric symptoms. b Dimension 1 described mood-related symptoms, e.g. feeling sad. c Dimension 2 described psychosis-related symptoms, e.g. auditory hallucinations. d Dimension 3 described fear-related symptoms, e.g. fear of traveling. e Dimension 4 described externalizing behavior and related symptoms, e.g. trouble following instructions. fi Network module connectivity patterns associated with each brain connectivity-behavior dimension. Heatmaps depict the magnitude and direction of correlation change to each brain-behavior dimension score (positively- or negatively correlated) in the following functional networks: default mode network (DMN), visual network (VIS), fronto-parietal network (FPT), salience network (SAL), ventral attention network (VAT), dorsal attention network (DAT). Abbreviations: a.u., arbitrary units. Figure adapted with permission from ref. [88].

In a second example of this approach, Mihalik et al. [89] investigated how psychopathology dimensions relate to changes in the patterns of functional connectivity during brain development in a mostly healthy sample of adolescents. The study included 306 participants aged 14–24, but most had relatively few clinical symptoms and only 25 were clinically depressed. They used CCA to identify two brain-behavior modes of covariation. The first mode described externalizing/internalizing behaviors and was strongly associated with sex. The second mode described emotional well being vs. distress and was independent of sex. Interestly, both brain-behavior modes were associated with age, but in opposite directions, and each mode was correlated with distinct patterns of functional connectivity that related to adolescent brain development. The first mode was positively correlated with age and attentional and frontoparietal networks, but negatively correlated with limbic and intra-subcortical connections. The second mode was negatively correlated with age and connectivity in frontoparietal control and attention networks, but positively correlated with limbic and subcortical connectivity. One limitation with respect to understanding depression heterogeneity is that only 25 of the 306 participants were significantly depressed, which could bias the results. Still, the results provide insight into the neurobiological mechanisms underlying developing psychopathology in adolescents and young adults, and a qualitative correspondence with some of the findings from the study by Xia et al. [88].

Importantly, these are just two examples of a dimensional approach to understanding diagnostic heterogeneity by modeling neurobiological predictors of individual differences in behavior. This approach is anchored in a much larger body of literature that has sought to identify neurobiological correlates at the group level of specific behaviors and symptoms that transcend conventional diagnostic boundaries. Multiple transdiagnostic meta-analyses are now available. For example, Zhang et al. tested for transdiagnostic correlates of three anhedonia subdomains [90]. They found that in patients with MDD or schizophrenia, consummatory and anticipatory anhedonia were linked to decreased activity in the basal ganglia and abnormalities in fronto-striatal networks including the dorsal anterior cingulate and dorsolateral prefrontal cortex, respectively. Other meta-analyses have identified transdiagnostic substrates of dysfunction in cognitive control [91], cognitive reappraisal [92], emotion regulation [93], and meta-cognition [94].

Categorical approaches

While the studies above parsed diagnostic heterogeneity across a continuous spectrum of psychopathology, an alternative strategy is to cluster subjects into categorical subgroups, which may have some advantages for clinicians, who tend to think in terms of categorical heuristics. Price et al. [95] used task-based functional MRI scans from 68 depressed patients and 24 never-depressed control subjects and clustered patients using a community detection algorithm based on brain activity during a positive mood induction task. Their analysis focused on measures derived from the default mode network, cognitive control network, and ventral affective network. They identified two subgroups distinguished by functional properties of the ventral affective network and other ventral cortical areas and by diagnosis. The subgroup with hyperactivity in ventral regions included 81% of depressed patients, but only 50% of never-depressed controls. In a follow-up study, Price and colleagues used resting state functional MRI scans from the same 68 depressed subjects and an additional 12 depressed patients [96]. Similar to the first study, they performed network analysis of functional connectivity patterns and used community detection analysis to cluster patients into subgroups. They identified two depression subgroups that differed with respect to sex, as well as patterns of atypical functional connectivity in the default mode network. Interestingly, abnormal functional connectivity in the default mode network that was detected in subgroup B was not evident when all depressed subjects were compared to healthy controls, highlighting the utility of clustering patients into subgroups with more homogeneous neurobiological profiles. A separate study by Feder and colleagues [97] clustered on functional connectivity of depressed individuals and healthy controls and also identified two clusters, though there were no strong differences in clinical symptoms between subgroups.

Hybrid approaches

Importantly, dimensional and categorical approaches are not mutually exclusive. In a hybrid approach, Drysdale and colleagues used a large-scale multi-site resting state fMRI dataset (N = 458 depressed patients and N = 730 healthy controls) to identify two brain-behavior dimensions that predicted individual differences in specific depressive symptoms and then clustered patients with treatment-resistant depression along these two dimensions [98]. The first dimension predicted anhedonia symptoms based on frontostriatal and orbitofrontal connectivity, while the second dimension predicted anxiety and insomnia based on amygdala, ventral hippocampus, ventral striatum, and lateral prefrontal cortex connectivity. Hierarchical clustering on these two dimensions identified four patient subgroups that differed in patterns of atypical connectivity (Fig. 3a) and in their clinical symptom profiles (Fig. 3b, c). Subgroup differences in anhedonia, anxiety, and other clinical symptoms were confirmed in an independent replication sample. Subsequent work [99] in a more clinically heterogeneous sample (N = 187 patients with a history of mixed mood or anxiety disorders, 64.2% in remission) showed that this approach tends to overfit, a problem that can be solved by using L2 regularization to improve the robustness and stability of canonical correlations in held-out data and may be sensitive to subjects’ mood state at the time of scanning [100] (see Box 1 for additional details). Another important form of validation came from a subsequent analysis testing for subgroup differences in antidepressant response. This analysis showed that these four clusters predicted treatment response to repetitive transcranial magnetic stimulation (TMS) targeting the dorsomedial prefrontal cortex, and that TMS response was correlated with functional connectivity patterns involving the left amygdala, left dorsolateral prefrontal cortex, and dorsomedial prefrontal cortex (Fig. 3d–f). These results define just one solution to parsing diagnostic heterogeneity in depression that will surely be improved upon in future studies employing larger samples and richer clinical phenotyping. As discussed below, it will also be critical to evaluate whether these treatment predictive algorithms are sufficiently accurate to improve treatment outcomes in a prospective randomized controlled trial.

Fig. 3: Brain connectivity-behavior dimensions of depression define novel depression subtypes that predict treatment response to TMS.
figure 3

a Four rsfMRI-based subtypes of depression, identified through hierarchical clustering on latent brain-behavior dimensions (canonical connectivity–symptom components), exhibit distinct patterns of atypical functional connectivity. Heatmaps depict the z score from a Wilcoxon rank sum test for differences between the functional connectivity of depressed subjects in each subtype and of healthy controls. b The four subtypes were associated with distinct clinical symptom profiles as indexed by item-level responses to the Hamilton Depression Rating Scale. c Boxplots depicting subtype differences in depression severity. de Patients in Subtypes 1 and 3 were more likely to respond to rTMS targeting the dorsomedial prefrontal cortex, compared to patients in Subtypes 2 and 4. f Distinct functional connectivity patterns prior to treatment in rTMS-responders vs. nonresponders. Heatmap depicts functional connectivity features that were significantly different in responders, including connectivity between the dorsomedial prefrontal target and the left dorsolateral prefrontal cortex and left amygdala. Figure adapted with permission from Ref. [98]. ACC anterior cingulate cortex; amyg amygdala; a.u. arbitrary units; COTC cingulo-opercular task-control network; DAN dorsal attention network; DLPFC dorsolateral prefrontal cortex; DMN default-mode network; DMPFC dorsomedial prefrontal cortex; FPTC frontoparietal task-control network; GP globus pallidus; HAMD Hamilton Depression Rating Scale; HC hippocampus; lat PFC lateral prefrontal cortex; LIMB limbic; M1 primary motor cortex; NAcc nucleus accumbens; OFC orbitofrontal cortex; PCC posterior cingulate cortex; PPC posterior parietal cortex; precun precuneus; rTMS repetitive transcranial magnetic stimulation; rsfMRI resting state functional MRI; SM primary sensorimotor cortex (M1 or S1); SS1 primary somatosensory cortex; SN salience network; subC subcortical; thal, thalamus; VAN ventral attention network; vis visual cortex; VLPFC ventrolateral prefrontal cortex; vStr ventral striatum; n.s. not significant.

In a second study utilizing a hybrid approach, Tokuda et al. [101] used unsupervised Bayesian co-clustering of a high-dimensional and multimodal dataset of 134 subjects (N = 67 depressed patients). The study included measurements of functional connectivity, gene expression data, clinical symptoms, and a number of other biological measures. They identified three subtypes that differed in functional connectivity patterns, especially in the angular gyrus and other default mode network brain regions, as well as trauma history and treatment response to SSRIs. This study is remarkable not only for these interesting subgroups differences but also for the fact that it is among the first to attempt to integrate gene expression data—a promising future direction reviewed in more detail below.

Technical challenges and potential solutions

While these results highlight the potential of neuroimaging data for parsing diagnostic heterogeneity and identifying novel subtypes of depression, it is also important to recognize key technical challenges inherent in these approaches. First, it is challenging to directly compare the studies reviewed above and identify clear points of convergence due to differences in methodology, patient samples, and clustering criteria. This will be a critical goal for future studies as this area of research matures. To aid in the synthesis of convergent findings across studies, Table 2 provides a summary of the results from the studies discussed above and identifies two preliminary areas of convergence in the clinical correlates of data-driven subtypes. Second, functional neuroimaging data are susceptible to various artifacts, particularly those related to head motion, respiratory patterns, and local changes in blood gases [102,103,104,105,106,107]. Third, efforts to study diagnostic heterogeneity and discover novel subtypes invariably benefit from large samples, which often require integrating data from multiple scanners, which can introduce additional artifacts that can be difficult to control [108,109,110]. Fourth, while there are obvious benefits to analyzing a heterogeneous sample and adopting an inclusive recruitment strategy, it is still critical to think carefully about how clinical sample characteristics influence the results of a specific study. Fifth, complex multivariate analyses of high-dimensional datasets are prone to pitfalls including overfitting and false positives, and benefit from stringent controls and other strategies for minimizing these problems. In Box 1, we discuss potential solutions to these four technical challenges. For a more detailed discussion, see ref. [34].

Table 2 Neurophysiological and clinical correlates of candidate brain-based depression subtypes.

Toward molecular mechanisms: integrating neuroimaging and genomic data

As noted above, genetic factors are another likely contributor to diagnostic heterogeneity in depression, and there is increasing interest in integrating genetic and neuroimaging data to understand how genetic risk variants influence brain structure and function (Fig. 4). These efforts are often motivated by the premise that depression-related genetic variants may manifest in endophenotypes—neurobiological changes that can be measured with neuroimaging and other tools [111]. These intermediate neuroimaging endophenotypes may be more closely linked to genetic risk variants than the highly heterogeneous depression diagnosis and could therefore provide a promising avenue for understanding how genetic factors modulate pathophysiological processes. Family, adoption, and twin studies, which can be used to separate the influence of genetic versus environmental factors, have shown that depression is moderately heritable: ~35–40% of an individual’s risk for unipolar depression is attributable to genetic variation [112, 113], and bipolar depression is ~60–85% heritable [114]. Like most psychiatric disorders, depression exhibits a highly polygenic inheritance pattern, such that an individual’s risk is thought to be determined by complex interactions between a large number of genetic variants and factors like early life adversity, psychosocial stress, and other environmental variables [111, 115,116,117]. Further complicating these efforts, the lifetime prevalence of depression is an order of magnitude higher than psychiatric conditions such as schizophrenia and autism, such that much larger sample sizes are required for identifying robust and replicable effects [30]. Recently, extremely large-scale genome-wide association (GWA) studies have identified multiple genetic variants conferring increased risk with genome-wide significance [31, 33, 118, 119]. A recent meta-analysis involving 135,458 cases and 344,901 controls identified 44 independent loci corresponding to 153 genes associated with excitatory neurotransmission, synapse function, and dendritic spines [31]. Of note, six of these loci were shared with the 108 risk variants identified in schizophrenia GWA studies, supporting transdiagnostic overlap in the genetic etiology of these diseases [31, 120]. An even larger meta-analysis involving 414,055 cases and 892,299 controls, identified 102 risk variants, 87 of which were significant after correcting for multiple comparisons, and the findings supported the transdiagnostic overlap with schizophrenia found previously [33]. While the heritability of depression has been found to be ~37% in twin studies, it is estimated that currently GWAS only capture ~9% of genetic variation [29]. Furthermore, since MDD is more prevalent and has a considerably lower heritability rate than other psychiatric illnesses such as schizophrenia (81% heritability) [121], some studies estimate that much larger samples sizes, on the order of 100,000 to 1 million MDD cases and matched controls, may be needed to have sufficient power to detect the degree of risk that individual genetic variants confer to overall depression risk [29, 30, 122]. Mirroring developments in depression genetics, efforts to understand how genetic risk variants influence brain structure and function focused initially on candidate genes and more recently on confirmed GWA risk variants and transcriptomic data. Below, we review findings from each approach as well as emerging consensus views on statistical challenges and best practices for implementing these analyses.

Fig. 4: Integrating neuroimaging and genetic data to uncover intermediate endophenotypes and novel depression subgroups.
figure 4

Schematic of how combining neuroimaging with genetic data can be used to parse heterogeneity in depression and uncover subgroups within the depressed population. Polygenic variation may manifest in intermediate behavior-related brain circuits that can give rise to distinct depression subgroups. The polygenic effects of risk variants on circuit dysfunction, cognition, behavior, and clinical symptoms may interact with each other. Genes indicated are examples of candidate depression risk variants from Table 3 with the known variation and locus in the genome indicated and depicted on the chromosome strand in yellow. 1. Indicates examples of brain circuits known to be dysfunctional in depression. 2. Lists examples of cognitive processes and behaviors that are altered in depression, possibly as a direct or indirect consequence of genetic risk variants. 3. Lists clinical symptoms of depression that may result directly or indirectly from dysfunction in depression brain circuits. The double-sided arrows indicate the bi-directional relationships between 1, 2, and 3 that may modulate the expression of intermediate phenotypes.

Candidate gene studies

Prior to the discovery of confirmed risk variants in large-scale GWA studies, early attempts to integrate genetic and neuroimaging data focused on candidate genes that were hypothesized to confer depression susceptibility. Excellent reviews are available elsewhere [123,124,125,126]. Here, we focus on highlighting findings from the two most frequently studied examples of this approach: common polymorphisms associated with the serotonin transporter (5-HTTLPR) and brain derived neurotrophic factor (BDNF) genes.

The 5-HTTLPR “long” and “short” alleles are degenerate repeat polymorphisms in a regulatory region linked to the serotonin transporter gene promoter. The short allele is associated with reduced expression of the serotonin transporter and reduced serotonin uptake in lymphoblasts [127]. It has also been associated with neuroticism [128] and increased susceptibility to depression in response to stressful life events in some epidemiological cohorts and studies [129, 130] but not in others [131]. In multiple functional neuroimaging studies, this polymorphism has been found to modulate emotion regulation, social cognition, and amygdala reactivity to emotionally salient stimuli, and functional interactions between the anterior cingulate cortex and amygdala [123, 132,133,134]. Structural neuroimaging studies have reported genotype differences in hippocampal volume and white matter microstructure in frontolimbic areas [135, 136].

The BDNF Val66Met polymorphism is a single nucleotide substitution in codon 66 of the BDNF prodomain that disrupts activity-dependent BDNF release [137]. BDNF has been implicated in stress resilience and in ketamine’s antidepressant mechanism of action [138,139,140], and the Val66Met polymorphism is associated with increased anxiety-related behavior that is resistant to fluoxetine treatment in transgenic knock-in mice [141]. In human neuroimaging studies, Met carriers exhibit modest deficits in episodic memory performance and hippocampal activity [137], and reduced hippocampal volume in some [142] but not all studies [143].

Investigators have also recognized several important limitations of the candidate gene approach in neuroimaging genetics [124]. Methodological differences across studies complicate efforts to identify replicated findings. Two recent meta-analyses of structural neuroimaging studies involving the 5-HTTLPR and BDNF Val66Met polymorphisms found that most changes in hippocampal volume did not reach statistical significance across studies [125, 126]. However, reductions in hippocampal volume in BDNF Met allele carriers in bipolar disorder was a consistently replicated finding [125], and exploratory secondary analyses identified consistent increases in activity and gray matter volume in the prefrontal cortex [126]. Likewise, carriers of the 5-HTTLPR “short” allele exhibited consistent alterations in white matter microstructure in the corpus callosum and superior longitudinal fasciculus. The authors also identified multiple factors that could contribute to inconsistent results across studies, including gender distribution, medication status, and perhaps most importantly, relatively small sizes (typically ~35–45 subjects per genotype). Another important limitation is that recent studies indicate that many candidate gene associations may be false positives and do not appear to contribute significantly to depression risk with genome-wide significance in very large cohorts [126, 130, 144].

Polygenic risk

More recently, neuroimaging studies have begun to leverage findings from large-scale depression GWA studies. As noted above, at least 44 genetic variants associated with 153 genes have been identified [31], but none is highly penetrant. Understanding how these common SNPs interact to modulate depression pathophysiology is thus a major challenge for the field. One approach to this problem is to investigate how polygenic risk modulates brain structure and function by calculating a polygenic risk score (PRS), which measures the risk for a phenotype attributable to the accumulation of multiple genetic variants [145,146,147]. (For an excellent review on polygenic risk scores and detailed guidelines on how to calculate PRS quality control measures and other best practices, see ref. [148].) While multiple studies have used this approach for other diseases [149,150,151,152], its application in depression is relatively new. However, the early results are promising. For example, motivated by the fact that depressed patients frequently present with working memory deficits, Yüksel et. al. investigated whether polygenic risk for depression also modulated working memory performance and brain activity in 137 healthy individuals [153]. Brain activation patterns in the right prefrontal cortex and other working-memory related brain areas were modulated by polygenic risk for depression (Fig. 3a–c). This suggests that depression-related genetic variants could modulate depression risk by acting in part on working memory functions—a potential endophenotype. Alternatively, polygenic risk could modulate depression risk directly, and working memory deficits could emerge as one possible consequence (but not necessarily a 100% determined outcome) of depression pathophysiology. Dissociating these two models of causality would require additional data on whether working memory deficits are more or less heritable than depression; whether they precede the onset of depression and persist in a remitted mood state; and whether they occur in first-degree relatives of depressed probands [154].

A second study tested whether polygenic risk for anhedonia modulates brain structure and function using the UK Biobank dataset [155]. They first performed a novel GWAS using the 375,724 UK Biobank participants and identified 11 novel loci for anhedonia. Using the anhedonia GWAS results, they tested for genetic overlap with GWAS results from other psychiatric disorders [156], and found that polygenic risk for anhedonia was correlated with polygenic risk for major depressive disorder, schizophrenia, and bipolar disorder, but not for OCD or Parkinson’s Disease. Next, they tested whether anhedonia PRS scores predicted individual differences in brain structure and functional activity during an fMRI emotional face processing task. They found that increased polygenic risk for anhedonia was not associated with functional activity, but was associated with reduced volume in the orbitofrontal cortex and other brain areas involved in reward-processing, and with alterations in white matter microstructure as indexed by diffusion tractography (Fig. 5a–c). These promising findings notwithstanding, how genetic risk variants modulate depression-related pathophysiology is largely unknown. Table 3 synthesizes findings from recent efforts to link candidate genes and GWA-confirmed risk variants with specific depression-related behaviors and neurophysiological functions. As above, it will be critical to replicate these results in prospective cohorts.

Fig. 5: Polygenic risk scores for anhedonia predict psychiatric neuroimaging phenotypes and spatial patterns of gene expression for schizophrenia risk genes predict schizotypy-associated myelination.
figure 5

a–c Polygenic risk scores (PRS) for anhedonia were associated with a. regional volumes of cortical and subcortical regions of interest, b tract-specific fractional anisotropy (measure of axonal integrity and myelination), and c tract-specific mean diffusivity (measures of structural integrity in the intra- and extracellular space, neuropil, and global CSF). d Partial least squares (PLS) analysis linear combinations of genes whose spatial expression patterns co-localized with schizotypy-associated myelination patterns as indexed by an MRI magnetization transfer measure. Genes with larger “PLS1 weight” values were more important predictors of the spatial distribution of SRM myelination. e Positively weighted PLS1 genes were associated with genes known to be down-regulated in schizophrenia (“Gandal Down-Reg” and “Fromer Down-Reg”), neuron cell types, and increased SRM myelination. f Negatively weighted PLS1 genes were associated with genes known to be up-regulated in schizophrenia (“Gandal Up-Reg” and “Fromer Up-Reg”), decreased SRM myelination, and astrocyte, microglia, and neuron cell types. FDR false discovery rate; PLS partial least squares; SRM schizotypy-related magnetization; Up/Down-Reg Up/Down-Regulated. Figure adapted with permission from refs. [155] (ac) and [160] (df).

Table 3 Functional roles for candidate risk genes for major depressive disorder (MDD).

Gene expression and transcriptomics

A third approach that is becoming increasingly common is to investigate how regional differences in gene expression predict disease-related changes in brain structure and function. Although gene expression data are obtained post-mortem and are therefore not directly compatible with human neuroimaging in vivo, public databases are now making gene expression data available to the scientific community. The Allen Human Brain Atlas (AHBA) is one prominent example, including brain-wide microarray samples collected postmortem from 3702 brain regions in six neurotypical adult brains (N = 1 female, N = 5 male). Four contain samples from only one hemisphere while two brains have samples on both hemispheres and also RNA-seq data. Recent work indicates that regional differences in gene expression in the AHBA predict human functional network architecture [157].

To date, no studies have used AHBA gene expression data in combination with functional neuroimaging measures in depressed subjects, but this is a promising approach that has been applied to functional and structural imaging in other psychiatric disorders [158,159,160]. In one elegant example, Romero and colleagues tested whether regional differences in the expression of schizophrenia genetic risk variants predict regional differences in brain structure and schizotypal traits [160]. Their analysis combined gene expression data from the AHBA with an MRI magnetization transfer measure of intracortical myelination in 2135 healthy individuals (age 14–25 years old), who also completed a schizotypy self-report questionnaire. First, they quantified schizotypy-related magnetization by correlating schizotypal personality scores with age-corrected magnetization transfer measurements from 68 brain regions of interest. To compare the schizotypy-related magnetization to the spatial distribution of gene expression, they used a multivariate technique called partial least squares analysis (PLS). PLS was used to define a linear combination of schizotypy-related magnetization measures that maximally covaries with brainwide measurements of gene expression from the AHBA atlas. They identified two significant brain structure/gene expression components, and an enrichment analysis found that genes positively co-located with schizotypy-related magnetization were enriched for genes that are downregulated in schizophrenia and neurons, confirming a significant overlap with genes known to confer schizophrenia risk in this healthy normative sample (Fig. 5d, e). Other examples of this approach have begun to examine whether regional differences in gene expression explain structural abnormalities in psychosis [159], functional changes associated with early life trauma and conversion disorder [158], adolescent brain development [161], functional network organization [162], and structural changes in autism [163]. Together, these studies demonstrate the potential for neuroimaging gene expression analyses to reveal insights into how genetic risk variants influence brain structure and function in depression.

Technical challenges and potential solutions

Efforts to integrate genetic and neuroimaging data are a promising avenue for future research, but they can be logistically complicated, because ideally, they benefit from access to large datasets. Thus, they can be hindered by limited access to datasets that include both a large number of subjects and functional MRI data, behavioral measures, and genotypes. Functional MRI, genomic, and transcriptomic datasets typically include tens of thousands of variables and a comparatively small number of subjects. Functional MRI scans can be composed of over one million 3D voxels over many time volumes, and GWAS datasets often include millions of SNPs. In contrast, until recently, datasets that include both genotype data and fMRI data have typically included only a few dozen subjects. However, the growing availability of collaborative, large-scale, publicly accessible datasets like the UK Biobank will help to mitigate this problem [164, 165]. Likewise, multiple gene expression databases now exist, including the Brainspan database and the Allen Human Brain Atlas of microarray gene expression data [166, 167]. Another important obstacle is that these analyses can be statistically and technically complicated. In Box 2, we highlight important statistical considerations for preprocessing gene expression data and implementing partial least squares (PLS) analyses integrating neuroimaging and gene expression data.

Conclusions and future research directions

Depression is a highly heterogeneous psychiatric syndrome bearing a weak correspondence to its neurobiological and genetic substrates. Efforts to parse diagnostic heterogeneity in terms of continuous brain-behavior dimensions and to discover neurobiologically driven subtypes of depression hold substantial promise. These studies have the potential to uncover clinically useful dimensions and subtypes and to provide a framework for developing novel approaches to diagnosing depression subtypes, predicting clinical outcomes, and selecting treatments.

Above, we reviewed various approaches to integrating neuroimaging and genetic datasets, with a focus on candidate gene studies, polygenic risk scores, and transcriptomic analyses. There are a number of other approaches to combining gene expression data with functional neuroimaging data, including univariate approaches and sparse canonical correlation analysis. With all these approaches, it is important to recognize that these are discovery-oriented exploratory analyses, and thus descriptive, not predictive [168]. Replication studies in independent datasets are critical [124]. Future studies will benefit from datasets involving larger samples and incorporating information about brain-based depression dimensions and subtypes. With the growing availability of public databases, open-source statistical methods, and increasing computational power, neuroimaging genetics will increasingly be able to reveal insights into the molecular machinery that gives rise to depression pathophysiology. These findings, in turn, could be used for formulating mechanistic hypotheses and testing them in animal models [100] or in human subjects using innovative lesion mapping approaches and non-invasive brain stimulation [169,170,171,172]. In the long term, these methods could potentially translate to personalized pharmacological and circuit-based therapeutics. We conclude by highlighting five especially promising priorities for future research.

Defining sexually dimorphic mechanisms

Women are about twice as likely as men to be diagnosed with depression [6, 16, 173], indicating that sex may be an important contributor to diagnostic heterogeneity and that pathophysiological mechanisms may not be identical in men and women [174]. Some existing studies support this hypothesis. For example, a recent study examined the transcriptional correlates of depression in postmortem brain tissue derived from men or women and tested for sex-specific convergence in a chronic stress model in mice [5]. They found that stress susceptibility in mice and depression in humans were associated with largely non-overlapping transcriptional signatures in males vs. females. In both mice and patients, Dusp6—a gene encoding dual specificity phosphatase 6, a regulator of the ERK signaling pathway—was identified as a hub gene among differentially expressed genes in the ventromedial prefrontal cortex in female MDD patients and stressed female mice. Downregulating Dusp6 in the vmPFC was sufficient to increase stress susceptibility in female mice, but not in males. This study suggests that there are sex-specific pathophysiological mechanisms underlying depression in men and women. Future studies will surely benefit from examining how sexually dimorphic processes contribute to depression heterogeneity.

Longitudinal studies of disease course

Depression is a fundamentally episodic mental illness, but most studies to date have been cross-sectional. There is substantial heterogeneity in the temporal dynamics of depressive episodes across individuals [8], but the neurobiological bases of these differences are not well understood. Longitudinal studies, aided by advances in neuroimaging [175,176,177], will be critical for investigating the mechanisms mediating mood state transitions, rapid cycling, and highly recurrent depressions vs. single depressive episodes.

Reverse translation: testing mechanistic hypotheses in animal models

Neuroimaging models of depression heterogeneity and PLS approaches can be used to formulate hypotheses about how distinct neurobiological mechanisms contribute to specific symptoms and behaviors, at both the molecular and neural circuit levels [159, 160, 178]. Optogenetics, chemogenetics, and viral manipulations of specific signaling pathways, in turn, could be used to test subtype-specific mechanisms in animal models [100, 179,180,181].

Validating novel subtyping solutions

With renewed interest in rethinking our approach to diagnosing depression, it will be essential to validate candidate subtyping solutions and identify points of convergence. Initial analyses will be optimized for inference: i.e. identifying genetic, transcriptomic, and neurophysiological features that interact to explain individual differences in depression-related symptoms and behaviors. As the field matures, it will be important to refine and replicate these models and optimize them for prediction: i.e. optimizing model parameters to allow for accurate predictions of symptoms, behaviors, or clinical outcomes in held-out data and in prospective replication samples. Well validated models would be reproducible in new datasets contingent on predetermined criteria related to data quality, subject samples, and other issues (see Box 1), and they should either advance our understanding of depression neurobiology or have some clinical utility. Clinically useful models might facilitate treatment decisions (e.g. selecting between two antidepressants or predicting treatment response); enable the identification of a biomarker that could be used in diagnosis; or inform prognostic predictions about the course of illness in a given individual.

Personalized medicine and treatment prediction

Investigating individual differences in pathophysiology will be critical for developing personalized treatments. Existing treatments are effective, but not for everyone [14]. To the extent that distinct pathophysiological processes are at play in subsets of depressed patients, heterogeneity may contribute to treatment resistance and treatment failures. In the long term, parsing diagnostic heterogeneity in depression and defining subtype-specific mechanisms could give rise to individualized treatment strategies targeting personalized molecular signaling pathways, brain circuits, and behaviors. Multiple groups are already investigating how neuroimaging tools could be used to facilitate personalized treatment selection [83, 182,183,184,185] and for identifying the optimal target area for brain stimulation interventions [98, 171, 186]. As always, efforts to identify treatment predictive biomarkers must be followed up with rigorous prospective tests not only of their accuracy but also of their utility for improving clinical outcomes, preferably in the setting of a randomized control trial.

Funding and disclosure

This work was supported by grants from the National Institute of Mental Health (MH118388,MH114976, MH123154, MH118451, MH109685, MH109685-04S1), the National Institute onDrug Abuse (DA047851), the Hope for Depression Research Foundation, the PritzkerNeuropsychiatric Disorders Research Consortium, the Klingenstein–Simons Foundation Fund,the One Mind Institute, the Rita Allen Foundation, the Dana Foundation, the Foundation for OCD Research, and the Brain and Behavior Research Foundation (NARSAD). CL is listed asan inventor for Cornell University patent applications on neuroimaging biomarkers for depression that are pending or in preparation. The authors declare no competing interests.