Main

How does the complex anatomical and functional organization of the human brain develop from the expression of over 20,000 genes1, and how does this process go awry in neurodevelopmental disorders? In the past 10 years, whole-brain, whole-genome transcriptional atlases, such as the Allen Human Brain Atlas (AHBA)2, have suggested that healthy brain organization may depend on ‘transcriptional programs’ representing the coordinated expression of large numbers of genes over development3,4,5,6,7.

In 2012, Hawrylycz et al.2 showed that principal components of the AHBA dataset capture distinct features of cortical anatomy. In 2018, Burt et al. argued that the first principal component of cortical gene expression (C1) reflects an anterior-to-posterior ‘neuronal hierarchy’, defined in macaque tract-tracing data by feedforward and feedback axonal connections between cortical areas8,9,10 and indexed in humans by the ratio of T1-weighted and T2-weighted (T1w/T2w) magnetic resonance imaging (MRI) signals, a putative marker of cortical myelination8. These discoveries echoed previous findings from studies of embryonic development of chick, mouse and human brains where spatially patterned transcriptional gradients were shown to organize neurodevelopmental processes, such as areal differentiation, axonal projection and cortical lamination6,11,12,13. Single-cell RNA sequencing (RNA-seq) data have also revealed an anterior-to-posterior gradient in the gene expression of inhibitory interneurons, which is conserved across multiple species, including humans14. It is, therefore, likely that the principal component of gene expression in the adult human cortex represents a transcriptional program key to its normative development.

However, it is not clear that C1 is the only component of spatially patterned and neurodevelopmentally coordinated gene expression in the human brain. Hawrylycz et al.2 suggested that principal component analysis (PCA) of a restricted set of 1,000 genes in one of the six brains of the AHBA dataset revealed multiple biologically relevant components (Supplementary Fig. 1). Later, Goyal et al.15 used nonlinear dimension reduction across whole-genome spatial expression, again from only one of the six AHBA brains, to show that aerobic glycolysis was associated with a second transcriptional component. To our knowledge, more recent studies using all available AHBA data have reliably found only C1 (refs. 8,16). This first component has been linked to a general ‘sensorimotor-association axis’ (S-A axis) of brain organization10 derived from several macroscale brain phenotypes, including, among others, the principal gradient of functional connectivity17, maps of brain metabolism and blood flow15 and the map of human cortical expansion compared to other primates18. Although it is parsimonious to assume that such diverse brain phenotypes could all be determined by a single transcriptional program, it seems more realistic to expect that multiple transcriptional programs are important for human brain development, as is generally the case for brain development in other species19.

Here we present two higher-order components of human cortical gene expression, C2 and C3, that likely represent additional transcriptional programs distinct from the C1 component already reliably described8. These higher-order components emerged only when optimized data-filtering and dimension-reduction methods were applied to the AHBA dataset. We found that C2 and C3 are each specifically enriched for biologically relevant gene sets and spatially co-located with distinct clusters of neuroimaging phenotypes or macroscale brain maps. Leveraging independent RNA-seq datasets on single-cell and developmental gene expression, we further demonstrate that all three components are generalizable to other datasets, representative of coordinated transcription within cells of the same class, and dynamically differentiated over the course of fetal, childhood and adolescent brain development. Finally, by triangulating evidence across case–control neuroimaging, differential gene expression and genome-wide association studies (GWASs), we demonstrate that components C1 and C2 are specifically associated with autism spectrum disorder (ASD) and C3 with schizophrenia. Although previous studies used the AHBA to derive gene sets correlated with disorder-related MRI phenotypes20,21,22,23,24,25, this disorder-first, ‘imaging transcriptomics’26,27,28 approach is susceptible to identifying genes whose co-location with MRI phenotypes reflects secondary associations or consequences of a disorder, such as behavioral changes (for example, smoking and alcohol use), physical health disorders (for example, obesity and diabetes) or pharmacological treatment29,30,31. What is of most interest for neurodevelopmental disorders is to understand the pathogenic provenance of a clinically diagnosable disorder—to ask ‘what developed differently?’ rather than merely ‘what is different?’. Our approach sought to distinctively address the question of what ‘develops differently’ based on an understanding of ‘normal development’, by linking genetic risks and atypical phenotypes to a generalizable transcriptional architecture of healthy brain development.

Results

Three components pattern cortical gene expression

We first applied PCA to the entire AHBA dataset of six adult brains2. Microarray measurements of relative mRNA levels were processed to represent mean expression of approximately 16,000 genes at each of the 180 regions of the left hemispheric cortex defined by the HCP-MMP parcellation32,33,34 (Methods). We initially found that higher-order components (C2 and C3) estimated by PCA of the resulting {180 × 16,000} data matrix were not robust to sampling variation of the six donor brains, with low generalizability, g, compared to C1: gC1 = 0.78, gC2 = 0.09, gC3 = 0.14 (Methods). However, two data processing improvements were found to enhance the generalizability of higher-order components. First, we optimized the tradeoff involved in excluding noisy data—by filtering spatially inconsistent genes (with low differential stability35) and under-sampled brain regions—while seeking to maximize the anatomic and genomic scope of the data matrix (Extended Data Fig. 1). Second, we used the nonlinear dimension reduction technique of diffusion map embedding (DME), instead of linear PCA, to identify coordinated gene expression patterns from the matrix. DME is robust to noise and more biologically plausible than PCA in this context because of its less strict orthogonality constraints (Methods). We found that, although PCA and DME both identified the same components from the filtered gene expression matrix (Extended Data Fig. 1d), using DME was necessary to achieve high generalizability, g, while also retaining sufficient genes for downstream enrichment analyses.

We applied DME to the {137 × 7,937} filtered AHBA data matrix comprising the expression of the 50% most stable genes measured in the 137 cortical areas with data available from at least three brains. The generalizability of the first three components was substantially increased—gC1 = 0.97, gC2 = 0.72, gC3 = 0.65—whereas the generalizability of even higher-order components remained low—for example, gC4 = 0.28 (Fig. 1a). We found that the cortical maps of C2 and C3 derived from DME on filtered data were more spatially smooth than the corresponding PCA-derived maps on unfiltered data (Fig. 1b), consistent with the interpretation that higher generalizability indicates less contamination by spatially random noise. C1–C3 were also robust to variations in parameters for processing the AHBA, including choice of parcellation template (Extended Data Fig. 2). Finally, the transcriptional patterns represented by C1–C3 in the AHBA dataset were reproducible in an independent PsychENCODE dataset comprising bulk RNA-seq measurements of gene expression at 11 cortical regions from n = 54 healthy controls36 (regional correlation: rC1 = 0.85, rC2 = 0.75, rC3 = 0.73; Extended Data Fig. 3 and Supplementary Table 5).

Fig. 1: Three generalizable components of human cortical gene expression were enriched for biological processes, cytoarchitecture and cognitive capacity.
figure 1

a, To identify robust components of cortical gene expression, we split the six-brain AHBA dataset into two disjoint triplets of three brains, applied PCA to each triplet and correlated the resulting matched components (C1, C2, C3…) (Methods). For each component, the median absolute correlation over all 10 permutations of triplet pairs was a proxy for its generalizability, g. Using PCA and previously published best practices for processing the AHBA dataset33,34, generalizability decreased markedly beyond the first component: gC1 = 0.78, gC2 = 0.09, gC3 = 0.14. Using DME with the top 50% most stable genes, and the 137 regions with data available from at least three brains, the generalizability of the first three components substantially increased: gC1 = 0.97, gC2 = 0.72, gC3 = 0.65. b, Cortical maps of brain regional scores of components C1–C3 estimated by DME on the filtered AHBA dataset displayed smooth spatial gradients (right; Moran’s I 0.48, 0.58 and 0.21 for C1–C3, respectively), unlike those of PCA on the unfiltered data (left; Moran’s I 0.50, 0.09 and 0.07). c, GO biological process enrichments for C1–C3 showed that the number of significant enrichments was greater for higher-order components, illustrating that they were more biologically specific. C2-positive genes were enriched for metabolism, whereas C2-negative genes were enriched for regulatory processes. C3-positive genes were enriched for synaptic plasticity and learning, whereas C3-negative genes were enriched for immune processes. d, C1–C3 were distinctively enriched for marker genes of six cortical layers and white matter (WM)37. e, C1–C3 were also distinctively enriched for marker genes of cell types and synapses44. f, All three components were significantly enriched for genes mapped to common variants associated with educational attainment in previous GWAS data39. g, C2 and C3 (but not C1) were significantly enriched for genes mapped to common variation in intelligence and cognition across four independent GWAS studies40,41,42,43. For dg, significance was computed by two-sided permutation tests (Methods) and FDR-corrected across all tests in each panel; *P < 0.05, **P < 0.01, ***P < 0.001 .

The first three DME components, C1–C3, explained 38%, 10% and 6.5%, respectively, of the total variance of the filtered AHBA dataset (Methods). The proportion of variance explained was related to the number of genes that were strongly weighted (absolute correlation |r | ≥ 0.5) on each component: 4,867 genes (61%) were strongly weighted on C1, 967 genes (12%) on C2 and 437 genes (5.5%) on C3 (Supplementary Fig. 2). The three components also had distinct axial alignments in anatomical space, and the co-expression network of cortical regions displayed clear anatomical structure even when the highest-variance C1 component was regressed out (Extended Data Fig. 4). These findings demonstrate that these three expression patterns shared across hundreds to thousands of genes are likely to be biologically relevant.

To interpret the DME-derived components in more detail, we first used enrichment analyses of the weights of the 7,973 genes on each component (Methods). Many more Gene Ontology (GO) biological process terms were significantly enriched (with false discovery rate (FDR) = 5%) for C2 (59 GO terms) and C3 (111 GO terms) than for C1 (15 GO terms) (Fig. 1c).

Although C1 was enriched for relatively few, functionally general biological processes, it precisely matched the first principal component previously reported (r = 0.96)8. The same interneuron marker genes (SST, PVALB, VIP and CCK) and glutamatergic neuronal genes (GRIN and GABRA) were strongly weighted with opposite signs (positive or negative) on C1 (Supplementary Fig. 3).

For genes positively weighted on C2, 23 of 36 enrichments were for metabolic processes, and, for negatively weighted genes, 19 of 23 enrichments were for epigenetic processes (Fig. 1c and Supplementary Table 2). Whereas, for genes positively weighted on C3, 19 of 27 enrichments were related to synaptic plasticity or learning, and, for negatively weighted genes, 33 of 84 enrichments involved the immune system. We further analyzed enrichment for genes identified as markers of specific cortical layers37 (Fig. 1e) and cell types38 (Fig. 1f) and, in each case, observed distinct enrichment profiles for C1–C3. For example, genes positively weighted on C3 were enriched for marker genes of neurons, synapses and cortical layers 2 and 3 (L2 and L3), whereas genes negatively weighted on C3 were enriched for glial (especially oligodendroglial) marker genes.

We also explored the biological relevance of the three components by enrichment tests for genes associated with variation in adult cognitive capacity. We found that all three components, C1–C3, were enriched for genes significantly associated with educational attainment39 (Fig. 1f). Across four independent GWASs of intelligence and cognition40,41,42,43, genes strongly weighted on C1 were not significantly enriched, but genes negatively weighted on C2 were enriched for genetic variants associated with intelligence in three of the four studies, and genes positively weighted on C3 were enriched for genes identified by all four previous GWASs of intelligence (Fig. 1g).

Neuroimaging maps align to three transcriptional components

Previous work linked gene transcription to a multimodal S-A axis10 of brain organization, defined as the composite of 10 brain maps, comprising C1 and nine other MRI or positron emission tomography (PET) neuroimaging maps that were selected to differentiate sensorimotor and association cortices. We first aimed to build on this work by analyzing the correlation matrix of the same set of nine brain maps together with the three transcriptional components derived from DME of the filtered AHBA dataset. Data-driven cluster analysis of this {12 × 12} correlation matrix identified three clusters, each including one of the orthogonal transcriptional components (Fig. 2a and Methods). C1 was clustered together with two MRI maps: the T1w/T2w myelination marker44 and cortical thickness45; C2 was clustered with five maps: aerobic glycolysis46, cerebral blood flow47, cortical expansion in humans relative to non-human primates18, inter-areal allometric scaling48 and external pyramidal cell density49; and C3 was clustered with two maps: the principal gradient of functional MRI (fMRI) connectivity17 and first principal component of cognitive terms meta-analyzed by Neurosynth50. Although some maps were specifically aligned to one component—for example, aerobic glycolysis rC2 = 0.66 (Pspin = 0.004, FDR < 5%)—others were moderately correlated with multiple transcriptional components—for example, for cerebral blood flow: rC1 = 0.25, rC2 = 0.28, rC3 = 0.33. This clustering analysis suggests that it is overly parsimonious to align all nine neuroimaging phenotypes with just one transcriptional component (C1) as part of a singular S-A cortical axis.

Fig. 2: Neuroimaging and macroscale maps of brain structure, function and development were distinctively co-located with three components of cortical gene expression.
figure 2

a, Correlation matrix of intrinsic transcriptional components C1–C3 together with the nine neuroimaging-derived and physiologically derived maps that Sydnor et al.10 combined with C1 to define S-A axis of brain organization. Many of the maps were not highly correlated to each other (median |r| = 0.31), and data-driven clustering of the matrix revealed three distinct clusters around each of the mutually orthogonal transcriptional components C1–C3, demonstrating that all three components are relevant for understanding macroscale brain organization. b, Distributions of regional scores of C1–C3 in histologically defined regions of laminar cytoarchitecture51. C1 distinguished idiotypic (P = 0.005) and paralimbic (P = 0.002) regions, whereas C3 distinguished idiotypic (P = 0.002) and heteromodal (P = 0.01) regions. *P < 0.05, FDR-adjusted two-sided permutation test as the percentile of the mean z-score relative to null spin permutations, with adjustment for multiple comparisons across all 12 tests. c, Degree of fMRI connectivity52,53 was significantly aligned to C1 (r = 0.78, Pspin < 0.001). Blue/yellow highlighted points correspond to idiotypic/paralimbic cytoarchitectural regions as in b. d, MEG-derived theta power54 was significantly aligned to C2 (r = 0.78, Pspin = 0.002). e, Regional change in myelination over adolescence55,56 was significantly aligned to C3 (r = 0.43, Pspin = 0.009). Blue/red highlighted points correspond to idiotypic/heteromodal cytoarchitectural regions as in b. In c and d, *P < 0.05, **P < 0.01, ***P < 0.001, FDR-corrected two-sided spin-permutation test, with corrections for multiple comparisons of all maps in c and d being compared with all of C1–C3.

We also found that the three transcriptional components were associated with a wider range of cellular, functional and developmental phenotypes than the nine neuroimaging maps above and that these associations were again distinct for the three components. For example, at cellular scale, histologically defined regions of laminar cytoarchitectural differentiation51 were co-located with C1 and C3 but not with C2 (ANOVA, P < 0.001; Fig. 2b). In fMRI and magnetoencephalography (MEG) data, we found that weighted nodal degree of cortical regions in an fMRI network52,53 was strongly correlated with C1 (rC1 = 0.78, Pspin < 0.001, FDR = 5%; Fig. 2c) but not with C2 or C3 (rC2 = −0.01, rC3 = 0.00); across all canonical frequency intervals of MEG data54, an FDR-significant association was observed between theta band (4–7 Hz) oscillations and C2 (rC2 = 0.78, Pspin = 0.002, FDR = 5%; Fig. 2d) but not C1 or C3 (rC2 = −0.18, rC3 = −0.02) (see Extended Data Fig. 5 for other MEG results). In addition, in support of the hypothetical prediction that adult brain transcriptional programs are neurodevelopmentally relevant, we found that a previous map of adolescent cortical myelination, as measured by change in magnetization transfer between 14 years and 24 years (ΔMT)55,56, was significantly co-located with C3 (rC3 = 0.43, Pspin = 0.009; Fig. 2e) but not with C1 or C2 (rC2 = 0.17, rC3 = 0.15).

C1–C3 are distinctly developing intracellular programs

We next used two additional RNA-seq datasets to investigate the consistency of AHBA-derived components with gene co-expression in single cells—for example, neurons or glia—and to explore the developmental phasing of gene transcription programs represented by C1–C3.

First, for single-cell RNA-seq data comprising 50,000 nuclei sampled from five cortical regions of three donor brains57, the total weighted expression of the C1–C3 gene weights in each sample was computed separately for genes positively and negatively weighted in each component (Methods). We reasoned that if the components derived from bulk tissue microarray measurements in the AHBA dataset were merely reflective of regional differences in cellular composition—for example, neuron–glia ratio—then genes weighted positively and negatively on each component should not have anti-correlated expression across cells of the same class. However, we observed that genes weighted positively and negatively on the same component had strongly anti-correlated expression at the single-cell level (Fig. 3a), whereas genes that were positively and negatively weighted on different components were not anti-correlated (Supplementary Fig. 5). The anti-correlation of genes positively and negatively weighted on C1 or C2 was stronger within each class of cells than across multiple cell classes, and was even stronger when the single-cell data were stratified by subclasses of cells in specific cortical layers—for example, L2 VIP-expressing interneurons (Fig. 3a, inset). By contrast, for C3, the anti-correlation of positively and negatively weighted genes was stronger across cell classes than within each class, although there was still evidence for significantly coupled expression across cells of the same class or subclass.

Fig. 3: Transcriptional components represent intracellular coordination of gene expression programs with distinct developmental trajectories.
figure 3

a, For each of approximately 50,000 single-cell RNA-seq samples, the weighted average expression of the negatively weighted genes of each AHBA component C1–C3 is plotted against that of the positively weighted genes (Methods). Samples are colored by cell type, demonstrating that genes positively and negatively weighted on C1–C3 have correlated expression within each major class of brain cells. Astro, astrocytes; Endo, endothelial cells; Micro, microglia; N-Ex, excitatory neurons; N-In, inhibitory neurons; Oligo, oligodendrocytes; OPC, oligodendrocyte precursor cell. Inset, a subset of samples from L2 VIP interneurons, illustrating that C1–C3 weighted genes were transcriptionally coupled even within a fine-grained, homogeneous group of cells. b, Cortical maps representing the regional scores of components C1–C3 for each of 11 regions with transcriptional data available in the BrainSpan cohort of adult brains (left) and C1–C3 component scores for the matching subset of regions in the AHBA (right). c, Scatter plots of matched regional C1–C3 scores from b, demonstrating that the three transcriptional components defined in the AHBA had consistent spatial expression in BrainSpan. d, Correlations between AHBA C1–C3 scores and BrainSpan C1–C3 scores (as in c) for each of three age-defined subsets of the BrainSpan dataset. C1 and C2 component scores were strongly correlated between datasets for all age subsets, whereas C3 component scores were strongly correlated between datasets only for the 18–40-year subset of BrainSpan. This indicates that C1 and C2 components were expressed in nearly adult forms from the earliest measured phases of brain development, whereas C3 was not expressed in adult form until after adolescence. e, Developmental trajectories of brain gene expression as a function of age (−0.5 years to 40 years; x axis, log scale) were estimated for each gene (Methods) and then averaged within each decile of gene weights for each of C1–C3; fitted lines are color-coded by decile. Genes weighted positively on C3 were most strongly expressed during adolescence, whereas genes weighted strongly on C1 or C2 were most expressed in the first 5 years of life. Dots above the x axis represent the postmortem ages of the donor brains used to compute the curves. RPKM, reads per kilobase per million mapped reads.

Second, to explore the developmental trajectories of the transcriptional components, we used BrainSpan, an independent dataset where gene expression was measured by RNA-seq of bulk tissue samples from 4–14 cortical regions for each of 35 donor brains ranging in age from −0.5 years (mid-gestation) to 40 postnatal years6. We first asked if the gene weights for each of the components derived from the AHBA dataset would exhibit similar spatial patterns in the BrainSpan dataset. We projected the C1–C3 gene weights from the AHBA onto the subset of adult brains (18–40 years, n = 8) in BrainSpan (Fig. 3b and Methods) and found that the resulting cortical maps of component scores in the BrainSpan data were highly correlated with the corresponding cortical maps derived from the AHBA dataset (rC1 = 0.96, rC2 = 0.88, rC3 = 0.84; Fig. 1d). This indicated that the three components defined in the AHBA were generalizable to the adult brains in the BrainSpan dataset (for a full replication of C1–C3 in independent data, see Extended Data Fig. 3). We then similarly compared the cortical component maps derived from the AHBA dataset to the corresponding maps calculated for subsets of the BrainSpan cohort from two earlier developmental stages (prebirth, n = 20, and birth to 13 years, n = 14). We observed that, for C1 and C2, AHBA component scores were almost as highly correlated with BrainSpan component scores in fetal (prebirth) and childhood (birth to 13 years) brains as in the adult (18–40 years) brains (birth to 13 years, rC1 = 0.87, rC2 = 0.91; prebirth, rC1 = 0.74, rC2 = 0.66; Fig. 3d). However, C3 scores in the AHBA dataset were not so strongly correlated with C3 scores in the fetal and childhood subsets of the BrainSpan dataset (prebirth, rC3 = 0.29; birth to 13 years, rC3 = 0.47). These results suggest that C3 may only emerge developmentally during adolescence, whereas the C1 and C2 have nearly-adult expression from the first years of life.

We tested this hypothesis by further analysis of the BrainSpan dataset, modeling the nonlinear developmental trajectories of each gene over the age range of −0.5 years to 40 years (Methods) and then averaging trajectories over all genes in each decile of the distributions of gene weights on each of the three components. We found that genes in the top few deciles of C3 gene weights became more strongly expressed during and after adolescence, whereas genes in the top few (C2) or bottom few (C1) deciles of gene weights on the other two components were most strongly expressed in the first 5 years of life and then declined or plateaued during adolescence and early adulthood (Fig. 3e). These results confirmed that components C1–C3 have distinct neurodevelopmental trajectories, with genes positively weighted on C3 becoming strongly expressed after the first postnatal decade.

Autism and schizophrenia have specific links to C1/C2 and C3

Finally, we explored the clinical relevance of C1–C3 by analysis of previous neuroimaging, differential gene expression and GWAS associations for ASD, major depressive disorder (MDD) and schizophrenia.

First, we leveraged the BrainChart neuroimaging dataset of more than 125,000 MRI scans58, in which atypical deviation of regional cortical volumes in psychiatric cases was quantified by centile scores relative to the median growth trajectories of normative brain development over the lifecycle (Fig. 4a). Using the Desikan–Killiany parcellation of 34 cortical regions necessitated by alignment with this dataset (Methods), we found that cortical shrinkage in ASD was significantly associated with both C1 and C2 (rC1 = 0.49, Pspin = 0.0002, FDR < 5%; rC2 = −0.28, Pspin = 0.0006, FDR < 5%), whereas shrinkage in schizophrenia was specifically associated with C3 (rC3 = 0.43, Pspin = 0.008, FDR < 5%) (Fig. 4b).

Fig. 4: Genetics, transcriptomics and neuroimaging of autism and schizophrenia were consistently and specifically linked to normative transcriptional programs.
figure 4

a, First row: cortical volume shrinkage in ASD, MDD and schizophrenia (SCZ) cases. Red indicates greater shrinkage, computed as z-scores of centiles from normative modeling of more than 125,000 MRI scans. Second row: AHBA components projected into the same Desikan–Killiany parcellation. b, Spatial correlations between volume changes and AHBA components, C1–C3. Significance was tested by two-sided FDR-adjusted spatially autocorrelated spin permutations and corrected for multiple comparisons. c, Enrichments in C1–C3 for consensus lists of DEGs in postmortem brain tissue of donors with ASD, MDD and SCZ compared to healthy controls (Methods). Significance was assessed as percentile of mean weight of DEGs in each component relative to randomly permuted gene weights and corrected for multiple comparisons; two-sided FDR-adjusted P values. d, Enrichment in C1–C3 for GWAS risk genes for ASD66, MDD67 and SCZ68, tested for significance as in c, demonstrating alignment with both spatial associations to volume changes and enrichments for DEGs. e, Venn diagrams showing the lack of overlap of DEGs and GWAS risk genes reported by the primary studies summarized in c and d. f, DEGs and GWAS risk genes for each disorder were filtered for only C3-positive genes and then tested for enrichment with marker genes for each cortical layer37. Significance was tested by one-sided Fisher’s exact test and corrected for multiple comparisons across all 42 tests. C3-positive DEGs and GWAS genes for SCZ (but not ASD or MDD) were both enriched for L2 and L3 marker genes, despite the DEGs and GWAS gene sets having nearly no overlap for each disorder (see Extended Data Fig. 6 for more detail). g, Convergent with L2/L3 enrichment in the C3-positive SCZ-associated DEGs and GWAS genes, a cortical map of supragranular-specific cortical thinning in SCZ72 was significantly and specifically co-located with C3 (r = 0.55, two-sided spin-permutation P = 0.002); each point is a region, and color represents C3 score. *P < 0.05, **P < 0.01, ***P < 0.001.

Second, we compiled consensus lists of differentially expressed genes (DEGs) from RNA-seq measurements of dorsolateral prefrontal cortex (DLPFC) tissue in independent studies of ASD36,59,60, MDD61 and schizophrenia60,62,63,64,65 (Methods). We found that genes differentially expressed in ASD were specifically enriched in both C1 and C2 (but not in C3), whereas genes differentially expressed in schizophrenia were enriched in C3 (but not in C1 or C2), and genes differentially enriched in MDD were enriched only in C1 (Fig. 4b). Corroborating the enrichments of ASD DEGs, case–control differences in expression at 11 cortical regions for ASD cases compared to healthy controls showed that the positively weighted genes on C1 and C2 were significantly less strongly expressed in ASD cases than in controls (Extended Data Fig. 3).

Third, using data from the most recent GWASs of ASD66, MDD67 and schizophrenia68, we found that genetic variants significantly associated with ASD were enriched in both C1 and C2 (but not in C3), whereas genes associated with schizophrenia were enriched in C3 (but not in C1 or C2) (Fig. 4d). Genes associated with MDD were not significantly enriched in any transcriptional component. These associations were replicated when using alternative methods (MAGMA69 and H-MAGMA70) to test the association between GWAS-derived P values for the association of each gene with ASD, MDD or schizophrenia and the C1–C3 gene weights without requiring an explicit prioritization of GWAS-associated genes (Supplementary Fig. 6). This pattern of results for autism and schizophrenia GWAS associations evidently mirrored the pattern of previous results from analysis of case–control neuroimaging (Fig. 4b) and differential gene expression studies (Fig. 4c), with ASD consistently linked to components C1 and C2 and schizophrenia consistently linked to C3.

Notably, this consistency of association between disorders and specific transcriptional components was observed despite minimal overlap between the DEGs and GWAS risk genes identified as significant by the primary studies of each disorder71 (Fig. 4e). However, motivated by the association of C3 with regions of greatest laminar differentiation (Fig. 2b), we found that the subsets of the schizophrenia-associated DEG and GWAS gene sets that were positively weighted on C3 were both significantly enriched for marker genes of L2 and L3 (Fig. 4g and Extended Data Fig. 6). These shared laminar associations between the non-overlapping DEG and GWAS gene sets were present only when subsetting to C3-positive genes and were specific to schizophrenia (that is, C3-positive subsets of ASD and MDD genes did not show the same L2/L3 enrichments). Convergent with C3 revealing an L2/L3 association in schizophrenia-associated genes from DEG and GWAS gene sets, we found that the cortical map of C3 was significantly co-located with an MRI-derived map of specifically supragranular, L2/L3-predominant thinning in schizophrenia72 (rC3 = 0.55, Pspin = 0.002, FDR < 1%; Fig. 4g).

Discussion

Our results offer a new perspective on how the brain’s macroscale organization develops from the microscale transcription of the human genome. Through optimized processing of the AHBA and replication in PsychENCODE, we show that the transcriptional architecture of the human cortex comprises at least three generalizable components of coordinated gene expression. The two higher-order components (C2 and C3) had not previously been robustly demonstrated, although the initial AHBA paper identified similar components to C1 and C2 by applying PCA to one of the six AHBA brains and filtering for only 1,000 genes2 (Supplementary Fig. 1). In the present study, we derived C2 and C3 from all six AHBA brains and show here that they each represent the coordinated expression of hundreds of genes (Supplementary Fig. 2). Broadly, the C2 genes were enriched for ‘metabolic’ and ‘epigenetic’ processes, whereas the C3 genes were enriched for ‘synaptic’ and ‘immune’ processes (Fig. 1c). Both higher-order components were significantly enriched for genes associated with intelligence and educational attainment (Fig. 1f,g), indicating their relevance to the brain’s ultimate purpose of generating adaptive behavior. The brain maps corresponding to each of the components were also distinctively co-located with multiple neuroimaging or other macroscale brain phenotypes (Fig. 2). These co-locations were often convergent with the gene enrichment results, triangulating evidence for C2 as a metabolically specialized component and for C3 as a component specialized for synaptic and immune processes underpinning adolescent plasticity (Table 1). Together, these convergent results expand on the proposal of a single S-A axis10,73 by demonstrating that macroscale brain organization emerges from multiple biologically relevant transcriptional components.

Table 1 Summary of convergent results on the biological and clinical relevance of three human brain transcriptional programs

The discovery of these biologically relevant, higher-order transcriptional components in the AHBA dataset raised further questions. (1) Do the components reflect coordinated gene expression within cells or only variation in cell composition? (2) When do the components emerge during brain development? (3) How do they intersect with neurodevelopmental disorders? We addressed these questions using additional RNA-seq datasets (Supplementary Table 5). First, we found that genes positively or negatively weighted on the components derived from the AHBA bulk tissue samples had consistently coupled co-expression across RNA-seq measurements in single cells—for example, individual neurons and glia (Fig. 3a). This indicated that C1–C3 represent transcriptional programs coordinated at the intracellular level, not merely regional variation in the proportion of different cell types. Second, we found that C1–C3 have differentially phased developmental trajectories of expression—for example, that the positive pole of C3 becomes strongly expressed only during adolescence, convergent with its spatial co-location with a map of adolescent cortical myelination (Fig. 3b,c). Finally, we established that these transcriptional programs are not only critical for healthy brain development but, as might be expected, are also implicated in the pathogenesis of neurodevelopmental disorders (Fig. 4).

The pattern of results for disorders was strikingly convergent across multiple data modalities: C1 and C2 were both enriched for genes implicated by both GWAS and DEG data on ASD, whereas C3 was specifically enriched for genes implicated by both GWAS and DEG data on schizophrenia (Table 1). We observed a similar pattern of significant co-location between C1–C3 maps and MRI phenotypes: developmentally normalized scores on reduced cortical volume in ASD were correlated with maps of C1 and C2 and, for schizophrenia, with the map of C3 (Fig. 4a,b). In contrast, there was no evidence for enrichment of C1–C3 by genes associated with risk of Alzheimer’s disease74 (Supplementary Fig. 6). An intuitive generalisation of these results is that the developmental processes that give rise to these three components of gene expression in the healthy adult brain are pathogenically more relevant for neurodevelopmental disorders than for neurodegenerative disease.

Overall, our results were strongly supportive of the motivating hypothesis that the transcriptional architecture of the human cortex represents developmental programs crucial both to the brain’s healthy organization and to the emergence of neurodevelopmental disorders. For example, when interpreting C3 as a transcriptional program mediating adolescent plasticity (Table 1), our finding that C3 represents coupled transcription of synapse-related and immune-related genes within cells (Fig. 3a) is consistent with previous work indicating that the neuronal expression of immune-related, typically glial genes can play a mechanistic role in synaptic pruning75 and, vice versa, that neuronal genes associated with synapse and circuit development can also be expressed in glial cells76. Although atypical synaptic pruning has long been hypothesized to be a mechanistic cause of schizophrenia77,78,79, previous results on the biology of schizophrenia have shown limited consistency, both among the primary data modalities of GWAS, postmortem expression and neuroimaging80,81 and even among DEG studies71. Here we demonstrate that the C3 transcriptional program offers a unifying link between these disparate previous results. When parsed by the C3-positive genes, the otherwise non-overlapping GWAS and DEG gene sets for schizophrenia display a shared enrichment for supragranular marker genes (Fig. 4e,f), and, convergently, C3 was spatially associated with supragranular-specific thinning in schizophrenia (Fig. 4g). Supragranular layers have dense cortico-cortical connections82 and are expanded in humans relative to other species83,84,85, mature latest in development86, have been linked to intelligence87 and have previously been linked to schizophrenia88,89,90. This triangulation of evidence strongly suggests that the third component of the brain’s gene expression architecture represents the transcriptional program coordinating the normative, neuro-immune processes of synaptic pruning and myelination in adolescence55, such that atypical expression of C3 genes due to schizophrenia genetic risk variants can result in atypical development of supragranular cortical connectivity, leading to the clinical emergence of schizophrenia.

Clearly, there are limits to what can be learned from RNA measurements of bulk tissue samples from six healthy adult brains. In the present study, we explicitly identified the limits of the AHBA dataset by optimizing data processing against an unbiased measure of generalizability, g, which yielded three components. The architecture of human brain gene expression likely involves more than three components; however, our analysis suggests that their discovery will rely on additional high-granularity transcriptional data. In particular, gene expression varies with sex, age, genetics and environment91, so we expect that future data will reveal additional components that are more individually variable and demographically diverse than the three that we characterize here. In addition, the code and data that supplement our results can help future research to leverage our work with the unique AHBA resource.

Methods

AHBA data and donor-level parcellation images

Probe-level gene expression data with associated spatial coordinates were obtained from the Allen Institute website (https://human.brain-map.org), which collected the data after obtaining informed consent from the deceased’s next of kin. HCP-MMP1.0 parcellation images matched to the individual native MRI space of each donor brain (n = 6) were obtained from Arnatkevičiūtė et al. (https://figshare.com/articles/dataset/AHBAdata/6852911)33. The use of native donor parcellation images (rather than a standard parcellation image with sample coordinates mapped to MNI space) was chosen as it optimized the triplet generalizability metric (see the following).

AHBA processing parameters

To correctly match AHBA samples to regions in native donor space parcellation images using published processing pipelines, we recommend the use of either (1) abagen version 0.1.3 or greater (for Python)34 or (2) the version of the AHBAprocessing pipeline updated in June 2021 or later (for MATLAB)33.

In the present study, we processed the AHBA with the abagen package, with one modification: we filtered the AHBA samples for only those annotated as cortical samples before subsequent processing steps. This was done such that subcortical and brainstem samples did not influence the intensity filter and probe aggregation steps. This modification was chosen as it optimized the triplet generalizability metric (see the following). The code used to apply the modification is available in the code/processing_helpers.py file at https://github.com/richardajdear/AHBA_gradients.

Other than this modification, abagen was run using the following parameters, which follow published recommendations33 unless otherwise specified:

  • Hemisphere: The right hemisphere samples that are present for two of the six donors were reflected along the midline and processed together with the left hemisphere samples of those donor datasets to increase sample coverage.

  • Intensity-based filter: Probes were filtered to retain only those exceeding background noise (as defined by the binary flag provided with the data by the Allen Institute) in at least 50% of the samples33.

  • Probe aggregation: Probes were aggregated to genes by differential stability, meaning that, for each gene, the probe with the highest mean correlation across donor pairs was used.

  • Distance threshold: Samples were matched to regions with a tolerance threshold of 2 mm, using the voxel-mass algorithm in the abagen package.

  • Sample normalization: Before aggregating over donors, samples were normalized across all genes, using the scaled robust sigmoid method (a sigmoid transformation that is robust to outliers33).

  • Gene normalization: Before aggregating over donors, genes were normalized across all samples, again using the scaled robust sigmoid.

To ensure robustness, the primary analysis of computing components of the AHBA was repeated in a series of sensitivity analyses varying all of the processing parameters above—for example, not mirroring right hemisphere samples to the left hemisphere, different or no intensity filter for genes and different methods for aggregating and normalizing probes. Sensitivity analyses also included running the pipeline with alternative parcellation templates: HCP-MMP1.0 (ref. 32), Schaefer-400 (ref. 96) Desterieux97 and Desikan–Killiany98 (Extended Data Fig. 2).

Gene filtering by differential stability

Genes were filtered for those that showed more similar spatial patterns of expression across the six donors using the metric of differential stability as previously described by Hawrylycz et al.35. For each gene, differential stability was calculated as the average correlation of that gene’s regional expression vector between each donor pair (15 pairs with all six brains or three pairs in the triplets analysis; see below). Genes were ranked by differential stability, and then only the top 50% percent of genes were retained. The 50% threshold was chosen on the basis of a grid search (in combination with the region filter to optimize for generalizability) where the threshold for differential stability was varied between 10% and 100% (Extended Data Fig. 1).

Filtering regions by donors represented

Regions were filtered for those that included samples from at least three of the six AHBA donor brains, which, in the HCP-MMP1.0 parcellation, retained 137 of 180 regions. Note that, in the triplets analysis (see below), this means that only brain regions with samples from all three donors in the triplet were retained. The choice to filter for representation of three of the six donors was made on the basis of a grid search in combination with the differential stability gene filter to optimize for generalizability (Extended Data Fig. 1).

Triplets analysis: disjoint triplet correlation as a proxy for generalizability

To test for generalizability, we separated the six AHBA brains into pairs of disjoint triplets (for example, donor brains 1,2,3 in one triplet and 4,5,6 in another). We applied our full analysis pipeline (including all processing steps—for example, probe aggregation, normalization and filters) independently to each of the 20 possible combinations of triplets and correlated the regional scores for each DME or PCA component between each of the 10 disjoint pairs (Pearson’s r). When filtering for consistently sampled regions, the retained regions were different for each triplet of donor brains, so correlations were performed on only the intersection of regions retained in both triplets of each pair.

As the order of principal components can vary across different triplets, we used a matching algorithm in which the full correlation matrix was computed among the top five principal components of both triplets (for example, C1 from triplet A was correlated with each of C1–C5 of triplet B). The highest absolute correlation value in the matrix was then identified as representing two matched components and removed from the matrix, with the process repeated until all components were matched. The components were then ranked by the mean variance explained in each matched pair.

The median absolute correlation across all 10 disjoint triplet pairs represented the generalizability, g, of the AHBA components processed using the given set of parameters. Processing parameters, in particular the filters for regions and donors, were optimized so as to maximize g while retaining as many genes and regions as possible (Extended Data Fig. 1).

Dimension reduction methods

Dimension reduction was performed using both PCA and DME, the latter having been described for use in spatial gradient analysis of brain imaging data by Margulies et al.17. For DME, the normalized cosine function was used as the kernel for the affinity matrix. No sparsity was added, and the alpha parameter was set at 1. These parameters were chosen as they optimized the inter-triplet correlation metric for generalizability. Both PCA and DME methods were implemented using the BrainSpace package99. See Supplementary Methods for further explanation on DME and its benefits over PCA and other alternatives (for example, independent component analysis).

Component gene weights

For each component, gene weights were computed as the Pearson’s correlation of each gene’s individual spatial expression vector with the regional scores of the component. For PCA, these correlations are equivalent to the PCA loadings (eigenvectors) multiplied by the square root of the variance explained by the component (eigenvalues).

Variance explained

For PCA, variance explained is given directly by the squared eigenvalues of the singular value decomposition. For DME, eigenvalues do not represent variance explained as the gene expression matrix is first converted to an affinity matrix using a kernel (here, the normalized cosine). Therefore, variance explained was calculated as the difference in the total variance of the region-by-gene expression matrix before and after regressing the matrix on each component’s region scores.

That is, defining the residual regional expression vector of gene g after regressing out i components as \({\bf {e}}_{g,i}\), the total variance \({V}_{i}\) of the residualized region-by-gene expression matrix is

$$\begin{array}{l}{V}_{i}=\sum _{g}{\rm{Var}}\left({\bf{e}}_{g,i}\right)\end{array}$$

and, for each component \({C}_{i}\), variance explained \({{\rm{VE}}}_{i}\) is given by

$${{\rm{VE}}}_{i}={V}_{i-1}-{V}_{i}$$

GO enrichment analysis for biological processes

Biological process enrichments of the gene weights for each component were computed using the ‘proteins with values/ranks’ function of the online software STRING100, which tests whether the mean weight of each annotated gene list is significantly higher or lower than random permutations of the same gene weights (the ‘aggregate fold change’ method100,101) and includes a Benjamini–Hochberg adjustment of the FDR.

The aggregate fold change method was chosen as it does not require thresholding the gene weights of the components to define ‘target’ versus ‘background’ gene lists (as in, for example, Fisher’s exact test). That is, rather than setting a threshold for which genes are ‘in’ or ‘out’ of each component, we took the weighted gene list where all genes can have some contribution to each component and, for each component, tested whether each GO gene list was, in aggregate, more positively or negatively weighted than chance.

Layer and cell type enrichment analyses

The gene lists for cortical layer marker genes were obtained from published analyses of laminar enrichment in spatial transcriptomic data from human postmortem tissue in the DLPFC37 (columns Q–W of supplementary table 4b in Maynard et al.37).

Cell type gene lists were obtained from Seidlitz et al.22, who compiled lists of significantly differentially expressed genes from five independent single-cell RNA-seq studies38,102,103,104,105. The gene list for synaptic marker genes was the unfiltered gene list from SynaptomeDB106.

All enrichments for layers and cell types were computed by the same aggregate fold change method101 as in the STRING software100, whereby the mean gene weight of each gene list was computed for both the true set of gene weights of each component and for 5,000 random permutations of the weights. The z-scores and permutation test P values for significance testing of enrichment were corrected for multiple comparisons with the Benjamini–Hochberg FDR.

GWAS enrichment analyses for educational attainment and intelligence

Genes associated with cognitive capacity by GWAS were obtained from:

  • Lee et al.39, supplementary table 7 (educational attainment)

  • Davies et al.40, supplementary table 6

  • Savage et al.42, supplementary table 15

  • Hill et al.41, supplementary table 5

  • Hatoum et al.43, supplementary table 16

Enrichment tests were performed by the aggregate fold change method101, as above.

Neuroimaging and other macroscale brain maps (Fig. 2)

Neuroimaging and other macroscale maps were obtained as follows:

  • The nine neuroimaging and macroscale maps in the clustering analysis (Fig. 2a) were obtained from the neuromaps package107 and are also available in Sydnor et al.10.

  • The regions of cytoarchitectural differentiation (Fig. 2b) were obtained from Paquola et al.108 and averaged into the HCP-MMP parcellation using the neuromaps package107.

  • The map of fMRI degree (Fig. 2c) was obtained from Paquola et al.49 and was originally computed from the HCP S900 release109.

  • The maps of MEG power bands (Fig. 2d and Extended Data Fig. 5) were obtained from the neuromaps package107.

  • The map of adolescent change in cortical myelination was obtained from Váša et al.56.

All maps were aggregated into the HCP-MMP parcellation and are provided in Supplementary Table 3.

Spatial associations between maps and the transcriptional components were computed by Pearsonʼs correlations and tested for significance using spin permutation tests (5,000 spins) by the Cornblath method110, leveraging tools from neuromaps107, and tested for significance with FDR correction for multiple testing.

For the regions of cytoarchitectural differentiation, the mean component scores in each architectonic class were tested for differences between class mean scores using ANOVA against spin-permuted null models, followed by correction for FDR. The associations between individual cytoarchitectural regions and each component were computed by the z-score of the mean component score in each region normalized by a spin permutation distribution of the regional mean component score with significance testing corrected for FDR.

Single-cell co-variation analysis (Fig. 3a)

Single-cell RNA-seq data were obtained from the Allen Cell Types Database (https://portal.brain-map.org/atlases-and-data/rnaseq)57.

Single-cell gene expression was filtered for the 7,873 genes in the optimally filtered AHBA dataset. To perform the analysis in Fig. 3a, the positive and negative gene weights were separated for each of C1–C3, and the dot product was taken with the gene expression matrix of single-cell samples. This produced a vector of six numbers, representing the weighted total expression of C1+, C1, C2+, C2, C3+ and C3 genes, respectively, for each of the 50,000 single-cell samples.

That is, given the gene expression vector sj of each single-cell sample j, we computed t he total weighted positive and negative expression \({ {{\mathrm{s}}^{+}}_{{\mathrm{j}},{\mathrm{Ci}}}}\) and \({ {{\mathrm{s}}^{-}}_{{\mathrm{j}},{{\mathrm{Ci}}}}}\) from the C1–C3 gene weights as:

$${{\mathrm{s}}^{+}}_{{\mathrm{j}},{{\mathrm{Ci}}}}={{\mathrm{s}}}_{{\mathrm{j}}}\cdot {{{\mathrm{u}}}^{+}}_{{{\mathrm{Ci}}}}\quad{\mathrm{and}}\quad{\mathrm{{{s}}}^{-}}_{\mathrm{{j}},{{\mathrm{Ci}}}}={{\mathrm{s}}}_{{\mathrm{j}}}\cdot {{{\mathrm{u}}}^{-}}_{{{\mathrm{Ci}}}}$$

where \(\begin{array}{l}{ {{\mathrm{u}}^{+}}_{{{\mathrm{Ci}}}}}=\max \left\{{{\mathrm{u}}}_{{{\mathrm{Ci}}}},{\mathrm{0}}\right\}\end{array}\) and \(\begin{array}{l}{ {{\mathrm{u}}^{-}}_{{{\mathrm{Ci}}}}}=\min \left\{{{\mathrm{u}}}_{{{\mathrm{Ci}}}},{\mathrm{0}}\right\}\end{array}\).

BrainSpan developmental gene expression processing (Fig. 3b–d)

BrainSpan data were obtained directly from the Allen Institute website6 (http://brainspan.org) and processed as follows:

  1. 1.

    The 11 cortical regions in the BrainSpan data were manually matched to the HCP-MMP1.0 parcellation regions according to the descriptions in the BrainSpan documentation. This mapping is provided at https://github.com/richardajdear/AHBA_gradients.

  2. 2.

    Exon-level expression data were filtered for only the matched BrainSpan regions.

  3. 3.

    Donor brains from which fewer than four regions were sampled were dropped.

  4. 4.

    Within each donor, expression of each gene was z-normalized over regions.

  5. 5.

    Donors were aggregated into three age ranges (pre-birth, birth to 13 years and 18–40 years), and expression was averaged for each gene.

AHBA-BrainSpan developmental consistency analysis (Fig. 3b–d)

Consistency between the AHBA components and BrainSpan was evaluated as follows:

  1. 1.

    Processed BrainSpan data were filtered for only the 7,973 genes retained in the filtered AHBA dataset (top 50% by differential stability; see above).

  2. 2.

    The dot product of the gene weights for C1–C3 were taken against the BrainSpan data, resulting in ‘BrainSpan scores’ for each of C1–C3, for each of the 11 BrainSpan regions, at each age range (pre-birth, birth to 13 years and 18–40 years).

  3. 3.

    In each of the 11 BrainSpan regions, ‘AHBA scores’ were computed as the mean of the matching HCP-MMP region scores from the original C1–C3 maps derived from the AHBA.

  4. 4.

    The ‘BrainSpan scores’ and ‘AHBA scores’ were correlated over the 11 BrainSpan regions (Pearson’s r) for each of C1–C3 and for each age bucket of the BrainSpan data.

As further clarification: given gene weights \({\mathrm{u}}_{\mathrm{i}}\) for AHBA component \({C}_{i}\) and the vector of expression over genes \({\mathrm{b}}_{\mathrm{j}}\) for each BrainSpan sample j (with a given age and region), the ‘BrainSpan score’ is

$${\mathrm{y}}_{\mathrm{j}},{\mathrm{i}}={\mathrm{b}}_{\mathrm{j}}\cdot {\mathrm{u}}_{\mathrm{i}}$$

and the consistency was tested as the correlation across the matched regions of the AHBA scores x and the mean of the BrainSpan scores \(\underline{y}\) of BrainSpan donors in each age range.

BrainSpan developmental trajectory modeling (Fig. 3e)

The developmental trajectories of each decile of C1–C3 were computed as follows:

  1. 1.

    The ages in the BrainSpan data were converted to post-conception days on a log10 scale.

  2. 2.

    For each gene, a generalized additive model was fitted using the GLMGam function in the statsmodels Python package with alpha = 1 and 12 3rd-degree basis splines as a smoothing function (df = 12, degree = 3 in the BSplines function). Sex and brain region were included as covariates.

  3. 3.

    Developmental curves were plotted from the fitted models for each gene, sex and region and then averaged by decile of gene weight for each of C1–C3.

Disorder spatial associations (Fig. 4a,b)

Maps of the regional centile score differences in cortical volume for ASD, MDD and schizophrenia were obtained from the BrainCharts project by Bethlehem et al.58, in which normative models were computed for multiple brain phenotypes across the human lifespan from a harmonized dataset of more than 125,000 total MRI scans (ncontrols = 38,839, nASD = 381, nMDD = 3,861, nSCZ = 315). As these data were in the Desikan–Killiany parcellation, the AHBA components in the HCP-MMP parcellation were mapped to a vertex-level surface map (FreeSurfer’s 41k fsaverage atlas) and then re-averaged into the Desikan–Killiany parcellation. Pearsonʼs correlations with cortical maps of C1–C3 scores were computed, and significance was assessed by spin permutation tests and corrected for FDR across all nine tests (three disorders by three components).

These disorder maps are provided in Supplementary Table 4.

Disorder DEG associations (Fig. 4c)

DEGs (FDR < 5%) from RNA-seq of postmortem brain tissue were obtained from the following case–control studies for each of ASD, MDD and schizophrenia:

  • ASD:

  • Gandal et al.36, supplementary table 3, WholeCortex_ASD_FDR < 0.05

  • Gandal et al.60, supplementary table 1, ASD.fdr < 0.05

  • Parikshak et al.59, supplementary table 2, FDR-adjusted P value, ASD versus CTL < 0.05

    • MDD

  • Jaffe et al.61, supplementary table 2, Cortex_adjPVal_MDD < 0.05

    • Schizophrenia

  • Fromer et al.62, supplementary table 16, FDR estimate < 0.05

  • Gandal et al.60, supplementary table 1, SCZ.fdr < 0.05

  • Jaffe et al.65, supplementary table 9, fdr_qsva <0.05

  • Collado-Torres et al.64, supplementary table 11), adj.P.Val <0.05 & region == ‘DLPFC’

A consensus list of DEGs was compiled for each disorder (except MDD where only one study was included) by including only those genes identified in at least two studies.

Enrichments for these gene sets in each disorder were computed by the aggregate fold change method101—that is, computing the percentile of the mean weight of the DEGs in C1–C3 relative to the 5,000 random permutations of the gene labels.

Disorder-associated genes from GWAS (Fig. 4d)

Genes significantly associated with ASD, MDD and schizophrenia by GWAS were obtained from:

  • ASD: Matoba et al.66, supplementary table 7

  • MDD: Howard et al.67, supplementary table 9

  • Schizophrenia: Trubetskoy et al.68, extended GWAS: https://figshare.com/articles/dataset/scz2022/19426775?file=35775617

    Associations with GWAS were calculated using three methods (Supplementary Fig. 6):

  • Enrichment of the prioritized genes identified in each of the specific studies, using the aggregate fold change method101 as described above.

  • MAGMA69, a regression technique that tests for association between each of the components C1–C3 and the P values for each gene’s association with ASD, MDD or schizophrenia (from corresponding primary GWASs) without requiring a threshold to be applied to the GWAS-derived P values to define a prioritized subset of genes for enrichment analysis. MAGMA additionally accounts for gene length and gene–gene correlations. The COVAR function of MAGMA was used to test for association of the GWAS P values with the C1–C3 gene weights as a continuous variable. For standard MAGMA, a single-nucleotide polymorphism (SNP)-to-gene mapping window of +35 kb to −10 kb was used.

  • H-MAGMA70, an extension of MAGMA where SNP-to-gene mapping is performed using Hi-C chromatin measurements from postmortem brain tissue so as to capture trans-regulatory effects. We used the Hi-C mapping from adult brain DLPFC, available online from the original H-MAGMA authors.

Laminar enrichments shared across DEG and GWAS gene sets (Fig. 4f)

Enrichments for the marker genes of each cortical layer37 were computed for the disorder-associated gene lists from DEGs and GWASs using Fisher’s exact test. These enrichments were computed both with and without filtering for only genes with positive C3 weights.

Schizophrenia supragranular-specific cortical thinning (Fig. 4g)

The MRI-derived map of supragranular cortical thinning in schizophrenia was obtained from Wagstyl et al.72 (n = 90 subjects, n = 46 cases) and parcellated using HCP-MMP1.0 parcellation. Pearson’s correlations were computed with C1–C3, and significance was assessed by spin permutation tests, corrected for FDR.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.