A concerted neuron–astrocyte program declines in ageing and schizophrenia

Human brains vary across people and over time; such variation is not yet understood in cellular terms. Here we describe a relationship between people’s cortical neurons and cortical astrocytes. We used single-nucleus RNA sequencing to analyse the prefrontal cortex of 191 human donors aged 22–97 years, including healthy individuals and people with schizophrenia. Latent-factor analysis of these data revealed that, in people whose cortical neurons more strongly expressed genes encoding synaptic components, cortical astrocytes more strongly expressed distinct genes with synaptic functions and genes for synthesizing cholesterol, an astrocyte-supplied component of synaptic membranes. We call this relationship the synaptic neuron and astrocyte program (SNAP). In schizophrenia and ageing—two conditions that involve declines in cognitive flexibility and plasticity1,2—cells divested from SNAP: astrocytes, glutamatergic (excitatory) neurons and GABAergic (inhibitory) neurons all showed reduced SNAP expression to corresponding degrees. The distinct astrocytic and neuronal components of SNAP both involved genes in which genetic risk factors for schizophrenia were strongly concentrated. SNAP, which varies quantitatively even among healthy people of similar age, may underlie many aspects of normal human interindividual differences and may be an important point of convergence for multiple kinds of pathophysiology.

Human brains vary across people and over time; such variation is not yet understood in cellular terms.Here we describe a relationship between people's cortical neurons and cortical astrocytes.We used single-nucleus RNA sequencing to analyse the prefrontal cortex of 191 human donors aged 22-97 years, including healthy individuals and people with schizophrenia.Latent-factor analysis of these data revealed that, in people whose cortical neurons more strongly expressed genes encoding synaptic components, cortical astrocytes more strongly expressed distinct genes with synaptic functions and genes for synthesizing cholesterol, an astrocyte-supplied component of synaptic membranes.We call this relationship the synaptic neuron and astrocyte program (SNAP).In schizophrenia and ageing-two conditions that involve declines in cognitive flexibility and plasticity 1,2 -cells divested from SNAP: astrocytes, glutamatergic (excitatory) neurons and GABAergic (inhibitory) neurons all showed reduced SNAP expression to corresponding degrees.The distinct astrocytic and neuronal components of SNAP both involved genes in which genetic risk factors for schizophrenia were strongly concentrated.SNAP, which varies quantitatively even among healthy people of similar age, may underlie many aspects of normal human interindividual differences and may be an important point of convergence for multiple kinds of pathophysiology.
In natural, non-laboratory settings-in which individuals have diverse genetic inheritances, environments and life histories, as humans doalmost all aspects of biology exhibit quantitative variation across individuals 3 .Natural variation makes it possible to observe a biological system across many contexts and potentially learn underlying principles that govern its function 4,5 .
Here we sought to recognize changes that multiple cell types in the human brain characteristically implement together.The need to be able to recognize tissue-level gene-expression programs comes from a simple but important idea in the physiology of the brain and other tissues: cells of different types collaborate to perform essential functions, working together to construct and regulate structures such as synaptic networks.
We analysed the prefrontal cortex of 191 human brain donors using single-nucleus RNA sequencing (snRNA-seq) and developed a computational approach, based on latent-factor analysis, to recognize commonly recurring multicellular gene-expression patterns in such data.Tissue-level programs of which the expression varies across individuals could provide new ways to understand healthy brain function and also brain disorders, as disease processes probably act through endogenous pathways and programs in cells and tissues.
A longstanding challenge in genetically complex brain disorders is to identify the aspects of brain biology on which disparate genetic effects converge; here we applied this idea to try to better understand schizophrenia.

snRNA-seq analysis of the dlPFC
We analysed the dorsolateral prefrontal cortex (dlPFC; Brodmann area 46), which serves working memory, attention, executive functions and cognitive flexibility 6 , abilities that decline in schizophrenia and with advancing age 1,2 .Analyses included frozen post-mortem dlPFC samples from 191 donors (aged 22-97 years, median 64 years), including 97 without known psychiatric conditions and 94 affected by schizophrenia (Extended Data Fig. 1 and Supplementary Table 1).To generate data that were well controlled across donors and therefore amenable to integrative analysis, we processed a series of 20-donor sets of dlPFC tissue, each as a single pooled sample (or village 7 ; Fig. 1a) and then, during computational analysis, we used combinations of many transcribed single-nucleotide polymorphisms (SNPs) to identify the source donor of each nucleus (Fig. 1a,b and Extended Data Fig. 2).

Article
Each of the 1,217,965 nuclei was classified into one of seven cell types-glutamatergic neurons (43% of all nuclei), GABAergic neurons (20%), astrocytes (15%), oligodendrocytes (12%), polydendrocytes (oligodendrocyte progenitor cells, 5.5%), microglia (3.6%) and endothelial cells (1.3%) (Fig. 1c and Supplementary Fig. 1)-as well as neuronal subtypes defined in earlier taxonomies (Fig. 1d,e and Supplementary Figs. 2 and 3).Each donor contributed nuclei of all types and subtypes (Supplementary Figs. 1, 4 and 5), although subsequent analyses excluded 11 atypical samples (Supplementary Fig. 1d).CT, corticothalamic; ET, extratelencephalic; IT, intratelencephalic; NP, nearprojecting.f, Latent factor analysis.Cell-type-resolution expression data from all donors and cell types were combined into a single analysis.Latent factor analysis identified constellations of gene-expression changes that consistently appeared together.g, The cell type specificity of the latent factors inferred from 180 donors, shown as the cell type distributions of the 1,000 most strongly loading gene-cell type combinations per factor.Factors 4-7 and 10 are strongly driven by gene-expression co-variation spanning multiple cell types.h, The association of schizophrenia (SCZ) with interindividual variation in the expression levels of the ten latent factors in g, shown as a quantile-quantile plot comparing the observed schizophrenia associations with the ten factors (−log 10 [P]) to the distribution of association statistics expected by chance; only LF4 significantly associated with schizophrenia.See also Supplementary Fig. 6. i, The relationship between quantile-normalized LF4 donor expression levels and age (Spearman's ρ; n = 180 donors).The shaded regions represent the 95% confidence intervals.j, Quantile-normalized LF4 donor scores (n = 93 controls, 87 cases), adjusted for age.The P value was calculated using a two-sided Wilcoxon rank-sum test.For the violin plot, the box limits show the interquartile range, the whiskers show 1.5× the interquartile interval, the centre lines show the median values and the notches show the confidence intervals around the median values.

Inference of multicellular gene programs
The data revealed substantial interindividual variation in cell-typespecific gene expression levels, with highly expressed genes in each cell type exhibiting a median coefficient of variation (across donors) of about 15%.
Interindividual variation in gene expression almost certainly arises from cell-type-specific gene-expression programs, and could in principle also be shaped by concerted changes in multiple cell types.To identify such relationships, we applied latent factor analysis, a form of machine learning that infers underlying factors from the tendency of many measurements to fluctuate together 8 .Critically, we analysed cell-type-resolution data from all cell types at once, using interindividual variation to enable the recognition of relationships between expression patterns in different cell types (Fig. 1f).Each inferred factor was defined by a set of gene-by-cell-type loadings (revealing the distinct genes it involves in each cell type) and a set of expression levels (of the factor) in each donor (Fig. 1f).
Ten latent factors together explained 30% of interindividual variation in gene expression levels; these factors appeared to be independent of one another in their gene use patterns (loadings) and their expression levels across the individual donors (Extended Data Fig. 3a-d).Interindividual variation in the factors' inferred expression levels arose from interindividual variation within each 20-donor experimental set (Extended Data Fig. 3e).Each factor was primarily driven by gene expression in one or a few cell types (Fig. 1g).
Schizophrenia was associated with just one of these latent factors (LF4) (Fig. 1h, Extended Data Fig. 4a-e and Supplementary Table 2)-a factor that was also associated with donor age (Fig. 1i).Donors with and without schizophrenia both exhibited the decline in LF4 with age (Fig. 1i and Extended Data Fig. 1c,d).Joint regression analysis confirmed independent decreases in LF4 expression by age and in schizophrenia, and detected no effect of sex (Supplementary Table 3).
Factors similar to LF4 emerged in all analyses testing LF4's robustness to analysis parameters (Supplementary Fig. 6).The LF4 expression scores of individuals also did not correlate with medication use, time of day at death, post-mortem interval or sequencing depth (Extended Data Fig. 4f-k).We also found evidence that the LF4 constellation of gene-expression changes manifests at the protein level (Supplementary Fig. 7).

Neuronal and astrocyte genes driving LF4
Of the 1,000 gene/cell-type expression traits with the strongest LF4 loadings, 99% involved gene expression in glutamatergic neurons (610), GABAergic neurons (125) or astrocytes (253) (Fig. 1g).LF4 involved similar genes and expression effect directions in glutamatergic and GABAergic neurons, but a distinct set of genes and effect directions in astrocytes (Fig. 2a and Extended Data Fig. 4l).To identify biological processes in LF4, we applied gene set enrichment analysis (GSEA) 9 to the LF4 gene loadings, separately for each cell type.
In both glutamatergic and GABAergic neurons, LF4 involved increased expression of genes with synaptic functions (Fig. 2b, Extended Data Fig. 4m and Supplementary Table 4).The most strongly enriched synaptic annotations for both glutamatergic and GABAergic neurons involved the synaptic vesicle cycle and the presynaptic compartment; the core genes driving these enrichments encoded components of the SNARE complex and their interaction partners (STX1A, SNAP25 and SYP), effectors and regulators of synaptic vesicle exocytosis (SYT11, RAB3A and RPH3A) and other synaptic vesicle components (SV2A and SYN1).In glutamatergic neurons, LF4 also appeared to involve genes encoding postsynaptic components, including signalling proteins (PAK1, GSK3B and CAMK4) and ion channels and receptors (CACNG8, KCNN2, CHRNB2, GRM2 and GRIA3).
People with schizophrenia and people of advanced age exhibited reduced levels of synapse-related gene expression by cortical neurons of all types (Fig. 2c and Extended Data Fig. 5).
In astrocytes, LF4 involved gene-expression effects distinct from those in neurons (Fig. 2a and Extended Data Fig. 4l).Gene sets with roles in fatty acid and cholesterol biosynthesis and export, including genes encoding the SREBP1 and SREBP2 transcription factors and their regulators and targets, were positively correlated with LF4 and underexpressed in the cortical astrocytes of donors with schizophrenia (Fig. 2d and Supplementary Table 4) or advanced age (Extended Data Fig. 6a).These effects appeared to be specific to astrocytes relative to other cell types (Extended Data Fig. 7).

Concerted neuron-astrocyte expression
To understand these results in terms of specific biological activities, we focused on gene sets corresponding to neuronal synaptic components and three kinds of astrocyte activities: adhesion to synapses, uptake of neurotransmitters and cholesterol biosynthesis (see the 'Selected gene sets' section of the Methods).
The proportion of astrocyte gene expression devoted to each of these three astrocyte activities was strongly correlated with the proportion of neuronal gene expression devoted to synaptic components (Fig. 2e and Supplementary Fig. 8), even after adjusting for age and casecontrol status (Extended Data Fig. 8).Donors with schizophrenia, as well as donors with advanced age, tended to have reduced expression of these genes (Fig. 2e and Extended Data Fig. 6).
As this gene expression program involves concerted effects on the expression of (distinct) genes for synaptic components in neurons and astrocytes, we call it SNAP, although it also involves genes with unknown functions and involves more modest expression effects in additional cell types.We used the LF4 expression scores of donors to measure SNAP expression.

Astrocyte gene programs and SNAP
To better appreciate the astrocytic contribution to SNAP, we further analysed the RNA-expression data from 179,764 individual astrocytes.The analysis readily recognized a known, categorical distinction among three subtypes of adult cortical astrocytes: protoplasmic astrocytes, which populate the grey matter and were the most abundant subtype; fibrous astrocytes; and interlaminar astrocytes (Fig. 3a and Extended Data Fig. 9a-d).Neither schizophrenia nor age were associated with variation in the relative abundances of these subtypes (Extended Data Fig. 9e,f).
We next identified latent factors that collectively explained 25% of quantitative gene-expression variation among individual astrocytes (using consensus non-negative matrix factorization (cNMF) 10 , which better scaled to the single-cell-level data) (Extended Data Fig. 10a,b).The factors appeared to capture diverse biological activities, including translation (cNMF1); zinc and cadmium ion homeostasis (cNMF7); and inflammatory responses (cNMF8) (Supplementary Table 5).One factor (cNMF2) corresponded to the astrocyte component of SNAP (Extended Data Fig. 10c-e and Supplementary Table 6); the strong co-expression relationships in SNAP were therefore robust to the computational approach used (Extended Data Fig. 10c-e and Supplementary Fig. 9).
As cNMF2 is informed by variation in the single-astrocyte expression profiles, we consider it a more precise description of the astrocyte-specific gene-expression effects in SNAP, and refer to it here as SNAP-a.Across donors, the average astrocyte expression of SNAP-a was associated even more strongly with schizophrenia case-control status and with age (Fig. 3b-e and Extended Data Fig. 10f-i).
The strongest positive gene-set associations to SNAP-a involved adhesion to synaptic membranes and intrinsic components of synaptic membranes (Supplementary Table 5).The 20 genes most strongly Article 0.8 0.9 1.0   (left).Right, the proportion of astrocytic gene expression devoted to the annotated cholesterol biosynthesis genes shown, across 180 donors.The P value was calculated using a two-sided Wilcoxon rank-sum test.e, Concerted gene-expression variation in neurons and astrocytes.The relationships (across 180 donors) between astrocytic gene expression related to three biological activities (synapse adhesion, neurotransmitter uptake and cholesterol biosynthesis) and neuronal gene expression related to synapses (Spearman's ρ).Quantities plotted are the fraction of all detected nuclear mRNA transcripts (UMIs) derived from these genes in each donor's astrocytes (x axis) or neurons (y axis) relative to the median expression among control donors.The shaded regions represent the 95% confidence intervals for the estimated slopes.For the box plots nested within the violin plots in c and d, the box limits show the interquartile range, the whiskers show 1.5× the interquartile interval, the centre line shows the median value and the notches show the confidence intervals around the median values.
We sought to relate SNAP-a to an emerging appreciation of astrocyte heterogeneity and its basis in gene expression 13 .An earlier analysis of  d, The relationship between donor quantile-normalized SNAP-a expression scores and age (Spearman's ρ).n = 180 donors.The shaded regions represent the 95% confidence intervals.e, The distributions of SNAP-a donor scores (age adjusted and quantile normalized) for people with and without schizophrenia.n = 93 controls, 87 cases.The P value was calculated using a two-sided Wilcoxon rank-sum test.For the box plots, the box limits show the interquartile range, the whiskers show 1.5× the interquartile interval, the centre line shows the median value and the notches show the confidence intervals around the median values.f-j, Similar plots to those in a-e, respectively, but for the L5 IT glutamatergic neuron contribution to SNAP (SNAP-n).n = 75,929 nuclei.Exc, excitatory neuron subtype.k, Variation in the expression levels across 180 individual persons (columns, ordered from left to right by SNAP expression levels) of a select set of strongly SNAP-recruited genes (rows) in astrocytes (left panel) and L5 IT glutamatergic neurons (right panel) of the 180 brain donors.One set of genes (SNAP-a; top) exhibits co-regulation in astrocytes; and a distinct set of genes (SNAP-n; bottom) exhibits co-regulation in neurons.Genes indicated by asterisks and hashes are at genomic loci associated with common and rare genetic variation in schizophrenia, respectively 22 .The grey bars indicate that regulon activity was not detected.astrocyte molecular and morphological diversity in mice identified gene-expression modules based on their co-expression relationships 14 .SNAP-a exhibited the strongest overlap (P = 3.5 × 10 −4 , q = 0.015, gene set enrichment analysis (GSEA)) (Supplementary Table 5) with the module that had correlated most closely with the size of the territory covered by astrocyte processes (the turquoise module in ref. 14, with overlap driven by genes including EZR and NTM).A potential interpretation is that SNAP-a supports these perisynaptic astrocytic processes 15 .
Earlier studies identified reactive astrocyte states that are induced by strong experimental perturbations and injuries, and were described as polarized cell states 16 .We found that more than half of the human orthologues of markers for these states were expressed at levels that correlated negatively and in a continuous, graded manner with SNAP-a expression (Extended Data Fig. 11).At the single-astrocyte level, SNAP-a expression exhibited continuous, quantitative variation rather than discrete state shifts (Extended Data Fig. 10f,g), consistent with observations of abundant astrocyte biological variation less extreme than experimentally polarized states 17 .
We performed an analogous cNMF analysis on the RNA-expression profiles of 75,929 glutamatergic neurons, focusing on a single, abundant subtype so that the variation among individual cells would be driven primarily by dynamic cellular programs rather than by subtype identity (Fig. 3f).One factor corresponded to the neuronal geneexpression effects of SNAP; we refer to this factor as SNAP-n (Fig. 3g-j and Supplementary Table 7).Like SNAP-a, the average expression of SNAP-n was associated with age and with schizophrenia (Fig. 3i,j).SNAP-n and SNAP-a were associated with each other still more strongly, even in a control-only age-adjusted analysis, highlighting the close coupling of neuronal and astrocyte gene expression (Extended Data Fig. 12).Although SNAP-n was associated with synaptic gene sets, the specific genes driving these enrichments were distinct from those driving SNAP-a (Fig. 3k, Supplementary Fig. 11 and Supplementary Table 8).
Expression of SNAP-a and SNAP-n was associated with the expression of many transcription factors and their predicted targets, and engaged distinct pathways in astrocytes and neurons (Fig. 3k and Extended Data Figs.12c and 13b): for example, SREBP1 and its well-known transcriptional targets 18 in astrocytes, and JUNB (AP-1) and its well-known targets 19,20 in neurons (Extended Data Fig. 14) (the latter may reflect average neuronal activity levels in the PFC, which neuroimaging has found to decline (hypofrontality) in schizophrenia 21 ).SNAP-a expression in astrocytes was also associated with a RORB regulon (underexpressed in SNAP low donors) and a KLF6 regulon (overexpressed in SNAP low donors) (Fig. 3k and Extended Data Fig. 13b); common genetic variation at RORB and KLF6 is associated with schizophrenia 22 .

Schizophrenia genetics and SNAP
A key question when studying disease through human post-mortem tissue is whether observations involve disease-causing/diseaseexacerbating processes, or reactions to disease circumstances such as medications.We found no relationship between SNAP expression and donor use of antipsychotic medications (Extended Data Fig. 4j,k), or between cholesterol-biosynthesis gene expression in astrocytes and donor statin intake (Extended Data Fig. 7b), but this does not exclude the possibility that astrocytes are primarily reacting to disease-associated synaptic hypofunction in neurons, as opposed to contributing to such hypofunction.
Human genetic data provide more powerful evidence, as inherited alleles affect risk or exacerbate disease processes rather than being caused by disease.We therefore sought to evaluate the extent to which SNAP-a and SNAP-n involved genes and alleles implicated by genetic studies of schizophrenia.
Previous research [22][23][24] found that genes expressed most strongly by neurons (relative to other cell types), but not genes expressed most strongly by glia, are enriched for the genes implicated by genetic analyses in schizophrenia [22][23][24] ; we replicated these findings in our data (Fig. 4a and Supplementary Note).However, such analyses treat cell types as fixed levels of gene expression (cell identities), rather than as collections of dynamic transcriptional activities; SNAP-a involves a great many genes that are also strongly expressed in other cell types.
We found that the genes that are dynamically recruited by SNAP-a in astrocytes were enriched in genetic signals for schizophrenia: they were 14 times more likely than other protein-coding genes to reside at genomic loci implicated by common genetic variation in schizophrenia (P = 5 × 10 −25 , 95% confidence interval = 8.7-24, logistic regression) and 7 times more likely to have strong evidence from rare variants in schizophrenia (95% confidence interval = 2.3-21, P = 5 × 10 −4 , logistic regression) (Supplementary Note).
To evaluate whether common variation in the genes recruited by SNAP-a contributes more broadly to schizophrenia risk, beyond these strongest associations, we used gene-level association statistics from the largest schizophrenia genome-wide association study to date 22,25 .As expected, the strongest neuron-identity genes (as defined in the earlier work) exhibited elevated schizophrenia association, whereas the strongest astrocyte-identity genes did not (Fig. 4a and Supplementary Note).However, in the same analysis, the genes most strongly associated with SNAP-a and SNAP-n were highly significant as additional predictive factors, particularly the genes associated with SNAP-a (Fig. 4a).Analysis by linkage disequilibrium (LD) score regression 26 also confirmed enrichment of schizophrenia risk factors among SNAP-a genes (Supplementary Fig. 12).
Polygenic risk involves thousands of common alleles across the genome, of which the effects converge on unknown biological processes.A polygenic risk score for schizophrenia was associated with reduced expression of SNAP but not with the other latent factors (Fig. 4b and Supplementary Fig. 13).Higher polygenic risk was also associated with a greater decrease in SNAP among people with schizophrenia (Fig. 4b).
To better understand such relationships, we examined the relationship between SNAP-a and genetic risk through two specific genes: neurexin-1 (NRXN1) and complement component 4 (C4).
Exonic deletions within NRXN1 greatly increase the risk for schizophrenia 27,28 .Our data indicate that astrocytic, but not neuronal, NRXN1 expression was reduced in people with schizophrenia and among people aged over 70 years (Fig. 4c and Extended Data Fig. 15a,b).Interindividual variation in astrocytic NRXN1 expression was strongly associated with SNAP-a (Fig. 4d).
An increased copy number of the complement component 4 (C4A) gene more modestly increases the risk for schizophrenia 29 ; however, far more interindividual variation in C4 gene expression (>80%) arises from unknown, dynamic effects on C4 expression 29,30 .We found that astrocytes, rather than neurons or microglia, are the main site of C4 (including C4A and C4B) RNA expression in the human prefrontal cortex (Fig. 4e and Extended Data Fig. 15c).Donors with lower-than-average expression of SNAP-a tended to have greatly increased C4 expression: such donors included 43 out of the 44 donors with the highest C4 expression levels, and their astrocytes expressed 3.2-fold more C4 compared with astrocytes of donors with above-average expression of SNAP-a (Fig. 4f).C4 expression was also greatly increased among donors aged over 70 years (Extended Data Fig. 15d,e).

Discussion
Here we identified SNAP-concerted gene-expression programs implemented by cortical neurons and astrocytes to corresponding degrees in the same individuals.SNAP expression varied even among unaffected control brain donors and may be a core axis of human neurobiological variation, with potential implications for cognition and plasticity that will be important to understand.SNAP appears to involve many genes that contribute to synapses and to astrocyte-synapse interactions 31,32 (Figs. 2 and 3k, Supplementary Table 9 and Supplementary Figs. 10 and 11).The genes associated with SNAP-a suggested a potential role in supporting perisynaptic astrocyte processes, motile, morphologically plastic astrocyte projections whose interactions with synapses can promote synaptic stability 15 .Diverse lines of study increasingly reveal a key role for astrocytes in regulating the ability of synaptic networks to acquire and learn new information, for example, by lowering thresholds for activity and synaptic plasticity 33,34 .
A notable aspect of SNAP involved the astrocytic regulation of genes with roles in fatty acid and cholesterol biosynthesis and cholesterol export, which strongly correlated (across donors) with expression of synaptic-component genes by neurons (Fig. 2d,e).Earlier research has defined a potential rationale for this neuron-astrocyte coordination: synapses and dendritic spines-synapse-containing morphological structures-require large amounts of cholesterol, which astrocytes supply 35 .Decreases in cholesterol biosynthesis have previously been noted in mouse models of brain disorders 36,37 that (like schizophrenia and ageing) involve cognitive losses, cortical thinning and reduction in neuropil.
Schizophrenia and ageing both brought substantial reductions in SNAP expression (Fig. 1i,j).Neuropsychological, neuroimaging and neuronal microstructural studies have long noted similar changes in schizophrenia and ageing 1,2,[38][39][40][41][42][43][44][45][46][47] .Inherited genetic risk for schizophrenia is associated with decreased measures of cognition in older individuals 48,49 , and schizophrenia greatly increases the risk of dementia later in life 50 .Our results suggest that these relationships between schizophrenia and ageing arise from shared cellular and molecular changes.
Underexpression of SNAP could, in principle, underlie longstanding microstructural observations [41][42][43][44][45][46][47] of reduced numbers of dendritic spines on cortical neurons in older humans and primates and in people with schizophrenia.These microstructural observations appear to arise from highly plastic thin spines and may therefore reflect reduced rates of continuous synapse formation and stabilization (rather than pruning of mature synapses) [42][43][44][45][46][47] .The gene-expression changes that we observed in the human dlPFC (Fig. 2c) suggest that cortical neurons of all types, including glutamatergic and GABAergic neurons, may be affected by such changes.
It is intriguing to consider whether pharmacotherapies or other interventions could be developed to promote SNAP as a way to address cognitive symptom domains in schizophrenia and ageing such as cognitive flexibility, working memory and executive function deficits, continuous and disabling features that are typically not improved by available treatments 1 .
An important future direction will be to determine the extent to which SNAP is present in other brain areas, and the relationship of SNAP with molecular and physiological changes in dendrites, synapses and perisynaptic astrocyte processes.Additional questions involve the molecular mechanisms that accomplish neuron-astrocyte coordination and the extent to which SNAP supports learning and/or cognitive flexibility.
SNAP was made visible by human interindividual biological variation.Although controlled laboratory experiments usually try to eliminate genetic and environmental variation, natural variation may be able to reveal cell-cell coordination and regulatory programs in many tissues and biological contexts, offering new ways to identify pathophysiological processes within and beyond the human brain.

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-024-07109-5.

Ethical compliance
Brain donors were recruited by the Harvard Brain Tissue Resource Center/NIH NeuroBioBank (HBTRC/NBB), in a community-based manner, across the United States.Human brain tissue was obtained from the HBTRC/NBB.The HBTRC procedures for informed consent by the donor's legal next-of-kin and distribution of de-identified post-mortem tissue samples and demographic and clinical data for research purposes are approved by the Mass General Brigham Institutional Review Board.Post-mortem tissue collection followed the provisions of the United States Uniform Anatomical Gift Act of 2006 described in the California Health and Safety Code section 7150 and other applicable state and federal laws and regulations.Federal regulation 45 CFR 46 and the associated guidance indicate that the generation of data from de-identified post-mortem specimens does not constitute human participant research that requires institutional review board review.

Donors for snRNA-seq
Donor information with anonymized donor IDs is available in Supplementary Table 1.Consensus diagnosis of schizophrenia was performed by retrospective review of medical records and extensive questionnaires concerning social and medical history provided by family members.Several regions from each brain were examined by a neuropathologist.We excluded participants with evidence for gross and/or macroscopic brain changes, or with clinical history consistent with cerebrovascular accident or other neurological disorders.Participants with Braak stage III or higher (modified Bielchowsky stain) were excluded.None of the participants had substantial reported history of substance dependence within 10 or more years from death, as further corroborated by negative toxicology reports.The absence of recent substance abuse is typical for samples from the HBTRC, which receives exclusively community-based tissue donations.
Exposure to psychotropic and neurotropic medications was assessed on the basis of medical records.Estimated daily milligram doses of antipsychotic drugs were converted to the approximate equivalent of chlorpromazine as a standard comparator 51 .These values are reported as lifetime, as well as last six months of life, grams per patient.Exposure to other classes of psychotropic drugs was reported as present or absent.

Single-nucleus library preparation and sequencing
We analysed the dlPFC (Brodmann area 46 (BA46)), which exhibits functional and microstructural abnormalities in schizophrenia 52,53 and in ageing 46 .Frozen tissue blocks containing BA46 were obtained from the HBTRC.We used snRNA-seq rather than single-cell RNA-seq to avoid effects of cell morphology on ascertainment, and because nuclear (but not plasma) membranes remain intact in frozen post-mortem tissue.Nuclear suspensions from frozen tissue were generated according to a protocol that we have made available at Protocols.io(https:// doi.org/10.17504/protocols.io.4r3l22e3xl1y/v1).To ensure that batch compositions were balanced, researchers were not blinded to the batch allocation or processing order of each specimen.To maximize the technical uniformity of the snRNA-seq data, we processed sets of 20 brain specimens (each consisting of affected and control donors) at once as a single pooled sample.Specimens were allocated into batches of 20 specimens per batch, ensuring that the same number of cases and age-matched controls (10 per group), and men and women (10 per group) were included in each batch.Some donors were resampled across multiple batches to enable quality-control analyses (Extended Data Fig. 2).Specimens from cases and age-matched controls were also processed in alternating order within each batch.Researchers had access to unique numerical codes assigned to the donor-of-origin of each specimen as well as basic donor metadata (for example, casecontrol status, age, sex).
From each donor, 50 mg of tissue was dissected from the dlPFCsampling across the cortical layers and avoiding visible concentrations of white matter-and used to extract nuclei for analysis.Generation of gel beads -in-emulsion and library preparation was performed according to the 10x Chromium Single Nuclei 3′ v3.1 protocol (version CG000204_ChromiumNextGEMSingleCell3'v3.1_RevD).We encapsulated nuclei into droplets using approximately 16,500 nuclei per reaction, understanding that about 95% of all doublets (cases in which two nuclei were encapsulated in the same droplet) would consist of nuclei from distinct donors and therefore be recognized by the Dropulation analysis 7 as containing combinations of SNP alleles from distinct donors.cDNA amplification was performed using 13 PCR cycles.
Raw sequencing reads were aligned to the hg38 reference genome using the standard Drop-seq (v.2.4.1) 54 workflow, modified so that reads from C4 transcripts would not be discarded as multi-mapping (see the 'MetaGene discovery' section below).Reads were assigned to annotated genes if they mapped to exons or introns of those genes.Ambient/ background RNA was removed from digital gene expression (DGE) matrices using CellBender (v.0.1.0) 55remove-background.

Genotyping and donor assignment from snRNA-seq data
We used combinations of hundreds of transcribed SNPs to assign each nucleus to its donor of origin using Dropulation (v.2.4.1) 7 .Previous Dropulation analyses of stem cell experiments used whole-genome sequencing (WGS) data on the individual donors for such analyses 7 .For this study, we developed a cost-efficient approach based on SNP array data with imputation.Genomic DNA from the individual brain donors was genotyped by SNP array (Illumina GSA).
Raw Illumina IDAT files from the GSAMD-24v1-0_20011747 array (2,085 samples) and GSAMD-24v3-0-EA_20034606 array (456 samples) were genotyped using GenCall (v.3.0.0) 56 and genotypes were phased using SHAPEIT4 (v.4.2.2) 57 by processing the data through the MoChA workflow (v.2022-12-21) 58,59 (https://github.com/freeseek/mochawdl) using the default settings and aligning markers against the GRCh38 genome.APOE genotypes for marker rs429358 were removed due to unreliable genotypes.To improve phasing, genotypes from the McLean cohort were combined with genotypes from the Genomic Psychiatry Cohort with IDAT files available also from the GSAMD-24v1-0_20011747 array (5,689 samples) 60 .After removing 128 samples recognized as duplicates, phased genotypes were then imputed using IMPUTE5 (v.1.1.5) 61by processing the output data from the MoChA workflow using the MoChA imputation workflow and using the high-coverage 1000 Genomes reference panel for GRCh38 62 , including 73,452,470 non-singleton variants across all the autosomes and chromosome X.Only SNPs with imputation quality INFO > 0.95 were used for donor assignments.Using this approach, we found that 99.6% of nuclei could be assigned confidently to a donor (Extended Data Fig. 2a).
To evaluate the accuracy of this method of donor assignment, we genotyped a pilot cohort of 11 donors using both WGS and SNP array.Importantly, the two methods had 100% concordance on the assignment of individual nuclei to donors, validating both our computational donor-assignment method and the sufficiency of the SNPs-plus-imputation approach (Extended Data Fig. 2c).SNP data for the individual donors are available at NeMO (https://assets.nemoarchive.org/dat-bmx7s1t).
After donor assignment, DGE matrices from all libraries in each batch (7 to 8 libraries per batch) were merged for downstream analyses.
Cell types.Model training.The classification model used for celltype assignments was trained on the DGE matrix from batch 6 (BA46_2019-10-16), which was annotated as follows.Nuclei with fewer than 400 detected genes and 100 detected transcripts were removed from the DGE matrix from this batch.After normalization and variable gene selection, the DGE matrix was processed through an initial clustering analysis using independent component analysis (ICA, using fastICA (v.1.2-1)) 71as previously described 72 .This analysis produced clustering solutions with 43 clusters of seven major cell types (astrocytes, endothelial cells, GABAergic neurons, glutamatergic neurons, microglia, oligodendrocytes and polydendrocytes) that could be identified based on expression of canonical marker genes (markers in Supplementary Fig. 1) (note that around 9% of cells within clusters annotated as endothelial cells do not express canonical endothelial cell markers but, rather, those of pericytes; these ~1,400 cells have been grouped together with endothelial cells for downstream analyses).scPred was trained on this annotated DGE matrix, and the resulting model was subsequently used to make cell-type assignments for the remaining batches' DGE matrices.Filtering.After an initial cell-type classification using the above model, the DGE matrices were filtered further to remove any remaining heterotypic doublets missed by scPred.First, raw DGE matrices from each of the 11 batches were subsetted to form separate DGE matrices for each of the 7 major cell types (77 subsetted DGE matrices total).Each subsetted DGE matrix was normalized using sctransform (v.0.3.1) 64with 7,000 variable features, scaling and centring.For each cell type, normalized DGE matrices from the 11 batches were merged and clustered together in Scanpy (v.1.9.1) 70 using 50 principal components, batch correction by donor using BBKNN (v.1.5.1) 73 and Leiden clustering using a range of resolutions.The most stable clustering resolution for each cell type was selected using clustree (v.0.4.4) 74 .Clusters expressing markers of more than one cell type were determined to be heterotypic doublets; cell barcodes in these clusters were discarded from the above DGE matrices, and these filtered DGE matrices were then carried forward for integrated analyses across batches.
Neuronal subtypes.Classification models for neuronal subtypes were trained using DGE matrices from a previous study 75 that were subsetted to glutamatergic or GABAergic neuron nuclei in middle temporal gyrus (MTG).Although a similar dataset exists for human brain nuclei from the primary motor cortex (M1) 76 , we trained the model only on the MTG dataset as the M1 lacks a traditional layer 4 (L4), whereas BA46 does have a L4.
We made the following adjustments to the MTG annotations before model training.First, as subtype-level annotations (for example, L5 IT, as used previously 76 for M1) were not available for the MTG dataset, we inferred these based on M1/MTG cluster correspondences (from extended data figure 10 in ref. 76).Second, we reassigned the following glutamatergic neuron types in the MTG from the L4 IT subtype (as inferred by integration with M1 in ref. 76) to the L2/3 IT subtype: Exc L3−5 RORB FILIP1L, Exc L3−5 RORB TWIST2 and Exc L3−5 RORB COL22A1.This was done on the basis of their properties described in other studies-for example, the Exc L3−5 RORB COL22A1 type has been described as a deep L3 type by Patch-seq 77 -and by the expression of their marker genes on a two-dimensional projection of the RNA-expression profiles of glutamatergic neuron nuclei (Supplementary Fig. 2).

Donor exclusion
Donors were excluded on the basis of unusual gene-expression profiles and/or cell-type proportions (potentially related to agonal events) as outlined below.
Expression.Donors with fewer than 1,000 total UMIs in any cell type were first excluded.Next, for each cell type, gene-by-donor expression matrices comprising the remaining donors were scaled to 100,000 UMIs per donor and filtered to the top expressing genes (defined as having at least 10 UMIs per 100,000 for at least one donor; these were among the top 12-19% of expressed genes).These filtered expression matrices by cell type were merged into a single expression matrix that was used to calculate each donor's pairwise similarity to the other donors (Pearson correlations of log 10 -scaled expression values across genes).The median of these pairwise correlation values was determined to be the conformity score for each donor.To identify outliers, these donor conformity scores were converted to modified z scores (M i ) for each donor as described previuously 79 : where x i is the donor's conformity score, ∼ x is the median of donor con- formity scores and MAD is the median absolute deviation of donor conformity scores.Donors whose modified z scores had absolute values of >5 were excluded.This approach flagged a total of five donors (one who had low UMI counts and four who were outliers on the basis of expression).
Cell-type proportions.Each donor's pairwise similarity to the other donors was determined on the basis of cell-type proportions (that is, the values plotted in Supplementary Fig. 1c,d).Donor conformity scores and modified z scores based on these values were calculated for each donor using the same approach described above for expression values.Donors whose modified z scores had absolute values of >15 were excluded.This approach flagged a total of nine donors, two of whom were also flagged as expression outliers.
Between the two approaches, in total, 11 unique donors were flagged as outliers (4 control, 7 schizophrenia) and excluded from downstream analyses.
Latent factor analysis snRNA-seq data.Our approach was to (1) create a gene-by-donor matrix of expression measurements for each of seven cell types; (2) concatenate these matrices into a larger matrix in which each gene is represented multiple times (once per cell type); and (3) perform latent factor analysis 8,80 on this larger matrix.We selected probabilistic estimation of expression residuals (PEER) 81 over other approaches (such as principal component analysis (PCA)) for inferring latent variables as it is more sensitive and less dependent on the number of factors modelled.A major pitfall to avoid when performing latent factor analysis is obtaining highly correlated factors due to overfitting.The latent factors that we have inferred are independent from each other when we compare their gene loadings (Extended Data Fig. 3c), enabling us to proceed with downstream analyses based on these factors.
Raw, filtered DGE matrices from each of the 11 batches were subsetted to form separate DGE matrices for each of the 7 major cell types (77 subsetted DGE matrices total).For each subsetted DGE matrix, cell barcodes from outlier donors were excluded, the DGE matrix was normalized using sctransform (v.0.3.1) 64with 3,000 variable features, and the output of Pearson residual expression values (with all input genes returned) was exported to a new DGE matrix.For each cell type, these new expression values in the 11 normalized DGE matrices were summarized across donors (taking the sum of residual expression values) to create a gene-by-donor expression matrix.Each of these expression matrices was filtered to the top 50% of expressed genes (based on feature counts scaled to 100,000 transcripts per donor), yielding expression matrices with approximately 16,000 to 18,000 genes per cell type.Within each expression matrix, each gene name was modified with a suffix to indicate the cell type of origin (for example, ACAP3 to ACAP3_astrocyte), and the seven expression matrices were combined to produce a single expression matrix with expression values from all seven cell types for each donor (a schematic is shown in Fig. 1f).This expression matrix was used as the input to latent factor analysis with PEER (v.1.0) 81using the default parameters and a range of requested factors k.
Although we looked for correlations between these factors and technical variables, these analyses were negative, with one exception: latent factor 2 (LF2) appeared to capture quantitative variation in the relative representation of deep and superficial cortical layers in each dissection (Extended Data Fig. 3f).
Latent factor donor expression values were adjusted for age by taking the residuals from a regression of the donor expression values against age.
To improve the visualization of latent factor donor expression values while leaving the results of statistical analyses unchanged, quantile-normalized values were calculated in R using the function qnorm(rank(x)/(length(x) + 1)).The figure legends indicate when these quantile-normalized values are used.
Proteomics data.Protein intensities from the LRRK2 Cohort Consortium (LCC) cohort of a previous study 82 were downloaded from the Pro-teomeXchange Consortium (PXD026491) and subset to those peptides that passed the q-value threshold in at least 25% of all analysed samples.These were further subset to intensities from control donors without the LRRK2(G2019S) mutation and without erythrocyte contamination (n = 22 donors).After normalization of the protein intensities using sctransform (v.0.3.1) 64, the output of Pearson residual expression values (with all input proteins returned) was exported to a new matrix.This matrix of normalized protein intensities was used as the input to latent factor analysis with PEER (v.1.0) 81using the default parameters.
For comparisons of CSF protein loadings to SNAP gene loadings in Supplementary Fig. 7, each gene in SNAP was represented by a single composite loading representing gene loadings from all cell types.This composite loading was determined for each gene by first calculating the median expression of each gene (in each cell type), then calculating a new loading onto SNAP weighted across cell types by these median expression values.

GSEA
For GSEA 9,86 of latent factors inferred by PEER, the C5 Gene Ontology collection (v.7.2) 87,88 from the Molecular Signatures Database 89,90 was merged with the SynGO (release 20210225) 91 biological process (BP) and cell component (CC) gene lists.Gene sets from this merged database that were enriched in each latent factor were identified with GSEAPreranked in GSEA (v.4.0.3) 9,86 using 10,000 permutations and gene loadings as the ranking metric.
For astrocyte latent factors inferred by cNMF 10 , GSEA was performed as described above with the addition of the following custom gene sets to the database: • PGC3_SCZ_GWAS_GENES_1TO2_AND_SCHEMA1_GENES: a gene set comprising genes implicated in human-genetic studies of schizophrenia, including genes at 1-2 gene loci from GWAS (PGC3) 22 and genes with rare coding variants (FDR < 0.05) 23 .• Gene sets for each of the seven astrocyte subclusters identified in ref.
14. • Gene sets for each of the 62 colour module eigengenes identified by WGCNA in ref. 14.  • Gene sets for each of the six astrocyte subcompartments analysed in ref. 92, comprising genes encoding the proteins that were unique to or enriched in these subcompartments.
For L5 IT glutamatergic neuron latent factors inferred by cNMF, GSEA was performed as described above with the addition of the following custom gene sets to the database: • PGC3_SCZ_GWAS_GENES_1TO2_AND_SCHEMA1_GENES: a gene set comprising genes implicated in human genetic studies of schizophrenia, including genes at 1-2 gene loci from GWAS (PGC3 (ref.22)) and genes with rare coding variants (FDR < 0.05) 23 .

Selected gene sets
On the basis of the results of the GSEA described above, we selected several of the top-enriched gene sets for further analyses.These are referred to in the figures with labels modified for brevity, but are described in further detail below.Lists of genes in each gene set are provided in Supplementary Table 9.
• Integral component of postsynaptic density membrane (Extended Data Figs.6 and 8 and Supplementary Fig. 8): core genes contributing to the enrichment of GO:0099061 (v.7.2, integral component of postsynaptic density membrane) in the glutamatergic neuron component of LF4 (SNAP).• Neurotransmitter reuptake transporters (Fig. 2e, Extended Data Figs.6 and 8 and Supplementary Fig. 8): genes from among the 100 genes most strongly recruited by cNMF2 (SNAP-a) with known functions as neurotransmitter-reuptake transporters.These include core genes contributing to the enrichment of GO:0140161 (v.7.2, monocarboxylate: sodium symporter activity) in SNAP-a.• Presynapse (Extended Data Figs.6 and 8 and Supplementary Fig. 8): core genes contributing to the enrichment of GO:0098793 (v.7.2, presynapse) in the GABAergic neuron component of LF4 (SNAP).• Regulation of cholesterol biosynthesis (Fig. 2d,e, Extended Data Figs.6-8 and 13d and Supplementary Fig. 8): core genes contributing to the enrichment of GO:0045540 (v.7.2, regulation of cholesterol biosynthetic process) in the astrocyte component of LF4 (SNAP).This enrichment is of interest as cholesterol is an astrocyte-supplied component of synaptic membranes 35,93,94 .Products of this biosynthetic pathway also include other lipids and cholesterol metabolites with roles at synapses, including 24S-hydroxycholesterol, a positive allosteric modulator of NMDA receptors 95 .Although we refer to this gene set by this label based on its annotation by GO, we note that subsets of these genes contribute to cholesterol export and/or to synthesis of additional fatty acids.
Gene sets displayed in Fig. 2b are the SynGO terms most strongly enriched in each top-level category (among biological processes: process in the presynapse, synaptic signalling, synapse organization, process in the postsynapse, transport and metabolism, respectively).

Analysis of astrocyte and glutamatergic L5 IT neuron geneexpression programs
Consensus non-negative matrix factorization.cNMF (v.1.2) 10 was performed on both astrocyte and glutamatergic L5 IT neurons.We used cNMF owing to its scalability to the astrocyte and glutamatergic L5 IT neuron datasets.The cNMF protocol detailed in the tutorial for PBMCs at GitHub (https://github.com/dylkot/cNMF/blob/master/Tutorials/analyze_pbmc_example_data.ipynb) was followed for the initial data filtering and analysis.For both datasets, data were filtered to remove cells with fewer than 200 genes or 200 UMIs.Genes expressed in fewer than 10 cells were removed.Factorization was run on raw counts data after filtering, with iterations of factorization run for each k (factors requested), with a k ranging from 3 to 30.
The astrocyte raw counts data contained 179,764 cells and 42,651 genes, of which 0 cells and 9,040 genes were excluded.On the basis of PCA of the gene expression matrix and the cNMF stability report, factorization with k = 11 was selected for further analysis.The 11 cNMF factors together explained 25% of variation in gene expression levels among single astrocytes.
The L5 IT raw counts data contained 75,929 cells and 42,651 genes, of which 0 cells and 8,178 genes were excluded.On the basis of the PCA of the gene expression matrix and the cNMF stability report, factorization with k = 13 was selected for further analysis.The 13 cNMF factors together explained 44% of variation in gene expression levels among single L5 IT glutamatergic neurons.To align the direction of interpretation across all three analyses (SNAP, SNAP-a, and SNAP-n), we took the negative of cNMF factor 6 (SNAP-n) cell scores, gene loadings and donor scores.
The latent factor usage matrix (cell by factor) was normalized before analysis to scale each cell's total usage across all factors to 1.

Co-varying neighbourhood analysis.
To further assess the robustness of the astrocyte gene-expression changes represented by SNAP and SNAP-a, we used a third computational approach-co-varying neighbourhood analysis (CNA, v.0.1.4) 96.The protocol provided in the CNA tutorial at GitHub (https://nbviewer.org/github/yakirr/cna/blob/master/demo/demo.ipynb)was followed for data preprocessing and analysis.
Pilot association tests to find transcriptional neighbourhoods associated with schizophrenia case-control status were first performed using the default value for N null .These pilot analyses evaluated the effects of batch correction (by batch or donor) and covariate correction (by age, sex, post-mortem interval, number of UMIs or number of expressed genes).Nearly all analyses yielded highly similar neighbourhoods associated with case-control status with the same global P value (P = 1 × 10 −4 ), with the exception of batch correction by donor which yielded P = 1.The final association test described in Supplementary Fig. 9 was performed with an increased value for N null (N null = 1,000,000) and without additional batch or covariate correction.

Regulatory network inference
The goal of pySCENIC 97,98 is to infer transcription factors and regulatory networks from single-cell gene-expression data.The pySCENIC (v0.11.2) protocol detailed in the tutorial for PBMCs at GitHub (https://github.com/aertslab/SCENICprotocol/blob/master/notebooks/PBMC10k_ SCENIC-protocol-CLI.ipynb) was followed for the initial data filtering and analysis.For both astrocytes and L5 IT glutamatergic neurons, data were filtered to remove cells with fewer than 200 genes, and genes with fewer than 3 cells.Cells with high MT expression (>15% of their total transcripts) were removed.
The gene regulatory network discovery adjacency matrix was inferred by running Arboreto on the gene counts matrix and a list of all transcription factors provided by the authors (https://resources.aertslab.org/cistarget/tf_lists/allTFs_hg38.txt) to generate an initial set of regulons.This set was further refined using ctx, which removes targets that are not enriched for a motif in the transcription factor using a provided set of human specific motifs (https://resources.aertslab.org/cistarget/motif2tf/motifs-v9-nr.hgnc-m0.001-o0.0.tbl) and cis targets (https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc9nr/gene_based).Finally, aucell was run to generate the per-cell enrichment scores for each discovered transcription factor.

Super-enhancer analysis
Preparation of input BAM files was performed as follows.FASTQ files of bulk H3K27ac HiChIP data from the middle frontal gyrus 99 were downloaded from the Gene Expression Omnibus (GEO: GSM4441830 and GSM4441833).Demultiplexed FASTQ files were trimmed with Trimmomatic (v.0.33) 100 using the parameter SLIDINGWINDOW:5:30.Trimmed reads were aligned to the hg38 reference genome with Bowtie2 (v2.2.4) 101 using the default parameters.Uniquely mapped reads were extracted with samtools (v.1.3.1) 102view using the parameters -h -b -F 3844 -q 10. Preparation of input constituent enhancers was performed as follows.FitHiChIP interaction files for H3K27ac from the middle frontal gyrus 99 were downloaded from the GEO (GSM4441830 and GSM4441833).These were filtered to interacting bins (at interactions with q < 0.01) that overlap bulk H3K27ac peaks in the one-dimensional HiChIP data in both replicates.Next, these bins were intersected with IDR-filtered single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) peaks in isocortical and unclassified astrocytes (peaks from clusters 13, 15 and 17, downloaded from the GEO (GSE147672)) 99 .Unique coordinates of these filtered regions were converted to GFF files.
Super-enhancers were called with ROSE (v.1.3.1) 103,104using the input files prepared above and the parameters -s 12500 -t 2500.Coordinates of promoter elements for Homo sapiens (December 2013 GRCh38/hg38) were downloaded from the Eukaryotic Promoter Database (EPD) 105 using the EPDnew selection tool (https://epd.expasy.org/epd/EPDnew_select.php) 106.Using these sets of coordinates, FitHi-ChIP loops that overlap bulk H3K27ac peaks and scATAC peaks in astrocytes were subset to those that contained a promoter in one anchor and a super-enhancer in the other anchor.Binomial smooth plots were generated as described previously 107 .

Heritability analyses
MAGMA.Summary statistics from ref. 22 were uploaded to the FUMA (v.1.5.6) 108 web server (https://fuma.ctglab.nl).Gene-level z scores were calculated using SNP2GENE with the 'Perform MAGMA' function (MAGMA v.1.08)and the default parameter settings.The reference panel population was set to '1000G Phase3 EUR'.The MHC region was excluded due to its unusual genetic architecture and LD.MAGMA z scores were then used for downstream analyses as described in the Supplementary Note.
Stratified LD score regression.To partition SNP heritability, we used stratified LD score regression (S-LDSC; v.1.0.1) 26 , which assesses the contribution of gene expression programs to disease heritability.First, for analysis of astrocyte-identity genes, we computed (within the BA46 region only), a Wilcoxon rank-sum test on a per-gene basis using presto (v.1.0.0) 109 between astrocytes and all other cell types; for analysis of astrocyte-activity genes (SNAP-a), we sorted all genes expressed in astrocytes by their SNAP-a loadings and took the top 2,000 genes.We then converted each gene set into annotations for S-LDSC by extending the window size to 100 kb (from the transcription start site and transcription end site), and ordered SNPs in the same order as the .bimfile (from phase 3 of the 1000 Genomes Project 110 ) used to calculate the LD scores.We then computed LD scores for annotations using a 1 cM window and restricted the analysis to Hapmap3 SNPs.We excluded the MHC region due to both its high LD and high gene density.We used LD weights calculated for HapMap3 SNPs for the regression weights.We then jointly modeled the annotations corresponding to our gene expression program, as well as all protein-coding genes, and the baseline model (baseline model v.1.2).We tested for enrichment of SNP heritability on the traits listed below.The LDSC script 'munge_sumstats.py'was used to prepare the summary statistics files.We used the resultant P values, which reflect a one-sided test that the coefficient (τ) is greater than zero, as a determinant as to whether our cell type gene expression programs are enriched for SNP-heritability of a given trait 111 .

C4
MetaGene discovery.Genes that have high sequence homology are typically difficult to capture using standard UMI counting methods.Reads from these regions map to multiple locations in the genome with low mapping quality, and are ignored by many gene expression algorithms.MetaGene discovery leverages that high sequence similarity by looking for UMIs that consistently map to multiple genes at low mapping quality consistently across many cells.
Each UMI is associated with a single gene if at least one read from the UMI uniquely maps to a single gene model.If all reads are mapped at low quality to multiple genes, then assignment of that UMI to a specific gene model is ambiguous, and that UMI is associated with all gene models.By surveying a large number of cells, a set of gene families are discovered where UMIs are consistently associated with sets of genes.This discovery process finds expected sets of gene families with high sequence homology directly from the mapping, such as C4A/C4B, CSAG2/CSAG3 and SERF1A/SERF1B.
These UMIs are then extracted in the counts matrix as a joint expression of all genes in each set.We prefer to calculate expression as the joint expression of all genes in the set because the priors in the data prevent confidently distributing these ambiguous UMIs.For example, C4A and C4B have very few UMIs that map uniquely to either gene in the set (8 UMIs, <0.5% of all UMIs captured for this set of genes), which is a weak prior to proportionally assign ambiguous UMIs to the correct model.
This approach was validated for C4 expression by generating a reference genome that contained only one copy of C4.This allowed each UMI to map uniquely to the single remaining copy of the gene using standard tools.The custom reference approach and joint expression of C4A/C4B on the basis of the metagene approach was concordant in 15,664 of 15,669 cells tested (Extended Data Fig. 15c).
Imputation of C4 structural variation.Phased copy-number calls for structural features of the C4 gene family were obtained by imputation using Osprey, a method for imputing structural variation.The total copy number of C4 genes, the number of copies of C4A and C4B, and the copy number of the polymorphic HERV element that distinguishes long from short forms of C4 29 were imputed into the McLean cohort using a reference panel based on 1000 Genomes 62 .
An imputation reference panel was constructed for GRCh38 using 2,604 unrelated individuals (out of 3,202 total) from 1000 Genomes.SNPs were included in the reference panel if (1) they were within the locus chromosome 6: 24000000-34000000 but excluding the copy-number variable region chromosome 6: 31980001-32046200; and (2) they were not multi-allelic and (3) they had an allele count (AC) of at least 3 when subset to the 2,604 reference individuals.
The imputation reference panel was merged with genotypes for the McLean cohort obtained from the GSA genotyping arrays.Markers not appearing in both datasets were dropped and the merged panel was phased with SHAPEIT4 (v.4.2.0) 57 using the default parameters plus --sequencing and the default GRCh38 genetic map supplied with SHAPEIT.
Reference copy numbers for the C4 structural features on GRCh38 were obtained for the 3,202 1000 Genomes samples using a custom pipeline based on Genome STRiP (v.2.0) 130 .The source code for this pipeline is available at Terra (http://app.terra.bio) 131.In brief, the pipeline uses Genome STRiP to estimate the total C4 copy number and HERV copy number from normalized read depth of coverage, then estimates the number of copies of C4A and C4B using maximum likelihood based on reads that overlap the C4 active site (coordinates, chromosome 6: 31996082-31996099 and chromosome 6: 32028820-32028837).These copy-number genotypes were then subset to the 2,604 unrelated individuals.
The structural features were imputed into the merged imputation panel using Osprey (v.0.1-9) 132,133 by running ospreyIBS followed by osprey using the default parameters plus '-iter 100', the SHAPEIT4 genetic map for GRCh38 chromosome 6 and a target genome interval of chromosome 6: 31980500-32046500.
The output from Osprey was post-processed using a custom R script (refine_C4_haplotypes.R) that enforces constraints between the copy-number features and recalibrates the likelihoods considering only possible haplotypes.The enforced constraints are that the C4A + C4B copies must equal the total C4 copy number and that the HERV copy number must be less than or equal to C4 copy number.

Source data and visualization
In addition to the software cited above, we used Colour Oracle (v.1.3) 134,135as well as the following packages to prepare the source data and figures in this manuscript.

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.Extended Data Fig. 4 | Properties of Latent Factor 4 (LF4).a, Expression of each latent factor by case-control status (n = 93 controls and 87 cases).P-values are from a two-sided Wilcoxon rank-sum test.Box plots show interquartile ranges; whiskers, 1.5x the interquartile interval; central lines, medians; notches, confidence intervals around medians.b, Expression of LF4 by casecontrol status, split by sex (female: n = 31 controls and 39 cases; male: n = 62 controls and 48 cases).P-values are from a two-sided Wilcoxon rank-sum test.Box plots show interquartile ranges; whiskers, 1.5x the interquartile interval; central lines, medians; notches, confidence intervals around medians.Note that the more-modest p-value for the females-only analysis relative to the malesonly analysis appears to represent the smaller sample (70 females vs. 110 males) rather than a weaker relationship to schizophrenia status; please see also Extended Data Fig. 10h. c Box plots show interquartile ranges; whiskers, 1.5x the interquartile interval; central lines, medians; notches, confidence intervals around medians.l, See also Fig. 2a.LF4 involves broadly similar gene-expression effects in glutamatergic and GABAergic neurons, and a distinct set of gene-expression effects in astrocytes.Genes plotted are the protein-coding genes that are expressed (at levels of at least 10 UMIs per 10 5 ) in both cell types (Spearman's ρ; n = 1,538, 1,067, and 1,131 genes respectively).m, Concentrations of the strongest enriched neuronal gene-expression changes in LF4 among synaptic functions as annotated by SynGO 91 .Plots show categories of SynGO biological processes.in each donor (from the indicated subtype) that are derived from these genes, normalized to the median expression among control donors.Shaded regions represent 95% confidence intervals.The observed decline in schizophrenia and aging was consistent with earlier observations that expression of genes for synaptic components is reduced in schizophrenia 168 and with advancing age 169 .(right) GABAergic neurons of genes strongly recruited by SNAP-a.These were enriched in genes encoding synaptic-adhesion proteins, intrinsic components of synaptic membranes such as transporters and receptors, as well as genes strongly implicated in human genetic studies of schizophrenia.Genes in the "Schizophrenia genetics" heatmap are from among the prioritized genes from 23 (FDR < 0.05) or 22 .Genes annotated by ^ are from among all genes at loci implicated by common variants in 22 , regardless of prioritization status.b, UMAP of regulon activity scores (as inferred by pySCENIC 98 ) from 179,764 astrocyte nuclei from 180 donors, using the same projection from Fig. 3a-c.Regulons plotted are the most strongly enriched in astrocytes with high versus low SNAP-a expression.(+) indicates that the targets of the indicated regulon are predicted to be upregulated in expression.c-d, Transcriptional investments (by donor, separated by schizophrenia case-control status) in (c) genes encoding synaptic receptors and transporters and (d) cholesterol biosynthesis genes, in subtypes of astrocytes.Quantities plotted are the fraction of all UMIs in each subtype that are derived from these genes.P-values from a two-sided Wilcoxon rank-sum test comparing the affected to the unaffected donors are reported at the top of each panel.Box plots show interquartile ranges; whiskers, 1.5x the interquartile interval; central lines, medians; notches, confidence intervals around medians.e, Relationship of SNAP-a expression to association with super-enhancers.Genes expressed in astrocytes were grouped based on whether their promoters were predicted to contact super-enhancers in astrocytes (using bulk H3K27ac HiChIP and scATAC-seq data from 99 ), and SNAP-a loadings were compared between the two groups.(involved in fatty acid biosynthesis) 18,182,183 to SNAP-a expression (Spearman's ρ).Target-gene expression levels in astrocytes are shown.c, Relationship of donors' expression levels of known JUNB target genes (that are late-response genes) 19,20,184  .P-value is from a two-sided Wilcoxon rank-sum test.Box plots show interquartile ranges; whiskers, 1.5x the interquartile interval; central lines, medians; notches, confidence intervals around medians.c, Validation of a metagene computational approach for identifying RNA transcripts (UMIs) from the C4 genes.Standard analysis approaches tend to discard sequence reads from C4A or C4B because these genes are almost identical in sequence, differing only at a few key positions (far from the 3' end), such that most reads are discarded due to low mapping quality.To measure expression of these genes, UMIs were either aligned to a custom reference genome that contained only one C4 gene (x-axis) or were processed through a custom pipeline that identified UMIs associated with sets of gene families with high sequence homology, including C4A/C4B (y-axis).The two approaches (custom reference approach and joint expression of C4A/C4B via the metagene approach) arrived at concordant C4 UMI counts in 15,664 of 15,669 cells tested.Note that these measurements do not distinguish between C4A and C4B.d, Relationship of C4 expression to age in astrocytes (Spearman's ρ).Shaded region represents 95% confidence interval.e, Expression of C4 in astrocytes in control donors, split by donor age (n = 56 donors younger than 70 years old and 37 donors 70 years old or older).P-value is from a two-sided Wilcoxon rank-sum test.Box plots show interquartile ranges; whiskers, 1.5x the interquartile interval; central lines, medians; notches, confidence intervals around medians.ń

6 Fig. 1 |
Fig. 1 | Identification of concerted multicellular gene-expression changes common to schizophrenia and ageing.a, Generation of snRNA-seq data, in a series of 20-donor 'villages'.The diagram was created using images by thekua (person icon), B. Lachner (laboratory tools) and pnx (brain exterior side view) under a Creative Commons licence CC0 1.0.b, Uniform manifold approximation and projection (UMAP; coloured by donor) analysis of the RNAexpression profiles of 1,217,965 nuclei analysed from 191 donors.c, Assignments of nuclei to cell types (same projection as in b).d,e, Assignments of nuclei to glutamatergic (n = 524,186) (d) and GABAergic (n = 238,311) (e) neuron subtypes.CT, corticothalamic; ET, extratelencephalic; IT, intratelencephalic; NP, nearprojecting.f, Latent factor analysis.Cell-type-resolution expression data from all donors and cell types were combined into a single analysis.Latent factor analysis identified constellations of gene-expression changes that consistently appeared together.g, The cell type specificity of the latent factors inferred from 180 donors, shown as the cell type distributions of the 1,000 most strongly

3 Fig. 3 |
Fig. 3 | Biological states and transcriptional programs of astrocytes and L5 IT glutamatergic neurons in schizophrenia.a-c, UMAP analysis of RNA expression patterns from 179,764 astrocyte nuclei from 180 donors.Nuclei are coloured by astrocyte subtype (a), schizophrenia affected/unaffected status (b) and expression of the astrocyte component of SNAP (SNAP-a) (c).d, The relationship between donor quantile-normalized SNAP-a expression scores and age (Spearman's ρ).n = 180 donors.The shaded regions represent the 95% confidence intervals.e, The distributions of SNAP-a donor scores (age adjusted and quantile normalized) for people with and without schizophrenia.n = 93 controls, 87 cases.The P value was calculated using a two-sided Wilcoxon rank-sum test.For the box plots, the box limits show the interquartile range, the whiskers show 1.5× the interquartile interval, the centre line shows the median

Fig. 4 |
Fig.4| The relationship between SNAP and schizophrenia genetics.a, Enrichment of schizophrenia genetic association (from common variants, using MAGMA to generate a schizophrenia association z score for each gene) in the 2,000 genes most preferentially expressed in glutamatergic neurons and astrocytes (cell identity gene expression, upper bars), or the 2,000 genes of which the expression is most strongly recruited by SNAP-n and SNAP-a (cellular programs, lower bars).Values plotted are −log 10 [P] from a joint regression analysis in which each gene set is an independent and competing predictive factor.See also the Supplementary Note.b, The relationship between donor SNAP expression (quantile normalized) and donor schizophrenia polygenic risk scores (Spearman's ρ; n = 180 donors; PGC3 GWAS from ref.22).The shaded regions represent the 95% confidence intervals.c, NRXN1 expression (per 10 5 detected nuclear transcripts) in each cell type in individual donors.n = 93 controls, 87 cases.P values were calculated using two-sided Wilcoxon rank-sum tests.For the box plots, the box limits show the interquartile range, the whiskers show 1.5× the interquartile interval, the centre line shows the median value and the notches show the confidence intervals around the median values d, NRXN1 expression in individual astrocytes (using the same projection as in Fig.3a-c) (left).The values represent Pearson residuals from variance stabilizing transformation.Right, the relationship between the 180 donors' NRXN1 expression in astrocytes and SNAP-a expression (Spearman's ρ).e,f, Similar plots to those in c and d, but for C4.

Extended Data Fig. 1 |
Ages of brain tissue donors.a, Distribution of the ages of brain donors (n = 191 donors).b, Distributions of donors' ages by schizophrenia status, displayed as a quantile-quantile plot that compares ages of unaffected control donors (n = 97 donors) to ages of donors with schizophrenia (n = 94 donors).c-d, Distributions of donors' ages separated by schizophrenia status (n = 97 unaffected and 94 affected), displayed as (c) histograms and (d) violin plots.e-f, Distributions of donors' ages, separated by sex (n = 75 women and 116 men), displayed as (e) histograms and (f) violin plots.Note that while female brain donors are on average older than male donors, expression of SNAP (LF4) did not associate with sex in either a naive or age-adjusted analysis (Extended Data Fig. 4d,e), nor in a simultaneous regression on age, sex, and schizophrenia affected/unaffected status (Supplementary Table

3
).Extended Data Fig.2| Single-donor assignment and sequencing metrics.a, Density plot showing the fraction of all nuclei that were determined to be "singlets" (containing alleles from just one donor); n = 1,262,765 assignable singlets out of 1,271,830).b, Density plot showing donor-assignment likelihoods (as false discovery rates, on a log scale) for the 1,271,830 singlet nuclei.c, Validation of the computational assignment of nuclei to individual brain donors whose genomes have been analysed (individually) by SNP array-genotyping plus imputation.The matrix displays the concordance of single-donor assignment between whole-genome sequencing (WGS) (y-axis) and SNP array + imputation (x-axis) for a pilot set of 11 donors whose genomes were analysed by both methods.(Accuracy of donor assignment when WGS data are available has been previously shown by)7 .Each row/column corresponds to one of the 11 donors, and each entry in the table displays the number of nuclei that were assigned to a given donor (at a false discovery rate of 0.05).d, Number of nuclei assigned to each donor in each of 11 batches or (rightmost panel) across all batches, separated by schizophrenia case-control status (n = 10 controls and 10 schizophrenia cases per batch).P-values from a two-sided Wilcoxon rank-sum test comparing the affected to the unaffected donors are reported at the top of each panel.Central lines represent medians.e, Median number of UMIs ascertained per donor in each batch or (rightmost panel) across all batches, separated by schizophrenia case-control status (n = 10 controls and 10 schizophrenia cases per batch).P-values from a two-sided Wilcoxon rank-sum test comparing the affected to the unaffected donors are reported at the top of each panel.Central lines represent medians.f, Relationship of median UMIs/nucleus (normalized to the median value of the donors in each donor's batch) to (top) post-mortem interval (PMI) and (bottom) RIN score (Spearman's ρ).Colours represent different batches.Shaded regions represent 95% confidence intervals.Extended Data Fig. 3 | Properties of the latent factors inferred from snRNAseq data.a, Total % variance in expression explained by latent factors with different numbers of requested factors k. b, Fraction of variance explained by each latent factor in an analysis with 10 requested factors.c-d, Independence of latent factors, visualized as Pearson correlation heatmaps of factors' (c) gene loadings (n = 125,437 gene/cell-type combinations) and (d) donor scores (n = 180 donors).e, Expression level of each latent factor (panels) in each donor (points), split by batch (n = 20 donors per batch).f, Relationship of latent factors to markers of superficial and deep cortical layers from 75 .Markers label dominant classes of glutamatergic neurons (superficial: LAMP5, LINC00507, RORB; deep: THEMIS, FEZF2) or spatially restricted subtypes (superficial: Exc L2 LAMP5 LTK, marked by CUX2 and LINC01500; deep: Exc L5-6 THEMIS C1QL3, marked by SATB2 and LINC00343).Factor 2 exhibits the most distinct segregation of these superficial and deep layer markers when genes are ranked by their loadings onto each factor.n = 18,830 genes expressed in glutamatergic neurons; coloured dots are plotted over the dots of genes not among the markers listed above (grey).

Fig. 5 |
Relationship of synaptic vesicle cycle gene expression in neuronal subtypes to advancing age.a−b, See also Fig. 2c.Neuronal expression of synaptic vesicle cycle genes in the most abundant subtypes of (a) glutamatergic and (b) GABAergic neurons (across 180 donors), plotted against donor age (Spearman's ρ).Expression values are the fraction of all UMIs

Fig. 6 |
Relationship of gene-set expression in astrocytes and neurons to advancing age and schizophrenia.a, Expression of gene sets enriched in the astrocyte and neuronal components of LF4 (across 180 donors), plotted against donor age (Spearman's ρ).Expression values are the fraction of all UMIs in each donor (from the indicated cell type) that are derived from these genes, normalized to the median expression among control donors.Shaded regions represent 95% confidence intervals.b, Expression (by donor, separated by schizophrenia case-control status; n = 180 donors) of gene sets enriched in the astrocyte and neuronal components of LF4.Expression values are the fraction of all UMIs in each donor (from the indicated cell type) that are derived from these genes, adjusted for donor age.P-values from a two-sided Wilcoxon rank-sum test comparing the affected to the unaffected donors are reported at the top of each panel.Box plots show interquartile ranges; whiskers, 1.5x the interquartile interval; central lines, medians; notches, confidence intervals around medians.Extended Data Fig. 7 | Expression of cholesterol-biosynthesis genes in cortical cell types.a, See also Fig. 2d.For each cortical cell type: (Left) Distributions of LF4 gene loadings for (black) all expressed genes and (blue) specifically for genes annotated by GO as having roles in cholesterol biosynthesis (core genes contributing to the enrichment of GO:0045540 ("cholesterol biosynthesis genes") in that cell type's component of LF4.(Right) Each cell type's expression of cholesterol biosynthesis genes (by donor, split by schizophrenia case-control status; n = 180 donors).Expression values are the fraction of all UMIs in each donor (from the indicated cell type) that are derived from these genes.P-values are from a two-sided Wilcoxon rank-sum test comparing the affected to the unaffected donors.Box plots show interquartile ranges; whiskers, 1.5x the interquartile interval; central lines, medians; notches, confidence intervals around medians.b, Expression in astrocytes of cholesterol biosynthesis genes by donor, separated by statin intake among donors with available medication data (n = 63 donors not taking statins and 16 donors taking statins).Expression values are the fraction of all UMIs in each donor's astrocytes that are derived from these genes.P-value is from a two-sided Wilcoxon rank-sum test.Box plots show interquartile ranges; whiskers, 1.5x the interquartile interval; central lines, medians; notches, confidence intervals around medians.Extended Data Fig. 12 | Biological states and transcriptional programs of L5 IT glutamatergic neurons in schizophrenia.a-b, Relationship of SNAP-a to SNAP-n (Spearman's ρ).Values plotted are (a) quantile-normalized and (b) donor age-adjusted, quantile-normalized donor scores for each factor.Shaded regions represent 95% confidence intervals.c, UMAP of regulon activity scores (as inferred by pySCENIC 98 ) from L5 IT glutamatergic neuron nuclei from 180 donors, using the same projection from Fig. 3f-h.Regulons plotted are the most strongly enriched in L5 IT glutamatergic neurons with high versus low SNAP-n expression.(+) indicates that the targets of the indicated regulon were found to be upregulated in expression.Extended Data Fig. 13 | Astrocyte gene-expression programs underlying SNAP-a.a, See also Fig. 3k.Concerted expression in (left) astrocytes and (Left) Distributions of SNAP-a gene loadings for (blue) 1,286 genes whose promoters are predicted to contact super-enhancers in astrocytes and (black) the set of 32,325 remaining expressed background genes.(Right) Binomial smooth results of scaled SNAP-a gene loadings versus log 10 -scaled mean expression values in astrocytes, shown separately for the two groups.Shaded regions represent 95% confidence intervals.Extended Data Fig. 14 | Expression of well-characterized transcriptional programs in SNAP-a and SNAP-n.a, Concerted expression in (left) astrocytes and (right) L5 IT glutamatergic neurons of target genes of known transcriptional programs specifically active in SNAP-a or SNAP-n.Genes are listed in decreasing order by their importance for each regulon as scored by pySCENIC.b, Relationship of donors' expression levels of known SREBP1 target genes to SNAP-n expression (Spearman's ρ).Target-gene expression levels in L5 IT glutamatergic neurons are shown.Extended Data Fig. 15 | Relationship of astrocytic NRXN1 and C4 expression to advancing age.a, Relationship of NRXN1 expression to age in astrocytes (Spearman's ρ).Shaded region represents 95% confidence interval.b, Expression of NRXN1 in astrocytes in control donors, split by donor age (n = 56 donors younger than 70 years old and 37 donors 70 years old or older)

,
Similar plots as in b, here displaying LF4 expression values adjusted for donor age.d, Expression of LF4 by sex, split by case-control status (controls: n = 31 females and 62 males; cases: n = 39 females and 48 males).P-values are from a two-sided Wilcoxon rank-sum test.Box plots show interquartile ranges; whiskers, 1.5x the interquartile interval; central lines, medians; notches, confidence intervals around medians.e, Similar plots as in d, here displaying LF4 expression values adjusted for donor age.f-k, Relationship of LF4 expression measurements to other available donor and tissue characteristics: (f) time of death in zeitgeber time (ZT), with rhythmicity analyses performed as in 83 ; (g) post-mortem interval; (h) number of nuclei sampled; (i) number of UMIs sampled; ( j) use of psychiatric medications (left column) across each donor's lifespan or (right column) in the last 6 months prior to death; and (k) use of clozapine.Correlation coefficients in g-j are Spearman's ρ.P-values in k are from a two-sided Wilcoxon rank-sum test.