Risk variants for schizophrenia affect more than 100 genomic loci, yet cell- and tissue-specific roles underlying disease liability remain poorly characterized. We have generated for two cortical areas implicated in psychosis, the dorsolateral prefrontal cortex and anterior cingulate cortex, 157 reference maps from neuronal, neuron-depleted and bulk tissue chromatin for two histone marks associated with active promoters and enhancers, H3-trimethyl-Lys4 (H3K4me3) and H3-acetyl-Lys27 (H3K27ac). Differences between neuronal and neuron-depleted chromatin states were the major axis of variation in histone modification profiles, followed by substantial variability across subjects and cortical areas. Thousands of significant histone quantitative trait loci were identified in neuronal and neuron-depleted samples. Risk variants for schizophrenia, depressive symptoms and neuroticism were significantly over-represented in neuronal H3K4me3 and H3K27ac landscapes. Our Resource, sponsored by PsychENCODE and CommonMind, highlights the critical role of cell-type-specific signatures at regulatory and disease-associated noncoding sequences in the human frontal lobe.
Recent progress in understanding the genetic basis of many psychiatric diseases has identified both rare and common variants responsible for genetic risk1. Integrating epigenomics data from disease-relevant cell types and tissues promises to enhance interpretation of these risk variants and the mechanisms by which they confer disease liability2. This includes the exploration of noncoding regulatory DNA and its epigenetic variation in mediating the effects of genetic risk variants2,3,4. Thus, the long-term goal of the PsychENCODE5,6 and CommonMind7 consortia is to generate a large-scale epigenomics resource for the human brain to serve as a foundation for integrative genomics in psychiatric research6. In this regard, nucleosomal histone modifications contribute to genome organization and function, with various histone methylation and acetylation markings, including H3K4me3 and H3K27ac, considered key regulators for active promoters and enhancers and other cis-regulatory noncoding sequences8. Importantly, molecular regulators for such types of open-chromatin-associated histone modifications rank as top-scoring biological pathways by genome-wide association in schizophrenia and bipolar disorder9, further underscoring the importance of fine-mapping histone landscapes in brain. However, only a few publicly available histone datasets and resources exist for the human brain5,10,11, all of which were created from bulk tissue homogenate. These tissue-homogenate-based resources have clearly contributed to a deeper understanding of the genetic risk architecture of common psychiatric disease. However, there is evidence that, even in the context of normal cortical development and aging, vast portions of the neuronal genome show a very different histone modification landscape in comparison to those of the surrounding glia and other non-neuronal cells12,13. Unfortunately, the degree to which cell-type- and region-specific epigenomic signatures mediate the influence of genetic risk factors for psychiatric disease remains largely unexplored.
Here we present the largest dataset to date of open-chromatin-associated histone modifications mapped separately in neurons versus the remaining neuron-depleted cell fraction from two higher order brain areas implicated in schizophrenia and other psychiatric diseases14: dorsolateral prefrontal cortex (PFC) and anterior cingulate cortex (ACC). Our publicly accessible resource, available at http://psychencode.org/ and https://www.synapse.org/#!Synapse:syn4566010, includes data, results and UCSC browser visualizations for cell-type-specific maps from N = 129 samples, complemented with another N = 28 maps from tissue homogenate from adult control subjects without known neurological or psychiatric disease (Supplementary Table 1). This epigenomics resource provides hitherto unexplored insights into cell- and region-specific histone methylation and acetylation landscapes, including sites with extraordinarily high inter-individual variability. We elucidate the influence of genetic regulation on chromatin state and identify thousands of significant histone quantitative trait loci (hQTLs). We report striking enrichments of risk variants for schizophrenia, educational attainment, neuroticism and depressive symptoms highly specific to neuronal chromatin, thereby critically confirming cell type as a key variable in the neurogenomic architecture of psychiatric disease.
Samples and sequencing
Nuclei were extracted from previously frozen gray matter collected from two frontal lobe areas implicated in higher order processing serving cognition and emotion: the dorsolateral prefrontal cortex (PFC), at the superior frontal gyrus, and the anterior cingulate cortex (ACC), positioned immediately dorso-anterior to the corpus callosum (Fig. 1a, left). Chromatin immunoprecipitation and sequencing (ChIP-seq) with anti-H3K4me3 and anti-H3K27ac antibodies followed by 100-base-pair paired end sequencing was performed for neuronal and non-neuronal nuclei separately after NeuN neuronal marker immunotagging and fluorescence-activated sorting (Fig. 1a, right). NeuN, broadly expressed in the vast majority of cortical excitatory and inhibitory neurons15, is a prototypical neuronal marker in adult human cortex16. We herein refer to the NeuN+ fraction as neuronal and the NeuN– fraction as neuron-depleted, while acknowledging that each of these two cell types is comprised of many different subpopulations17. Performing cell-type-specific ChIP-seq on two brain regions from each of 17 subjects (14 males and 3 females), we generated N = 129 cell-type-specific libraries (N = 63 H3K4me3; N = 66 H3K27ac), as well as N = 28 tissue-homogenate-based libraries from N = 19 additional controls (N = 11 H3K4me3, 4 female, 7 male; N = 17 H3K27ac, 8 female, 9 male) passing ENCODE quality controls (>10 million uniquely mapped reads, normalized strand coefficient (NSC) > 1 and PCR bottleneck coefficient > 0.8; Supplementary Fig. 1 and Supplementary Table 2).
For downstream analysis, we consolidated multiple ChIP-seq datasets by cell type for each brain region and histone mark as (i) H3K4me3 PFC neuronal, (ii) H3K4me3 PFC neuron-depleted, (iii) H3K4me3 ACC neuronal, (iv) H3K4me3 ACC neuron-depleted, (v) H3K27ac PFC neuronal, (vi) H3K27ac PFC neuron-depleted, (vii) H3K27ac ACC neuronal and (viii) H3K27ac ACC neuron-depleted. Tissue homogenate samples for each histone mark were consolidated as (ix) H3K4me3 PFC HBCC homogenate and (x) H3K27ac PFC HBCC homogenate. (Although all our samples were acquired through the Human Brain Collection Core (HBCC) brain bank, the “HBCC” identifier was used for the homogenate samples alone in order to distinguish them from the Roadmap Epigenomics Project tissue homogenates in our subsequent analysis.) Supplementary Table 3 shows the list of samples in each of the ten consolidated datasets (see Methods for a detailed description of the consolidation steps). The average number of uniquely mapped and nonredundant reads for the consolidated datasets by cell type and brain region ranged from 13 to 41 million for H3K4me3 and 23 to 125 million for H3K27ac, reflecting that H3K27ac samples were sequenced at twice the coverage depth due to their larger width (Supplementary Fig. 1a). The subsequent steps of peak calling, read quantification of each peak, exploration of technical and biological covariates, differential modification analysis and functional annotation of peak sets (see Supplementary Fig. 2 for workflow diagram) were performed on each consolidated dataset. Across all individuals and both histone marks, ~50–70% of consolidated peaks in the cell-type-specific data and ~20–40% in the tissue homogenate data had read coverage of at least 1 count per million (CPM) (Supplementary Fig. 3).
To evaluate the specificity of our histone-modification maps, we compared the peak coordinates to published H3K4me3 and H3K27ac maps from the Roadmap Epigenomics Project (REP) covering 111 tissues5. The maximum similarity (estimated based on Jaccard’s J) was found when our consolidated subset was compared to the REP brain tissues, while overlap with non-neural and peripheral REP tissues was lower (Supplementary Fig. 4 and Supplementary Table 4). For both brain regions and epigenetic marks, our neuron-depleted samples, which overwhelmingly comprise non-neuronal cells, had a higher similarity with REP brain samples than neuronal samples. Likewise, our NeuN– H3K27ac landscapes displayed a higher similarity with H3K27ac and also histone H3 acetyl-Lys9 (H3K9ac) landscapes collected from bulk cortex tissue (homogenate) from independent brain cohorts10,11. These observations, taken together, likely reflect the fact that the majority of cells residing in cortical gray matter are indeed non-neuronal18.
Genome-wide analysis of H3K4me3 and H3K27ac peaks reveal cell type specificity
The cell-type-specific peak sets (peaks called on consolidated datasets i–viii above) varied by the fraction of the genome covered by peak regions, as well as by the degree of overlap with other subsets. As expected, the 61,000–95,000 narrow H3K4me3 peaks (range reflecting different cell types and cortical areas) covered a much smaller fraction of the genome than the 91,000–116,000 broader H3K27ac peaks (Supplementary Table 5). For example, in PFC neurons, H3K4me3 peaks covered 82 Mb (2.8%) of the genome, while H3K27ac covered 595 Mb (19.8%) in the same subset (Fig. 1b–e). Only minimal differences in the percentage of genomic coverage by H3K4me3 peaks (2.7–2.9%) was observed across cell types, whereas H3K27ac showed much higher genomic coverage for neuronal (19.8–20.4%) than neuron-depleted (15.5–16.9%) chromatin (Fig. 1c).
Principal component analysis revealed distinct clusters of neuronal, neuron-depleted (non-neuronal) and homogenate samples for both histone marks (Fig. 1f,g and Supplementary Fig. 5a); however, samples from the PFC and ACC clustered together (Supplementary Fig. 5b). This indicates a relatively high degree of epigenetic difference between neuronal and non-neuronal chromatin compared to a minimal difference between cortical areas. In contrast, chromatin from our PFC tissue homogenate samples and additional homogenate brain tissue from other sources5,10 fell between that of the FACS-sorted cells along the first principal component (Fig. 1f,g). Notably, the HBCC homogenate PFC samples were much more similar to the non-neuronal component, and the fraction of NeuN– nuclei in our tissue homogenates comprised, on average, 60–70% of the total population (Supplementary Fig. 6 and Supplementary Table 6), which is consistent with the fact that non-neuronal cells outnumber neurons by 1.6–2:1 in the human frontal lobe18. To further explore this similarity of PFC homogenate with non-neuronal cells, we quantified and analyzed the non-overlapping regions of PFC neuronal, PFC neuron-depleted and PFC HBCC homogenate peak sets. PFC neuronal chromatin included vast amounts of H3K27ac (369 Mb) and H3K4me3 (46 Mb) peak sequences not shared with either neuron-depleted or tissue homogenate, while only 245 Mb (H3K27ac) and 15 Mb (H3K4me3) of peak sequences were unique to non-neuronal chromatin and not shared with tissue chromatin extracts or neurons (Fig. 2a). Taken together, these characteristics illustrate a crucial advantage of cell-specific data over homogenate data. Functional enrichment of genes in close proximity to these non-overlapping modified peak regions using GREAT19 indicated distinct biological functions by cell type (Fig. 2b and Supplementary Table 7a–f). Neuron-specific H3K4me3 and H3K27ac peaks were enriched for ion channels, neurotransmitter signaling and synaptic genes, while genome regions marked in neuron-depleted and tissue homogenate peak sets showed enrichment for broader, less defined categories (Fig. 2b).
While this analysis has described large-scale trends, cell specificity of histone modification is readily visualized at the gene level. As representative examples, we consider CAMK2A and OLIG1, which are neuron- and non-neuron-specific genes, respectively (see Supplementary Fig. 7).
Collectively, our findings affirm that the neuronal epigenomic landscape is distinct from both non-neuronal and tissue homogenate landscapes. Although our findings indicate that chromatin maps from homogenate may omit critical neuron-specific epigenomic signatures, they do, however, provide a better representation of non-neuronal chromatin. Indeed, analysis of published brain hQTLs from H3K27ac profiles in cortical homogenate10 showed modest enrichment for overlap with our NeuN– H3K27ac peaks, but a depletion for overlap with our NeuN+ H3K27ac peaks (Supplementary Fig. 8). This enrichment was highly specific, as these hQTLs were depleted for overlap with H3K4me4 peaks. Moreover, analysis of another type of H3-acetyl mark, H3K9ac, from brain tissue homogenate11 showed only depletion for overlap with the two marks from neuronal and neuron-depleted chromatin in this study (Supplementary Fig. 8).
Neuronal histone modification landscapes show strong enrichment of schizophrenia GWAS loci
Owing to the distinct histone modification landscapes between neuronal and non-neuronal cells in the frontal lobe, we wanted to better understand the role of cell- and region-specific epigenomic regulation associated with various psychiatric and nonpsychiatric traits. To this end, we used the linkage disequilibrium (LD)-score partitioned heritability method20 to examine the enrichment of common genetic variants identified by genome-wide association studies (GWAS) within genomic regions with cell-type-specific histone modifications. Altogether, 18 different types of brain- and non-brain-related diseases and conditions were included in these analyses (Fig. 3).
The strongest enrichment was found for schizophrenia-associated loci; weaker (but nonetheless significant) enrichments were found for the genetic architectures associated with education years, intelligence, neuroticism, depressive symptoms, body mass index, chronotype and sleep duration (Supplementary Table 8a,b). Strikingly, each of these enrichments was almost exclusively limited to the neuronal histone modification landscapes of the PFC and ACC, suggesting that the aforementioned GWAS datasets link disease-associated vulnerabilities specifically to neurons. Indeed, the neuron-specific enrichment for sequences implicated in schizophrenia risk were consistently more significant than the comparatively weaker enrichment for these risk sequences in the histone modification maps from brain tissue homogenate herein, as well as in published H3K27ac10 and H3K9ac11 maps from brain tissue homogenate (Fig. 3 and Supplementary Fig. 9). For non-brain-related traits such as height, coronary artery disease, Crohn’s disease and ulcerative colitis, we observed little enrichment for peaks from either neuronal and neuron-depleted chromatin. Furthermore, the strongest enrichment of brain related traits was identified in the non-overlapping peak regions of PFC neurons compared with the PFC HBCC homogenate peak sets, which further corroborates the association of GWAS loci of neuropsychiatric diseases with neuron-specific chromatin regions (Supplementary Fig. 9 and Supplementary Table 8c,d). Finally, LD-score regression coefficients20 from the enrichment analysis of schizophrenia were significantly larger in neuronal as compared to neuron-depleted chromatin, and this effect was consistently observed for both histone marks in the two cortical regions, ACC and PFC (Supplementary Fig. 10). However, neither neuronal nor neuron-depleted PFC and ACC chromatin showed any significant overlap with Alzheimer’s disease-associated variants, consistent with the hypothesis that Alzheimer’s disease risk variants are enriched for regulatory sequences within cells of myeloid origin21,22,23
Decomposing quantitative variation in histone modification into multiple components
Quantitative epigenetic variation could be attributed to biological variation across cell types, subjects, brain regions and sexes. To quantify the percentage of variation in histone modification in each peak region that is attributable to each of these four variables plus residual variation, we fit a linear mixed model using variancePartition24 (Fig. 4). Since variance percentages sum to 100%, these values can be easily compared across variables, peak regions and histone marks. The variance percentages are easily interpretable visually: a peak region with high variation across cell types shows distinct levels of histone modification in neuronal versus neuron-depleted chromatin (Fig. 4a). The genome-wide trend across all peak regions for each mark indicates that cell type was the strongest source of variation in histone modification, followed by subject (Fig. 4b,c). In contrast, variation across brain regions was very limited. Finally, as expected, variation across sexes was minimal genome-wide while exerting a strong effect on genes linked to chromosomes X and Y. To further clarify the extent to which epigenomic differences between male and female frontal cortex are driven by histone peaks located on the sex chromosomes, we conducted principal component analysis of our 83 H3K27ac samples (11 female, 72 male), including cell-type-specific and tissue homogenate datasets. Inclusion of regions on chromosomes X and Y indeed resulted in strong sex-specific clustering on the fourth principal component while male and female brains completely intermixed when the analyses with repeated under exclusion of histone-tagged sequences specific to the X and Y chromosomes (Supplementary Fig. 11).
Finally, to interpret the peak regions with the highest variation across subjects, we computed the overlap of peak regions from the current dataset with regions that have genome-wide significant hQTLs identified in lymphoblastoid cell lines and human brains for the H3K4me325 and H3K27ac10 epigenetic marks, respectively. Peak regions with high variation across subjects were indeed enriched for regions that are hQTLs in lymphoblastoid cell lines and human brains for their respective histone mark (Fig. 4d,e). This is consistent with variation in histone modification across subjects being driven at least in part by genetic regulatory variation10,25,26. Importantly, this enrichment is limited to loci subject to epigenomic regulation that are common between neuronal and neuron-depleted chromatin (this study) and tissue extract and lymphoblast lines from previous studies10,25.
Genetic regulation of histone modification in neuronal and neuron-depleted fractions
To examine whether there is cell-type-specific genetic regulation of histone modifications (as has been observed previously for gene expression6,27), we applied RASQUAL (robust allele-specific quantitation and quality control), a QTL approach integrating allele-specific and between-subject differences28. Each of the four neuronal, neuron-depleted and bulk tissue chromatin preparations indeed harbored thousands of hQTLs, ranging from 6,695 to 8,042 for H3K27ac and 1,565 to 3,517 for H3K4me3 at a false discovery rate (FDR) < 0.05, depending on chromatin fraction (Supplementary Fig. 12 and Supplementary Table 9). Of note, H3K27ac-tagged chromatin showed unexpectedly strong enrichment for Gene Ontology GREAT biological processes such as neurofilament organization, regulation of synaptic plasticity, associative learning, catecholamine-dependent signaling and various other pathways highly relevant to the neurobiology of schizophrenia and other common psychiatric disease (Supplementary Fig. 13).
Since hQTLs calling was slightly underpowered as a result of the small sample size (n = 36), we took the simple approach of comparing our cell-specific and bulk tissue hQTLs with data from GWAS for schizophrenia29. We took all associations with P < 5 × 10−8 that are in high LD (r2 > 0.8) with the lead single-nucleotide polymorphism (SNP) and evaluated their overlap with our hQTLs. Comparisons across bulk tissue, neuronal and neuron-depleted chromatin revealed strong cell-type-specific effects for many of these risk-associated loci. For example, H3K4me3 peaks near MIR137 showed stronger hQTLs in neuronal samples than neuron-depleted and bulk tissue samples with localization of lead SNP rs1702294. H3K27ac peaks showed even stronger cell-specific hQTL signal near the voltage-gated calcium channel CACNA1C in neuronal samples, whereas peaks for both histone marks near FURIN showed a single hQTL only in neuron-depleted samples (Fig. 5, Supplementary Fig. 12 and Supplementary Table 10).
Epigenomic variation between ACC and PFC
Cell type was the major source of quantitative variation in histone modification, with 55,628 H3K4me3 and 117,708 H3K27ac peak regions epigenetically different across cell types at FDR 5% (Fig. 6a and Supplementary Fig. 14a). Unsurprisingly, for each histone mark, functional enrichment by gene categories was highly specific for cell type (Fig. 6b). However, differences in histone modification between ACC and PFC were much smaller because of the similarity between the brain regions. Differential histone modification analysis between brain regions in neuronal cells identified 508 H3K4me3 and 10,797 H3K27ac peaks with increased modification in ACC, as well as 696 H3K4me3 and 10,665 H3K27ac peaks in PFC (Supplementary Figs. 14b and 15a–c and Supplementary Tables 11 and 12). Notably, there was minimal region-specific signal in neuron-depleted chromatin, with only 27 H3K4me3 and 18 H3K27ac peaks with increased modification in ACC and none in PFC (Supplementary Fig. 15d). These results indicate dramatically higher regional specificity for the population of neurons compared to their surrounding non-neuronal cells in the frontal lobe. It remains to be determined whether the differential histone acetylation landscapes in PFC vs. ACC neurons are reflective of differences in neurocognitive function between these cortical areas. For example, there is robust ACC activation with regard to reward processing, pain, affect and emotion30. In contrast, dorsolateral PFC is frequently implicated in the regulation of goal-directed behavior, including working memory and executive functions31. We note that multiple peak regions that were differentially modified between PFC and ACC neurons are proximal to neuropsychiatric risk genes (Supplementary Tables 13 and 14). These include the forkhead transcription factor FOXP1, which functions synergistically with a related molecule, FOXP2, to regulate cognition and speech32; the exocytosis regulator CADPS2, which is essential for axonal release of brain-derived nerve growth factor (BDNF)33; and the GRIK4 kainate receptor, relevant for a broad range of disorders on the autism, mood and psychosis spectrum34,35.
Transcriptional signatures of promoter-bound histone methylation and acetylation
While histone peaks from neuronal and neuron-depleted chromatin were bound to promoters, introns and intergenic elements (Supplementary Fig. 16), annotation of the H3K4me3 and H3K27ac peak sequences in the consolidated subsets revealed that a large majority of sequences (70–79% of H3K4me3-tagged and 57–68% of H3K27ac-tagged) were bound to promoters within 5 kb of annotated transcription start sites (TSS). We therefore examined whether levels of H3K4me3 and H3K27ac modification are associated with gene expression magnitude in an independent set of postmortem brain samples from dorsolateral PFC from the CommonMind Consortium7. To this end, we calculated the number of ChIP-seq reads aligned within 15 kb of the annotated TSS of genes in five gene sets grouped by expression magnitude. As expected from findings in peripheral cells and tissues8, both H3K4me3 and H3K27ac ChIP-seq reads were enriched around the TSS of genes with high levels of expression compared to genes with low levels of expression (Supplementary Fig. 17).
In our final analyses, we examined the association of the neuronal, neuron-depleted and homogenate chromatin landscape with gene expression magnitudes in multiple subtypes of neurons and glia recently identified by massively parallel profiling of single brain nuclei17. There were indeed strong cell-type-specific chromatin effects, with neuron-depleted chromatin showing strong enrichment for oligodendrocyte- and astrocyte-specific transcripts while conversely neuronal chromatin profiles were more strongly associated with transcripts of the various types of neurons as compared to glia (Fig. 7). In contrast, these enrichments showed no cell type specificity for chromatin fractions prepared from tissue homogenate (Fig. 7). Not limiting our analyses to cell types, we examined the association of neuronal, neuron-depleted and homogenate chromatin with differentially expressed transcripts in multiple cohorts of subjects diagnosed with autism, bipolar disorder or schizophrenia36. Both cell-type and homogenate chromatin fractions showed moderate levels of enrichment with these disease-related gene sets (Fig. 7).
Interpreting the functional consequences of recently identified genetic variants contributing to the risk of neuropsychiatric disease requires a deeper understanding of the epigenomic context of these variants in brain and other tissues2,3,4,6,10,11. We built a dataset of cell-type-specific reference maps for NeuN+ neuronal and NeuN– (overwhelmingly non-neuronal) histone modification landscapes for H3K4me3 and H3K27ac, which are typically associated with active promoter and enhancer regions, respectively. Notably, non-neuronal chromatin showed a high degree of concordance with epigenomic landscapes of cortical homogenates from multiple sources. In contrast, histone methylation and acetylation landscapes from ACC and PFC neurons showed considerable ‘epigenomic distance’ from neuron-depleted and tissue homogenate samples (Fig. 1f,g and Supplementary Fig. 5a), suggesting they are likely a poor surrogate for neuron-specific alterations in the context of cognitive function and neurological disease. Given that the differences between neuronal and non-neuronal H3K4me3 and H3K27ac landscapes are the major axis of epigenomic variation (Fig. 4), it will be essential for future studies to pursue more sample fractionation by cell type in order to capture the estimated 16 neuronal populations defined by single-cell RNA sequencing in human cerebral cortex37, as well as potentially similar degrees of heterogeneity in glia, as recently reported for mouse brain38. Such a higher resolution approach is expected to reveal vast numbers of genomic loci with an epigenomic signature unique to a specific type of neuron or glia, and provide deeper insight into the interrelation of transcriptome and histone modification landscapes. We also note the unexpectedly large quantitative H3K27ac differences between cell types, with a much larger genome coverage (20%) in neuronal chromatin decorated by histone acetylation versus only 15–16% genome coverage in neuron-depleted chromatin. The extended H3K27ac coverage broadly included intronic and intergenic sequences, in addition to many promoter-bound peaks (Supplementary Fig. 16). While the functional implications of the extended H3K27ac peak coverage in the neuronal genome remains to be explored, we note that drugs interfering with the regulation of histone acetylation, including histone deacetylase inhibitors and suppressors of histone acetylation reader proteins, show a surprisingly broad therapeutic profile, improving cognition and neuronal function in a wide range of neuropsychiatric disease models39,40,41. Furthermore, consistent with previous gene expression profiles in adult frontal cortex42, the transcriptional histone marks of the present study, H3K4me3 and H3K27ac, showed few sex-specific histone methylation and acetylation differences in the autosomal genome (Supplementary Fig. 11). However, previous DNA methylation profiling in cortical tissue homogenate from aged brains revealed sex-specific effects for approximately 10% of age-sensitive methyl-CpG marks43. At present, it is not known whether sex-specific regulation of histone modifications is increased in the aged brain.
One primary goal of the PsychENCODE Consortium is to explore regulatory noncoding DNA associated within the genetic risk architectures of common neuropsychiatric disorders6. Using LD-score regression to partition heritability20, we found strong, specific enrichments for schizophrenia and somewhat weaker association with depression, neuroticism and education attainment in both H3K4me3 and H3K27ac peaks (Fig. 3). This effect was primarily if not exclusively driven by neuronal chromatin (Fig. 3 and Supplementary Figs. 9 and 10), with minimal or no contribution from neuron-depleted chromatin. Intriguingly, the strongest association with brain-region-specific peaks identifies risk variants for schizophrenia and educational attainment specifically in PFC neurons, consistent with the key role of the PFC in executive function. Taken together, these findings underscore the importance of epigenomic fine mapping with maximal region- and cell-type-specific resolution for the human brain in order to link the genetic risk architectures of neuropsychiatric disorders to selected cell populations or neural circuits.
Our cell-type-specific reference maps, accessible through the PsychENCODE Knowledge Portal and UCSC browser on Synapse (https://www.synapse.org/#!Synapse:syn4566010), are a resource that will empower future studies exploring the epigenetic foundations of cell-type-specific genome organization and function in human brain, with important implications for the neurobiology of common psychiatric disease.
All tissue donors of the present study were from the Human Brain Collection Core (HBCC) at the National Institutes of Health. None of the brains were affected by known neurological or psychiatric disease. All brains had undergone a detailed neuropathological exam (including Bielschowsky stain) and were considered normal by histopathology. Demographics of the brain cohort and toxicology and neuropathology reports are summarized in Supplementary Table 1. Sample size: no statistical methods were used to predetermine sample sizes, but our sample sizes exceeded those reported in previous publications focused on cell-type-specific histone profiling in human brain12,13,44 by several-fold.
Antibodies, ChIP-seq library preparation and sequencing
Nuclei were extracted from approximately 300-mg aliquots of frozen frontal (dorsolateral prefrontal and anterior cingulate gyrus) cortex tissue and immunotagged with an anti-NeuN–Alexa488 antibody (cat. no. MAB377X, EMD Millipore) that robustly stains human cortical neuron nuclei44,45 for subsequent fluorescence-activated nuclear sorting. Next, chromatin of sorted nuclei was digested with micrococcal nuclease and subsequently pulled down with anti-histone antibodies, followed by library preparation and sequencing. Two histone antibodies, anti-H3K4me3 (cat. no. 9751BC, lot no. 7; Cell Signaling, Danvers, MA) and anti-H3K27ac (cat. no. 39133, lot no. 01613007; Active Motif, Carlsbad, CA) were used for immunoprecipitation. Antibody specificity was tested using peptide binding assays and immunoblotting of nuclear extracts from human postmortem cortical tissue. A commercially available histone H3 peptide array (cat. no. 16-667; Millipore) containing 46 peptides representing 46 different histone H3 posttranslational modifications was used as previously described44. All procedures were performed as described in the recent PsychENCODE methods paper, providing a detailed description of the protocol44. For each cell-type-specific ChIP assay, a minimum of 400,000 sorted neuronal (NeuN+) or neuron-depleted non-neuronal (NeuN–) nuclei was required as starting material. ChIP-PCR was conducted for selected gene promoters to validate cell-type-specific peak profiles (Supplementary Fig. 18 and refs.44, 45). Furthermore, post-FACS quality controls for nuclei included visual inspection under the microscope as described44. Notably, due to our stringent FACS gating criteria maximizing specificity, not sensitivity (Supplementary Fig. 19), 100% of sorted nuclei in the neuronal fraction showed green fluorescence, confirming NeuN+ status, while 100% of sorted nuclei in the non-neuronal fraction showed only blue DAPI stain, confirming NeuN– status. For the PFC samples, we collected (mean ± s.d.) NeuN+ 667,675 ± 196,847 and NeuN– 611,025 ± 203,172 nuclei. For the ACC samples, we collected NeuN+ 490,585 ± 184,358 and NeuN– 653,743 ± 389,284 nuclei.
Additional ChIP-seq studies were conducted with homogenized dorsolateral PFC as input. To this end, frozen human postmortem brain tissue (approximately 20–200 mg) was homogenized in lysis buffer and the total nuclei were purified. The nuclei were resuspended in 300 μL of Dounce buffer and treated with 2 μL of micrococcal nuclease (0.2 U/μL) for 5 min at 28 °C, followed by 30 μL of 500 mM of EDTA to stop the reaction. After this initial procedure for nuclear preparation and digestion, the sample was processed in the same manner as described for the FACS-sorted nuclear samples.
Randomization and blinding
To avoid batch effects and other confounds, samples underwent repeated rounds of randomization, including (i) chromatin immunoprecipitation procedures and (ii) library preparation. Blinding was not relevant to this study; analysts were aware of data generation, processing and donor metadata.
Sequenced cell-specific and homogenate ChIP-seq FASTQ files were aligned to the Hg19 (Feb 2009, GRCh37) human genome using the Burrows-Wheeler Aligner (BWA-0.7.8-r455) method with default settings46. The output files were exported as BAM files.
Filtering and quality control
PCR duplicates in aligned BAM files were removed using the Picard 2.2.4 tool47. After filtering out duplicates, all BAM files were preprocessed to remove unmapped reads and any interchromosomal read pairs of length > 10 kb. The mapped reads were subsampled to the median number of paired-end reads of each dataset: H3K4me3, 13 M; H3K27ac, 23 M (Supplementary Fig. 1). Any samples after removing duplicates with sequencing depth < 10 M (from ENCODE48) were flagged in this study. These uniformly subsampled files were used for further downstream analysis.
Experimental design and statistical analyses
For a general overview of the bioinformatic analyses, see Supplementary Fig. 2. To determine the best peak calling method, we used the P value from irreproducible discovery rate (IDR) analysis, where the input was peaks called using the MACS2, PeakSeq and SPP methods. To identify differentially modified histone peaks across cell types and brain regions, we applied a quasi-likelihood negative binomial generalized log-linear model on the normalized CPM matrix. For multiple-testing correction of identified differential peaks, we used the Benjamini–Hochberg method on the P values to control false discovery rate. For pathway enrichment analysis of differentially modified peaks, we used P values from the hypergeometric test computed by GREAT and did multiple-testing correction using the Bonferroni correction method. To test overlap of identified peaks with disease- and trait-associated genetic variants, we used the LDSR method, which take P values of peak regions as an input. See below for additional details on statistical methods.
Variants were called from BAM files using GATK 3.5-049 to produce gVCF files. Variant concordance analysis was performed to identify any mislabeling. Variants on chr22 were merged using GATK’s CombineVCFs functionality. Variant concordance between all pairs of samples was evaluated with bcftools v1.350. Two mislabeled samples were identified and were relabeled appropriately for all downstream analyses.
Comparison of peak calling methods
For each histone mark, we consolidated BAMs across all individuals for the PFC neuronal set and subsampled three files. Our approach to determine best peak calling method was to derive the irreproducible discovery rate (IDR)51 after calling peaks using MACS (v.2.1.0)52, SPP (v.1.13)53 and PeakSeq54 methods. Afterward, the method that gave the maximum number of overlapping regions between subsamples at 5% IDR was used for peak calling on the full dataset. We found that MACS2 was the best peak caller method, with the maximum number of peak regions at 5% IDR for both marks (Supplementary Fig. 1d). ENCODE uses IDR on technical replicates of samples to determine the reproducibility of peaks51 while we have used it globally on our dataset. We applied the following parameters in MACS2: SE, SE no model, PE, PE no model and P = 0.01, 0.1, 0.5; SPP: FDR 0.01, 0.05, 0.99, background model = simulated, minimum interpeak distance = 150 and PeakSeq: target FDR = 0.01, 0.05, 0.99.
Consolidation of datasets
For cell-specific datasets, we consolidated uniformly processed BAMs by cell type for each brain region. For example, H3K4me3-modified ChIP-seq BAMs from neuronal cells from PFC brain region for all individuals (n = 17) were consolidated as the H3K4me3 PFC neuronal dataset. Consolidating the BAMs by cell type for each brain region produced eight large BAM files for both marks: (i) H3K4me3 PFC neuronal, (ii) H3K4me3 PFC neuron-depleted, (iii) H3K4me3 ACC neuronal, (iv) H3K4me3 ACC neuron-depleted, (v) H3K27ac PFC neuronal, (vi) H3K27ac PFC neuron-depleted, (vii) H3K27ac ACC neuronal and (viii) H3K27ac ACC neuron-depleted.
ChIP-seq BAMs for homogenate were generated from one brain region, and therefore all individuals BAMs were consolidated into two large BAM files for both marks as (i) H3K4me3 PFC HBCC homogenate and (ii) H3K27ac PFC HBCC homogenate. Similarly, input samples were consolidated separately for cell-specific and homogenate datasets. For details of sets of individual files contributing to the consolidated dataset, see Supplementary Table 3. We used these cell-specific (n = 8) and homogenate (n = 2) consolidated BAMs for further downstream analysis.
Narrow peak regions were called for H3K4me3 histone mark datasets on each of the consolidated cell-specific and homogenate BAMs—(i) H3K4me3 PFC neuronal, (ii) H3K4me3 PFC neuron-depleted, (iii) H3K4me3 ACC neuronal, (iv) H3K4me3 ACC neuron-depleted and (v) H3K4me3 PFC HBCC homogenate—with Poisson P = 0.01 and with --keep-dup all --nomodel --extsize = fragment length. Broad peak regions were called for H3K27ac histone mark datasets on each of the consolidated BAMs—(vi) H3K27ac PFC neuronal, (vii) H3K27ac PFC neuron-depleted, (viii) H3K27ac ACC neuronal, (ix) H3K27ac ACC neuron-depleted and (x) H3K27ac PFC HBCC homogenate—using the same parameters. The consolidated cell-type and homogenate input control samples were used as control inputs for peak calling on cell-specific and homogenate datasets, respectively.
All called peaks were filtered from blacklisted48 region peaks and –log10(P) > 3.05 (P value obtained from IDR analysis) for downstream analysis. For each mark, the coordinates for peaks for each set—PFC neuronal, PFC neuron-depleted, ACC neuronal, ACC neuron-depleted—used in this study are given for cell-specific H3K4me3 = syn11306591, homogenate H3K4me3 = syn11306589, cell-specific H3K27ac = syn9998643 and homogenate H3K27ac = syn11485660. Before calling peaks using MACS2, we first ran SPP to find the fragment length using maximum strand cross correlation (Supplementary Fig. 1b and Supplementary Fig. 2). For QC parameters (NSC, RSC, PBC and number of mapped reads) of uniformly reprocessed and consolidated ChIP-seq sets, we used phantompeakqualtools53. The NSC of all samples used in this study was above a threshold of 1.1 (Supplementary Fig. 1c). We provide summarized QC parameters of individual files (Supplementary Table 2) and consolidated (Supplementary Table 5) datasets.
Functional enrichment of non-overlapping cell- and tissue-specific histone peaks
To interpret the specificity of cell-type and homogenate data, we identified their respective unique or non-overlapping peak regions. Non-overlapping regions in a dataset are defined as all genomic regions except the ones that have at least 50% overlap with the dataset they are compared with. We examined the biological function of nearby genes for these non-overlapping peak regions using Genomic Regions Enrichment of Annotations Tool (GREAT)19. The settings for GREAT used are as follows: proximal 5.0 kb upstream, 5.0 kb downstream and plus Distal: up to 100 kb.
Gene set enrichment analysis based on single-cell RNA-seq
We next examined the difference between the ChIP-seq signal in cell-specific and homogenate datasets by measuring the enrichment of gene sets identified in neuronal and neuron-depleted chromatin subtypes by massively parallel profiling of single brain nuclei17. For neuronal subtypes we used identified gene sets for excitatory neurons (n = 24), pyramidal neuron from CA1 (n = 132), pyramidal neurons from CA2 (n = 111), pyramidal neurons from CA3 (n = 50), GABAergic interneurons (n = 145) and granule cells from the DG (n = 163), and for non-neuronal subtypes we used identified genes sets for radial glia (n = 10), myelin (n = 16), oligodendrocytes (n = 120), astrocytes (n = 155) and oligoprogenitor cells (n = 42).
nsgplot v2.6155 was used to quantify ChIP-seq reads enrichment of seven datasets for both marks—(i) PFC neuronal, (ii) PFC neuron-depleted, (iii) ACC neuronal, (iv) ACC neuron-depleted, (v) PFC HBCC homogenate, (vi) PFC REP homogenate and (vii) ACC REP homogenate—for the abovementioned neuronal and non-neuronal gene sets as a function of 15 kb distance upstream and downstream from the TSS of each gene. We calculated the magnitude of area under these ChIP-seq reads enrichment curve to examine the difference between the enrichment of cell-specific and homogenate datasets for neuronal and non-neuronal subtypes.
In addition to these gene sets, we measured the enrichment of ChIP-seq reads for neuropsychiatric disease signatures as well. We curated these gene sets for (i) CMC schizophrenia (n = 693) based on RNA-seq differential gene expression between cases and controls from PFC region from 690 individuals (P ≤ 0.05) and for (ii) schizophrenia (n = 884), (iii) bipolar disorder (n = 179), (iv) major depressive disorder (n = 25) and (v) autism spectrum disorder (n = 933) based on differential gene expression cerebral cortex region from microarray studied done on 715 individuals (P ≤ 0.05 and log2FC ≥ 2).
Quantification of ChIP-seq signal in each peak
To determine the read coverage across the whole genome for a BAM file, we used featureCounts from subread 1.5.256. The data input to featureCounts consists of (a) uniformly processed BAM files and (b) a consensus peak file in simplified automation format (SAF). The consensus peak signals for H3K4me3 and H3K27ac were generated by taking the union of MACS2 narrowPeak files of cell-specific and homogenate consolidated datasets that are (i) H3K4me3 PFC neuronal, (ii) H3K4me3 PFC non-neuronal, (iii) H3K4me3 ACC neuronal, (iv) H3K4me3 ACC neuron-depleted and (v) H3K4me3 PFC HBCC homogenate and the union of MACS2 broadPeaks files of (v) H3K27ac PFC neuronal, (vi) H3K27ac PFC neuron-depleted, (vii) H3K27ac ACC neuronal, and (viii) H3K27ac ACC neuron-depleted, respectively. featureCounts quantifies the number of reads for each sample in every peak region of consensus signal. The counts were put together in a matrix separately for H3K4me3 and H3K27ac marks, with 74 (cell-specific, 63; homogenate, 11) samples from 28 individuals (cell-specific, 17; homogenate, 11) as rows for H3K4me3 and 83 (cell-specific, 66; homogenate, 17) from 34 individuals (cell-specific, 17; homogenate, 17) as rows for H3K27ac and 107,480 and 152,590 peak regions as columns for H3K4me3 and H3K27ac, respectively. This matrix was converted into log2 counts per million (CPM) using TMM normalization57 to correct for the total number of reads. The log2 CPM matrix was used for downstream analysis.
Decomposing variation into multiple components with variancePartition
For the cell-specific dataset for each histone mark, the epigenetic variance of each peak was decomposed into variation attributable to cell type, subject, brain region and sex, plus the residual variation:
These four variables are categorical and so were modeled as random effects. The analysis was performed by modeling the log2 CPM with a linear mixed model implemented in variancePartition v1.4.124 and treating each variable as a random effect. Each peak was considered separately and the results for all peaks were aggregated afterwards. Results were summarized in terms of the fraction of total variation explained by each variable for each peak.
A variancePartition analysis was also performed on additional metadata variables, such as QC statistics (i.e., NSC, RSC, PCR PBC and NRF) and sample processing batches (library preparation date of chip, chip DNA volume, chip DNA amount (nanograms), total chip DNA in a library, library preparation operator, library AMpure bead lot, library PCR cycles number, library volume, library sequencing batch, library sequencing submission date, library preparation library batch). Continuous variables were modeled as fixed effects and categorical variables were modeled as random effects. The percentage variation explained by technical variables such as experimental batches or QC statistics was mainly explained by the four major variables described above.
Principal component analysis
As a QC step, we performed principal component analysis on the log2 CPM matrix to identify outliers. Eight samples were identified as outliers (Supplementary Table 2), and these corresponded to samples that barely passed our previous QC cutoffs. These samples were excluded from further analysis.
Differential histone modification
For the cell-specific dataset for each mark, we performed differential analysis to identify peak regions with significant differences (i) across the cell types (neuronal and non-neuronal) and (ii) across brain regions (ACC neuronal and PFC neuronal, and ACC neuron-depleted and PFC neuron-depleted) using edgeR v.3.14.058.The CPM matrix was prefiltered to regions with CPM > 1 in at least five samples for both histone marks and normalized using the calcNormFactors function, which uses the trimmed mean of M-values (TMM)57. The edgeR software models the read count matrix as a negative binomial distribution using cell types and brain regions as covariates. We fit the normalized CPM matrix to a quasi-likelihood negative binomial generalized log-linear model using the glmQLFit function with robust = TRUE option. The quasi-likelihood F-test was then applied to identify peak regions that are significantly different across cell types and brain region (for both neuronal and non-neuronal cell types) using glmQLFTest (glmQLFit object, contrast = cell type or brain region). Multiple testing was done by applying the Benjamini–Hochberg method to the P values to control false discovery rate59. The total number of differential peaks was determined at a FDR of 5%. The coordinates of cell-specific peaks (neuronal, non-neuronal) are listed in Supplementary Table 11 and region-specific peaks (ACC, PFC) are listed in Supplementary Table 12 (ACC neuronal, PFC neuronal, ACC non-neuronal and PFC non-neuronal).
Comparison with the Roadmap Epigenomics Project
For each mark, we measured the similarity of genomic regions of cell-specific (four) and homogenate (one) consolidated datasets: ACC neuronal, PFC neuronal, PFC neuronal, PFC neuron-depleted (non-neuronal) and PFC HBCC homogenate with Roadmap Epigenomics Project (REP) data from 111 tissues. We used bedtools jaccard -a sample bed file -b REP bed file60. This command outputs the Jaccard index parameter (see Supplementary Table 4), which is evaluated as
Cell composition analysis
Cell-type proportions were quantified using R library CellMix61. Using our neuronal and non-neuronal ChIP-seq datasets, we generated cell-type signatures to run deconvolution on homogenate samples to quantify the proportion of each cell type for every sample. We first created the basis set for neuronal and non-neuronal cell types by taking the mean of RPKM values for each peak across neuronal samples and neuron-depleted samples, respectively, for both marks. We defined as our input matrix the HBCC homogenate samples’ RPKM matrix. We then used the lsfit method from the CellMix library for decomposition of the RPKM matrix to calculate the coefficients of neuronal and non-neuronal cell types.
We used CHIPSeeker v.1.8.962 to annotate peaks to seven distinct categories: promoter, 5′ UTR, exons, introns, 3′ UTR, downstream (≤3 kb) and distal intergenic regions within 5 kb downstream and upstream of the transcription start site. The transcript database used for the annotation was ENSEMBL v75 for GRCh37.70.
Correlation of ChIP-seq reads counts with RNA-seq expression
We next examined the enrichment of ChIP-seq reads counts around transcription start site (TSS) region of protein-coding genes with RNA-seq expression from 537 individuals for 20,330 genes from PFC brain regions. We use nsgplot v2.6155 to plot ChIP-seq read enrichment of combined PFC neuronal datasets as a function of 15 kb distance upstream and downstream around the TSS for both marks. Enrichments plots were made for all protein-coding genes grouped into five categories sorted by the RPKM mean values across 537 subjects from the CommonMind RNA-seq dataset7.
Histone QTL enrichment analysis
The overlap between peak regions with hQTLs detected in lymphoblastoid cell lines (LCL)25 and peak regions exceeding a variance percentage cutoff for a particular variable for both marks was computed. This overlap was then compared to the overlap computed from randomly permutated variance percentages. Each peak region was assigned a value based on the percentage of variance explained by a particular variable in the variancePartition analysis. At each of 40 cutoff values, the overlap between peak regions with values exceeding this cutoff and the peak regions with a hQTL for the same histone mark in LCLs and PFC was evaluated using the Jaccard index.
The overlap was computed for the observed data and 10,000 datasets with the variance percentages randomly permutated. At each cutoff, the enrichment is computed as
The mean enrichment value and the 90% confidence interval are shown in the plot. Only regions on autosomes are considered, leaving 9,575 H3K4me3 hQTLs in LCLs and 1,912 H3K27ac hQTLs in postmortem PFC. Permutation and overlap calculations were performed using regioneR63.
We ran a similar analysis to test the overlap between the peak regions with (a) hQTLs in H3K27ac-modified peak regions from PFC homogenate from Sun et al.10 and (b) hQTLs in H3K9ac-modified peak regions from PFC homogenate from Ng et al.11 with peak regions from (i) H3K27ac PFC neuronal, (ii) H3K27ac PFC non-neuronal, (iii) H3K27ac ACC neuronal and (iv) H3K27ac ACC non-neuronal datasets and (v) differentially modified neuronal and (vi) non-neuronal H3K27ac-modified peak regions.
The overlapobserved is the Jaccard index between hQTL regions a,b and datasets i–vi. The overlappermuted is the Jaccard index between abovementioned datasets i–vi and 1,000 datasets obtained by randomly permuting x peak regions from (a) PFC homogenate from Sun et al.10 and (b) PFC homogenate from Ng et al.11 (x = length of hQTLs).
Pathway enrichment analysis
Genomic Regions Enrichment of Annotations Tool (GREAT)19 was used to interpret differentially modified peaks in terms of the biological function of nearby genes. We took the sets of peaks that showed significant (<5% FDR) differences across cell types (neurons and non-neurons) from edgeR analysis and tested for functional enrichment using the consensus peaks for each mark as a background (see Supplementary Tables 7 and 9). The settings for genomic regions used were as follows: proximal, 5.0 kb upstream and 5.0 kb downstream; distal, to 100 kb. Since many of the gene sets from different databases are redundant, we only considered REACTOME, KEGG and PID pathways as our complete gene sets. Significance testing for the enrichment analysis was based on the binomial test compute by GREAT and using a Bonferroni cutoff 4.7 × 10−5 based on these tests.
Overlap of identified peaks with disease- and trait-associated genetic variants
To assess whether the genomic regions carrying the two assayed histone marks in the different brain regions and cell types play a role in the various traits and diseases, we examined the overlap with common genetic variants identified by genome-wide association studies (GWAS). For this, we employed LD-score partitioned heritability20, which estimates whether common genetic variants in the genomic regions of interest explain more of the heritability of a given trait than genetic variants not overlapping the genomic regions of interest, normalized by the number of variants in either category. The algorithm allows for correction of the general genetic context of the annotation using a baseline model of broad genomic annotations (such as coding, intronic and conserved). By using this baseline model, the algorithm focuses on enrichments above those expected from the genetic context of the interrogated regions. We applied the method to a range of GWAS traits with presumed involvement of the brain29,64,65,66,67,68 and well-powered studies of traits not believed to involve the brain69,70,71. For the Alzheimer’s disease GWAS, see “Materials and methods for the Alzheimer’s disease GWAS” below. We used the European-only versions of the summary statistics when available. The coronary artery disease analysis was the only one remaining with a mixed ancestry (77% Europeans). We excluded the broad MHC region (chr6:25–35 MB) and otherwise used default parameters.
Allele-specific QTL analysis
We used RASQUAL28 to call cell- and tissue-specific cis-hQTLs in our datasets—PFC neuronal, ACC neuronal, PFC non-neuronal, ACC non-neuronal and PFC tissue homogenate—for each of the two histone marks, H3K27ac and H3K4me3. RASQUAL uses allele-specific reads counts at heterozygous sites to increase power to detect cis-hQTLs correlated with quantitative variation in histone modification.
With RASQUAL, a feature in our ChIP-seq dataset is defined by a set of start and end coordinates of identified peaks for calling a cis-hQTLs. RASQUAL requires a few data preprocessing steps before calling cis-hQTLs. (i) ChIP-seq read counts and offset matrices as text and bin files for each dataset and mark. We used the bedtools --nuc option to obtain GC content for each identified peak region and used that as an input for the offset calculation from the counts matrix in the custom makeOffset.R script file. All text files were converted to bin files using the text2bin.R script. (ii) Covariate text and bin file for each dataset and mark. The confounding factors in ChIP-seq reads counts are obtained by applying PCA to log FPKMs with and without permutation and outputs the first N components whose variances are greater than those from permutation results. We used the makeCovariates.R script file and found five components as covariates for PFC neuronal, ACC neuronal, PFC non-neuronal and ACC non-neuronal datasets for H3K4me3 and H3K27ac marks whereas PFC homogenate samples had three or four components for H3K4me3 and H3K27ac, respectively. (iii) Allele-specific count VCF file. The createASVF.sh script file was used to count allele-specific reads for every individual for a given SNP within a feature. We used whole-genome sequencing data from 17 individuals to generate allele-specific counts. WGS paired-end 150-bp reads were aligned to the GRCh37 human reference using the Burrows-Wheeler Aligner (BWA-MEM v0.78) and processed using a best-practices pipeline that included marking of duplicate reads with Picard tools (v1.83; http://broadinstitute.github.io/picard/), realignment around indels, and base recalibration via the Genome Analysis Toolkit (GATK v3.2.2). All individuals’ WGS data were merged into a single vcf file and used as one of the inputs to createASVF.sh.
We ran RASQUAL feature on 90,767 H3K4me3 peaks for PFC neuronal (n = 17), ACC neuronal (n = 14), PFC non-neuronal (n = 17), ACC non-neuronal (n = 15) and PFC homogenate (n = 11) datasets and 127,773 H3K27ac peaks from PFC neuronal (n = 17), ACC neuronal (n = 17), PFC non-neuronal (n = 17), ACC non-neuronal (n = 15) and PFC homogenate (n = 17) datasets. SNPs were tested within 10 kb of the cis region from peak start and end points. We used Benjamini–Hochberg q < 0.05 as a threshold to determine the significant cis-hQTLs.
To test the overlap of significant hQTLs (RASQUAL q < 0.05) with GWAS identified schizophrenia loci, we took the list of lead SNPs and SNPs in LD (R2 > 0.8) with the latter. The list was downloaded from https://www.med.unc.edu/pgc/results-and-downloads/downloads. We report the overlapping loci separately with cell-specific hQTLs—PFC NeuN+, PFC NeuN–, ACC NeuN+, ACC NeuN–—and PFC HBCC tissue homogenate. hQTLs for both marks are listed in Supplementary Table 1072.
Materials and methods for the Alzheimer’s disease GWAS
Summary statistics for Alzheimer’s disease were provided by the International Genomics of Alzheimer’s Project (IGAP). IGAP is a large, two-stage study based on genome-wide association studies (GWAS) on individuals of European ancestry. In stage 1, IGAP used genotyped and imputed data on 7,055,881 single-nucleotide polymorphisms (SNPs) to meta-analyze four previously published GWAS datasets consisting of 17,008 Alzheimer’s disease cases and 37,154 controls (the European Alzheimer’s Disease Initiative (EADI), the Alzheimer Disease Genetics Consortium (ADGC), the Cohorts for Heart and Aging Research in Genomic Epidemiology consortium (CHARGE) and the Genetic and Environmental Risk in AD consortium (GERAD)). In stage 2, 11,632 SNPs were genotyped and tested for association in an independent set of 8,572 Alzheimer’s disease cases and 11,312 controls. Finally, a meta-analysis was performed combining results from stages 1 and 2.
Data access instructions for ChIP-seq data presented in this paper: https://www.synapse.org/#!Synapse:syn4921369/wiki/235539. Data, results and visualizations for ChIP-seq data presented in this paper: https://www.synapse.org/#!Synapse:syn4566010. Psychiatric Genomics Consortium: http://med.unc.edu/pgc. International Genomics of Alzheimer’s Project: http://web.pasteur-lille.fr/en/recherche/u744. The Social Science Genetic Association Consortium: http://ssgac.org/. Sleep phenotypes: http://www.t2diabetesgenes.org/data. Genetic Investigation of Anthropometric Traits: http://portals.broadinstitute.org/collaboration/giant. Coronary Artery Disease: http://cardiogramplusc4d.org. International Inflammatory Bowel Disease Genetics Consortium: http://ibdgenetics.org/. CommonMind Consortium: http://commonmind.org/. Roadmap Epigenomics Project: http://www.roadmapepigenomics.org/. Grubert et al.25 hQTLs: http://mitra.stanford.edu/kundaje/portal//chromovar3d_old/.
The data analyzed for this article are available through the PsychENCODE Knowledge Portal (http://psychencode.org/). Access to the data is controlled by the NIMH Repository and Genomics Resources (NRGR), https://www.nimhgenetics.org/. See instructions in the PsychENCODE Knowledge Portal: https://www.synapse.org/#!Synapse:syn4921369. Data and results are at https://www.synapse.org/#!Synapse:syn4566010. The site includes a link to UCSC browser visualizations.
Geschwind, D. H. & Flint, J. Genetics and genomics of psychiatric disease. Science 349, 1489–1494 (2015).
Gandal, M. J., Leppa, V., Won, H., Parikshak, N. N. & Geschwind, D. H. The road to precision psychiatry: translating genetics into disease mechanisms. Nat. Neurosci. 19, 1397–1407 (2016).
Bernstein, B. E. et al. The NIH Roadmap Epigenomics Mapping Consortium. Nat. Biotechnol. 28, 1045–1048 (2010).
Farh, K. K.-H. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Akbarian, S. et al. The PsychENCODE project. Nat. Neurosci. 18, 1707–1712 (2015).
Fromer, M. et al. Gene expression elucidates functional impact of polygenic risk for schizophrenia. Nat. Neurosci. 19, 1442–1453 (2016).
Zhou, V. W., Goren, A. & Bernstein, B. E. Charting histone modifications and the functional organization of mammalian genomes. Nat. Rev. Genet. 12, 7–18 (2011).
Network and Pathway Analysis Subgroup of Psychiatric Genomics Consortium. Psychiatric genome-wide association study analyses implicate neuronal, immune and histone pathways. Nat. Neurosci. 18, 199–209 (2015).
Sun, W. et al. Histone acetylome-wide association study of autism spectrum disorder. Cell 167, 1385–1397.e1311 (2016).
Ng, B. et al. An xQTL map integrates the genetic architecture of the human brain’s transcriptome and epigenome. Nat. Neurosci. 20, 1418–1426 (2017).
Cheung, I. et al. Developmental regulation and individual differences of neuronal H3K4me3 epigenomes in the prefrontal cortex. Proc. Natl. Acad. Sci. USA 107, 8824–8829 (2010).
Shulha, H. P., Cheung, I., Guo, Y., Akbarian, S. & Weng, Z. Coordinated cell type-specific epigenetic remodeling in prefrontal cortex begins before birth and continues into early adulthood. PLoS Genet. 9, e1003433 (2013).
Charney, D. S., Sklar, P. B., Buxbaum, J. D. & Nestler, E. J. Charney & Nestler’s Neurobiology of Mental Illness (Oxford Univ. Press, New York, 2018).
Mancarci, B. O. et al. Cross-laboratory analysis of brain cell type transcriptomes with applications to interpretation of bulk tissue data. eNeuro 4, ENEURO.0212-17.2017 (2017).
Huttner, H. B. et al. The age and genomic integrity of neurons after cortical stroke in humans. Nat. Neurosci. 17, 801–803 (2014).
Habib, N. et al. Div-Seq: single-nucleus RNA-Seq reveals dynamics of rare adult newborn neurons. Science 353, 925–928 (2016).
Sherwood, C. C. et al. Evolution of increased glia-neuron ratios in the human frontal cortex. Proc. Natl. Acad. Sci. USA 103, 13606–13611 (2006).
McLean, C. Y. et al. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 28, 495–501 (2010).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Huang, K. L. et al. A common haplotype lowers PU.1 expression in myeloid cells and delays onset of Alzheimer’s disease. Nat. Neurosci. 20, 1052–1061 (2017).
Gjoneska, E. et al. Conserved epigenomic signals in mice and humans reveal immune basis of Alzheimer’s disease. Nature 518, 365–369 (2015).
Raj, T. et al. Polarization of the effects of autoimmune and neurodegenerative risk alleles in leukocytes. Science 344, 519–523 (2014).
Hoffman, G. E. & Schadt, E. E. variancePartition: interpreting drivers of variation in complex gene expression studies. BMC Bioinformatics 17, 483 (2016).
Grubert, F. et al. Genetic control of chromatin states in humans involves local and distal chromosomal interactions. Cell 162, 1051–1065 (2015).
Waszak, S. M. et al. Population variation and genetic control of modular chromatin architecture in humans. Cell 162, 1039–1050 (2015).
GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
Kumasaka, N., Knights, A. J. & Gaffney, D. J. Fine-mapping cellular QTLs with RASQUAL and ATAC-seq. Nat. Genet. 48, 206–213 (2016).
Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
Posner, M. I., Rothbart, M. K., Sheese, B. E. & Tang, Y. The anterior cingulate gyrus and the mechanism of self-regulation. Cogn. Affect. Behav. Neurosci. 7, 391–395 (2007).
Moghaddam,B . & Homayoun,H . Divergent plasticity of prefrontal cortex networks. Neuropsychopharmacology. 33, 42–55 (2008).
Le Fevre, A. K. et al. FOXP1 mutations cause intellectual disability and a recognizable phenotype. Am. J. Med. Genet. A 161A, 3166–3175 (2013).
Sadakata, T. et al. Reduced axonal localization of a Caps2 splice variant impairs axonal release of BDNF and causes autistic-like behavior in mice. Proc. Natl. Acad. Sci. USA 109, 21104–21109 (2012).
Griswold, A. J. et al. Evaluation of copy number variations reveals novel candidate genes in autism spectrum disorder-associated pathways. Hum. Mol. Genet. 21, 3513–3523 (2012).
Kawaguchi, D. M. & Glatt, S. J. GRIK4 polymorphism and its association with antidepressant response in depressed patients: a meta-analysis. Pharmacogenomics 15, 1451–1459 (2014).
Gandal, M. J. et al. Shared molecular neuropathology across major psychiatric disorders parallels polygenic overlap. Science 359, 693–697 (2018).
Lake, B. B. et al. Neuronal subtypes and diversity revealed by single-nucleus RNA sequencing of the human brain. Science 352, 1586–1590 (2016).
Zeisel, A. et al. Molecular architecture of the mouse nervous system. Preprint at bioRxiv https://doi.org/10.1101/294918 (2018).
Sullivan, J. M. et al. Autism-like syndrome is induced by pharmacological suppression of BET proteins in young mice. J. Exp. Med. 212, 1771–1781 (2015).
Penney, J. & Tsai, L. H. Histone deacetylases in memory and cognition. Sci. Signal. 7, re12 (2014).
Jakovcevski, M. & Akbarian, S. Epigenetic mechanisms in neurological disease. Nat. Med. 18, 1194–1204 (2012).
Kang, H. J. et al. Spatio-temporal transcriptome of the human brain. Nature 478, 483–489 (2011).
Yang, J. et al. Association of DNA methylation in the brain with age in older persons is confounded by common neuropathologies. Int. J. Biochem. Cell Biol. 67, 58–64 (2015).
Kundakovic, M. et al. Practical guidelines for high-resolution epigenomic profiling of nucleosomal histones in postmortem human brain tissue. Biol. Psychiatry 81, 162–170 (2017).
Jiang, Y., Matevossian, A., Huang, H. S., Straubhaar, J. & Akbarian, S. Isolation of neuronal chromatin from brain tissue. BMC Neurosci. 9, 42 (2008).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Wysoker, A., Tibbetts, K. & Fennell, T. Picard tools version 1.90 (2013).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
Li, Q., Brown, J. B., Huang, H. & Bickel, P. J. Measuring reproducibility of high-throughput experiments. Ann. Appl. Stat 5, 1752–1779 (2011).
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Kharchenko, P. V., Tolstorukov, M. Y. & Park, P. J. Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat. Biotechnol. 26, 1351–1359 (2008).
Rozowsky, J. et al. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat. Biotechnol. 27, 66–75 (2009).
Shen, L., Shao, N., Liu, X. & Nestler, E. ngs.plot: quick mining and visualization of next-generation sequencing data by integrating genomic databases. BMC Genom. 15, 284 (2014).
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol. 57, 289–300 (1995).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Gaujoux, R. & Seoighe, C. CellMix: a comprehensive toolbox for gene expression deconvolution. Bioinformatics 29, 2211–2212 (2013).
Yu, G., Wang, L.-G. & He, Q.-Y. ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics 31, 2382–2383 (2015).
Gel, B. et al. regioneR: an R/Bioconductor package for the association analysis of genomic regions based on permutation tests. Bioinformatics 32, 289–291 (2016).
Lambert, J.-C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet. 45, 1452–1458 (2013).
Okbay, A. et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542 (2016).
Okbay, A. et al. Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nat. Genet. 48, 624–633 (2016).
Jones, S. E. et al. Genome-wide association analyses in 128,266 individuals identifies new morningness and sleep duration loci. PLoS Genet. 12, e1006125 (2016).
Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).
Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).
Nikpay, M. et al. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 47, 1121–1130 (2015).
Liu, J. Z. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015).
Wetterstrand, K. DNA sequencing costs: data from the NHGRI Genome Sequencing Program (GSP). http://www.genome.gov/sequencingcosts (2016).
We thank M. Fromer, E. Stahl, L. Huckins, L. Shen, G. Senthil and T. Lehner for discussion. This paper is dedicated to the memory of Pamela Sklar. This work was supported in part through the computational resources and staff expertise provided by Scientific Computing at the Icahn School of Medicine at Mount Sinai. We are extremely grateful to J. Ochando, C. Bare and other personnel of the Icahn School of Medicine at Mount Sinai’s Flow Cytometry Core for providing and teaching cell sorting expertise. Data were generated as part of the PsychENCODE Consortium, supported by U01MH103339, U01MH103365, U01MH103392, U01MH103340, U01MH103346, R01MH105472, R01MH094714, R01MH105898, R21MH102791, R21MH105881, R21MH103877 and P50MH106934 awarded to S.A. (Icahn School of Medicine at Mount Sinai), G. Crawford (Duke), S. Dracheva (Icahn School of Medicine at Mount Sinai), P. Farnham (USC), M. Gerstein (Yale), D. Geschwind (UCLA), T. M. Hyde (LIBD), A. Jaffe (LIBD), J. A. Knowles (USC), C. Liu (UIC), D. Pinto (Icahn School of Medicine at Mount Sinai), N. Sestan (Yale), P.S. (Icahn School of Medicine at Mount Sinai), M. State (UCSF), P. Sullivan (UNC), F. Vaccarino (Yale), S. Weissman (Yale), K. White (UChicago) and P. Zandi (JHU). Data were generated as part of the CommonMind Consortium supported by funding from Takeda Pharmaceuticals Company Limited, F. Hoffman-La Roche Ltd and NIH grants R01MH085542, R01MH093725, P50MH066392, P50MH080405, R01MH097276, RO1-MH-075916, P50M096891, P50MH084053S1, R37MH057881, R37MH057881S1, HHSN271201300031C, AG02219, AG05138 and MH06692. Brain tissue for the study was obtained from the following brain bank collections: the Mount Sinai NIH Brain and Tissue Repository, the University of Pennsylvania Alzheimer’s Disease Core Center, the University of Pittsburgh NeuroBioBank and Brain and Tissue Repositories and the NIMH Human Brain Collection Core. CMC Leadership: P.S., J. Buxbaum (Icahn School of Medicine at Mount Sinai), B. Devlin, D. Lewis (University of Pittsburgh), R. Gur, C.-G. Hahn (University of Pennsylvania), K. Hirai, H. Toyoshiba (Takeda Pharmaceuticals Company Limited), E. Domenici, L. Essioux (F. Hoffman-La Roche Ltd), L. Mangravite, M.A.P. (Sage Bionetworks), T. Lehner and B.K.L. (NIMH). Data on coronary artery disease and myocardial infarction have been contributed by CARDIoGRAMplusC4D investigators. We also thank the International Genomics of Alzheimer’s Project (IGAP) for providing summary results data for these analyses. The investigators within IGAP contributed to the design and implementation of IGAP and/or provided data but did not participate in analysis or writing of this report. IGAP was made possible by the generous participation of the control subjects, the patients, and their families. The i–Select chips were funded by the French National Foundation on Alzheimer’s disease and related disorders. EADI was supported by the LABEX (Laboratory of Excellence Program Investment for the Future) DISTALZ grant, Inserm, Institut Pasteur de Lille, Université de Lille 2 and the Lille University Hospital. GERAD was supported by the Medical Research Council (grant no. 503480), Alzheimer’s Research UK (grant no. 503176), the Wellcome Trust (grant no. 082604/2/07/Z) and German Federal Ministry of Education and Research (BMBF): Competence Network Dementia (CND) grant no. 01GI0102, 01GI0711, 01GI0420. CHARGE was partly supported by NIH NIA grant R01 AG033193 and NIA AG081220 and AGES contract N01–AG–12100, NHLBI grant R01 HL105756, the Icelandic Heart Association, and the Erasmus Medical Center and Erasmus University. ADGC was supported by NIH NIA grants U01 AG032984, U24 AG021886 and U01 AG016976, and Alzheimer’s Association grant ADGC–10–196728.
The authors declare no competing financial interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Figures 1–19
Postmortem brain metadata
Quality control measurements for sorted nuclei and homogenate samples
Similarity (Jaccard) of consolidated datasets with REP and Sun et al. and Ng et al. data
QC measurements of all consolidated datasets for both marks
Cell composition of neuronal, neuronal depleted and bulk tissue samples
GREAT pathways enrichment of non-overlapping regions
LDSR score regression P values
Cell-specific and bulk tissue hQTLs
hQTLs overlap with GWAS SCZ loci
Brain region (ACC, PFC)-specific peaks in neurons and non-neurons
GREAT pathways enrichment of cell-specific peaks
GREAT pathways enrichment of brain region (ACC, PFC) peaks
About this article
Genome Biology (2019)
Current Opinion in Neurobiology (2019)
Schizophrenia Research (2019)
Analysis of Genetically Regulated Gene Expression Identifies a Trauma Type Specific PTSD Gene, SNRNP35
SSRN Electronic Journal (2019)