Abstract
Identification of risk variants for neuropsychiatric diseases within enhancers underscores the importance of understanding population-level variation in enhancer function in the human brain. Besides regulating tissue-specific and cell-type-specific transcription of target genes, enhancers themselves can be transcribed. By jointly analyzing large-scale cell-type-specific transcriptome and regulome data, we cataloged 30,795 neuronal and 23,265 non-neuronal candidate transcribed enhancers. Examination of the transcriptome in 1,382 brain samples identified robust expression of transcribed enhancers. We explored gene–enhancer coordination and found that enhancer-linked genes are strongly implicated in neuropsychiatric disease. We identified expression quantitative trait loci (eQTLs) for both genes and enhancers and found that enhancer eQTLs mediate a substantial fraction of neuropsychiatric trait heritability. Inclusion of enhancer eQTLs in transcriptome-wide association studies enhanced functional interpretation of disease loci. Overall, our study characterizes the gene–enhancer regulome and genetic mechanisms in the human cortex in both healthy and diseased states.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout





Data availability
Processed data (epigenomic annotations, TEns, differential analysis result, read count matrix of TEns and genes, gene–enhancer links, eQTL summary, and TWAS weights) have been deposited on the Synapse platform at https://doi.org/10.7303/syn25716684. Download requires registering for a Synapse account (https://www.synapse.org/#!RegisterAccount:0).
The multi-omics cohort, including ATAC-seq, RNA-seq, H3K4me3 ChIP-seq, H3K27me3 ChIP-seq, H3K27ac ChIP-seq and Hi-C data, is available through the AD Knowledge Portal Super Ager EpiMap Study at https://adknowledgeportal.synapse.org/Explore/Studies/DetailsPage?Study=syn25672226. The AD Knowledge Portal is a platform for accessing data, analyses and tools generated by the AMP AD Target Discovery Program and other National Institute on Aging (NIA)-supported programs to enable open-science practices and accelerate translational learning. The data, analyses and tools are shared early in the research cycle without a publication embargo on secondary use. Data are available for general research use according to the requirements for data access and data attribution detailed at https://adknowledgeportal.org/DataAccess/Instructions.
The CMC data are available through the CMC Knowledge Portal. The CMC investigators are committed to the release of data and analysis results, with the anticipation that data sharing in a rapid and transparent manner will speed the pace of research to the benefit of the greater research community. Data are available for general research use according to following requirements for data access and data attribution detailed at https://www.synapse.org//#!Synapse:syn2759792/wiki/197282.
Code availability
The TEn identification pipeline is available at https://zenodo.org/record/6845955 (ref. 109). Further information and requests for reagents may be directed to the corresponding author.
References
Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).
Sullivan, P. F. & Geschwind, D. H. Defining the genetic, genomic, cellular, and diagnostic architectures of psychiatric disorders. Cell 177, 162–183 (2019).
Grove, J. et al. Identification of common genetic risk variants for autism spectrum disorder. Nat. Genet. 51, 431–444 (2019).
Mullins, N. et al. Genome-wide association study of more than 40,000 bipolar disorder cases provides new insights into the underlying biology. Nat. Genet. 53, 817–829 (2021).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Wang, D. et al. Comprehensive functional genomic resource and integrative model for the human brain. Science 362, eaat8464 (2018).
Hauberg, M. E. et al. Common schizophrenia risk variants are enriched in open chromatin regions of human glutamatergic neurons. Nat. Commun. 11, 5581 (2020).
Girdhar, K. et al. Cell-specific histone modification maps in the human frontal lobe link schizophrenia risk to the neuronal epigenome. Nat. Neurosci. 21, 1126–1136 (2018).
de la Torre-Ubieta, L. et al. The dynamic landscape of open chromatin during human cortical neurogenesis. Cell 172, 289–304 (2018).
Schoenfelder, S. & Fraser, P. Long-range enhancer–promoter contacts in gene expression control. Nat. Rev. Genet. 20, 437–455 (2019).
Long, H. K., Prescott, S. L. & Wysocka, J. Ever-changing landscapes: transcriptional enhancers in development and evolution. Cell 167, 1170–1187 (2016).
Heinz, S., Romanoski, C. E., Benner, C. & Glass, C. K. The selection and function of cell type-specific enhancers. Nat. Rev. Mol. Cell Biol. 16, 144–154 (2015).
Yap, E.-L. & Greenberg, M. E. Activity-regulated transcription: bridging the gap between neural activity and behavior. Neuron 100, 330–348 (2018).
Core, L. J. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat. Genet. 46, 1311–1320 (2014).
Tippens, N. D. et al. Transcription imparts architecture, function and logic to enhancer units. Nat. Genet. 52, 1067–1075 (2020).
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
Gu, B. et al. Transcription-coupled changes in nuclear mobility of mammalian cis-regulatory elements. Science 359, 1050–1055 (2018).
Catarino, R. R. & Stark, A. Assessing sufficiency and necessity of enhancer activities for gene expression and the mechanisms of transcription activation. Genes Dev. 32, 202–223 (2018).
Hou, T. Y. & Kraus, W. L. Spirits in the material world: enhancer RNAs in transcriptional regulation. Trends Biochem. Sci. 46, 138–153 (2021).
Kim, T.-K. et al. Widespread transcription at neuronal activity-regulated enhancers. Nature 465, 182–187 (2010).
Yao, L. et al. A comparison of experimental assays and analytical methods for genome-wide identification of active enhancers. Nat. Biotechnol. 40, 1056–1065 (2022).
Roadmap Epigenomics Consortium, et al.Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Wang, X. & Goldstein, D. B. Enhancer domains predict gene pathogenicity and inform gene discovery in complex disease. Am. J. Hum. Genet. 106, 215–233 (2020).
Umans, B. D., Battle, A. & Gilad, Y. Where are the disease-associated eQTLs? Trends Genet. 37, 109–124 (2021).
Murakawa, Y. et al. Enhanced identification of transcriptional enhancers provides mechanistic insights into diseases. Trends Genet. 32, 76–88 (2016).
Chen, H. et al. A pan-cancer analysis of enhancer expression in nearly 9000 patient samples. Cell 173, 386–399 (2018).
Mikhaylichenko, O. et al. The degree of enhancer or promoter activity is reflected by the levels and directionality of eRNA transcription. Genes Dev. 32, 42–57 (2018).
Hauberg, M. E. et al. Differential activity of transcribed enhancers in the prefrontal cortex of 537 cases with schizophrenia and controls. Mol. Psychiatry 24, 1685–1695 (2019).
Sartorelli, V. & Lauberth, S. M. Enhancer RNAs are an important regulatory layer of the epigenome. Nat. Struct. Mol. Biol. 27, 521–528 (2020).
Lake, B. B. et al. Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain. Nat. Biotechnol. 36, 70–80 (2018).
Halfon, M. S. Studying transcriptional enhancers: the founder fallacy, validation creep, and other biases. Trends Genet. 35, 93–103 (2019).
Yao, P. et al. Coexpression networks identify brain region-specific enhancer RNAs in the human brain. Nat. Neurosci. 18, 1168–1174 (2015).
Trubetskoy, V. et al. Mapping genomic loci implicates genes and synaptic biology in schizophrenia. Nature 604, 502–508 (2022).
Hoffman, G. E. et al. CommonMind Consortium provides transcriptomic and epigenomic data for schizophrenia and bipolar disorder. Sci. Data 6, 180 (2019).
Fromer, M. et al. Gene expression elucidates functional impact of polygenic risk for schizophrenia. Nat. Neurosci. 19, 1442–1453 (2016).
Pennacchio, L. A., Bickmore, W., Dean, A., Nobrega, M. A. & Bejerano, G. Enhancers: five essential questions. Nat. Rev. Genet. 14, 288–295 (2013).
Skene, N. G. et al. Genetic identification of brain cell types underlying schizophrenia. Nat. Genet. 50, 825–833 (2018).
Imbrici, P., Camerino, D. C. & Tricarico, D. Major channels involved in neuropsychiatric disorders and therapeutic perspectives. Front. Genet. 4, 76 (2013).
Dietz, A. G., Goldman, S. A. & Nedergaard, M. Glial cells in schizophrenia: a unified hypothesis. Lancet Psychiatry 7, 272–281 (2020).
Zeng, B. et al. Multi-ancestry eQTL meta-analysis of human brain identifies candidate causal variants for brain-related traits. Nat. Genet. 54, 161–169 (2022).
Grubert, F. et al. Genetic control of chromatin states in humans involves local and distal chromosomal interactions. Cell 162, 1051–1065 (2015).
GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Sun, W. et al. Histone acetylome-wide association study of autism spectrum disorder. Cell 167, 1385–1397 (2016).
Bryois, J. et al. Evaluation of chromatin accessibility in prefrontal cortex of individuals with schizophrenia. Nat. Commun. 9, 3121 (2018).
Schmidt, E. M. et al. GREGOR: evaluating global enrichment of trait-associated variants in epigenomic features using a systematic, data-driven approach. Bioinformatics 31, 2601–2606 (2015).
Liu, X. et al. Functional architectures of local and distal regulation of gene expression in multiple human tissues. Am. J. Hum. Genet. 100, 605–616 (2017).
Yao, D. W., O’Connor, L. J., Price, A. L. & Gusev, A. Quantifying genetic effects on disease mediated by assayed gene expression levels. Nat. Genet. 52, 626–633 (2020).
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
Huckins, L. M. et al. Publisher Correction: Gene expression imputation across multiple brain regions provides insights into schizophrenia risk. Nat. Genet. 51, 1068 (2019).
Gusev, A. et al. Transcriptome-wide association study of schizophrenia and chromatin activity yields mechanistic disease insights. Nat. Genet. 50, 538–548 (2018).
Gandal, M. J. et al. Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder. Science 362, eaat8127 (2018).
Zhang, W. et al. Integrative transcriptome imputation reveals tissue-specific and shared biological mechanisms mediating susceptibility to complex traits. Nat. Commun. 10, 3834 (2019).
Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51, 592–599 (2019).
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
Giambartolomei, C. et al. A Bayesian framework for multiple trait colocalization from summary association statistics. Bioinformatics 34, 2538–2545 (2018).
Mancuso, N. et al. Probabilistic fine-mapping of transcriptome-wide association studies. Nat. Genet. 51, 675–682 (2019).
Wang, Q. et al. A Bayesian framework that integrates multi-omics data and gene networks predicts risk genes from schizophrenia GWAS data. Nat. Neurosci. 22, 691–699 (2019).
Liem, K. F. Jr, He, M., Ocbina, P. J. R. & Anderson, K. V. Mouse Kif7/Costal2 is a cilia-associated protein that regulates Sonic hedgehog signaling. Proc. Natl Acad. Sci. USA 106, 13377–13382 (2009).
Guo, J. et al. Developmental disruptions underlying brain abnormalities in ciliopathies. Nat. Commun. 6, 7857 (2015).
Baggelaar, M. P., Maccarrone, M. & van der Stelt, M. 2-Arachidonoylglycerol: a signaling lipid with manifold actions in the brain. Prog. Lipid Res. 71, 1–17 (2018).
Ogasawara, D. et al. Rapid and profound rewiring of brain lipid signaling networks by acute diacylglycerol lipase inhibition. Proc. Natl Acad. Sci. USA 113, 26–33 (2016).
Chen, H. & Liang, H. A high-resolution map of human enhancer RNA loci characterizes super-enhancer activities in cancer. Cancer Cell 38, 701–715 (2020).
Zhu, Y. et al. Predicting enhancer transcription and activity from chromatin modifications. Nucleic Acids Res. 41, 10032–10043 (2013).
Birnbaum, R. & Weinberger, D. R. Genetic insights into the neurodevelopmental origins of schizophrenia. Nat. Rev. Neurosci. 18, 727–740 (2017).
Bahl, E., Koomar, T. & Michaelson, J. J. cerebroViz: an R package for anatomical visualization of spatiotemporal brain data. Bioinformatics 33, 762–763 (2017).
Wang, M. et al. The Mount Sinai cohort of large-scale genomic, transcriptomic and proteomic data in Alzheimer’s disease. Sci. Data 5, 180185 (2018).
Hoffman, G. E. et al. Transcriptional signatures of schizophrenia in hiPSC-derived NPCs and neurons are concordant with post-mortem adult brains. Nat. Commun. 8, 2225 (2017).
Schrode, N. et al. Synergistic effects of common schizophrenia risk variants. Nat. Genet. 51, 1475–1485 (2019).
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
Wang, L., Wang, S. & Li, W. RSeQC: quality control of RNA-seq experiments. Bioinformatics 28, 2184–2185 (2012).
Amemiya, H. M., Kundaje, A. & Boyle, A. P. The ENCODE blacklist: identification of problematic regions of the genome. Sci. Rep. 9, 9354 (2019).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Kuhn, M. Building predictive models in R using the caret package. J. Stat. Softw. 28, 1–26 (2008).
Sing, T., Sander, O., Beerenwinkel, N. & Lengauer, T. ROCR: visualizing classifier performance in R. Bioinformatics 21, 3940–3941 (2005).
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
Hoffman, G. E. & Roussos, P. Dream: powerful differential expression analysis for repeated measures designs. Bioinformatics 37, 192–201 (2021).
Hoffman, G. E. & Schadt, E. variancePartition: interpreting drivers of variation in complex gene expression studies. BMC Bioinformatics 17, 483 (2016).
Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).
Friedman, J. H., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).
Demontis, D. et al. Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nat. Genet. 51, 63–75 (2019).
Bipolar Disorder and Schizophrenia Working Group of the Psychiatric Genomics Consortium. Genomic dissection of bipolar disorder and schizophrenia, including 28 subphenotypes. Cell 173, 1705–1715 (2018).
Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668–681 (2018).
Karlsson Linnér, R. et al. Genome-wide association analyses of risk tolerance and risky behaviors in over 1 million individuals identify hundreds of loci and shared genetic influences. Nat. Genet. 51, 245–257 (2019).
Jansen, P. R. et al. Genome-wide analysis of insomnia in 1,331,010 individuals identifies new risk loci and functional pathways. Nat. Genet. 51, 394–403 (2019).
Nagel, M., Watanabe, K., Stringer, S., Posthuma, D. & van der Sluis, S. Item-level analyses reveal genetic heterogeneity in neuroticism. Nat. Commun. 9, 905 (2018).
Jansen, I. E. et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet. 51, 404–413 (2019).
Nicolas, A. et al. Genome-wide analyses identify KIF5A as a novel ALS gene. Neuron 97, 1268–1283 (2018).
International Multiple Sclerosis Genetics Consortium. Multiple sclerosis genomic map implicates peripheral immune cells and microglia in susceptibility. Science 365, eaav7188 (2019).
Nalls, M. A. et al. Identification of novel risk loci, causal insights, and heritable risk for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet Neurol. 18, 1091–1102 (2019).
Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).
CARDIoGRAMplusC4D Consortium. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 47, 1121–1130 (2015).
Liu, J. Z. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015).
Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).
Traylor, M. et al. Genetic risk factors for ischaemic stroke and its subtypes (the METASTROKE Collaboration): a meta-analysis of genome-wide association studies. Lancet Neurol. 11, 951–962 (2012).
Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).
Mahajan, A. et al. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat. Genet. 50, 1505–1513 (2018).
International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003).
1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).
Koopmans, F. et al. SynGO: an evidence-based, expert-curated knowledge base for the synapse. Neuron 103, 217–234 (2019).
de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).
Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Haeussler, M. et al. The UCSC Genome Browser database: 2019 update. Nucleic Acids Res. 47, D853–D858 (2019).
Barbeira, A. N. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 9, 1825 (2018).
Dong, P. & Roussos, P. multi-omics identification of transcribed enhancers. Available at https://doi.org/10.5281/zenodo.6845955 (2022).
Fullard, J. F. et al. An atlas of chromatin accessibility in the adult human brain. Genome Res. 28, 1243–1252 (2018).
Stahl, E. A. et al. Genome-wide association study identifies 30 loci associated with bipolar disorder. Nat. Genet. 51, 793–803 (2019).
Acknowledgements
We thank the computational resources and staff expertise provided by the Scientific Computing group of the Icahn School of Medicine at Mount Sinai. The CommonMind data sets were generated as part of the CommonMind Consortium supported by funding from Takeda Pharmaceuticals Company Limited; F. Hoffman-La Roche Ltd; and NIH grants R01MH085542, R01MH093725, P50MH066392, P50MH080405, R01MH097276, RO1-MH-075916, P50M096891, P50MH084053S1, R37MH057881, AG02219, AG05138, MH06692, R01MH110921, R01MH109677, R01MH109897, U01MH103392, U01MH116442, project ZIC MH002903 and contract HHSN271201300031C through the NIMH Intramural Research Program (IRP). Brain tissue for the study was obtained from the following brain bank collections: the Mount Sinai/JJ Peters VA Medical Center NIH Brain and Tissue Repository, the University of Pennsylvania Alzheimer’s Disease Core Center, the University of Pittsburgh Brain Tissue Donation Program, and the NIMH Human Brain Collection Core. CMC leadership: P.R., J.D.B., A.C., S.A., V.H., B.D., D.A.L., R.G., C.-G.H., E.D., M.A.P., S.K.S., S.M., B.K.L. and F.J.M. This work is supported by the NIA through NIH grants R01-AG067025 (to P.R. and V.H.), R01-AG065582 (to P.R. and V.H.) and R01-AG050986 (to P.R.); by the NIMH through NIH grants R01-MH110921 (to P.R.), U01-MH116442 (to P.R. and V.H.), R01-MH125246 (to P.R.), R01-MH106056 (to P.R. and K.J.B.), R01-MH109897 (to P.R. and K.J.B.) and R01-MH121074 (to K.J.B.); and by the Veterans Affairs Merit grant BX002395 (to P.R.). P.D. was supported in part by NARSAD Young Investigator Grant 29683 from the Brain & Behavior Research Foundation. G.E.H. was supported in part by NARSAD Young Investigator Grant 26313 from the Brain & Behavior Research Foundation. J.B. was supported in part by NARSAD Young Investigator Grant 27209 from the Brain & Behavior Research Foundation.
Author information
Authors and Affiliations
Consortia
Contributions
P.R. conceived of and designed the project. J.F.F. and P.R. designed experimental strategies for epigenome profiling of human postmortem tissue. J.F.F. prepared nuclei and performed FANS. J.F.F. and R.M. generated ATAC-seq data. P.A. generated the ChIP-seq and multi-omics RNA-seq data. V.H. dissected and provided multi-omics brain specimens. S.R. generated Hi-C data. S.R., J.M.V., M.B.F., K.G.T. and K.J.B. performed the CRISPR interference experiments. P.D. and P.R. designed analytical strategies. J.B., K.G. and P.D. conducted initial bioinformatics, sample processing and quality control for the multi-omics cohort. G.E.H. and W.Z. conducted initial bioinformatics, sample processing and quality control for the CMC cohort. P.D. developed the computational scheme and performed the downstream analysis. P.D. and G.V. performed the TWAS analysis. B.Z. performed the MMQTL analysis. P.D. and P.R. wrote the manuscript with input from all authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 TEns identification.
a, The distribution of deconvolved RNA-seq cell type distribution for each sample (Nneuron = 47, Nnon-neuron = 46). Box plot indicates median, interquartile range (IQR) and 1.5 × IQR. b, Jaccard index between the Multi-omics peaks and previous reports. cell-type-specific ATAC-seq/H3K4me3/H3K27ac peaks were compared to our previous reports of the corresponding assays9,110. H3K27me3 was compared to Roadmap H3K27me3 peaks23. PFC indicates the prefrontal cortex. c, Empirical density distribution of FANTOM5 enhancer size; based on the curve, we chose 500 bp as the TEn size. d, Feature importance heatmap from random forest models. e, Distribution of TEn numbers per super-enhancer in the two cell types (Nneuron = 2,049, Nnon-neuron = 1,946). Box plots as in a. f, Distance to the nearest TEns for every TEn within or outside of super-enhancer regions (Nneuron = 29,555, Nnon-neuron = 23,260, P < 10−16 for both cell types, two-sided Wilcoxon test). Box plots as in a. g, Epigenomic profiles of expressed TEns, and promoters of protein-coding genes and lincRNAs. The line represents the mean, and the shadow indicates a 95% confidence interval.
Extended Data Fig. 2 TEns identification methods compare.
a, Overlap between our model(RF) and logistic regression (logit), ChromHMM +OCR (ChromHMM), and the OCR only (OCR) method in neuronal and non-neural cells (Methods). b, Strand-specific CAGE tags average profile for the shared and methods-specific enhancers. In all three comparisons, our model-specific enhancers are enriched for CAGE tags, while the logit-, ChromHMM-, and OCR-specific are depleted of CAGE tags. The radar plots show typical enhancer-related signals including H3K4me3, H3K27ac, H3K27me3, super-enhancer, and loop anchor occupancy of OCR only specific, our model-specific, and shared enhancers of the two methods in neuronal c, and non-neuronal d, cells. The value within parentheses indicates the odds ratio (OR) and P value between OCR only specific and the rest (one-sided Fisher exact test).
Extended Data Fig. 3 Differential expression/activity between neuronal and non-neuronal cells.
a, ChromHMM identified 6 chromatin states including EnhA (active enhancer), TssA (active promoters), TssBiv (bivalent promoters), TssFlnk (promoter flanking region), ReprPC (polycomb repression region), and Quies (other regions). b, Comparing the cell-type-specific Chromatin states with the roadmap DLPFC result. c, Jaccard index between neuronal and non-neuronal chromatin states. Compared to active promoters (TssA), and polycomb repressed regions (ReprPC), active enhancers (EnhA) are remarkably more different between cell types. d, Overlap of expressed TEns between neuronal and non-neuronal cells. e, π1 statistics of differential activity between the two cell types of different molecular markers. f, Violin plot shows the variance explained by different factors for the five markers (NATACSeq = 98, NTEns = Ngene = 93, NH3K27ac = 96, NH3K4me3 = 96). Box plot indicates median, interquartile range (IQR), and 1.5 × IQR.
Extended Data Fig. 4 GWAS association with different classes of enhancers.
a. Stratified LD score regression of different classes of enhancers across different classes of traits. A positive coefficient signifies enrichment in heritability (per base enrichment, mean ± standard error).’·’: Nominally significant (P < 0.05);’+’: significant after FDR (Benjamini & Hochberg) correction (FDR < 0.05). b, coverage (Mb) and numbers (k) of different classes of enhancers. c, Stratified LD score regression of cell-type-specific enhancers defined by TEns, the intersection of differential H3K27ac and noncoding OCR (Diff H3K27ac ∩ OCR), and the intersection of differential noncoding OCR and H3K27ac (Diff OCR ∩ H3K27ac) across different classes of traits as in a. d, coverage (Mb), and numbers (k) of the three types of enhancers.
Extended Data Fig. 5 enhancer-gene link.
a, the expression level of different classes of genes and enhancers (Nintergeric_enhancer = 4,626, Nintronic_enhancer = 37,358, NlincRNA = 2,915, Nprotein_coding_gene = 16,468). Box plot indicates median, interquartile range (IQR), and 1.5 × IQR. b, Pairwise Spearman correlation coefficient (SCC) between samples of different classes of genes/enhancers of discovery vs discovery (DD), discovery vs replicate (DR), and replicate vs replicate (RR) (Nintergeric_enhancer = 4,080, Nintronic_enhancer = 36,493, NlincRNA = 2,665, Nprotein_coding_gene = 16,175). Box plots as in a. c, Distribution of gene FPKM (fragments per kilobase of transcript per million mapped reads) and counts for linked enhancers. Linked genes have a significantly higher gene expression (FPKM, two-sided Wilcoxon test, P < 10−16). Box plots as in a. d, Distribution of enhancer CPM and for counts linked genes. Linked enhancers do not have a higher expression (CPM, two-sided Wilcoxon test, P = 0.96). Box plots as in a. e, Distance between enhancers to genes of linked group and background (Nlink = 35,964, Nbackground = 241,040). f, Distance between enhancers to genes of linked group and background, physically-overlapped gene-enhancer pairs were excluded (Nlink = 22,920, Nbackground = 220,062). Box plots as in a. g, Neuronal vs non-neuronal t statistics for linked and non-linked enhancers. Linked enhancer has a significantly higher value (two-sided KS test, P < 10−16).
Extended Data Fig. 6 enhancer-gene link.
a, Scatter plot of pairwise Spearman correlation (SCC) between linked enhancers and genes for the Discovery set (x-axis) and Replicate set (y-axis), the two sets of correlations are highly consistent (SCC = 0.79, P < 10−16, two-sided Spearman correlation test). b, Neuropsychiatric diseases and behavioral traits, neurological diseases, and non-brain related trait enrichment with different classes of genes (determined by MAGMA). For BD and SCZ, we included different GWAS versions including PGC2 SCZ1 (Ncase = 36,989, Ncontrol = 113,075), PGC3 SCZ34 (Ncase = 69,369, Ncontrol = 236,642), PGC2 BD111 (Ncase = 29,764, Ncontrol = 169,118), PGC3 BD5 (Ncase = 41,917, Ncontrol = 371,549). c, Pairwise comparison of the coefficients of significant traits for enhancer-linked neuronal genes (one-sided t-test). The color of the heatmap exhibits the one-sided t-test significance (row vs column).’·’: Nominally significant (P < 0.05);’+’: significant after FDR (Benjamini & Hochberg) correction (FDR < 0.05).
Extended Data Fig. 7 gene and enhancer eQTL.
a, Percentage of genes/enhancers that are cis-heritable (GCTA nominal P < 0.05 and cis-heritability > 0). b, Overlap of cis-heritable genes between ACC and DLPFC. c, Overlap of cis-heritable enhancers between ACC and DLPFC. d, Dot plots illustrate the first two genetic ancestry principal components (PCs) for individuals reported (report) to have European ancestry and the individuals selected (select) to have European ancestry based on sd to the center. e, The replication of eQTL from the discovery set in the replicate set across different P value cutoffs for the discovery set. π1 values (proportion of true positive p values) for the discovery set significant eQTL in our replicate eQTLs. SCC values are the Spearman correlation coefficients of significant eQTL effect sizes between the two sets. f, The percentage of genes/enhancers that have significant eQTL are cis-heritable across different P value cutoffs. Overlap of the cis heritable transcript with the transcript with eQTL for genes g, and h, enhancer.
Extended Data Fig. 8 consistent genetic effects between linked genes and enhancers.
a, The distribution of genomic distances from eSNPs to the TSSs for different classes of transcripts. b, The replication of reported eQTL in our analysis. π1 values (proportion of true positive P values) for reported significant GTEx eQTL in our gene eQTLs. SCC values are the Spearman correlation coefficients of significant eQTL effect sizes between GTEx eQTL and corresponding pairs in our data. The bar color represents the number of unique genes used. c, for both physically overlapping and non-overlapping pairs, we subsampled from the non-linked pairs to generate backgrounds that have similar levels of enhancer-TSS distance distributions compared to linked pairs (numbers in the parenthesis indicate the number of pairs). Box plot indicates median, interquartile range (IQR) and 1.5 × IQR. d, The allelic genetic effect between gene and target enhancers considering the gene-enhancer physically-overlapping and enhancer-TSS effects. The background is controlled for enhancer-TSS distance.
Extended Data Fig. 9 quantile-quantile plot of GWAS P values.
Quantile–quantile plot of GWAS p values across different GWAS traits. EeQTL specific, eQTL specific, and shared SNPs are shown in comparison with genome-wide SNPs. GWAS SNPs were binarily annotated using SNPs within r2 > 0.8 of the eSNP.
Extended Data Fig. 10 GWAS association.
a, Stratified LD score regression of neuronal and non-neuronal enhancer/gene eQTL across different classes of traits. The gene/enhancer eQTLs are assigned to Neuronal and non-Neuronal groups based on the differential expression status between the two cell types. A positive coefficient signifies enrichment in heritability (per base enrichment, mean ± standard error).’·’: Nominally significant (P < 0.05);’+’: significant after FDR (Benjamini & Hochberg) correction (FDR < 0.05). b, Gene TWAS Z scores compared to published reports52,53. SCC represents the Spearman correlation coefficient (ρ) (all P < 10−16, two-sided Spearman correlation test). c, For different types of transcripts, the TWAS Z scores between the two brain regions. SCC represents the Spearman correlation coefficient (ρ)(all P < 10−16, two-sided Spearman correlation test). d, Aligned Manhattan plots of SCZ GWAS and EeQTLs at the enh41216 locus generated by LocusCompare. SNPs are colored by LD (r2) with the lead EeQTL (rs3247233).
Supplementary information
Supplementary Information
Supplementary Methods and Supplementary Fig. 1.
Supplementary Tables
Supplementary Tables 1–8.
Rights and permissions
About this article
Cite this article
Dong, P., Hoffman, G.E., Apontes, P. et al. Population-level variation in enhancer expression identifies disease mechanisms in the human brain. Nat Genet 54, 1493–1503 (2022). https://doi.org/10.1038/s41588-022-01170-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-022-01170-4
This article is cited by
-
The three-dimensional landscape of cortical chromatin accessibility in Alzheimer’s disease
Nature Neuroscience (2022)