Abstract
Noncoding variants of presumed regulatory function contribute to the heritability of neuropsychiatric disease. A total of 2,221 noncoding variants connected to risk for ten neuropsychiatric disorders, including autism spectrum disorder, attention deficit hyperactivity disorder, bipolar disorder, borderline personality disorder, major depression, generalized anxiety disorder, panic disorder, post-traumatic stress disorder, obsessive-compulsive disorder and schizophrenia, were studied in developing human neural cells. Integrating epigenomic and transcriptomic data with massively parallel reporter assays identified differentially-active single-nucleotide variants (daSNVs) in specific neural cell types. Expression-gene mapping, network analyses and chromatin looping nominated candidate disease-relevant target genes modulated by these daSNVs. Follow-up integration of daSNV gene editing with clinical cohort analyses suggested that magnesium transport dysfunction may increase neuropsychiatric disease risk and indicated that common genetic pathomechanisms may mediate specific symptoms that are shared across multiple neuropsychiatric diseases.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All raw and processed sequencing data are available in GEO accession GSE182095. For ease of reference, processed TPM values for RNA-seq are provided in the data supplement (Supplementary Table 13). Tracks for ATAC and HiChIP were visualized on WashU Epigenome Browser. Raw and processed RNA, ATAC and HiChIP D0 and D2 Ngn2-derived H9 samples are referenced85. All MPRA summary statistics and raw count results are provided (Supplementary Data 5) and processed data is available at https://arvid-data.shinyapps.io/neuropsychiatry/ or Supplementary Data 3. Previously published GWAS study data used as a basis for this work is noted in each study published and is annotated in online GWAS data resources, including https://www.ebi.ac.uk/gwas/ and http://www.nealelab.is/uk-biobank/. For LDSC scoring, GWAS data used was available and preprocessed by https://alkesgroup.broadinstitute.org/LDSCORE/all_sumstats/. For colocalization studies, available summary statistics are provided: https://zenodo.org/record/3518299#.XbMgFNF7m90. Additional publicly available data sets used include GTEx v7, Haploreg v4, ENCODE hg19, StringDB (https://string-db.org/), OMIM (https://www.omim.org/), The Drug Repurposing Hub (http://www.broadinstitute.org/repurposing), PsychENCODE (http://resource.psychencode.org/), HOCOMOCO v11 (https://hocomoco11.autosome.org/), UCSC browser (https://genome.ucsc.edu/), Brainmap SMART-seq cortical data (http://portal.brain-map.org/atlases-and-data/rnaseq/human-multiple-cortical-areas-smart-seq), SCHEMA (https://schema.broadinstitute.org/) and SNVlocs.Hsapiens.dbSNV142.GRCh37. VA cohort data have restricted access due to privacy concerns.
Code availability
Analyses were done in custom Jupyter Notebook or Rmarkdown scripts in Python 3.7.4 and R 3.6.1, locally or on the Stanford Sherlock computing cluster. Code to analyze transcriptomics and epigenomics data is available on GitHub (https://github.com/mguo123/pan_omics_psych.git)86. MPRA-based analysis scripts are available here (https://github.com/mguo123/psych_mpra.git)87. Additional software used includes LDSC (LD Score) (v1.0.1), MPRAnalyze (v1.4.0), STAR aligner (v2.5.4b), RSEM (v1.3.0), ENCODE ATAC–seq pipeline (https://github.com/ENCODE-DCC/atac-seq-pipeline), Bowtie2 (2.3.4.1) and EnrichR (https://maayanlab.cloud/Enrichr/). ChIPSeeker (v1.22.0), motifBreakR (v2.10.2), rgt (https://github.com/CostaLab/reg-gen), ClusterProfileR (v3.14.0), RColorBrewer (v1.1.0) HiC-Pro (v2.11.1), Hichipper (v 0.7.7), FitHiChIP (v7.0.0), diffloop(v1.10.0), DESeq2 (v1.26.0), CytoScape v3.7.2, ABC-Enhancer-Gene-Prediction (https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction), gatk (v4.1.9.0), picard (v2.24.0), MACS2 (v2.1.1), enloc (https://github.com/xqwen/integrative), PhenomeXcan (https://github.com/hakyimlab/phenomexcan), gkmSVM (v0.82.0), DeepSea (http://deepsea.princeton.edu/job/analysis/create/), pheatmap (v1.0.12), biothings (v0.2.6) and GenomicRanges (v.1.48.0), and Rsubread (v2.0.0).
References
Schizophrenia Working Group of the Psychiatric Genomics Consortium Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
PsychENCODE Consortium et al.The PsychENCODE project. Nat. Neurosci. 18, 1707–1712 (2015).
Ombrato, L. et al. Metastatic-niche labelling reveals parenchymal cells with stem features. Nature 572, 603–608 (2019).
Witt, S. H. et al. Genome-wide association study of borderline personality disorder reveals genetic overlap with bipolar disorder, major depression and schizophrenia. Transl. Psychiatry 7, e1155 (2017).
Meier, S. M. et al. Genetic variants associated with anxiety and stress-related disorders: a genome-wide association study and mouse-model study. JAMA Psychiatry 76, 924–932 (2019).
Tam, V. et al. Benefits and limitations of genome-wide association studies. Nat. Rev. Genet. 20, 467–484 (2019).
Zhang, F. & Lupski, J. R. Non-coding genetic variants in human disease. Hum. Mol. Genet. 24, R102–R110 (2015).
Won, H. et al. Chromosome conformation elucidates regulatory relationships in developing human brain. Nature 538, 523–527 (2016).
Gandal, M. J. et al. Shared molecular neuropathology across major psychiatric disorders parallels polygenic overlap. Science 359, 693–697 (2018).
Mulvey, B., Lagunas, T. & Dougherty, J. D. Massively parallel reporter assays: defining functional psychiatric genetic variants across biological contexts. Biol. Psychiatry 89, 76–89 (2021).
Tewhey, R. et al. Direct identification of hundreds of expression-modulating variants using a multiplexed reporter assay. Cell 165, 1519–1529 (2016).
Ulirsch, J. C. et al. Systematic functional dissection of common genetic variation affecting red blood cell traits. Cell 165, 1530–1545 (2016).
Lonsdale, J. et al. The genotype-tissue expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).
Zhang, Y. et al. Rapid single-step induction of functional neurons from human pluripotent stem cells. Neuron 78, 785–798 (2013).
Ashuach, T. et al. MPRAnalyze: statistical framework for massively parallel reporter assays. Genome Biol. 20, 183 (2019).
Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 50, 1171–1179 (2018).
Buenrostro, J. D., Wu, B., Chang, H. Y. & Greenleaf, W. J. ATAC–seq: a method for assaying chromatin accessibility genome-wide. Curr. Protoc. Mol. Biol. 109, 21.29.1–9 (2015).
Mumbach, M. R. et al. HiChIP: efficient and sensitive analysis of protein-directed genome architecture. Nat. Methods 13, 919–922 (2016).
Song, M. et al. Mapping cis-regulatory chromatin contacts in neural cells links neuropsychiatric disorder risk variants to target genes. Nat. Genet. 51, 1252–1262 (2019).
Zhang, S. et al. Allele-specific open chromatin in human iPSC neurons elucidates functional disease variants. Science 369, 561–565 (2020).
Inoue, F. et al. A systematic comparison reveals substantial differences in chromosomal versus episomal encoding of enhancer activity.Genome Res. 27, 38–52 (2017).
Herdegen, T. & Waetzig, V. AP-1 proteins in the adult brain: facts and fiction about effectors of neuroprotection and neurodegeneration. Oncogene 20, 2424–2437 (2001).
Chew, L. J. et al. Sox17 regulates a program of oligodendrocyte progenitor cell expansion and differentiation during development and repair. Cell Rep. 29, 3173–3186 (2019).
Ruiz i Altaba, A., Palma, V. & Dahmane, N. Hedgehog–GLI signaling and the growth of the brain. Nat. Rev. Neurosci. 3, 24–33 (2002).
Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).
Skene, N. G. et al. Genetic identification of brain cell types underlying schizophrenia. Nat. Genet. 50, 825–833 (2018).
Hook, P. W. & McCallion, A. S. Leveraging mouse chromatin data for heritability enrichment informs common disease architecture and reveals cortical layer contributions to schizophrenia. Genome Res. 30, 528–539 (2020).
Bulik-Sullivan, B. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Wang, D. et al. Comprehensive functional genomic resource and integrative model for the human brain. Science 362, eaat8464 (2018).
Kinney, D. K. et al. A unifying hypothesis of schizophrenia: abnormal immune system development may help explain roles of prenatal hazards, post-pubertal onset, stress, genes, climate, infections, and brain dysfunction. Med. Hypotheses 74, 555–563 (2010).
Shao, Z. et al. Dysregulated protocadherin-pathway activity as an intrinsic defect in induced pluripotent stem cell-derived cortical interneurons from subjects with schizophrenia. Nat. Neurosci. 22, 229–242 (2019).
Hoseth, E. Z. et al. Attenuated Notch signaling in schizophrenia and bipolar disorder. Sci. Rep. 8, 5349 (2018).
Onwordi, E. C. et al. Synaptic density marker SV2A is reduced in schizophrenia patients and unaffected by antipsychotics in rats. Nat. Commun. 11, 246 (2020).
Yabut, O. R., Fernandez, G., Huynh, T., Yoon, K. & Pleasure, S. J. Suppressor of fused is critical for maintenance of neuronal progenitor identity during corticogenesis. Cell Rep. 12, 2021–2034 (2015).
Nord, A. S., Pattabiraman, K., Visel, A. & Rubenstein, J. L. R. Genomic perspectives of transcriptional regulation in forebrain development. Neuron 85, 27–47 (2015).
Zhang, S. et al. OCT4 and PAX6 determine the dual function of SOX2 in human ESCs as a key pluripotent or neural factor. Stem Cell Res. Ther. 10, 122 (2019).
Hodge, R. D. et al. Conserved cell types with divergent features in human versus mouse cortex. Nature 573, 61–68 (2019).
Nichol, H., Amilhon, B., Manseau, F., Badrinarayanan, S. & Williams, S. Electrophysiological and morphological characterization of Chrna2 cells in the subiculum and Ca1 of the hippocampus: an optogenetic investigation. Front. Cell. Neurosci. 12, 32 (2018).
Giralt, A. et al. Pyk2 modulates hippocampal excitatory synapses and contributes to cognitive deficits in a Huntington’s disease model. Nat. Commun. 8, 15592 (2017).
Arjona, F. J. et al. CNNM2 mutations cause impaired brain development and seizures in patients with hypomagnesemia. PLoS Genet. 10, 1004267 (2014).
Li, M. et al. A human-specific AS3MT isoform and BORCS7 are molecular risk factors in the 10q24.32 schizophrenia-associated locus. Nat. Med. 22, 649–656 (2016).
Franken, G. A. C. et al. Cyclin M2 (CNNM2) knockout mice show mild hypomagnesaemia and developmental defects. Sci. Rep. 11, 8217 (2021).
Adam, M. P., Mirzaa, G. M., Pagon, R. A., Wallace, S.E., Bean, L. J. H., Gripp, K. W. & Amemiya, A. (eds.). GeneReviews (University of Washington, 1993).
Katsuki, A. et al. A single-nucleotide polymorphism influences brain morphology in drug-näve patients with major depressive disorder. Neuropsychiatr. Dis. Treat. 15, 2425–2432 (2019).
Logan, T. T., Rusnak, M. & Symes, A. J. Runx1 promotes proliferation and neuronal differentiation in adult mouse neurosphere cultures. Stem Cell Res. 15, 554–564 (2015).
Singh, T., Neale, B. M., Daly, M. J. et al. Exome sequencing identifies rare coding variants in 10 genes which confer substantial risk for schizophrenia. Preprint at medRxiv https://doi.org/10.1101/2020.09.18.20192815 (2022).
Andrade, A. et al. Genetic associations between voltage-gated calcium channels and psychiatric disorders. Int. J. Mol. Sci. 20, 3537 (2019).
Davis, K. A. S. et al. Mental health in UK Biobank: development, implementation and results from an online questionnaire completed by 157 366 participants: a reanalysis. BJPsych Open 6, e18 (2020).
Subramanian, A. et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171, 1437–1452 (2017).
Yaghoobi, M. M., Mowla, S. J. & Tiraihi, T. Nucleostemin, a coordinator of self-renewal, is expressed in rat marrow stromal cells and turns off after induction of neural differentiation. Neurosci. Lett. 390, 81–86 (2005).
Teng, X. et al. KCTD: a new gene family involved in neurodevelopmental and neuropsychiatric disorders. CNS Neurosci. Ther. 25, 887–902 (2019).
Imbrici, P., Camerino, D. C. & Tricarico, D. Major channels involved in neuropsychiatric disorders and therapeutic perspectives. Front. Genet. 4, 76 (2013).
Meng, Q. et al. Integrative analyses prioritize GNL3 as a risk gene for bipolar disorder. Mol. Psychiatry 25, 2672–2684 (2020).
Eby, G. A. & Eby, K. L. Rapid recovery from major depression using magnesium treatment. Med. Hypotheses 67, 362–370 (2006).
Myint, L. et al. Testing the regulatory consequences of 1,049 schizophrenia associated variants with a massively parallel reporter assay. Preprint at bioRxiv https://doi.org/10.1101/447557 (2019).
Calakos, N. et al. Functional evidence implicating a novel TOR1A mutation in idiopathic, late-onset focal dystonia. J. Med. Genet. 47, 646–650 (2010).
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).
Amberger, J. S., Bocchini, C. A., Schiettecatte, F., Scott, A. F. & Hamosh, A. OMIM.org: Online Mendelian Inheritance in Man (OMIM), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 43, D789–D798 (2015).
Ward, L. D. & Kellis, M. HaploReg v4: systematic mining of putative causal variants, cell types, regulators and target genes for human complex traits and disease. Nucleic Acids Res. 44, D877–D881 (2016).
De Souza, N. The ENCODE project. Nat. Methods 9, 1046 (2012).
Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019).
Du, Z. W. et al. Generation and expansion of highly pure motor neuron progenitors from human pluripotent stem cells. Nat. Commun. 6, 6626 (2015).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Kim, D. et al. The dynamic, combinatorial cis-regulatory lexicon of epidermal differentiation. Nat. Genet. 53, 1564–1576 (2021).
Kim, H. K. et al. Deep learning improves prediction of CRISPR-Cpf1 guide RNA activity. Nat. Biotechnol. 36, 239–241 (2018).
Concordet, J. & Haeussler, M. CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens. Nucleic Acids Res. 46, 242–245 (2018).
Gasperini, M. et al. A genome-wide framework for mapping gene regulation via cellular genetic screens. Cell 176, 377–390 (2019).
Labun, K., Montague, T. G., Krause, M., Cleuren, Y. N. T. & Valen, E. CHOPCHOP v3: expanding the CRISPR web toolbox beyond genome editing. Nucleic Acids Res. 47, 171–174 (2019).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).
Corces, M. R. et al. Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat. Genet. 48, 1193–1203 (2016).
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
Lareau, C. A. & Aryee, M. J. Hichipper: a preprocessing pipeline for calling DNA loops from HiChIP data. Nat. Methods 15, 155–156 (2018).
Bhattacharyya, S., Chandra, V., Vijayanand, P. & Ay, F. FitHiChIP: identification of significant chromatin contacts from HiChIP data. Preprint at bioRxiv https://doi.org/10.1101/412833 (2018).
Li, Z. et al. Identification of transcription factor binding sites using ATAC–seq. Genome Biol. 20, 45 (2019).
Yu, G., Wang, L. G., Han, Y. & He, Q. Y. ClusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16, 284–287 (2012).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Donohue, L. K. H. et al. A cis-regulatory lexicon of DNA motif combinations mediating cell-type-specific gene regulation. Cell Genom. 2, 100191 (2022).
Szklarczyk, D. et al. STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. 47, D607–D613 (2019).
Welch, J. D. et al. Single-cell multi-omic integration compares and contrasts features of brain cell identity. Cell 177, 1873–1887 (2019).
Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics 14, 128 (2013).
Singh, T. et al. Rare coding variants in ten genes confer substantial risk for schizophrenia. Nature 604, 509–516 (2022).
Ghandi, M., Lee, D., Mohammad-Noori, M. & Beer, M. A. Enhanced regulatory sequence prediction using gapped k-mer features. PLoS Comput. Biol. 10, 1003711 (2014).
Ang, C. E. et al. The dynamic interplay between homeodomain transcription factors and chromatin environment regulates proneural factor outcomes. Preprint at bioRxiv https://doi.org/10.1101/2020.12.02.398677 (2020).
mguo123. mguo123/pan_omics_psych: publication release. Zenodo. https://zenodo.org/record/8098407 (2023).
mguo123. mugo123/psych_mpra: publication release. Zenodo. https://zenodo.org/record/8098409 (2023).
Duncan L, Yilmaz Z, Gaspar H, et al. Significant Locus and Metabolic Genetic Correlations Revealed in Genome-Wide Association Study of Anorexia Nervosa. Am. J. Psychiatry 174, 850–858 (2017).
Demontis, D., Walters, R.K., Martin, J. et al. Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nat. Genet. 51, 63–75 (2019).
Grove, J., Ripke, S., Als, T.D. et al. Identification of common genetic risk variants for autism spectrum disorder. Nat. Genet. 51, 431–444 (2019).
Bipolar Disorder and Schizophrenia Working Group of the Psychiatric Genomics Consortium. Electronic address: douglas.ruderfer@vanderbilt.edu; Bipolar Disorder and Schizophrenia Working Group of the Psychiatric Genomics Consortium. Genomic Dissection of Bipolar Disorder and Schizophrenia, Including 28 Subphenotypes. Cell 173, 1705–1715 (2018).
Stahl, E.A., Breen, G., Forstner, A.J. et al. Genome-wide association study identifies 30 loci associated with bipolar disorder. Nat. Genet. 51, 793–803 (2019).
Pardiñas, A.F., Holmans, P., Pocklington, A.J. et al. Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection. Nat. Genet. 50, 381–389 (2018).
Wray, N.R., Ripke, S., Mattheisen, M. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668–681 (2018).
Howard, D.M., Adams, M.J., Clarke, TK. et al. Genome-wide meta-analysis of depression identifies 102 independent variants and highlights the importance of the prefrontal brain regions. Nat. Neurosci. 22, 343–352 (2019).
Nagel, M., Jansen, P.R., Stringer, S. et al. Meta-analysis of genome-wide association studies for neuroticism in 449,484 individuals identifies novel genetic loci and pathways. Nat. Genet. 50, 920–927 (2018).
International League Against Epilepsy Consortium on Complex Epilepsies. Electronic address: epilepsy-austin@unimelb.edu.au. Genetic determinants of common epilepsies: a meta-analysis of genome-wide association studies. Lancet Neurol. 13, 893–903 (2014).
Jansen, I.E., Savage, J.E., Watanabe, K. et al. Genome-wide meta-analysis identifies new loci and functional pathways influencing Alzheimer’s disease risk. Nat. Genet. 51, 404–413 (2019).
Marioni, R.E., Harris, S.E., Zhang, Q. et al. GWAS on family history of Alzheimer’s disease. Transl. Psychiatry 8, 99 (2018).
Corces, M.R., Shcherbina, A., Kundu, S. et al. Single-cell epigenomic analyses implicate candidate causal variants at inherited risk loci for Alzheimer’s and Parkinson’s diseases. Nat Genet 52, 1158–1168 (2020).
Ferreira MAR, Mathur R, Vonk JM, et al. Genetic Architectures of Childhood- and Adult-Onset Asthma Are Partly Distinct. Am J Hum Genet. 104, 665–684 (2019).
de Lange, K., Moutsianas, L., Lee, J. et al. Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease. Nat Genet 49, 256–261 (2017).
Acknowledgements
We thank S. Srinivasan for the creation of the web resource. We thank G. Rayant and K. Fields for their generous support and helpful discussions. We also thank M.P. Snyder, W.J. Greenleaf, D.F. Levinson and H.Y. Chang for the presubmission review and J. Engreitz and members of the Khavari and Altman laboratories for helpful discussions. This work was supported by the USVA Office of Research and Development and by the Atlas of Regulatory Variants in Disease (ARVID) project from NHGRI/NIH U24HG010856 and by NIAMS/NIH AR076965 (both to P.A.K.). S.B.M. was supported by R01AG066490 and R01MH125244. A.E.U. was supported by P50HG00773506. This work was supported by Bitscopic’s R&D budget and intramural funding from the Department of Veterans Affairs. The funders had no role in study design, data collection and analysis, the decision to publish or the preparation of the manuscript. We thank C.C. Lee, H. Parekh and J. Mewton for Praedico maintenance, assistance with data extraction and professional support.
Author information
Authors and Affiliations
Contributions
M.G.G., M.W. and P.A.K. conceptualized the project. M.G.G., D.L.R., C.E.A., Y.L., Y.Y., Y.Z., L.N.K., L.K.H.D., X.Y., L.M., T.F., I.E., A.H., Z.S., Y.P., V.B. and N.A. performed experiments and analyzed the data. M.G.G., P.A.K., M.W., R.B.A., L.E.D., A.E.U., S.B.M., P.E., M.H. and D.H.G. guided methodology development, experiments and data analysis. M.G.G. and P.A.K. wrote the manuscript with input from all authors.
Corresponding author
Ethics declarations
Competing interests
V.B. and P.E. are employees of Bitscopic. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Yin Shen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 MPRA QC statistics.
(a) Bar chart showing number of reads per MPRA sample in log scale. Replicates are the number following the “R” prefix. Cell type abbreviations are as follows: AST= astrocytes; ES=hESC or human embryonic stem cell; A-NPC = anterior neural progenitor cell; P-NSC = posterior neural progenitor cell; N-DX = induced neuron of day X. Histograms showing barcodes per sequence in the (b) plasmid (prior to lentiviral infection) and (c) RNA library (extract post infection). (d) Power analysis for different levels of barcodes power for at 4 different log2-fold change thresholds (1.2, 1.5, 2, 3), for a total of 20 variants, simulated 5 times with 5, 10, 20, 50, and 100 barcodes each. (e) QQ plots showing the -log10 empirical vs theoretical p-values derived from MPRAnalyze for HEK293T as a cell-type example. The red line is (y = x). (f) Histograms showing barcodes per sequence in the RNA library, by cell type. (g) Heatmap showing Pearson count correlation between replicates for all cell types, conditions, and replicates.
Extended Data Fig. 2 Epigenetics study of the role of transcription regulation in neuropsychiatric diseases.
(a) Heatmap showing TF footprints that are enriched in cell types; color scale is normalized count values. (b) GO biological process dotplot depicting enrichment terms for genes closest to ATAC accessible peaks found across ES-derived neuronal differentiation. The size of the dot is the number of genes in the GO geneset and the color indicates FDR-adjusted p-values. (c) Bar chart showing frequency of loop types in promoters and promoter interaction anchor loops (putative enhancers) derived from HiChIP data. Type 1: where an enhancer is linked to a distal gene and the nearest gene, Type 2: where an enhancer is linked only to a distal gene, Type 3: where an enhancer is looped to the closest gene. (d) % of P-P (promoter-promoter) and P-PIR (promoter to promoter interaction regions) loops per cell type found via HiChIP. (e) Cumulative distribution curves of distance between loop anchors for the different tissues. (f) Heatmap (left) showing normalized enrichment scores of motifs broken or gained by SNVs associated with different neuropsychiatric diseases derived from MotifBreakR, relative to a background of other neuropsychiatric diseases. The * refers to motifs that are significantly broken (p-value < 0.10, Fisher’s exact test) in daSNVs compared to non-daSNVs for a specific disease. The heatmap (right) shows the log TPM expression values of these transcription factors in different neuronal cell lines and cell lines. (g) Scatterplot comparing log-2 fold changes (n = 206 variants) for the MPRA dataset (y-axis) with an external20 allele specific open chromatin dataset (x-axis), with a Pearson correlation of 0.48, p-value 1.7×10−13.
Extended Data Fig. 3 eGene network analysis of additional diseases.
eGene networks for the additional neuropsychiatric diseases with at least 20 eGenes (from left to right, top to bottom): MDD, BPD, OCD, ADHD, and GAD.
Extended Data Fig. 4 POU5F1/OCT4 vignette.
(a) Tracks for the POU5F1/OCT4 TF gene, where the peak tracks show the logFC change from cell-type specific MPRA for the daSNVs, and the bottom loop track shows the looping data for N-D2 cell type. Boxplots depicting ratios of cDNA to plasmid counts for reference versus alternate allele for SNVS (b) rs28428768, (c) rs2442722, (d) rs35735140, and (e) rs3134944, where the center line is the median of each MPRA normalized ratio (n = 10 genomic instances each); box limits are the upper and lower quartiles, whiskers are the 1.5x interquartile range, and points shown are outliers. Ratios are normalized to the median reference value for each cell type. Significant associations found by MPRAnalyze (FDR < 0.05) are shown with an asterisk*.
Extended Data Fig. 5 Gene concordance for variant annotation approaches.
(a) Distribution of # daSNVs for a GTEx eGene annotations show eGenes are on average, linked to five daSNVs. (b) Density plot showing the distribution of daSNV-to-eGene distance with the mean depicted as a vertical red dotted line at 20kB. (c) Pie chart showing gene annotation concordance between the different annotation of daSNVs, indicating almost a half of GWAS gene annotations do not match expression or chromatin-based gene linkages. (d) Enrichment map made via ClusterProfiler showing GO Molecular Functions enriched in genes linked to daSNVs.
Extended Data Fig. 6 Association between serum magnesium levels and relative psychiatric disease incidence in a VA cohort.
(a) Relative disease prevalence for serum magnesium levels in the bottom 10th and upper 10th deciles. The 10th decile of serum magnesium are values < 1.6 mg/dL and the 90th decile of serum magnesium are values > 2.4 mg/dL. ** indicates significance between the two proportion based on a two-sided 2-proportion z-test FDR-corrected p < 0.05 for a given disease. (b) Relative prevalence of diseases by serum magnesium levels in the VA cohort. The above graph includes all patients age 45-85, n = 846795. The below graph removes all patients who were diagnosed with Alcohol Use Disorder, n = 618692. Cohort was partitioned by serum magnesium levels into 6 quantiles and the prevalence of each disease was calculated within the quantile. Relative prevalence is calculated as the prevalence normalized to the disease prevalence in the entire cohort. Significance is determined by linear regression with the null hypothesis beta = 0, with p-values < 0.10 shown in solid. Abbreviations of disease are as follows: ADHD = attention deficit hyperactivity disorder, PD = panic disorder, GAD = generalized anxiety disorder, BPD = bipolar disorder, MDD = major depressive disorder, OCD = obsessive compulsive disorder, SCZ = schizophrenia, AD = Alzheimer’s disease, CKD = chronic kidney disease.
Extended Data Fig. 7 RERE vignette.
(a) Tracks for gene RERE, where the MPRA peak tracks show the logFC change from cell-type specific MPRA for the daSNVs, and the bottom ATAC peak tracks show accessibility profiles for all cell types. Box-and-whiskers plots depicting ratios of cDNA to plasmid counts for reference versus alternate allele for daSNVs (b) rs301806, the SNV of interest and (c) rs301807, as comparison, where the center line is the median of each MPRA normalized ratio (each point is a genomic instance with at least one count), box limits are the upper and lower quartiles, whiskers are the 1.5x interquartile range, and points shown are outliers. Ratios are normalized to the median reference value for each cell type. Additionally, MotifBreakR results are shown for (d) rs301806 (above) and rs301807 (below), depicting loss of RUNX1 motif in rs301806, and no RUNX1 motif present at rs301807 loci. (e) ChIP PCR for the transcription factor RUNX1 with n= 4 replicates, * indicated significance of two-sided paired t-test p-value between the reference and alternate allele for the two SNPs.
Extended Data Fig. 8 CMAP drug perturbation analysis.
Drug-eGene networks for (a) SCZ, (b) BPD, and (c) MDD. Linkages between eGene to drug indicate that the drug significantly upregulated (red) or downregulates (blue) the expression of that gene in neuro-relevant cell lines in CMAP. Genes (diamonds) are outlined based on the MPRA log fold change direction (red: positive, blue: negative). Drugs (ellipses) are color coded by drug type. Drug-gene pairs towards the left side of the map indicate the MPRA and expression vectors point in the same direction (putatively side effect causing variants); drug-gene pairs towards the right side of the map indicate MPRA and expression vectors pointing in the opposite direction (putatively therapeutic effects).
Supplementary information
Supplementary Information
Supplementary Methods.
Supplementary Tables
Supplementary Tables 1–15.
Supplementary Data 1
LDSC analysis for hereditability.
Supplementary Data 2
Literature-derived daSNV gene annotations.
Supplementary Data 3
daSNV summary statistics and annotations.
Supplementary Data 4
Networks of shared putative pathomechanisms in neuropsychiatric disorders.
Supplementary Data 5
MPRA cell condition-specific summary statistics.
Rights and permissions
About this article
Cite this article
Guo, M.G., Reynolds, D.L., Ang, C.E. et al. Integrative analyses highlight functional regulatory variants associated with neuropsychiatric diseases. Nat Genet 55, 1876–1891 (2023). https://doi.org/10.1038/s41588-023-01533-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-023-01533-5