Abstract
Despite the strong genetic basis of psychiatric disorders, the underlying molecular mechanisms are largely unmapped. RNA-binding proteins (RBPs) are responsible for most post-transcriptional regulation, from splicing to translation to localization. RBPs thus act as key gatekeepers of cellular homeostasis, especially in the brain. However, quantifying the pathogenic contribution of noncoding variants impacting RBP target sites is challenging. Here, we leverage a deep learning approach that can accurately predict the RBP target site dysregulation effects of mutations and discover that RBP dysregulation is a principal contributor to psychiatric disorder risk. RBP dysregulation explains a substantial amount of heritability not captured by large-scale molecular quantitative trait loci studies and has a stronger impact than common coding region variants. We share the genome-wide profiles of RBP dysregulation, which we use to identify DDHD2 as a candidate schizophrenia risk gene. This resource provides a new analytical framework to connect the full range of RNA regulation to complex disease.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Towards in silico CLIP-seq: predicting protein-RNA interaction via sequence-to-signal learning
Genome Biology Open Access 04 August 2023
-
Prioritizing genes associated with brain disorders by leveraging enhancer-promoter interactions in diverse neural cells and tissues
Genome Medicine Open Access 24 July 2023
-
Alternative polyadenylation transcriptome-wide association study identifies APA-linked susceptibility genes in brain disorders
Nature Communications Open Access 03 February 2023
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout




Data availability
All variant predicted scores have been made available to download and as an interactive Web interface available at https://hb.flatironinstitute.org/seqweaver.
Code availability
The code used in this study is available at https://hb.flatironinstitute.org/seqweaver/about.
References
Lee, P. H. et al. Genomic relationships, novel loci, and pleiotropic mechanisms across eight psychiatric disorders. Cell 179, 1469–1482.e11 (2019).
Black, D. L. Mechanisms of alternative pre-messenger RNA splicing. Annu. Rev. Biochem. 72, 291–336 (2003).
Krichevsky, A. M. & Kosik, K. S. Neuronal RNA granules: a link between RNA localization and stimulation-dependent translation. Neuron 32, 683–696 (2001).
Guhaniyogi, J. & Brewer, G. Regulation of mRNA stability in mammalian cells. Gene 265, 11–23 (2001).
Costa-Mattioli, M., Sossin, W. S., Klann, E. & Sonenberg, N. Translational control of long-lasting synaptic plasticity and memory. Neuron 61, 10–26 (2009).
Darnell, R. B. RNA protein interaction in neurons. Annu. Rev. Neurosci. 36, 243–270 (2013).
Schuman, E. M. mRNA trafficking and local protein synthesis at the synapse. Neuron 23, 645–648 (1999).
Verkerk, A. J. et al. Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome. Cell 65, 905–914 (1991).
Lines, M. A. et al. Haploinsufficiency of a spliceosomal GTPase encoded by EFTUD2 causes mandibulofacial dysostosis with microcephaly. Am. J. Hum. Genet. 90, 369–377 (2012).
Bernier, F. P. et al. Haploinsufficiency of SF3B4, a component of the pre-mRNA spliceosomal complex, causes Nager syndrome. Am. J. Hum. Genet. 90, 925–933 (2012).
Messiaen, L. M. et al. Exhaustive mutation analysis of the NF1 gene allows identification of 95% of mutations and reveals a high frequency of unusual splicing defects. Hum. Mutat. 15, 541–555 (2000).
Xiong, H. Y. et al. The human splicing code reveals new insights into the genetic determinants of disease. Science 347, 1254806 (2015).
Li, Y. I. et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016).
Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548.e24 (2019).
Walker, R. L. et al. Genetic control of expression and splicing in developing human brain informs disease mechanisms. Cell 179, 750–771.e22 (2019).
Zhou, J. et al. Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk. Nat. Genet. 51, 973–980 (2019).
Ule, J. et al. CLIP identifies Nova-Regulated RNA networks in the brain. Science 302, 1212–1215 (2003).
Iossifov, I. et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Kim, Y. K. & Maquat, L. E. UPFront and center in RNA decay: UPF1 in nonsense-mediated mRNA decay and beyond. RNA 25, 407–422 (2019).
Hogg, J. R. & Goff, S. P. Upf1 senses 3′UTR length to potentiate mRNA decay. Cell 143, 379–389 (2010).
Anttila, V. et al. Analysis of shared heritability in common disorders of the brain. Science 360, eaap8757 (2018).
Laursen, T. M. et al. Family history of psychiatric illness as a risk factor for schizoaffective disorder: a Danish register-based cohort study. Arch. Gen. Psychiatry 62, 841–848 (2005).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).
Aguet, F. et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Demontis, D. et al. Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder. Nat. Genet. 51, 63–75 (2019).
Grove, J. et al. Identification of common genetic risk variants for autism spectrum disorder. Nat. Genet. 51, 431–444 (2019).
Ruderfer, D. M. et al. Genomic dissection of bipolar disorder and schizophrenia, including 28 subphenotypes. Cell 173, 1705–1715.e16 (2018).
Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668–681 (2018).
Ripke, S. et al. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
Kelly, T. J., Suzuki, H. I., Zamudio, J. R., Suzuki, M. & Sharp, P. A. Sequestration of microRNA-mediated target repression by the Ago2-associated RNA-binding protein FAM120A. RNA 25, 1291–1297 (2019).
Balak, C. et al. Rare de novo missense variants in RNA helicase DDX6 cause intellectual disability and dysmorphic features and lead to P-body defects and RNA dysregulation. Am. J. Hum. Genet. 105, 509–525 (2019).
Hormozdiari, F. et al. Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Nat. Genet. 50, 1041–1047 (2018).
Fromer, M. et al. Gene expression elucidates functional impact of polygenic risk for schizophrenia. Nat. Neurosci. 19, 1442–1453 (2016).
Chen, L. et al. Genetic drivers of epigenetic and transcriptional variation in human immune cells. Cell 167, 1398–1414.e24 (2016).
Lam, M. et al. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet. 51, 1670–1678 (2019).
Schork, A. J. et al. A genome-wide association study of shared risk across psychiatric disorders implicates gene regulation during fetal neurodevelopment. Nat. Neurosci. 22, 353–361 (2019).
Pardiñas, A. F. et al. Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection. Nat. Genet. 50, 381–389 (2018).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Karlsson Linnér, R. et al. Genome-wide association analyses of risk tolerance and risky behaviors in over 1 million individuals identify hundreds of loci and shared genetic influences. Nat. Genet. 51, 245–257 (2019).
Howard, D. M. et al. Genome-wide association study of depression phenotypes in UK Biobank identifies variants in excitatory synaptic pathways. Nat. Commun. 9, 1470 (2018).
de Leeuw, C. A., Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).
Gandal, M. J. et al. Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder. Science 362, eaat8127 (2018).
Shi, Y. et al. Common variants on 8p12 and 1q24.2 confer risk of schizophrenia. Nat. Genet. 43, 1224–1227 (2011).
Åberg, K. et al. Human QKI, a new candidate gene for schizophrenia involved in myelination. Am. J. Med. Genet. B Neuropsychiatr. Genet. 141B, 84–90 (2006).
Bhalala, O. G., Nath, A. P., Inouye, M., Sibley, C. R. & UK Brain Expression ConsortiumIdentification of expression quantitative trait loci associated with schizophrenia and affective disorders in normal brain tissue. PLoS Genet. 14, e1007607 (2018).
Dunham, I. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Inloes, J. M. et al. The hereditary spastic paraplegia-related enzyme DDHD2 is a principal brain triglyceride lipase. Proc. Natl Acad. Sci. USA 111, 14924–14929 (2014).
Finkel, R. S. et al. Nusinersen versus sham control in infantile-onset spinal muscular atrophy. N. Engl. J. Med. 377, 1723–1732 (2017).
de la Torre-Ubieta, L., Won, H., Stein, J. L. & Geschwind, D. H. Advancing the understanding of autism disease mechanisms through genetics. Nat. Med. 22, 345–361 (2016).
Fromer, M. et al. De novo mutations in schizophrenia implicate synaptic networks. Nature 506, 179–184 (2014).
Darnell, J. C. et al. FMRP stalls ribosomal translocation on mRNAs linked to synaptic function and autism. Cell 146, 247–261 (2011).
Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Supek, F., Miñana, B., Valcárcel, J., Gabaldón, T. & Lehner, B. Synonymous mutations frequently act as driver mutations in human cancers. Cell 156, 1324–1335 (2014).
Darnell, R. B. & Posner, J. B. Paraneoplastic syndromes involving the nervous system. N. Engl. J. Med. 349, 1543–1554 (2003).
Häsler, R. et al. Alterations of pre-mRNA splicing in human inflammatory bowel disease. Eur. J. Cell Biol. 90, 603–611 (2011).
Cummings, B. B. et al. Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci. Transl. Med. 9, eaal5209 (2017).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017).
Licatalosi, D. D. et al. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456, 464–469 (2008).
Yan, Q. et al. Systematic discovery of regulated and conserved alternative exons in the mammalian brain reveals NMD modulating chromatin regulators. Proc. Natl Acad. Sci. USA 112, 3445–3450 (2015).
Sorek, R. & Ast, G. Intronic sequences flanking alternatively spliced exons are conserved between human and mouse. Genome Res. 13, 1631–1637 (2003).
Lebedeva, S. et al. Transcriptome-wide analysis of regulatory interactions of the RNA-binding protein HuR. Mol. Cell 43, 340–352 (2011).
Lovci, M. T. et al. Rbfox proteins regulate alternative mRNA splicing through evolutionarily conserved RNA bridges. Nat. Struct. Mol. Biol. 20, 1434–1442 (2013).
Fortune, M. D. & Wallace, C. simGWAS: a fast method for simulation of large scale case-control GWAS summary statistics. Bioinformatics 35, 1901–1906 (2018).
Wang, D. et al. Comprehensive functional genomic resource and integrative model for the human brain. Science 362, eaat8464 (2018).
Villar, D. et al. Enhancer evolution across 20 mammalian species. Cell 160, 554–566 (2015).
Kang, H. J. et al. Spatio-temporal transcriptome of the human brain. Nature 478, 483–489 (2011).
Anney, R. J. L. et al. Meta-analysis of GWAS of over 16,000 individuals with autism spectrum disorder highlights a novel locus at 10q24.32 and a significant overlap with schizophrenia. Mol. Autism 8, 21 (2017).
Ripke, S. et al. A mega-analysis of genome-wide association studies for major depressive disorder. Mol. Psychiatry 18, 497–511 (2013).
Neale, B. M. et al. Meta-analysis of genome-wide association studies of attention-deficit/hyperactivity disorder. J. Am. Acad. Child Adolesc. Psychiatry 49, 884–897 (2010).
Sklar, P. et al. Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nat. Genet. 43, 977–983 (2011).
Hoffman, G. E. et al. CommonMind consortium provides transcriptomic and epigenomic data for schizophrenia and bipolar disorder. Sci. Data 6, 180 (2019).
Jaffe, A. E. et al. Developmental and genetic regulation of the human cortex transcriptome illuminate schizophrenia pathogenesis. Nat. Neurosci. 21, 1117–1125 (2018).
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
Skene, N. G. et al. Genetic identification of brain cell types underlying schizophrenia. Nat. Genet. 50, 825–833 (2018).
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285 (2015).
Thorvaldsdóttir, H., Robinson, J. T. & Mesirov, J. P. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinform. 14, 178–192 (2013).
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
Pimentel, H., Bray, N. L., Puente, S., Melsted, P. & Pachter, L. Differential analysis of RNA-seq incorporating quantification uncertainty. Nat. Methods 14, 687–690 (2017).
Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).
Acknowledgements
We thank Z. Zhang, A. Andersen and S. Lall for their help with the manuscript. We also thank all members of the Troyanskaya and Darnell laboratory for helpful discussions. This work is supported by National Institutes of Health grant nos. R01HG005998, U54HL117798 and R01GM071966, U.S. Department of Health and Human Services grant no. HHSN272201000054C and Simons Foundation grant no. 395506. O.G.T. is a senior fellow of the Genetic Networks program of the Canadian Institute for Advanced Research. We thank the Simons Foundation Autism Research Initiative, Simons Foundation and Flatiron Institute. A substantial portion of the work in this paper was performed at the Terascale Infrastructure for Groundbreaking Research in Science and Engineering high-performance computer center at Princeton University, which is jointly supported by the Princeton Institute for Computational Science and Engineering and the Princeton University Office of Information Technology’s Research Computing department.
Author information
Authors and Affiliations
Contributions
C.Y.P. and O.G.T. conceived the study. C.Y.P. designed the study, developed the computational methods and performed the analyses. J.Z., C.L.T. and R.B.D. contributed ideas and insights. A.K.W. and K.M.C. developed the Web interface/software. C.Y.P. and O.G.T. wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Genetics thanks Amalio Telenti and Thomas Werge for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Population genetics revels negative selection acting on RBP target site dysregulation.
a, Across the Seqweaver profiled RBPs, we observe differential selection signatures for variants when segregated by their RBP target site dysregulation levels. Specifically, for gnomAD cohort noncoding variants (MAF bins x-axis), mean RBP dysregulation (Y-axis) shows an inverse relation with allele frequency, consistent with significant negative selection acting on high impact RBP disrupting variants. b, The top RBPs previously implicated by their autism de novo mutation risk (Zhou, Park, Theesfeld et al.), all show significant negative selection signatures, consistent with selection impeding RBP impacting variants from reaching high population prevalence. P-values from Wald test for slope and all inferred mean RBP dysregulation scores were normalized by subtracting average dysregulation predicted scores of common variants (MAF > 0.05) for comparison (95% CI).
Extended Data Fig. 2 Regions with peak childhood stage expression shows the largest enrichment association with RBP dysregulation.
We test the overlap between prefrontal cortex brain differential expressed regions and RBP dysregulation SNPs (the top 0.5%) associated with each disorder in comparison to the genome-wide rate. We also plot the enrichment overlap for the subset of regions in which the expression was highest during childhood stage. All ORs have an enrichment p-value of P < 2.2 × 10−16. Error bars are 95% CI.
Extended Data Fig. 3 Cross-ancestry replication – RBP dysregulation effects replicate in an independent cohort.
Replication of estimated schizophrenia RBP dysregulation effect sizes (τ*, European Psychiatric Genomics Consortium (PGC)) when compared to estimates from an East Asian cohort (Lam et al). P-value computed using spearman rank test of RBP effect sizes.
Extended Data Fig. 4 RBP dysregulation effects for cross-disorder risk replicate in iPSYCH cohort.
Replication of estimated cross-disorder RBP dysregulation effect sizes (τ*, Psychiatric Genomics Consortium cohort) when compared to estimates from the iPSYCH cohort. P-value computed using spearman rank sum test of RBP effect sizes.
Extended Data Fig. 5 RBP dysregulation is a major contributor to human phenotypic variation.
The per-SNP heritability effect sizes (τ*) for each RBP dysregulation is plotted across a collection of psychiatric traits, brain-associated anthropomorphic traits and representative non-brain related phenotypes previously examined by the Brainstorm Consortium study. The dashed line indicates RBP models below FDR 0.05 threshold after multiple hypothesis correction (block jackknife-based one-sided p-values; Benjamini-Hochberg correction).
Extended Data Fig. 6 Heatmap showing patterns of correlated GWAS effect sizes between psychiatric disorders and behavioral-cognitive phenotypes for variants affecting RBP dysregulation.
For each pair of disorder and phenotype (x,y), we extracted the top RBP dysregulation set of variants that influence disorder x and their GWAS effect sizes on both x and y. We then calculated correlation between the GWAS effect sizes on x and the GWAS effect sizes on y, and tested whether this correlation was significantly different from zero. Stars represent statistical significance *** P < 0.001, ** P < 0.01, * P < 0.05.
Extended Data Fig. 7 Heatmap showing patterns of correlated GWAS effect sizes between psychiatric disorders for variants affecting RBP dysregulation.
For each pair of disorders (x,y), we extracted the top RBP dysregulation set of variants that influence disorder x and their GWAS effect sizes on both x and y. We then calculated correlation between the GWAS effect sizes on x and the GWAS effect sizes on y, and tested whether this correlation was significantly different from zero. Stars represent statistical significance ***P < 0.001, **P < 0.01, *P < 0.05.
Extended Data Fig. 8 Heritability enrichment for the collective RBP dysregulation effects in comparison to QTL and genomic functional annotations for schizophrenia.
The top 0.1%, 0.5%, 1% SNPs with the largest overall RBP dysregulation effects were compared to known molecular QTLs and gene/promoter annotations for their enrichment of heritability using PGC schizophrenia GWAS.
Extended Data Fig. 9 Estimated RBP dysregulation effects are robust after conditioning on conserved genomic elements.
The per-SNP heritability effect sizes (τ*) for each RBP dysregulation is plotted across the five major psychiatric disorders after inclusion of vertebrate, mammal and primate conserved phastCons elements to the conditioning baseline annotation set (including QTL annotations). The dashed line indicates RBP models below FDR 0.05 threshold after multiple hypothesis correction (jackknife one-sided p-values; Benjamini-Hochberg correction).
Supplementary information
Supplementary Information
Supplementary Figs. 1–9
Supplementary Tables
Supplementary Tables 1–6
Rights and permissions
About this article
Cite this article
Park, C.Y., Zhou, J., Wong, A.K. et al. Genome-wide landscape of RNA-binding protein target site dysregulation reveals a major impact on psychiatric disorder risk. Nat Genet 53, 166–173 (2021). https://doi.org/10.1038/s41588-020-00761-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-020-00761-3
This article is cited by
-
Genome-wide landscape of RNA-binding protein (RBP) networks as potential molecular regulators of psychiatric co-morbidities: a computational analysis
Egyptian Journal of Medical Human Genetics (2023)
-
Prioritizing genes associated with brain disorders by leveraging enhancer-promoter interactions in diverse neural cells and tissues
Genome Medicine (2023)
-
Towards in silico CLIP-seq: predicting protein-RNA interaction via sequence-to-signal learning
Genome Biology (2023)
-
Alternative polyadenylation transcriptome-wide association study identifies APA-linked susceptibility genes in brain disorders
Nature Communications (2023)
-
Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings
Nature Genetics (2023)