A genetic etiology is identified for one-third of patients with congenital heart disease (CHD), with 8% of cases attributable to coding de novo variants (DNVs). To assess the contribution of noncoding DNVs to CHD, we compared genome sequences from 749 CHD probands and their parents with those from 1,611 unaffected trios. Neural network prediction of noncoding DNV transcriptional impact identified a burden of DNVs in individuals with CHD (n = 2,238 DNVs) compared to controls (n = 4,177; P = 8.7 × 10−4). Independent analyses of enhancers showed an excess of DNVs in associated genes (27 genes versus 3.7 expected, P = 1 × 10−5). We observed significant overlap between these transcription-based approaches (odds ratio (OR) = 2.5, 95% confidence interval (CI) 1.1–5.0, P = 5.4 × 10−3). CHD DNVs altered transcription levels in 5 of 31 enhancers assayed. Finally, we observed a DNV burden in RNA-binding-protein regulatory sites (OR = 1.13, 95% CI 1.1–1.2, P = 8.8 × 10−5). Our findings demonstrate an enrichment of potentially disruptive regulatory noncoding DNVs in a fraction of CHD at least as high as that observed for damaging coding DNVs.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Nature Communications Open Access 08 September 2023
Copy number variation-associated lncRNAs may contribute to the etiologies of congenital heart disease
Communications Biology Open Access 17 February 2023
Genome-wide association and multi-trait analyses characterize the common genetic architecture of heart failure
Nature Communications Open Access 14 November 2022
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Documentation, links, and availability of source code and select supplementary data are detailed at https://github.com/frichter/wgs_chd_analysis. The DNV identification pipeline is available at https://github.com/ShenLab/igv-classifier and https://github.com/frichter/dnv_pipeline. The HeartENN algorithmic framework is available at https://github.com/FunctionLab/selene/archive/0.4.8.tar.gz. HeartENN model weights and scripts for burden tests are available at https://github.com/frichter/wgs_chd_analysis. All source code is distributed under the Massachusetts Institute of Technology license.
van der Linde, D. et al. Birth prevalence of congenital heart disease worldwide. J. Am. Coll. Cardiol. 58, 2241–2247 (2011).
Pediatric Cardiac Genomics Consortium et al.The Congenital Heart Disease Genetic Network Study: rationale, design, and early results. Circ. Res. 112, 698–706 (2013).
Zaidi, S. et al. De novo mutations in histone-modifying genes in congenital heart disease. Nature 498, 220–223 (2013).
Homsy, J. et al. De novo mutations in congenital heart disease with neurodevelopmental and other congenital anomalies. Science 350, 1262–1266 (2015).
Jin, S. C. et al. Contribution of rare inherited and de novo variants in 2,871 congenital heart disease probands. Nat. Genet. 49, 1593–1601 (2017).
Fischbach, G. D. & Lord, C. The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron 68, 192–195 (2010).
Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at https://arxiv.org/abs/1207.3907v2 (2012).
Richter, F. et al. Whole genome de novo variant identification with FreeBayes and neural network approaches. Preprint at bioRxiv https://doi.org/10.1101/2020.03.24.994160 (2020).
Zhou, J. et al. Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk. Nat. Genet. 51, 973–980 (2019).
An, J.-Y. et al. Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder. Science 362, eaat6576 (2018).
Jónsson, H. et al. Parental influence on human germline de novo mutations in 1,548 trios from Iceland. Nature 549, 519–522 (2017).
Goldmann, J. M. et al. Parent-of-origin-specific signatures of de novo mutations. Nat. Genet. 48, 935–939 (2016).
Seiden, A. H. et al. Elucidation of de novo small insertion/deletion biology with parent-of-origin phasing. Hum. Mutat. 41, 800–806 (2020).
Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
Bernstein, B. E. et al. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Mei, S. et al. Cistrome Data Browser: a data portal for ChIP–Seq and chromatin accessibility data in human and mouse. Nucleic Acids Res. 45, D658–D662 (2017).
He, A. et al. Dynamic GATA4 enhancers shape the chromatin landscape central to heart development and disease. Nat. Commun. 5, 4907 (2014).
Sayed, D., Yang, Z., He, M., Pfleger, J. M. & Abdellatif, M. Acute targeting of general transcription factor IIB restricts cardiac hypertrophy via selective inhibition of gene transcription. Circ. Heart Fail. 8, 138–148 (2015).
Stefanovic, S. et al. GATA-dependent regulatory switches establish atrioventricular canal specificity during heart development. Nat. Commun. 5, 3680 (2014).
Sayed, D., He, M., Yang, Z., Lin, L. & Abdellatif, M. Transcriptional regulation patterns revealed by high resolution chromatin immunoprecipitation during cardiac hypertrophy. J. Biol. Chem. 288, 2546–2558 (2013).
Zhang, L. et al. KLF15 establishes the landscape of diurnal expression in the heart. Cell Rep. 13, 2368–2375 (2015).
Anand, P. et al. BET bromodomains mediate transcriptional pause release in heart failure. Cell 154, 569–582 (2013).
Attanasio, C. et al. Tissue-specific SMARCA4 binding at active and repressed regulatory elements during embryogenesis. Genome Res. 24, 920–929 (2014).
Sakabe, N. J. et al. Dual transcriptional activator and repressor roles of TBX20 regulate adult cardiac structure and function. Hum. Mol. Genet. 21, 2194–2204 (2012).
Consortium, R. E. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
May, D. et al. Large-scale discovery of enhancers from human heart tissue. Nat. Genet. 44, 89–93 (2012).
Dickel, D. E. et al. Genome-wide compendium and functional assessment of in vivo heart enhancers. Nat. Commun. 7, 12923 (2016).
Nord, A. S. et al. Rapid and pervasive changes in genome-wide enhancer usage during mammalian development. Cell 155, 1521–1531 (2013).
Blow, M. J. et al. ChIP–Seq identification of weakly conserved heart enhancers. Nat. Genet. 42, 806–810 (2010).
Yue, F. et al. A comparative encyclopedia of DNA elements in the mouse genome. Nature 515, 355–364 (2014).
Shen, Y. et al. A map of the cis-regulatory sequences in the mouse genome. Nature 488, 116–120 (2012).
van den Boogaard, M. et al. Genetic variation in T-box binding element functionally affects SCN5A/SCN10A enhancer. J. Clin. Invest. 122, 2519–2530 (2012).
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning–based sequence model. Nat. Methods 12, 931–934 (2015).
Huang, Y.-F., Gulko, B. & Siepel, A. Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data. Nat. Genet. 49, 618–624 (2017).
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).
Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019).
Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).
Ritchie, G. R. S., Dunham, I., Zeggini, E. & Flicek, P. Functional annotation of noncoding sequence variants. Nat. Methods 11, 294–296 (2014).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Melnikov, A., Zhang, X., Rogov, P., Wang, L. & Mikkelsen, T. S. Massively parallel reporter assays in cultured mammalian cells. J. Vis. Exp. https://doi.org/10.3791/51719 (2014).
Werling, D. M. et al. An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder. Nat. Genet. 50, 727–736 (2018).
Turner, T. N. et al. Genomic patterns of de novo mutation in simplex autism. Cell 171, 710–722.e12 (2017).
C Yuen, R. K. et al. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder. Nat. Neurosci. 20, 602–611 (2017).
Hamdan, F. F. et al. High rate of recurrent de novo mutations in developmental and epileptic encephalopathies. Am. J. Hum. Genet. 101, 664–685 (2017).
Peacock, J. D., Lu, Y., Koch, M., Kadler, K. E. & Lincoln, J. Temporal and spatial expression of collagens during murine atrioventricular heart valve development and maintenance. Dev. Dyn. 237, 3051–3058 (2008).
Kurosaka, S. et al. Arginylation regulates myofibrils to maintain heart function and prevent dilated cardiomyopathy. J. Mol. Cell. Cardiol. 53, 333–341 (2012).
Kleffmann, W. et al. 5q31 microdeletions: definition of a critical region and analysis of LRRTM2, a candidate gene for intellectual disability. Mol. Syndromol. 3, 68–75 (2012).
Mehta, G. et al. MITF interacts with the SWI/SNF subunit, BRG1, to promote GATA4 expression in cardiac hypertrophy. J. Mol. Cell. Cardiol. 88, 101–110 (2015).
Tshori, S. et al. Transcription factor MITF regulates cardiac growth and hypertrophy. J. Clin. Invest. 116, 2673–2681 (2006).
Nicholson, T. B. et al. A hypomorphic lsd1 allele results in heart development defects in mice. PLoS One 8, e60913 (2013).
Hamidi, T. et al. Identification of Rpl29 as a major substrate of the lysine methyltransferase Set7/9. J. Biol. Chem. 293, 12770–12780 (2018).
Siggs, O. M. et al. Mutation of Fnip1 is associated with B-cell deficiency, cardiomyopathy, and elevated AMPK activity. Proc. Natl Acad. Sci. USA 113, E3706–E3715 (2016).
Chen, C.-Y. et al. Accumulation of the inner nuclear envelope protein Sun1 is pathogenic in progeric and dystrophic laminopathies. Cell 149, 565–577 (2012).
Meinke, P. et al. Muscular dystrophy-associated SUN1 and SUN2 variants disrupt nuclear-cytoskeletal connections and myonuclear organization. PLoS Genet. 10, e1004605 (2014).
Röseler, S. et al. Lethal phenotype of mice carrying a Sept11 null mutation. Biol. Chem. 392, 779–781 (2011).
Guo, A. et al. E–C coupling structural protein junctophilin-2 encodes a stress-adaptive transcription regulator. Science 362, eaan3303 (2018).
Yamagishi, H. et al. A history and interaction of outflow progenitor cells implicated in “Takao Syndrome.” In Etiology and Morphogenesis of Congenital Heart Disease: From Gene Function and Cellular Interaction to Morphology (eds. Nakanishi, T. et al.) 201–209 (Springer, 2016).
Masuda, T. & Taniguchi, M. Congenital diseases and semaphorin signaling: overview to date of the evidence linking them. Congenit. Anom. (Kyoto). 55, 26–30 (2015).
Pierpont, M. E. et al. Genetic basis for congenital heart disease: revisited: a scientific statement from the American Heart Association. Circulation 138, e653–e711 (2018).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Van der Auwera, G. et al. From FastQ data to high‐confidence variant calls: the genome analysis toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.1–11.10.33 (2013).
Kim, B.-Y., Park, J. H., Jo, H.-Y., Koo, S. K. & Park, M.-H. Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data. PLoS One 12, e0182272 (2017).
Bailey, J. A., Yavor, A. M., Massa, H. F., Trask, B. J. & Eichler, E. E. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res. 11, 1005–1017 (2001).
Derrien, T. et al. Fast computation and applications of genome mappability. PLoS One 7, e30377 (2012).
Li, H. Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30, 2843–2851 (2014).
Ostrander, B. E. P. et al. Whole-genome analysis for effective clinical diagnosis and gene discovery in early infantile epileptic encephalopathy. NPJ Genom. Med. 3, 22 (2018).
Blake, J. A. et al. Mouse Genome Database (MGD)-2017: community knowledge resource for the laboratory mouse. Nucleic Acids Res. 45, D723–D729 (2017).
Chen, K. M., Cofer, E. M., Zhou, J. & Troyanskaya, O. G. et al. Selene: a PyTorch-based deep learning library for sequence data. Nat. Methods 16, 315–318 (2019).
Price, A. L. et al. Pooled association tests for rare variants in exon-resequencing studies. Am. J. Hum. Genet. 86, 832–838 (2010).
Lian, X. et al. Directed cardiomyocyte differentiation from human pluripotent stem cells by modulating Wnt/β-catenin signaling under fully defined conditions. Nat. Protoc. 8, 162–175 (2013).
Buenrostro, J. D., Wu, B., Chang, H. Y. & Greenleaf, W. J. ATAC–seq: a method for assaying chromatin accessibility genome-wide. Curr. Protoc. Mol. Biol. 109, 21.29.1–21.29.9 (2015).
Corces, M. R. et al. An improved ATAC–seq protocol reduces background and enables interrogation of frozen tissues. Nat. Methods 14, 959–962 (2017).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Yu, G., Wang, L.-G. & He, Q.-Y. ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics 31, 2382–2383 (2015).
Spurrell, C. H. et al. Genome-wide fetalization of enhancer architecture in heart disease. Preprint at bioRxiv https://doi.org/10.1101/591362 (2019).
Sharma, A., Toepfer, C. N., Schmid, M., Garfinkel, A. C. & Seidman, C. E. Differentiation and contractile analysis of GFP-sarcomere reporter hiPSC-cardiomyocytes. Curr. Protoc. Hum. Genet. 96, 21.12.1–21.12.12 (2018).
Shah, A., Qian, Y., Weyn-Vanhentenryck, S. M. & Zhang, C. CLIP Tool Kit (CTK): a flexible and robust pipeline to analyze CLIP sequencing data. Bioinformatics 33, 566–567 (2017).
Feng, H. et al. Modeling RNA-binding protein specificity in vivo by precisely registering protein-RNA crosslink sites. Mol. Cell 74, 1189–1204.e6 (2019).
We are enormously grateful to the patients and families who participated in this research. We thank the following for patient recruitment: A. Julian, M. MacNeal, Y. Mendez, T. Mendiz-Ramdeen and C. Mintz (Icahn School of Medicine at Mount Sinai); N. Cross (Yale School of Medicine); J. Ellashek and N. Tran (Children’s Hospital of Los Angeles); B. McDonough, J. Geva and M. Borensztein (Harvard Medical School); K. Flack, L. Panesar and N. Taylor (University College London); E. Taillie (University of Rochester School of Medicine and Dentistry); S. Edman, J. Garbarini, J. Tusi and S. Woyciechowski (Children’s Hospital of Philadelphia); D. Awad, C. Breton, K. Celia, C. Duarte, D. Etwaru, N. Fishman, E. Griffin, M. Kaspakoval, J. Kline, R. Korsin, A. Lanz, E. Marquez, D. Queen, A. Rodriguez, J. Rose, J. K. Sond, D. Warburton, A. Wilpers and R. Yee (Columbia Medical School); D. Gruber (Cohen Children’s Medical Center, Northwell Health). These data were generated by the PCGC, under the auspices of the Bench to Bassinet Program (https://benchtobassinet.com) of the NHLBI. The results analyzed and published here are based in part on data generated by Gabriella Miller Kids First Pediatric Research Program projects phs001138.v1.p2/phs001194.v1.p2, and were accessed from the Kids First Data Resource Portal (https://kidsfirstdrc.org/) and/or dbGaP (www.ncbi.nlm.nih.gov/gap). This manuscript was prepared in collaboration with investigators of the PCGC and has been reviewed and/or approved by the PCGC. PCGC investigators are listed at https://benchtobassinet.com/?page_id=119. This work was supported in part through the computational resources and staff expertise provided by Scientific Computing at the Icahn School of Medicine at Mount Sinai. We are grateful to all of the families at the participating Simons Simplex Collection (SSC) sites, as well as the principal investigators (A. Beaudet, R. Bernier, J. Constantino, E. Cook, E. Fombonne, D. Geschwind, R. Goin-Kochel, E. Hanson, D. Grice, A. Klin, D. Ledbetter, C. Lord, C. Martin, D. Martin, R. Maxim, J. Miles, O. Ousley, K. Pelphrey, B. Peterson, J. Piggot, C. Saulnier, M. State, W. Stone, J. Sutcliffe, C. Walsh, Z. Warren and E. Wijsman). We appreciate the access obtained to phenotypic and/or genetic data on SFARI Base. Approved researchers can obtain the SSC population dataset described in this study (https://www.sfari.org/resource/simons-simplex-collection) by applying at https://base.sfari.org. This work was supported by the Mount Sinai Medical Scientist Training Program (5T32GM007280 to F.R.), National Institute of Dental and Craniofacial Research Interdisciplinary Training in Systems and Developmental Biology and Birth Defects (T32HD075735 to F.R.), Harvard Medical School Epigenetic and Gene Dynamics Award (S.U.M. and C.E.S.), American Heart Association Post-Doctoral Fellowship (S.U.M.), and Howard Hughes Medical Institute (C.E.S.). Research conducted at the E.O. Lawrence Berkeley National Laboratory was supported by National Institutes of Health (NIH) grants (UM1HL098166 and R24HL123879) and performed under Department of Energy Contract DE-AC02-05CH11231, University of California. O.T. is a CIFAR fellow and this work was partially supported by NIH grant R01GM071966. The PCGC program is funded by the NHLBI, NIH, US Department of Health and Human Services through grants UM1HL128711, UM1HL098162, UM1HL098147, UM1HL098123, UM1HL128761 and U01HL131003. The PCGC Kids First study includes data sequenced by the Broad Institute (U24 HD090743-01).
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Overlaps with DNVs identified in 1,470 control trios with two other pipelines9,10. Of note, a third analysis of these trios did not include de novo calls42. For consistency with other pipelines, only SNVs were included and variants in LCRs, blacklists, segmental duplications, and repeats were excluded. Together, 94% of de novo SNVs were called by at least one other pipeline.
Multiple linear regression (βpaternal_agex + βmaternal_agex + βintercept + ε) was fitted on 763 CHD and 1,611 unaffected individuals to calculate the associations between paternal and maternal age for SNVs, indels, and combined. Regression coefficients and P-values are shown, uncorrected for multiple hypotheses. Sequencing metric comparisons between the centers, colored by cases (n = 763) and controls (n = 1,611), found moderate bias in DNV quantity, so the background statistical parameter throughout the manuscript is total number of DNVs. Box plots show medians and interquartile ranges.
The number of DNVs in 184 noncoding annotations (points) genome-wide and within 10 kb of TSSs for 6 gene sets (facets) was counted in CHD (n = 749) and Simons unaffected (n = 1,611) individuals. The P value threshold (1.5 x 10-4, horizontal blue line) is 0.05 divided by the product of the number of effective annotations (n = 47) and number of gene sets (n = 7). The P value (y-axis) was calculated with a two-sided Fisher’s exact test, the odds ratio (x-axis) was DNVsannotation,CHD/DNVstotal,CHD vs. DNVsannotation,unaffected/DNVstotal, unaffected. No annotations surpassed the P value threshold. CHD, congenital heart disease; HHE, high heart expression.
HearENN ROC AUC mean = 0.93 and AUPRC mean = 0.34. ROC AUC, receiver operator characteristics area under the curve; AUPRC, area under the precision recall curve.
a, Comparison of HGMD disease mutations (blue, n = 1,564) and polymorphism (gray, n = 642) DeepSEA absolute functional difference scores at varying functional cut-offs illustrates a similar distribution and functionally impactful range ≥0.1 (arrow) for disease mutations. No statistical significance testing was performed. b, The similarity of null distributions for DeepSEA (gray, downsampled to 184 features) and HeartENN (heart) HGMD polymorphism scores suggested that the DeepSEA functional score range was also applicable to HeartENN (gray and red n = 642). Scores of 0 set off to left (as 10-4).
For all DNVs (n = 170,171), overlap between HeartENN ≥0.1 (n = 6,415) and other noncoding scores was assessed with a two-sided Fisher’s exact test (left panel). Case–control burden for these other noncoding scores (right panel) was statistically significant for CADD ≥15 (PBonferroni = 0.019) with a two-sided Fisher’s exact test (cases n = 56,164 and controls n = 114,065). For both panels, unadjusted P-values are tabulated, and red indicates a Benjamini-Hochberg-adjusted P value false discovery rate (FDR) < 0.05.
Extended Data Fig. 7 Relationship between sequence length inserted into the pMRPA1 plasmid and the transcript reads/plasmid copies in MPRAs.
The length of the sequences inserted into the pMPRA1 plasmid (x-axis) ranged from 300 to 1,600 bp. After transfection of four libraries (color coded as per key) into the iPSC–CMs, the resulting ratios of transcript reads (mRNA) per plasmid copies (DNA) are graphed on the y-axis, showing no systematic relationship between insert length and transcriptional level.
Box plots for two DNVs for which two MPRA replicates were significantly different but overall statistical significance across all replicates was not attained. Boxplots show the median fold change (FC), first and third quartiles (lower and upper hinges), and range of values (whiskers and outlying points). Statistical significance was assessed with two-sided t-test Benjamini-Hochberg-adjusted P-values. Each boxplot has at least 3 independent experiments with 4 technical replicates each.
The fraction was calculated separately within CHD and unaffected subjects for each of the three methods (including overlaps) and the total number of variants in each group (right table).
a, Enrichment of DNVs with predicted functional impacts (score ≥0.1) for HeartENN (left) and DeepSEA (right) within phenotype subgroups. b, Enrichment of de novo SNVs with H3K36me3 marks implicated in RNA-binding protein disruption in different subgroups for the most significant (left) and highest effect size (right) hits. Both a and b were performed with a two-sided Fisher’s exact test (unadjusted P-values and 95% C.I.s shown) comparing the fraction of DNVs in each subgroup (HeartENN ≥ 0.1, DeepSEA ≥ 0.1, etc.) to the same control cohort. For HeartENN, there were n = 4,177 control DNVs with HeartENN ≥ 0.1 and n = 109,888 control DNVs with HeartENN < 0.1. NDD, neurodevelopmental disorder; ECA, extracardiac anomaly.
About this article
Cite this article
Richter, F., Morton, S.U., Kim, S.W. et al. Genomic analyses implicate noncoding de novo variants in congenital heart disease. Nat Genet 52, 769–777 (2020). https://doi.org/10.1038/s41588-020-0652-z
This article is cited by
Nature Communications (2023)
Copy number variation-associated lncRNAs may contribute to the etiologies of congenital heart disease
Communications Biology (2023)
Heterozygous rare variants in NR2F2 cause a recognizable multiple congenital anomaly syndrome with developmental delays
European Journal of Human Genetics (2023)
Nature Cardiovascular Research (2022)
Nature Reviews Cardiology (2022)