Allele-specific NKX2-5 binding underlies multiple genetic associations with human electrocardiographic traits

Abstract

The cardiac transcription factor (TF) gene NKX2-5 has been associated with electrocardiographic (EKG) traits through genome-wide association studies (GWASs), but the extent to which differential binding of NKX2-5 at common regulatory variants contributes to these traits has not yet been studied. We analyzed transcriptomic and epigenomic data from induced pluripotent stem cell-derived cardiomyocytes from seven related individuals, and identified ~2,000 single-nucleotide variants associated with allele-specific effects (ASE-SNVs) on NKX2-5 binding. NKX2-5 ASE-SNVs were enriched for altered TF motifs, for heart-specific expression quantitative trait loci and for EKG GWAS signals. Using fine-mapping combined with epigenomic data from induced pluripotent stem cell–derived cardiomyocytes, we prioritized candidate causal variants for EKG traits, many of which were NKX2-5 ASE-SNVs. Experimentally characterizing two NKX2-5 ASE-SNVs (rs3807989 and rs590041) showed that they modulate the expression of target genes via differential protein binding in cardiac cells, indicating that they are functional variants underlying EKG GWAS signals. Our results show that differential NKX2-5 binding at numerous regulatory variants across the genome contributes to EKG phenotypes.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Generation and characterization of iPSCs and iPSC-CMs by gene expression and epigenetic profiling.
Fig. 2: Identification of coordinated ASEs in gene expression, H3K27 acetylation, chromatin accessibility and NKX2-5 binding in iPSCs and iPSC-CMs.
Fig. 3: TF binding motifs are altered by SNVs with ASEs in NKX2-5 ChIP-Seq.
Fig. 4: Enrichment of ChIP-Seq ASE variants for known QTLs.
Fig. 5: Enrichment of NKX2-5 SNVs at GWAS loci, and validation of rs590041 as a regulatory variant in the SSBP3 locus for P-wave duration.
Fig. 6: Prioritization of candidate causal variants at heart rate loci using fgwas.
Fig. 7: Functional characterization of rs3807989 as candidate causal variants for PR interval and atrial fibrillation.

Data availability

All iPSC lines are available through the WiCell Research Institute (www.wicell.org; NHLBI Next Gen Collection). All genomic data are available through the database of Genotypes and Phenotypes (accessions phs000924 (RNA-Seq, ChIP-Seq, ATAC-Seq and Hi-C) and phs001325 (whole-genome-sequenced SNV and copy number variation genotypes)) and National Center for Biotechnology Information BioProject PRJNA285375. Processed data files are available through Gene Expression Omnibus accessions GSE125540 and GSE133833.

Code availability

Custom-written code is available via GitHub (https://github.com/frazer-lab/NKX2-5_ASE_iPSC-CM).

References

  1. 1.

    MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45, D896–D901 (2017).

  2. 2.

    Gallagher, M. D. & Chen-Plotkin, A. S. The post-GWAS era: from association to function. Am. J. Hum. Genet. 102, 717–730 (2018).

  3. 3.

    Van den Boogaard, M. et al. A common genetic variant within SCN10A modulates cardiac SCN5A expression. J. Clin. Invest. 124, 1844–1852 (2014).

  4. 4.

    Wang, X. et al. Discovery and validation of sub-threshold genome-wide association study loci using epigenomic signatures. eLife 5, e10557 (2016).

  5. 5.

    Deplancke, B., Alpern, D. & Gardeux, V. The genetics of transcription factor DNA binding variation. Cell 166, 538–554 (2016).

  6. 6.

    Pai, A. A., Pritchard, J. K. & Gilad, Y. The genetic and mechanistic basis for variation in gene regulation. PLoS Genet. 11, e1004857 (2015).

  7. 7.

    Maurano, M. T. et al. Large-scale identification of sequence variants influencing human transcription factor occupancy in vivo. Nat. Genet. 47, 1393–1401 (2015).

  8. 8.

    He, A., Kong, S. W., Ma, Q. & Pu, W. T. Co-occupancy by multiple cardiac transcription factors identifies transcriptional enhancers active in heart. Proc. Natl Acad. Sci. USA 108, 5632–5637 (2011).

  9. 9.

    Schlesinger, J. et al. The cardiac transcription network modulated by Gata4, Mef2a, Nkx2.5, Srf, histone modifications, and microRNAs. PLoS Genet. 7, e1001313 (2011).

  10. 10.

    Luna-Zurita, L. et al. Complex interdependence regulates heterotypic transcription factor distribution and coordinates cardiogenesis. Cell 164, 999–1014 (2016).

  11. 11.

    Ang, Y. S. et al. Disease model of GATA4 mutation reveals transcription factor cooperativity in human cardiogenesis. Cell 167, 1734–1749.e22 (2016).

  12. 12.

    Kathiresan, S. & Srivastava, D. Genetics of human cardiovascular disease. Cell 148, 1242–1257 (2012).

  13. 13.

    Pfeufer, A. et al. Genome-wide association study of PR interval. Nat. Genet. 42, 153–159 (2010).

  14. 14.

    Verweij, N. et al. Genetic determinants of P wave duration and PR segment. Circ. Cardiovasc. Genet. 7, 475–481 (2014).

  15. 15.

    Den Hoed, M. et al. Identification of heart rate-associated loci and their effects on cardiac conduction and rhythm disorders. Nat. Genet. 45, 621–631 (2013).

  16. 16.

    Nielsen, J. B. et al. Genome-wide study of atrial fibrillation identifies seven risk loci and highlights biological pathways and regulatory elements involved in cardiac development. Am. J. Hum. Genet. 102, 103–115 (2018).

  17. 17.

    Van Setten, J. et al. PR interval genome-wide association meta-analysis identifies 50 loci associated with atrial and atrioventricular electrical activity. Nat. Commun. 9, 2904 (2018).

  18. 18.

    Panopoulos, A. D. et al. Aberrant DNA methylation in human iPSCs associates with MYC-binding motifs in a clone-specific manner independent of genetics. Cell Stem Cell 20, 505–517.e6 (2017).

  19. 19.

    Carcamo-Orive, I. et al. Analysis of transcriptional variability in a large human iPSC library reveals genetic and non-genetic determinants of heterogeneity. Cell Stem Cell 20, 518–532.e9 (2017).

  20. 20.

    Kilpinen, H. et al. Common genetic variation drives molecular heterogeneity in human iPSCs. Nature 546, 370–375 (2017).

  21. 21.

    DeBoever, C. et al. Large-scale profiling reveals the influence of genetic variation on gene expression in human induced pluripotent stem cells. Cell Stem Cell 20, 533–546.e7 (2017).

  22. 22.

    Banovich, N. E. et al. Impact of regulatory variation across human iPSCs and differentiated cells. Genome Res. 28, 122–131 (2018).

  23. 23.

    Pashos, E. E. et al. Large, diverse population cohorts of hiPSCs and derived hepatocyte-like cells reveal functional genetic variation at blood lipid-associated loci. Cell Stem Cell 20, 558–570.e10 (2017).

  24. 24.

    Schwartzentruber, J. et al. Molecular and functional variation in iPSC-derived sensory neurons. Nat. Genet. 50, 54–61 (2018).

  25. 25.

    He, J. Q., Ma, Y., Lee, Y., Thomson, J. A. & Kamp, T. J. Human embryonic stem cells develop into multiple types of cardiac myocytes: action potential characterization. Circ. Res. 93, 32–39 (2003).

  26. 26.

    D’Antonio-Chronowska, A. et al. Human iPSC gene signatures and X chromosome dosage impact response to WNT inhibition and cardiac differentiation fate. Stem Cell Rep. (in the press).

  27. 27.

    Burridge, P. W. et al. Chemically defined generation of human cardiomyocytes. Nat. Methods 11, 855–860 (2014).

  28. 28.

    Panopoulos, A. D. et al. iPSCORE: a resource of 222 iPSC lines enabling functional characterization of genetic variation across a variety of cell types. Stem Cell Rep. 8, 1086–1100 (2017).

  29. 29.

    Kilpinen, H. et al. Coordinated effects of sequence variation on DNA binding, chromatin structure, and transcription. Science 342, 744–747 (2013).

  30. 30.

    Dupays, L. et al. Sequential binding of MEIS1 and NKX2-5 on the Popdc2 gene: a mechanism for spatiotemporal regulation of enhancers during cardiogenesis. Cell Rep. 13, 183–195 (2015).

  31. 31.

    Prall, O. W. et al. An Nkx2-5/Bmp2/Smad1 negative feedback loop controls heart progenitor specification and proliferation. Cell 128, 947–959 (2007).

  32. 32.

    Degner, J. F. et al. DNase I sensitivity QTLs are a major determinant of human expression variation. Nature 482, 390–394 (2012).

  33. 33.

    Ward, L. D. & Kellis, M. HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res. 40, D930–D934 (2012).

  34. 34.

    GTEx Consortium et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

  35. 35.

    Roselli, C. et al. Multi-ethnic genome-wide association study for atrial fibrillation. Nat. Genet. 50, 1225–1233 (2018).

  36. 36.

    Nielsen, J. B. et al. Biobank-driven genomic discovery yields new insight into atrial fibrillation biology. Nat. Genet. 50, 1234–1239 (2018).

  37. 37.

    Christophersen, I. E. et al. Fifteen genetic loci associated with the electrocardiographic P wave. Circ. Cardiovasc. Genet. 10, e001667 (2017).

  38. 38.

    Pickrell, J. K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559–573 (2014).

  39. 39.

    Eppinga, R. N. et al. Identification of genomic loci associated with resting heart rate and shared genetic predictors with all-cause mortality. Nat. Genet. 48, 1557–1563 (2016).

  40. 40.

    Butler, A. M. et al. Novel loci associated with PR interval in a genome-wide association study of 10 African American cohorts. Circ. Cardiovasc. Genet. 5, 639–646 (2012).

  41. 41.

    Sano, M. et al. Genome-wide association study of electrocardiographic parameters identifies a new association for PR interval and confirms previously reported associations. Hum. Mol. Genet. 23, 6668–6676 (2014).

  42. 42.

    Arking, D. E. et al. Genetic association study of QT interval highlights role for calcium signaling pathways in myocardial repolarization. Nat. Genet. 46, 826–836 (2014).

  43. 43.

    Holm, H. et al. Several common variants modulate heart rate, PR interval and QRS duration. Nat. Genet. 42, 117–122 (2010).

  44. 44.

    Ritchie, M. D. et al. Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk. Circulation 127, 1377–1385 (2013).

  45. 45.

    Van den Boogaard, M. et al. Genetic variation in T-box binding element functionally affects SCN5A/SCN10A enhancer. J. Clin. Invest. 122, 2519–2530 (2012).

  46. 46.

    Greenwald, W. W. et al. Subtle changes in chromatin loop contact propensity are associated with differential gene regulation and expression. Nat. Commun. 10, 1054 (2019).

  47. 47.

    Christophersen, I. E. et al. Large-scale analyses of common and rare variants identify 12 new loci associated with atrial fibrillation. Nat. Genet. 49, 946–952 (2017).

  48. 48.

    Ramasamy, A. et al. Genetic variability in the regulation of gene expression in ten regions of the human brain. Nat. Neurosci. 17, 1418–1428 (2014).

  49. 49.

    Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

  50. 50.

    Samee, M. A. H., Bruneau, B. G. & Pollard, K. S. A de novo shape motif discovery algorithm reveals preferences of transcription factors for DNA shape beyond sequence motifs. Cell Syst. 8, 27–42.e6 (2019).

  51. 51.

    Afek, A., Schipper, J. L., Horton, J., Gordan, R. & Lukatsky, D. B. Protein–DNA binding in the absence of specific base-pair recognition. Proc. Natl Acad. Sci. USA 111, 17140–17145 (2014).

  52. 52.

    Slattery, M. et al. Absence of a simple code: how transcription factors read the genome. Trends Biochem. Sci. 39, 381–399 (2014).

  53. 53.

    Heinz, S. et al. Effect of natural genetic variation on enhancer selection and function. Nature 503, 487–492 (2013).

  54. 54.

    Johnson, A. D. et al. SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics 24, 2938–2939 (2008).

  55. 55.

    Hong, K. W. et al. Identification of three novel genetic variations associated with electrocardiographic traits (QRS duration and PR interval) in East Asians. Hum. Mol. Genet. 23, 6659–6667 (2014).

  56. 56.

    Van der Harst, P. et al. 52 genetic loci influencing myocardial mass. J. Am. Coll. Cardiol. 68, 1435–1448 (2016).

  57. 57.

    Evans, D. S. et al. Fine-mapping, novel loci identification, and SNP association transferability in a genome-wide association study of QRS duration in African Americans. Hum. Mol. Genet. 25, 4350–4368 (2016).

  58. 58.

    Ellinor, P. T. et al. Meta-analysis identifies six new susceptibility loci for atrial fibrillation. Nat. Genet. 44, 670–675 (2012).

  59. 59.

    Jeff, J. M. et al. Generalization of variants identified by genome-wide association studies for electrocardiographic traits in African Americans. Ann. Hum. Genet. 77, 321–332 (2013).

  60. 60.

    Bezzina, C. R. et al. Common variants at SCN5A-SCN10A and HEY2 are associated with Brugada syndrome, a rare disease with high risk of sudden cardiac death. Nat. Genet. 45, 1044–1049 (2013).

  61. 61.

    Ban, H. et al. Efficient generation of transgene-free human induced pluripotent stem cells (iPSCs) by temperature-sensitive Sendai virus vectors. Proc. Natl Acad. Sci. USA 108, 14234–14239 (2011).

  62. 62.

    Lian, X. et al. Directed cardiomyocyte differentiation from human pluripotent stem cells by modulating Wnt/β-catenin signaling under fully defined conditions. Nat. Protoc. 8, 162–175 (2013).

  63. 63.

    Tohyama, S. et al. Distinct metabolic flow enables large-scale purification of mouse and human pluripotent stem cell-derived cardiomyocytes. Cell Stem Cell 12, 127–137 (2013).

  64. 64.

    Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

  65. 65.

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

  66. 66.

    Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J. & Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 31, 2032–2034 (2015).

  67. 67.

    Tischler, G. & Leonard, S. biobambam: tools for read pair collation based algorithms on BAM files. Source Code Biol. Med. 9, 13 (2014).

  68. 68.

    DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

  69. 69.

    McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

  70. 70.

    Dobin, A. et al. STAR: ultrafast universal RNA-Seq aligner. Bioinformatics 29, 15–21 (2013).

  71. 71.

    Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

  72. 72.

    Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).

  73. 73.

    Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).

  74. 74.

    Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).

  75. 75.

    Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).

  76. 76.

    Machanick, P. & Bailey, T. L. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696–1697 (2011).

  77. 77.

    Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013).

  78. 78.

    Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biol. 15, 550 (2014).

  79. 79.

    Van de Geijn, B., McVicker, G., Gilad, Y. & Pritchard, J. K. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat. Methods 12, 1061–1063 (2015).

  80. 80.

    Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.1-33 (2013).

  81. 81.

    Mayba, O. et al. MBASED: allele-specific expression detection in cancer tissues and cell lines. Genome Biol. 15, 405 (2014).

  82. 82.

    GTEx Consortium The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

  83. 83.

    Whitlock, M. C. Combining probability from independent tests: the weighted Z-method is superior to Fisher’s approach. J. Evol. Biol. 18, 1368–1373 (2005).

  84. 84.

    Schmidt, E. M. et al. GREGOR: evaluating global enrichment of trait-associated variants in epigenomic features using a systematic, data-driven approach. Bioinformatics 31, 2601–2606 (2015).

  85. 85.

    Stegle, O., Parts, L., Durbin, R. & Winn, J. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput. Biol. 6, e1000770 (2010).

Download references

Acknowledgements

This work was supported in part by California Institute for Regenerative Medicine grant (CIRM) GC1R-06673-B, NIH grants HG008118 and HL107442, and National Science Foundation grant 1728497. P.B. was supported by the Swiss National Science Foundation Postdoc Mobility fellowships P2LAP3-155105 and P300PA-167612. W.W.Y.G. was supported by the NHLBI under award number HL142151. C.D. was supported in part by the UCSD Genetics Training Program through an institutional training grant from the NIGMS under award number GM008666 and the CIRM Interdisciplinary Stem Cell Training Program at UCSD II (TG2-01154). Library preparation and sequencing services were conducted by K. Jepsen and M. Khosroheidari at the UCSD IGM Genomics Center, supported by NIH grant CA023100. N.S. was supported by NIH grants HL116747 and HL141989. K.J.G. was supported by NIH grant DK114650 and ADA grant 1-17-JDF-027. W.M., F.Y. and M.G.R. were supported by NIH grants DK018477 and DK039949. M.G.R. is a HHMI investigator. We are thankful to C.-A. Yen and N. Spann for assistance with the ChIP-Seq experiments, and to A. Schmitt for the Hi-C data. We thank A. Aguirre for performing immunofluorescence. We thank E. Farley and K. Olson for help with reporter assays. We thank many colleagues for helpful comments.

Author information

Affiliations

Authors

Contributions

P.B. designed the study, generated the ChIP-Seq and RNA-Seq data, and performed the statistical analyses. A.D’A.-C. generated the iPSC-CMs, ChIP-Seq, ATAC-Seq and RNA-Seq data, and performed the EMSA. W.M. generated the constructs for luciferase assay and CRISPRi, and performed the luciferase assays. F.Y. performed the CRISPRi experiments. W.W.Y.G. implemented the fgwas analysis pipeline. C.D. implemented the RNA-Seq, ATAC-Seq and ASE analysis pipelines. H.L. processed the WGS and ChIP-Seq data. F.D. and S.S. generated iPSC-CMs and contributed to data generation. M.K.R.D. and H.M. performed data processing and computational analyses. N.S. and J.v.S. provided summary statistics for the PR interval GWAS. K.J.G. supervised the EMSA experiments. M.D’A. and E.N.S. performed statistical analyses. M.G.R. supervised the experimental validation of the variants. K.A.F. conceived and oversaw the study. P.B., E.N.S. and K.A.F. prepared the manuscript.

Corresponding authors

Correspondence to Michael G. Rosenfeld or Kelly A. Frazer.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Characterization of iPSC-CMs by flow cytometry, immunofluorescence and gene expression.

(a) Percentage of cells positive for the cardiomyocyte-specific marker TNNT2 analyzed by flow cytometry on iPSC-CMs generated in this study and collected at either day 15 (n = 15 independent samples) or day 25 and purified with L-lactate selection (n = 12 independent samples). The day 15 sample that was lactate purified is indicated. Plot lines indicate median, lower and upper quartiles. (b) Confocal microscopy of immunofluorescence for TNNT2 and MYL7 in one replicate of iPSCORE_2_1 day 15 iPSC-CMs at two different magnifications. Similar staining was observed for other two day15-iPSC-CM samples (subjects iPSCORE_2_3 and iPSCORE_2_9, not shown). (c) Hierarchical clustering and heatmap of RNA-seq expression data from 61 selected cell-type specific genes. In addition to the 56 samples from this study, Roadmap RNA-seq data from stem cell lines (H1, HUES64, iPS-20b and iPS-18) and human tissues (right ventricle, left ventricle, right atrium and fetal heart) are included as reference samples. Gene expression values are reported as vst-normalized read counts.

Supplementary Figure 2 Epigenetic profile differences between iPSCs and iPSC-CMs.

(a) Tag heatmap of sequence coverage at the TSSs of differentially expressed genes (n = 5,307 DEGs, minimum log2 fold change = 2, FDR < 0.05, DESeq2), listed in decreasing order of log2 ratio of expression in iPSCs versus iPSC-CMs. The density plot at the top shows the average normalized coverage (range 0-1) across iPSC-upregulated genes (2,444 genes, green) and iPSC-CM-upregulated genes (2,863 genes, purple). For a 2-kb window centered at the TSS, coverage values for 25-bp bins were obtained by combining all samples of each data type and normalized to a total of 107 reads. (b-f) Heatmap and hierarchical clustering of similarity based on overlap between enhancer annotations (from 25-state ChromHMM) for 127 samples from Roadmap Epigenomics (Ernst, J. and M. Kellis, Nat. Methods 9, 215-216, 2012; Kundaje, A. et al. Nature 518, 317-330, 2015) and ATAC-seq peaks or ChIP-Seq peaks from iPSCs (b, c) or iPSC-CMs (d-f) combined samples. Similarity was calculated using the Jaccard statistic (intersection/union of base pairs in each comparison), mean-centered across the 127 tissues and ordered by highest average enhancer similarity. A zoom in of the top 10 Roadmap tissues with highest average enhancer similarity is shown.

Supplementary Figure 3 Analysis of variation of molecular phenotypes across samples.

(a, c, e, g, i) Plots of the two first principal components (PC) calculated on all genes/peaks identified in each RNA-seq (a, iPSCs: 29 independent samples; e, iPSC-CMs: 26 independent samples and 1 technical replicate) or ChIP-seq dataset (c, H3K27ac in iPSCs: 17 independent samples and 4 technical replicates; g, H3K27ac in iPSC-CMs: 25 independent samples and 2 technical replicates; i, NKX2-5: 12 independent samples and 3 technical replicates). Samples are color-coded by subject and distinguished by a different symbol representing a different batch for iPSC (cultured, collected and sequenced at different times) or a different protocol (day 15 vs. day 25) for iPSC-CMs differentiation. Samples from the same individual are considered independent if iPSC lines were cultured -or iPSC-CMs were differentiated- at different times, and are considered technical replicates (indicated by A and B) if the assays were performed on cell material collected from the same sample. (b, d, f, h, j) Tables showing adjusted r-squared values from ANOVA tests are reported as a measure of association between PC1 to PC10 and different covariates in each dataset. Tables are color-coded according to –log10 of ANOVA P-values. (k-o) Plots of the average Spearman correlation coefficients between pairs of samples across the 1,000 most variable genes or peaks for the indicated molecular phenotypes. Each dot corresponds to the per-sample pairwise correlation coefficient averaged across samples from either the same subject or different subjects. Technical replicates were excluded for the comparisons between samples of the same subject. Plot lines indicate median, lower and upper quartiles. P-values from one-tailed Mann-Whitney test are shown.

Supplementary Figure 4 Comparison of number and effects of ASE-SNVs identified in ChIP-seq and ATAC-seq.

(a) Number of uniquely mapped reads for each data type and subject (n = 7). Each open circle corresponds to merged reads from different samples of the same subject. (b) Average FrIP across different samples of the same subject in each data type. (c) Median number of reads at all heterozygous SNVs tested for ASE in each individual. (d,e) Scatterplot showing increase in the median SNV coverage as a function of the number of mapped reads (d) and of FrIP (e) in each subject (n = 7). Continuous lines indicate that the relation is significant (linear regression, P < 0.05). (f) Scatterplot showing increase in the fraction of identified ASE-SNV in each subject (FDR < 0.05), as a function of the median SNV coverage. Continuous lines indicate that the relation is significant (linear regression, P < 0.05). (g) Distribution of the mean SNV coverage between subjects, across all SNVs analyzed in each data type (number of SNVs in each distribution from left to right: 30,463, 26,201, 116,898, 123,151 and 19,371). Median values are shown as white dots within the violin plot and are indicated. (h) Distribution of the number of ASE-SNVs (FDR < 10%) in 100 samplings of 100 SNVs with the same coverage in the different data types. (i) Distribution of the mean ASE effect sizes (allele frequencies) of the 100 SNVs samples shown in h. Median values are shown. Boxplot elements: median (thick line), lower and upper quartiles (box), maximum and minimum (wiskers). (j-l) Scatterplot showing correlation of ASE effects of the same SNV between peaks from different data types in iPSC-CMs: (j) NKX2-5 vs. ATAC-seq peaks; (k) NKX2-5 vs. H3K27ac peaks; (l) ATAC-seq vs. H3K27ac peaks. The union of significant ASE-SNVs in each pair of datasets is shown, with ASE effects expressed as the proportion of the reference allele at heterozygous SNVs. Gray dots denote ASE-SNVs significant (binomial test, FDR < 0.05) only in the peaks indicated in the x-axis, blue dots in the y-axis, and green dots in both. Pearson correlation coefficient (r), number of SNVs (n) and P-values are indicated.

Supplementary Figure 5 Correlation between ASE and motif disruption in NKX2-5 ChIP-seq.

Scatterplots showing relationship between the proportion of reads for the reference allele at ASE-SNVs in the NKX2-5 ChIP-seq data and the difference in motif strength between the reference and alternate allele. Spearman correlation statistics (r and P-value) and the number of motif-altering ASE-SNVs (n) are indicated. The 12 most enriched families of motifs in NKX2-5 peaks (Supplementary Table 4) were tested. TFBS motifs that were strengthened (red) or weakened (blue) by the preferred allele of ASE-SNVs are indicated.

Supplementary Figure 6 Analysis of enrichment of ChIP-seq and ATAC-seq peaks for GWAS SNPs.

(a) Heatmap of enrichment for GWAS SNPs in ChIP-seq and ATAC-seq peaks from iPSCs and iPSC-CMs combined samples from this study as well as in peaks from cardiac tissues from Roadmap (DHS of fetal heart, H3K27ac of right ventricle and right atrium). Heatmap is ordered by the most enriched GWAS traits on average in the cardiac datasets (iPSC-CMs and Roadmap tissues) and shows fold change values for significant enrichment at FDR corrected P-value < 0.05. A total of 125 GWAS traits were tested for enrichment, and the corresponding number of independent SNPs is given in parenthesis. The statistical enrichment test was performed using GREGOR software. (b-f) Volcano plots showing -log10 P-values (y-axis) and fold enrichment (x-axis) for GWAS loci showed in a, indicating the position of the 6 electrocardiographic traits. Red symbols indicate significant enrichment at FDR corrected P-value < 0.05. The iPSC-CMs NKX2-5, H3K27ac and ATAC-seq enrichment plots are shown in Fig. 5a–c.

Supplementary Figure 7 Enrichment of GWAS signals within iPSC-CM functional annotations using fgwas single state models.

Fgwas natural log fold enrichment of iPSC-CM genomic annotations (y-axis) in heart rate (den Hoed, M. et al. Nat. Genet. 45, 621-631, 2013), atrial fibrillation (Christophersen, I. E. et al. Nat. Genet. 49, 946-952, 2017), and PR interval (van Setten, J. et al. Nat. Commun. 9, 2904, 2018) GWAS signals. Solid circles indicate significant enrichment (defined as 95% CI above zero) and the bars indicate the 95% confidence intervals. The genomic annotations include NKX2-5 ASE-SNVs, NKX2-5 peaks, ATAC-seq peaks, H3K27ac peaks and H3K27ac ASE-SNVs. Peaks were called by combining all samples from each data type. The number of SNPs analyzed for each GWAS was: 2,516,407 SNPs for heart rate, 11,779,664 SNPs for atrial fibrillation, and 2,712,310 SNPs for PR interval.

Supplementary Figure 8 Functional characterization of candidate causal variants at four loci associated with heart rate.

(a, d, f, h) For each of the four loci, the top panel shows the regional plot of association P-values with heart rate (den Hoed, M. et al. Nat. Genet. 45, 621-631, 2013); SNPs are color coded based on r2 values from the 1000 Genome Project CEU population (Johnson, A. D. et al. Bioinformatics, 24, 2938-2939, 2008); lead GWAS variants in the locus are indicated by a diamond. The second panel shows the posterior probability of causality (PPA) of the variants in the locus calculated using fgwas, and panels three through five show epigenetic tracks from iPSC-CM combined samples (NKX2-5, ATAC-seq and H3K27a). The bottom panel shows the Roadmap fetal heart ChromHMM and genes from UCSC genome browser (conventional ChromHMM color code). For d and h, the bottom panel shows the locus at lower scale. For d, f and h, the locations of Hi-C loops from iPSC-CM are shown in red. For the candidate causal variants (turquoise lines), the allelic imbalance (pie chart) of NKX2-5 ASE and FRD-corrected P-values are shown; for a, the altered TF motif is shown. Significant associations (P < 0.05, linear regression) between putative variants genotypes and normalized gene expression of candidate genes in iPSC-CMs from 128 different individuals from iPSCORE are shown (c, e, g, i). Boxplot elements: median (thick line), lower and upper quartiles (box), maximum and minimum (whiskers). (b) EMSA with iPSC-CM nuclear extract using probes containing both allelic variants of rs7612445. An independent replicate is shown in Supplementary Figure 9.

Supplementary Figure 9 Electrophoretic mobility shift assay (EMSA) for NKX2-5 ASE-SNVs.

(a-c) Second independent replicate of EMSA with iPSC-CMs nuclear extract using probes containing two allelic variants of rs590041 (a), rs3807989 (b), and rs7612445 (c). (d) Original (not cropped) blots of all presented EMSAs. The figures and supplementary figures where we showed the corresponding cropped versions are indicated.

Supplementary Figure 10 Experimental validation using luciferase assays and CRISPRi in iPSC-CMs.

(a) Representative fluorescence microscopy image showing that in the luciferase assays we achieved approximately 70% transfection efficiency of a GFP-over-expressing plasmid in iPSC-CMs. Efficiency was measured once. (b) Test of gRNA efficiency for CRISPR system in HEK293T cells. HEK293T were transfected with two vectors: one containing two gRNAs targeting the indicated SNP (2sgRNA-ccdB-EF1a-Puromycin) and the other expressing Cas9 (Lenti-Cas9-Blast, Addgene #52962). The gRNAs for SNPs rs590041 (SSBP3 intron) and rs3807989 (CAV1 intron) showed additional bands corresponding to targeted deletions (arrows) and were used for CRISPRi experiments in iPSC-CMs. The gRNAs for SNPs rs7612445 (GNB4) and rs8044595 (MYH11) did not show additional bands and were not used for CRISPRi. Efficiency was tested once. (c,d) qPCR expression of SSBP3 (c) or CAV1 and CAV2 (d) in iPSC-CMs (id: iPSCORE_1_57) stably expressing dCas9-KRAB (CRISPRi) and either a control guide RNA (gCTL) or two guide RNAs targeting the region encompassing rs590041 (c) or rs3807989 (d). Bars and error bars represent the mean and the standard deviation from three qPCR measurements, respectively; two-tailed t-test P-values are shown. Similar results were obtained in an independent cell line presented in the main manuscript (Fig. 5i and Fig. 7h).

Supplementary Figure 11 UCSC genome browser screenshots showing quality of ChIP-seq and ATAC-seq data.

Bedgraph tracks are shown for one representative sample from each of the 7 individuals for H3K27ac ChIP-seq: (a) iPSCs and (b) iPSC-CMs; (c) iPSCs and iPSC-CMs samples from 3 individuals are shown for ATAC-seq; and (d) iPSC-CMs for 7 individuals for NKX2-5.

Supplementary information

Supplementary Information

Supplementary Figs. 1–11, Tables 1, 3 and 7, and Note

Reporting Summary

Supplementary Table 2

Metadata and per-sample sequence data metrics

Supplementary Table 4

Motif enrichment analysis of ATAC-Seq and ChIP-Seq peaks

Supplementary Table 5

Annotation of SNVs showing ASEs in ChIP-Seq and ATAC-Seq datasets

Supplementary Table 6

Results from fgwas fine-mapping analysis of heart rate, atrial fibrillation and PR interval GWAS studies using iPSC-CM functional genomics data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Benaglio, P., D’Antonio-Chronowska, A., Ma, W. et al. Allele-specific NKX2-5 binding underlies multiple genetic associations with human electrocardiographic traits. Nat Genet 51, 1506–1517 (2019). https://doi.org/10.1038/s41588-019-0499-3

Download citation