Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Allele-specific NKX2-5 binding underlies multiple genetic associations with human electrocardiographic traits


The cardiac transcription factor (TF) gene NKX2-5 has been associated with electrocardiographic (EKG) traits through genome-wide association studies (GWASs), but the extent to which differential binding of NKX2-5 at common regulatory variants contributes to these traits has not yet been studied. We analyzed transcriptomic and epigenomic data from induced pluripotent stem cell-derived cardiomyocytes from seven related individuals, and identified ~2,000 single-nucleotide variants associated with allele-specific effects (ASE-SNVs) on NKX2-5 binding. NKX2-5 ASE-SNVs were enriched for altered TF motifs, for heart-specific expression quantitative trait loci and for EKG GWAS signals. Using fine-mapping combined with epigenomic data from induced pluripotent stem cell–derived cardiomyocytes, we prioritized candidate causal variants for EKG traits, many of which were NKX2-5 ASE-SNVs. Experimentally characterizing two NKX2-5 ASE-SNVs (rs3807989 and rs590041) showed that they modulate the expression of target genes via differential protein binding in cardiac cells, indicating that they are functional variants underlying EKG GWAS signals. Our results show that differential NKX2-5 binding at numerous regulatory variants across the genome contributes to EKG phenotypes.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Generation and characterization of iPSCs and iPSC-CMs by gene expression and epigenetic profiling.
Fig. 2: Identification of coordinated ASEs in gene expression, H3K27 acetylation, chromatin accessibility and NKX2-5 binding in iPSCs and iPSC-CMs.
Fig. 3: TF binding motifs are altered by SNVs with ASEs in NKX2-5 ChIP-Seq.
Fig. 4: Enrichment of ChIP-Seq ASE variants for known QTLs.
Fig. 5: Enrichment of NKX2-5 SNVs at GWAS loci, and validation of rs590041 as a regulatory variant in the SSBP3 locus for P-wave duration.
Fig. 6: Prioritization of candidate causal variants at heart rate loci using fgwas.
Fig. 7: Functional characterization of rs3807989 as candidate causal variants for PR interval and atrial fibrillation.

Data availability

All iPSC lines are available through the WiCell Research Institute (; NHLBI Next Gen Collection). All genomic data are available through the database of Genotypes and Phenotypes (accessions phs000924 (RNA-Seq, ChIP-Seq, ATAC-Seq and Hi-C) and phs001325 (whole-genome-sequenced SNV and copy number variation genotypes)) and National Center for Biotechnology Information BioProject PRJNA285375. Processed data files are available through Gene Expression Omnibus accessions GSE125540 and GSE133833.

Code availability

Custom-written code is available via GitHub (


  1. 1.

    MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45, D896–D901 (2017).

    CAS  PubMed  Google Scholar 

  2. 2.

    Gallagher, M. D. & Chen-Plotkin, A. S. The post-GWAS era: from association to function. Am. J. Hum. Genet. 102, 717–730 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Van den Boogaard, M. et al. A common genetic variant within SCN10A modulates cardiac SCN5A expression. J. Clin. Invest. 124, 1844–1852 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. 4.

    Wang, X. et al. Discovery and validation of sub-threshold genome-wide association study loci using epigenomic signatures. eLife 5, e10557 (2016).

    PubMed  PubMed Central  Google Scholar 

  5. 5.

    Deplancke, B., Alpern, D. & Gardeux, V. The genetics of transcription factor DNA binding variation. Cell 166, 538–554 (2016).

    CAS  PubMed  Google Scholar 

  6. 6.

    Pai, A. A., Pritchard, J. K. & Gilad, Y. The genetic and mechanistic basis for variation in gene regulation. PLoS Genet. 11, e1004857 (2015).

    PubMed  PubMed Central  Google Scholar 

  7. 7.

    Maurano, M. T. et al. Large-scale identification of sequence variants influencing human transcription factor occupancy in vivo. Nat. Genet. 47, 1393–1401 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. 8.

    He, A., Kong, S. W., Ma, Q. & Pu, W. T. Co-occupancy by multiple cardiac transcription factors identifies transcriptional enhancers active in heart. Proc. Natl Acad. Sci. USA 108, 5632–5637 (2011).

    CAS  PubMed  Google Scholar 

  9. 9.

    Schlesinger, J. et al. The cardiac transcription network modulated by Gata4, Mef2a, Nkx2.5, Srf, histone modifications, and microRNAs. PLoS Genet. 7, e1001313 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Luna-Zurita, L. et al. Complex interdependence regulates heterotypic transcription factor distribution and coordinates cardiogenesis. Cell 164, 999–1014 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Ang, Y. S. et al. Disease model of GATA4 mutation reveals transcription factor cooperativity in human cardiogenesis. Cell 167, 1734–1749.e22 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. 12.

    Kathiresan, S. & Srivastava, D. Genetics of human cardiovascular disease. Cell 148, 1242–1257 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Pfeufer, A. et al. Genome-wide association study of PR interval. Nat. Genet. 42, 153–159 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Verweij, N. et al. Genetic determinants of P wave duration and PR segment. Circ. Cardiovasc. Genet. 7, 475–481 (2014).

    PubMed  PubMed Central  Google Scholar 

  15. 15.

    Den Hoed, M. et al. Identification of heart rate-associated loci and their effects on cardiac conduction and rhythm disorders. Nat. Genet. 45, 621–631 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. 16.

    Nielsen, J. B. et al. Genome-wide study of atrial fibrillation identifies seven risk loci and highlights biological pathways and regulatory elements involved in cardiac development. Am. J. Hum. Genet. 102, 103–115 (2018).

    CAS  PubMed  Google Scholar 

  17. 17.

    Van Setten, J. et al. PR interval genome-wide association meta-analysis identifies 50 loci associated with atrial and atrioventricular electrical activity. Nat. Commun. 9, 2904 (2018).

    PubMed  PubMed Central  Google Scholar 

  18. 18.

    Panopoulos, A. D. et al. Aberrant DNA methylation in human iPSCs associates with MYC-binding motifs in a clone-specific manner independent of genetics. Cell Stem Cell 20, 505–517.e6 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Carcamo-Orive, I. et al. Analysis of transcriptional variability in a large human iPSC library reveals genetic and non-genetic determinants of heterogeneity. Cell Stem Cell 20, 518–532.e9 (2017).

    CAS  PubMed  Google Scholar 

  20. 20.

    Kilpinen, H. et al. Common genetic variation drives molecular heterogeneity in human iPSCs. Nature 546, 370–375 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21.

    DeBoever, C. et al. Large-scale profiling reveals the influence of genetic variation on gene expression in human induced pluripotent stem cells. Cell Stem Cell 20, 533–546.e7 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Banovich, N. E. et al. Impact of regulatory variation across human iPSCs and differentiated cells. Genome Res. 28, 122–131 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. 23.

    Pashos, E. E. et al. Large, diverse population cohorts of hiPSCs and derived hepatocyte-like cells reveal functional genetic variation at blood lipid-associated loci. Cell Stem Cell 20, 558–570.e10 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Schwartzentruber, J. et al. Molecular and functional variation in iPSC-derived sensory neurons. Nat. Genet. 50, 54–61 (2018).

    CAS  PubMed  Google Scholar 

  25. 25.

    He, J. Q., Ma, Y., Lee, Y., Thomson, J. A. & Kamp, T. J. Human embryonic stem cells develop into multiple types of cardiac myocytes: action potential characterization. Circ. Res. 93, 32–39 (2003).

    CAS  PubMed  Google Scholar 

  26. 26.

    D’Antonio-Chronowska, A. et al. Human iPSC gene signatures and X chromosome dosage impact response to WNT inhibition and cardiac differentiation fate. Stem Cell Rep. (in the press).

  27. 27.

    Burridge, P. W. et al. Chemically defined generation of human cardiomyocytes. Nat. Methods 11, 855–860 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Panopoulos, A. D. et al. iPSCORE: a resource of 222 iPSC lines enabling functional characterization of genetic variation across a variety of cell types. Stem Cell Rep. 8, 1086–1100 (2017).

    CAS  Google Scholar 

  29. 29.

    Kilpinen, H. et al. Coordinated effects of sequence variation on DNA binding, chromatin structure, and transcription. Science 342, 744–747 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Dupays, L. et al. Sequential binding of MEIS1 and NKX2-5 on the Popdc2 gene: a mechanism for spatiotemporal regulation of enhancers during cardiogenesis. Cell Rep. 13, 183–195 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. 31.

    Prall, O. W. et al. An Nkx2-5/Bmp2/Smad1 negative feedback loop controls heart progenitor specification and proliferation. Cell 128, 947–959 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Degner, J. F. et al. DNase I sensitivity QTLs are a major determinant of human expression variation. Nature 482, 390–394 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. 33.

    Ward, L. D. & Kellis, M. HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res. 40, D930–D934 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. 34.

    GTEx Consortium et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

    PubMed Central  Google Scholar 

  35. 35.

    Roselli, C. et al. Multi-ethnic genome-wide association study for atrial fibrillation. Nat. Genet. 50, 1225–1233 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. 36.

    Nielsen, J. B. et al. Biobank-driven genomic discovery yields new insight into atrial fibrillation biology. Nat. Genet. 50, 1234–1239 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Christophersen, I. E. et al. Fifteen genetic loci associated with the electrocardiographic P wave. Circ. Cardiovasc. Genet. 10, e001667 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Pickrell, J. K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559–573 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. 39.

    Eppinga, R. N. et al. Identification of genomic loci associated with resting heart rate and shared genetic predictors with all-cause mortality. Nat. Genet. 48, 1557–1563 (2016).

    CAS  PubMed  Google Scholar 

  40. 40.

    Butler, A. M. et al. Novel loci associated with PR interval in a genome-wide association study of 10 African American cohorts. Circ. Cardiovasc. Genet. 5, 639–646 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. 41.

    Sano, M. et al. Genome-wide association study of electrocardiographic parameters identifies a new association for PR interval and confirms previously reported associations. Hum. Mol. Genet. 23, 6668–6676 (2014).

    CAS  PubMed  Google Scholar 

  42. 42.

    Arking, D. E. et al. Genetic association study of QT interval highlights role for calcium signaling pathways in myocardial repolarization. Nat. Genet. 46, 826–836 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. 43.

    Holm, H. et al. Several common variants modulate heart rate, PR interval and QRS duration. Nat. Genet. 42, 117–122 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Ritchie, M. D. et al. Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk. Circulation 127, 1377–1385 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. 45.

    Van den Boogaard, M. et al. Genetic variation in T-box binding element functionally affects SCN5A/SCN10A enhancer. J. Clin. Invest. 122, 2519–2530 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. 46.

    Greenwald, W. W. et al. Subtle changes in chromatin loop contact propensity are associated with differential gene regulation and expression. Nat. Commun. 10, 1054 (2019).

    PubMed  PubMed Central  Google Scholar 

  47. 47.

    Christophersen, I. E. et al. Large-scale analyses of common and rare variants identify 12 new loci associated with atrial fibrillation. Nat. Genet. 49, 946–952 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. 48.

    Ramasamy, A. et al. Genetic variability in the regulation of gene expression in ten regions of the human brain. Nat. Neurosci. 17, 1418–1428 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  49. 49.

    Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. 50.

    Samee, M. A. H., Bruneau, B. G. & Pollard, K. S. A de novo shape motif discovery algorithm reveals preferences of transcription factors for DNA shape beyond sequence motifs. Cell Syst. 8, 27–42.e6 (2019).

    CAS  PubMed  Google Scholar 

  51. 51.

    Afek, A., Schipper, J. L., Horton, J., Gordan, R. & Lukatsky, D. B. Protein–DNA binding in the absence of specific base-pair recognition. Proc. Natl Acad. Sci. USA 111, 17140–17145 (2014).

    CAS  PubMed  Google Scholar 

  52. 52.

    Slattery, M. et al. Absence of a simple code: how transcription factors read the genome. Trends Biochem. Sci. 39, 381–399 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. 53.

    Heinz, S. et al. Effect of natural genetic variation on enhancer selection and function. Nature 503, 487–492 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. 54.

    Johnson, A. D. et al. SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics 24, 2938–2939 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. 55.

    Hong, K. W. et al. Identification of three novel genetic variations associated with electrocardiographic traits (QRS duration and PR interval) in East Asians. Hum. Mol. Genet. 23, 6659–6667 (2014).

    CAS  PubMed  Google Scholar 

  56. 56.

    Van der Harst, P. et al. 52 genetic loci influencing myocardial mass. J. Am. Coll. Cardiol. 68, 1435–1448 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. 57.

    Evans, D. S. et al. Fine-mapping, novel loci identification, and SNP association transferability in a genome-wide association study of QRS duration in African Americans. Hum. Mol. Genet. 25, 4350–4368 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. 58.

    Ellinor, P. T. et al. Meta-analysis identifies six new susceptibility loci for atrial fibrillation. Nat. Genet. 44, 670–675 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. 59.

    Jeff, J. M. et al. Generalization of variants identified by genome-wide association studies for electrocardiographic traits in African Americans. Ann. Hum. Genet. 77, 321–332 (2013).

    PubMed  PubMed Central  Google Scholar 

  60. 60.

    Bezzina, C. R. et al. Common variants at SCN5A-SCN10A and HEY2 are associated with Brugada syndrome, a rare disease with high risk of sudden cardiac death. Nat. Genet. 45, 1044–1049 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  61. 61.

    Ban, H. et al. Efficient generation of transgene-free human induced pluripotent stem cells (iPSCs) by temperature-sensitive Sendai virus vectors. Proc. Natl Acad. Sci. USA 108, 14234–14239 (2011).

    CAS  Google Scholar 

  62. 62.

    Lian, X. et al. Directed cardiomyocyte differentiation from human pluripotent stem cells by modulating Wnt/β-catenin signaling under fully defined conditions. Nat. Protoc. 8, 162–175 (2013).

    CAS  PubMed  Google Scholar 

  63. 63.

    Tohyama, S. et al. Distinct metabolic flow enables large-scale purification of mouse and human pluripotent stem cell-derived cardiomyocytes. Cell Stem Cell 12, 127–137 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. 64.

    Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Google Scholar 

  65. 65.

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. 66.

    Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J. & Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 31, 2032–2034 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. 67.

    Tischler, G. & Leonard, S. biobambam: tools for read pair collation based algorithms on BAM files. Source Code Biol. Med. 9, 13 (2014).

    PubMed Central  Google Scholar 

  68. 68.

    DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. 69.

    McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  70. 70.

    Dobin, A. et al. STAR: ultrafast universal RNA-Seq aligner. Bioinformatics 29, 15–21 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. 71.

    Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  72. 72.

    Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. 73.

    Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).

    PubMed  PubMed Central  Google Scholar 

  74. 74.

    Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  75. 75.

    Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  76. 76.

    Machanick, P. & Bailey, T. L. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696–1697 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  77. 77.

    Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  78. 78.

    Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biol. 15, 550 (2014).

    PubMed  PubMed Central  Google Scholar 

  79. 79.

    Van de Geijn, B., McVicker, G., Gilad, Y. & Pritchard, J. K. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat. Methods 12, 1061–1063 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  80. 80.

    Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.1-33 (2013).

    PubMed  Google Scholar 

  81. 81.

    Mayba, O. et al. MBASED: allele-specific expression detection in cancer tissues and cell lines. Genome Biol. 15, 405 (2014).

    PubMed  PubMed Central  Google Scholar 

  82. 82.

    GTEx Consortium The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

    Google Scholar 

  83. 83.

    Whitlock, M. C. Combining probability from independent tests: the weighted Z-method is superior to Fisher’s approach. J. Evol. Biol. 18, 1368–1373 (2005).

    CAS  PubMed  Google Scholar 

  84. 84.

    Schmidt, E. M. et al. GREGOR: evaluating global enrichment of trait-associated variants in epigenomic features using a systematic, data-driven approach. Bioinformatics 31, 2601–2606 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  85. 85.

    Stegle, O., Parts, L., Durbin, R. & Winn, J. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput. Biol. 6, e1000770 (2010).

    PubMed  PubMed Central  Google Scholar 

Download references


This work was supported in part by California Institute for Regenerative Medicine grant (CIRM) GC1R-06673-B, NIH grants HG008118 and HL107442, and National Science Foundation grant 1728497. P.B. was supported by the Swiss National Science Foundation Postdoc Mobility fellowships P2LAP3-155105 and P300PA-167612. W.W.Y.G. was supported by the NHLBI under award number HL142151. C.D. was supported in part by the UCSD Genetics Training Program through an institutional training grant from the NIGMS under award number GM008666 and the CIRM Interdisciplinary Stem Cell Training Program at UCSD II (TG2-01154). Library preparation and sequencing services were conducted by K. Jepsen and M. Khosroheidari at the UCSD IGM Genomics Center, supported by NIH grant CA023100. N.S. was supported by NIH grants HL116747 and HL141989. K.J.G. was supported by NIH grant DK114650 and ADA grant 1-17-JDF-027. W.M., F.Y. and M.G.R. were supported by NIH grants DK018477 and DK039949. M.G.R. is a HHMI investigator. We are thankful to C.-A. Yen and N. Spann for assistance with the ChIP-Seq experiments, and to A. Schmitt for the Hi-C data. We thank A. Aguirre for performing immunofluorescence. We thank E. Farley and K. Olson for help with reporter assays. We thank many colleagues for helpful comments.

Author information




P.B. designed the study, generated the ChIP-Seq and RNA-Seq data, and performed the statistical analyses. A.D’A.-C. generated the iPSC-CMs, ChIP-Seq, ATAC-Seq and RNA-Seq data, and performed the EMSA. W.M. generated the constructs for luciferase assay and CRISPRi, and performed the luciferase assays. F.Y. performed the CRISPRi experiments. W.W.Y.G. implemented the fgwas analysis pipeline. C.D. implemented the RNA-Seq, ATAC-Seq and ASE analysis pipelines. H.L. processed the WGS and ChIP-Seq data. F.D. and S.S. generated iPSC-CMs and contributed to data generation. M.K.R.D. and H.M. performed data processing and computational analyses. N.S. and J.v.S. provided summary statistics for the PR interval GWAS. K.J.G. supervised the EMSA experiments. M.D’A. and E.N.S. performed statistical analyses. M.G.R. supervised the experimental validation of the variants. K.A.F. conceived and oversaw the study. P.B., E.N.S. and K.A.F. prepared the manuscript.

Corresponding authors

Correspondence to Michael G. Rosenfeld or Kelly A. Frazer.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Characterization of iPSC-CMs by flow cytometry, immunofluorescence and gene expression.

(a) Percentage of cells positive for the cardiomyocyte-specific marker TNNT2 analyzed by flow cytometry on iPSC-CMs generated in this study and collected at either day 15 (n = 15 independent samples) or day 25 and purified with L-lactate selection (n = 12 independent samples). The day 15 sample that was lactate purified is indicated. Plot lines indicate median, lower and upper quartiles. (b) Confocal microscopy of immunofluorescence for TNNT2 and MYL7 in one replicate of iPSCORE_2_1 day 15 iPSC-CMs at two different magnifications. Similar staining was observed for other two day15-iPSC-CM samples (subjects iPSCORE_2_3 and iPSCORE_2_9, not shown). (c) Hierarchical clustering and heatmap of RNA-seq expression data from 61 selected cell-type specific genes. In addition to the 56 samples from this study, Roadmap RNA-seq data from stem cell lines (H1, HUES64, iPS-20b and iPS-18) and human tissues (right ventricle, left ventricle, right atrium and fetal heart) are included as reference samples. Gene expression values are reported as vst-normalized read counts.

Supplementary Figure 2 Epigenetic profile differences between iPSCs and iPSC-CMs.

(a) Tag heatmap of sequence coverage at the TSSs of differentially expressed genes (n = 5,307 DEGs, minimum log2 fold change = 2, FDR < 0.05, DESeq2), listed in decreasing order of log2 ratio of expression in iPSCs versus iPSC-CMs. The density plot at the top shows the average normalized coverage (range 0-1) across iPSC-upregulated genes (2,444 genes, green) and iPSC-CM-upregulated genes (2,863 genes, purple). For a 2-kb window centered at the TSS, coverage values for 25-bp bins were obtained by combining all samples of each data type and normalized to a total of 107 reads. (b-f) Heatmap and hierarchical clustering of similarity based on overlap between enhancer annotations (from 25-state ChromHMM) for 127 samples from Roadmap Epigenomics (Ernst, J. and M. Kellis, Nat. Methods 9, 215-216, 2012; Kundaje, A. et al. Nature 518, 317-330, 2015) and ATAC-seq peaks or ChIP-Seq peaks from iPSCs (b, c) or iPSC-CMs (d-f) combined samples. Similarity was calculated using the Jaccard statistic (intersection/union of base pairs in each comparison), mean-centered across the 127 tissues and ordered by highest average enhancer similarity. A zoom in of the top 10 Roadmap tissues with highest average enhancer similarity is shown.

Supplementary Figure 3 Analysis of variation of molecular phenotypes across samples.

(a, c, e, g, i) Plots of the two first principal components (PC) calculated on all genes/peaks identified in each RNA-seq (a, iPSCs: 29 independent samples; e, iPSC-CMs: 26 independent samples and 1 technical replicate) or ChIP-seq dataset (c, H3K27ac in iPSCs: 17 independent samples and 4 technical replicates; g, H3K27ac in iPSC-CMs: 25 independent samples and 2 technical replicates; i, NKX2-5: 12 independent samples and 3 technical replicates). Samples are color-coded by subject and distinguished by a different symbol representing a different batch for iPSC (cultured, collected and sequenced at different times) or a different protocol (day 15 vs. day 25) for iPSC-CMs differentiation. Samples from the same individual are considered independent if iPSC lines were cultured -or iPSC-CMs were differentiated- at different times, and are considered technical replicates (indicated by A and B) if the assays were performed on cell material collected from the same sample. (b, d, f, h, j) Tables showing adjusted r-squared values from ANOVA tests are reported as a measure of association between PC1 to PC10 and different covariates in each dataset. Tables are color-coded according to –log10 of ANOVA P-values. (k-o) Plots of the average Spearman correlation coefficients between pairs of samples across the 1,000 most variable genes or peaks for the indicated molecular phenotypes. Each dot corresponds to the per-sample pairwise correlation coefficient averaged across samples from either the same subject or different subjects. Technical replicates were excluded for the comparisons between samples of the same subject. Plot lines indicate median, lower and upper quartiles. P-values from one-tailed Mann-Whitney test are shown.

Supplementary Figure 4 Comparison of number and effects of ASE-SNVs identified in ChIP-seq and ATAC-seq.

(a) Number of uniquely mapped reads for each data type and subject (n = 7). Each open circle corresponds to merged reads from different samples of the same subject. (b) Average FrIP across different samples of the same subject in each data type. (c) Median number of reads at all heterozygous SNVs tested for ASE in each individual. (d,e) Scatterplot showing increase in the median SNV coverage as a function of the number of mapped reads (d) and of FrIP (e) in each subject (n = 7). Continuous lines indicate that the relation is significant (linear regression, P < 0.05). (f) Scatterplot showing increase in the fraction of identified ASE-SNV in each subject (FDR < 0.05), as a function of the median SNV coverage. Continuous lines indicate that the relation is significant (linear regression, P < 0.05). (g) Distribution of the mean SNV coverage between subjects, across all SNVs analyzed in each data type (number of SNVs in each distribution from left to right: 30,463, 26,201, 116,898, 123,151 and 19,371). Median values are shown as white dots within the violin plot and are indicated. (h) Distribution of the number of ASE-SNVs (FDR < 10%) in 100 samplings of 100 SNVs with the same coverage in the different data types. (i) Distribution of the mean ASE effect sizes (allele frequencies) of the 100 SNVs samples shown in h. Median values are shown. Boxplot elements: median (thick line), lower and upper quartiles (box), maximum and minimum (wiskers). (j-l) Scatterplot showing correlation of ASE effects of the same SNV between peaks from different data types in iPSC-CMs: (j) NKX2-5 vs. ATAC-seq peaks; (k) NKX2-5 vs. H3K27ac peaks; (l) ATAC-seq vs. H3K27ac peaks. The union of significant ASE-SNVs in each pair of datasets is shown, with ASE effects expressed as the proportion of the reference allele at heterozygous SNVs. Gray dots denote ASE-SNVs significant (binomial test, FDR < 0.05) only in the peaks indicated in the x-axis, blue dots in the y-axis, and green dots in both. Pearson correlation coefficient (r), number of SNVs (n) and P-values are indicated.

Supplementary Figure 5 Correlation between ASE and motif disruption in NKX2-5 ChIP-seq.

Scatterplots showing relationship between the proportion of reads for the reference allele at ASE-SNVs in the NKX2-5 ChIP-seq data and the difference in motif strength between the reference and alternate allele. Spearman correlation statistics (r and P-value) and the number of motif-altering ASE-SNVs (n) are indicated. The 12 most enriched families of motifs in NKX2-5 peaks (Supplementary Table 4) were tested. TFBS motifs that were strengthened (red) or weakened (blue) by the preferred allele of ASE-SNVs are indicated.

Supplementary Figure 6 Analysis of enrichment of ChIP-seq and ATAC-seq peaks for GWAS SNPs.

(a) Heatmap of enrichment for GWAS SNPs in ChIP-seq and ATAC-seq peaks from iPSCs and iPSC-CMs combined samples from this study as well as in peaks from cardiac tissues from Roadmap (DHS of fetal heart, H3K27ac of right ventricle and right atrium). Heatmap is ordered by the most enriched GWAS traits on average in the cardiac datasets (iPSC-CMs and Roadmap tissues) and shows fold change values for significant enrichment at FDR corrected P-value < 0.05. A total of 125 GWAS traits were tested for enrichment, and the corresponding number of independent SNPs is given in parenthesis. The statistical enrichment test was performed using GREGOR software. (b-f) Volcano plots showing -log10 P-values (y-axis) and fold enrichment (x-axis) for GWAS loci showed in a, indicating the position of the 6 electrocardiographic traits. Red symbols indicate significant enrichment at FDR corrected P-value < 0.05. The iPSC-CMs NKX2-5, H3K27ac and ATAC-seq enrichment plots are shown in Fig. 5a–c.

Supplementary Figure 7 Enrichment of GWAS signals within iPSC-CM functional annotations using fgwas single state models.

Fgwas natural log fold enrichment of iPSC-CM genomic annotations (y-axis) in heart rate (den Hoed, M. et al. Nat. Genet. 45, 621-631, 2013), atrial fibrillation (Christophersen, I. E. et al. Nat. Genet. 49, 946-952, 2017), and PR interval (van Setten, J. et al. Nat. Commun. 9, 2904, 2018) GWAS signals. Solid circles indicate significant enrichment (defined as 95% CI above zero) and the bars indicate the 95% confidence intervals. The genomic annotations include NKX2-5 ASE-SNVs, NKX2-5 peaks, ATAC-seq peaks, H3K27ac peaks and H3K27ac ASE-SNVs. Peaks were called by combining all samples from each data type. The number of SNPs analyzed for each GWAS was: 2,516,407 SNPs for heart rate, 11,779,664 SNPs for atrial fibrillation, and 2,712,310 SNPs for PR interval.

Supplementary Figure 8 Functional characterization of candidate causal variants at four loci associated with heart rate.

(a, d, f, h) For each of the four loci, the top panel shows the regional plot of association P-values with heart rate (den Hoed, M. et al. Nat. Genet. 45, 621-631, 2013); SNPs are color coded based on r2 values from the 1000 Genome Project CEU population (Johnson, A. D. et al. Bioinformatics, 24, 2938-2939, 2008); lead GWAS variants in the locus are indicated by a diamond. The second panel shows the posterior probability of causality (PPA) of the variants in the locus calculated using fgwas, and panels three through five show epigenetic tracks from iPSC-CM combined samples (NKX2-5, ATAC-seq and H3K27a). The bottom panel shows the Roadmap fetal heart ChromHMM and genes from UCSC genome browser (conventional ChromHMM color code). For d and h, the bottom panel shows the locus at lower scale. For d, f and h, the locations of Hi-C loops from iPSC-CM are shown in red. For the candidate causal variants (turquoise lines), the allelic imbalance (pie chart) of NKX2-5 ASE and FRD-corrected P-values are shown; for a, the altered TF motif is shown. Significant associations (P < 0.05, linear regression) between putative variants genotypes and normalized gene expression of candidate genes in iPSC-CMs from 128 different individuals from iPSCORE are shown (c, e, g, i). Boxplot elements: median (thick line), lower and upper quartiles (box), maximum and minimum (whiskers). (b) EMSA with iPSC-CM nuclear extract using probes containing both allelic variants of rs7612445. An independent replicate is shown in Supplementary Figure 9.

Supplementary Figure 9 Electrophoretic mobility shift assay (EMSA) for NKX2-5 ASE-SNVs.

(a-c) Second independent replicate of EMSA with iPSC-CMs nuclear extract using probes containing two allelic variants of rs590041 (a), rs3807989 (b), and rs7612445 (c). (d) Original (not cropped) blots of all presented EMSAs. The figures and supplementary figures where we showed the corresponding cropped versions are indicated.

Supplementary Figure 10 Experimental validation using luciferase assays and CRISPRi in iPSC-CMs.

(a) Representative fluorescence microscopy image showing that in the luciferase assays we achieved approximately 70% transfection efficiency of a GFP-over-expressing plasmid in iPSC-CMs. Efficiency was measured once. (b) Test of gRNA efficiency for CRISPR system in HEK293T cells. HEK293T were transfected with two vectors: one containing two gRNAs targeting the indicated SNP (2sgRNA-ccdB-EF1a-Puromycin) and the other expressing Cas9 (Lenti-Cas9-Blast, Addgene #52962). The gRNAs for SNPs rs590041 (SSBP3 intron) and rs3807989 (CAV1 intron) showed additional bands corresponding to targeted deletions (arrows) and were used for CRISPRi experiments in iPSC-CMs. The gRNAs for SNPs rs7612445 (GNB4) and rs8044595 (MYH11) did not show additional bands and were not used for CRISPRi. Efficiency was tested once. (c,d) qPCR expression of SSBP3 (c) or CAV1 and CAV2 (d) in iPSC-CMs (id: iPSCORE_1_57) stably expressing dCas9-KRAB (CRISPRi) and either a control guide RNA (gCTL) or two guide RNAs targeting the region encompassing rs590041 (c) or rs3807989 (d). Bars and error bars represent the mean and the standard deviation from three qPCR measurements, respectively; two-tailed t-test P-values are shown. Similar results were obtained in an independent cell line presented in the main manuscript (Fig. 5i and Fig. 7h).

Supplementary Figure 11 UCSC genome browser screenshots showing quality of ChIP-seq and ATAC-seq data.

Bedgraph tracks are shown for one representative sample from each of the 7 individuals for H3K27ac ChIP-seq: (a) iPSCs and (b) iPSC-CMs; (c) iPSCs and iPSC-CMs samples from 3 individuals are shown for ATAC-seq; and (d) iPSC-CMs for 7 individuals for NKX2-5.

Supplementary information

Supplementary Information

Supplementary Figs. 1–11, Tables 1, 3 and 7, and Note

Reporting Summary

Supplementary Table 2

Metadata and per-sample sequence data metrics

Supplementary Table 4

Motif enrichment analysis of ATAC-Seq and ChIP-Seq peaks

Supplementary Table 5

Annotation of SNVs showing ASEs in ChIP-Seq and ATAC-Seq datasets

Supplementary Table 6

Results from fgwas fine-mapping analysis of heart rate, atrial fibrillation and PR interval GWAS studies using iPSC-CM functional genomics data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Benaglio, P., D’Antonio-Chronowska, A., Ma, W. et al. Allele-specific NKX2-5 binding underlies multiple genetic associations with human electrocardiographic traits. Nat Genet 51, 1506–1517 (2019).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing