Allele-specific NKX2-5 binding underlies multiple genetic associations with human electrocardiographic traits

Benaglio, Paola; D’Antonio-Chronowska, Agnieszka; Ma, Wubin; Yang, Feng; Young Greenwald, William W.; Donovan, Margaret K. R.; DeBoever, Christopher; Li, He; Drees, Frauke; Singhal, Sanghamitra; Matsui, Hiroko; van Setten, Jessica; Sotoodehnia, Nona; Gaulton, Kyle J.; Smith, Erin N.; D’Antonio, Matteo; Rosenfeld, Michael G.; Frazer, Kelly A.

doi:10.1038/s41588-019-0499-3

Article
Published: 30 September 2019

Allele-specific NKX2-5 binding underlies multiple genetic associations with human electrocardiographic traits

Paola Benaglio¹,
Agnieszka D’Antonio-Chronowska²^na1,
Wubin Ma³^na1,
Feng Yang³,
William W. Young Greenwald⁴,
Margaret K. R. Donovan^4,5,
Christopher DeBoever⁴,
He Li ORCID: orcid.org/0000-0002-1766-5311²,
Frauke Drees²,
Sanghamitra Singhal¹,
Hiroko Matsui²,
Jessica van Setten ORCID: orcid.org/0000-0002-4934-7510⁶,
Nona Sotoodehnia^7,8,
Kyle J. Gaulton¹,
Erin N. Smith¹,
Matteo D’Antonio ORCID: orcid.org/0000-0001-5844-6433²,
Michael G. Rosenfeld ORCID: orcid.org/0000-0002-1572-156X³ &
…
Kelly A. Frazer ORCID: orcid.org/0000-0002-6060-8902^1,2

Nature Genetics volume 51, pages 1506–1517 (2019)Cite this article

4116 Accesses
25 Citations
83 Altmetric
Metrics details

Subjects

Abstract

The cardiac transcription factor (TF) gene NKX2-5 has been associated with electrocardiographic (EKG) traits through genome-wide association studies (GWASs), but the extent to which differential binding of NKX2-5 at common regulatory variants contributes to these traits has not yet been studied. We analyzed transcriptomic and epigenomic data from induced pluripotent stem cell-derived cardiomyocytes from seven related individuals, and identified ~2,000 single-nucleotide variants associated with allele-specific effects (ASE-SNVs) on NKX2-5 binding. NKX2-5 ASE-SNVs were enriched for altered TF motifs, for heart-specific expression quantitative trait loci and for EKG GWAS signals. Using fine-mapping combined with epigenomic data from induced pluripotent stem cell–derived cardiomyocytes, we prioritized candidate causal variants for EKG traits, many of which were NKX2-5 ASE-SNVs. Experimentally characterizing two NKX2-5 ASE-SNVs (rs3807989 and rs590041) showed that they modulate the expression of target genes via differential protein binding in cardiac cells, indicating that they are functional variants underlying EKG GWAS signals. Our results show that differential NKX2-5 binding at numerous regulatory variants across the genome contributes to EKG phenotypes.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Generation and characterization of iPSCs and iPSC-CMs by gene expression and epigenetic profiling.**

**Fig. 2: Identification of coordinated ASEs in gene expression, H3K27 acetylation, chromatin accessibility and NKX2-5 binding in iPSCs and iPSC-CMs.**

**Fig. 3: TF binding motifs are altered by SNVs with ASEs in NKX2-5 ChIP-Seq.**

**Fig. 4: Enrichment of ChIP-Seq ASE variants for known QTLs.**

**Fig. 5: Enrichment of NKX2-5 SNVs at GWAS loci, and validation of rs590041 as a regulatory variant in the *SSBP3* locus for P-wave duration.**

**Fig. 6: Prioritization of candidate causal variants at heart rate loci using fgwas.**

**Fig. 7: Functional characterization of rs3807989 as candidate causal variants for PR interval and atrial fibrillation.**

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Genome-wide association studies

Article 26 August 2021

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

Data availability

All iPSC lines are available through the WiCell Research Institute (www.wicell.org; NHLBI Next Gen Collection). All genomic data are available through the database of Genotypes and Phenotypes (accessions phs000924 (RNA-Seq, ChIP-Seq, ATAC-Seq and Hi-C) and phs001325 (whole-genome-sequenced SNV and copy number variation genotypes)) and National Center for Biotechnology Information BioProject PRJNA285375. Processed data files are available through Gene Expression Omnibus accessions GSE125540 and GSE133833.

Code availability

Custom-written code is available via GitHub (https://github.com/frazer-lab/NKX2-5_ASE_iPSC-CM).

References

MacArthur, J. et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45, D896–D901 (2017).
Article CAS PubMed Google Scholar
Gallagher, M. D. & Chen-Plotkin, A. S. The post-GWAS era: from association to function. Am. J. Hum. Genet. 102, 717–730 (2018).
Article CAS PubMed PubMed Central Google Scholar
Van den Boogaard, M. et al. A common genetic variant within SCN10A modulates cardiac SCN5A expression. J. Clin. Invest. 124, 1844–1852 (2014).
Article CAS PubMed PubMed Central Google Scholar
Wang, X. et al. Discovery and validation of sub-threshold genome-wide association study loci using epigenomic signatures. eLife 5, e10557 (2016).
Article PubMed PubMed Central Google Scholar
Deplancke, B., Alpern, D. & Gardeux, V. The genetics of transcription factor DNA binding variation. Cell 166, 538–554 (2016).
Article CAS PubMed Google Scholar
Pai, A. A., Pritchard, J. K. & Gilad, Y. The genetic and mechanistic basis for variation in gene regulation. PLoS Genet. 11, e1004857 (2015).
Article PubMed PubMed Central CAS Google Scholar
Maurano, M. T. et al. Large-scale identification of sequence variants influencing human transcription factor occupancy in vivo. Nat. Genet. 47, 1393–1401 (2015).
Article CAS PubMed PubMed Central Google Scholar
He, A., Kong, S. W., Ma, Q. & Pu, W. T. Co-occupancy by multiple cardiac transcription factors identifies transcriptional enhancers active in heart. Proc. Natl Acad. Sci. USA 108, 5632–5637 (2011).
Article CAS PubMed PubMed Central Google Scholar
Schlesinger, J. et al. The cardiac transcription network modulated by Gata4, Mef2a, Nkx2.5, Srf, histone modifications, and microRNAs. PLoS Genet. 7, e1001313 (2011).
Article CAS PubMed PubMed Central Google Scholar
Luna-Zurita, L. et al. Complex interdependence regulates heterotypic transcription factor distribution and coordinates cardiogenesis. Cell 164, 999–1014 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ang, Y. S. et al. Disease model of GATA4 mutation reveals transcription factor cooperativity in human cardiogenesis. Cell 167, 1734–1749.e22 (2016).
Article CAS PubMed PubMed Central Google Scholar
Kathiresan, S. & Srivastava, D. Genetics of human cardiovascular disease. Cell 148, 1242–1257 (2012).
Article CAS PubMed PubMed Central Google Scholar
Pfeufer, A. et al. Genome-wide association study of PR interval. Nat. Genet. 42, 153–159 (2010).
Article CAS PubMed PubMed Central Google Scholar
Verweij, N. et al. Genetic determinants of P wave duration and PR segment. Circ. Cardiovasc. Genet. 7, 475–481 (2014).
Article PubMed PubMed Central Google Scholar
Den Hoed, M. et al. Identification of heart rate-associated loci and their effects on cardiac conduction and rhythm disorders. Nat. Genet. 45, 621–631 (2013).
Article CAS PubMed PubMed Central Google Scholar
Nielsen, J. B. et al. Genome-wide study of atrial fibrillation identifies seven risk loci and highlights biological pathways and regulatory elements involved in cardiac development. Am. J. Hum. Genet. 102, 103–115 (2018).
Article CAS PubMed Google Scholar
Van Setten, J. et al. PR interval genome-wide association meta-analysis identifies 50 loci associated with atrial and atrioventricular electrical activity. Nat. Commun. 9, 2904 (2018).
Article PubMed PubMed Central CAS Google Scholar
Panopoulos, A. D. et al. Aberrant DNA methylation in human iPSCs associates with MYC-binding motifs in a clone-specific manner independent of genetics. Cell Stem Cell 20, 505–517.e6 (2017).
Article CAS PubMed PubMed Central Google Scholar
Carcamo-Orive, I. et al. Analysis of transcriptional variability in a large human iPSC library reveals genetic and non-genetic determinants of heterogeneity. Cell Stem Cell 20, 518–532.e9 (2017).
Article CAS PubMed Google Scholar
Kilpinen, H. et al. Common genetic variation drives molecular heterogeneity in human iPSCs. Nature 546, 370–375 (2017).
Article CAS PubMed PubMed Central Google Scholar
DeBoever, C. et al. Large-scale profiling reveals the influence of genetic variation on gene expression in human induced pluripotent stem cells. Cell Stem Cell 20, 533–546.e7 (2017).
Article CAS PubMed PubMed Central Google Scholar
Banovich, N. E. et al. Impact of regulatory variation across human iPSCs and differentiated cells. Genome Res. 28, 122–131 (2018).
Article CAS PubMed PubMed Central Google Scholar
Pashos, E. E. et al. Large, diverse population cohorts of hiPSCs and derived hepatocyte-like cells reveal functional genetic variation at blood lipid-associated loci. Cell Stem Cell 20, 558–570.e10 (2017).
Article CAS PubMed PubMed Central Google Scholar
Schwartzentruber, J. et al. Molecular and functional variation in iPSC-derived sensory neurons. Nat. Genet. 50, 54–61 (2018).
Article CAS PubMed Google Scholar
He, J. Q., Ma, Y., Lee, Y., Thomson, J. A. & Kamp, T. J. Human embryonic stem cells develop into multiple types of cardiac myocytes: action potential characterization. Circ. Res. 93, 32–39 (2003).
Article CAS PubMed Google Scholar
D’Antonio-Chronowska, A. et al. Human iPSC gene signatures and X chromosome dosage impact response to WNT inhibition and cardiac differentiation fate. Stem Cell Rep. (in the press).
Burridge, P. W. et al. Chemically defined generation of human cardiomyocytes. Nat. Methods 11, 855–860 (2014).
Article CAS PubMed PubMed Central Google Scholar
Panopoulos, A. D. et al. iPSCORE: a resource of 222 iPSC lines enabling functional characterization of genetic variation across a variety of cell types. Stem Cell Rep. 8, 1086–1100 (2017).
Article CAS Google Scholar
Kilpinen, H. et al. Coordinated effects of sequence variation on DNA binding, chromatin structure, and transcription. Science 342, 744–747 (2013).
Article CAS PubMed PubMed Central Google Scholar
Dupays, L. et al. Sequential binding of MEIS1 and NKX2-5 on the Popdc2 gene: a mechanism for spatiotemporal regulation of enhancers during cardiogenesis. Cell Rep. 13, 183–195 (2015).
Article CAS PubMed PubMed Central Google Scholar
Prall, O. W. et al. An Nkx2-5/Bmp2/Smad1 negative feedback loop controls heart progenitor specification and proliferation. Cell 128, 947–959 (2007).
Article CAS PubMed PubMed Central Google Scholar
Degner, J. F. et al. DNase I sensitivity QTLs are a major determinant of human expression variation. Nature 482, 390–394 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ward, L. D. & Kellis, M. HaploReg: a resource for exploring chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res. 40, D930–D934 (2012).
Article CAS PubMed Google Scholar
GTEx Consortium et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Article PubMed Central Google Scholar
Roselli, C. et al. Multi-ethnic genome-wide association study for atrial fibrillation. Nat. Genet. 50, 1225–1233 (2018).
Article CAS PubMed PubMed Central Google Scholar
Nielsen, J. B. et al. Biobank-driven genomic discovery yields new insight into atrial fibrillation biology. Nat. Genet. 50, 1234–1239 (2018).
Article CAS PubMed PubMed Central Google Scholar
Christophersen, I. E. et al. Fifteen genetic loci associated with the electrocardiographic P wave. Circ. Cardiovasc. Genet. 10, e001667 (2017).
Article CAS PubMed PubMed Central Google Scholar
Pickrell, J. K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559–573 (2014).
Article CAS PubMed PubMed Central Google Scholar
Eppinga, R. N. et al. Identification of genomic loci associated with resting heart rate and shared genetic predictors with all-cause mortality. Nat. Genet. 48, 1557–1563 (2016).
Article CAS PubMed Google Scholar
Butler, A. M. et al. Novel loci associated with PR interval in a genome-wide association study of 10 African American cohorts. Circ. Cardiovasc. Genet. 5, 639–646 (2012).
Article CAS PubMed PubMed Central Google Scholar
Sano, M. et al. Genome-wide association study of electrocardiographic parameters identifies a new association for PR interval and confirms previously reported associations. Hum. Mol. Genet. 23, 6668–6676 (2014).
Article CAS PubMed Google Scholar
Arking, D. E. et al. Genetic association study of QT interval highlights role for calcium signaling pathways in myocardial repolarization. Nat. Genet. 46, 826–836 (2014).
Article CAS PubMed PubMed Central Google Scholar
Holm, H. et al. Several common variants modulate heart rate, PR interval and QRS duration. Nat. Genet. 42, 117–122 (2010).
Article CAS PubMed Google Scholar
Ritchie, M. D. et al. Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk. Circulation 127, 1377–1385 (2013).
Article CAS PubMed PubMed Central Google Scholar
Van den Boogaard, M. et al. Genetic variation in T-box binding element functionally affects SCN5A/SCN10A enhancer. J. Clin. Invest. 122, 2519–2530 (2012).
Article CAS PubMed PubMed Central Google Scholar
Greenwald, W. W. et al. Subtle changes in chromatin loop contact propensity are associated with differential gene regulation and expression. Nat. Commun. 10, 1054 (2019).
Article PubMed PubMed Central CAS Google Scholar
Christophersen, I. E. et al. Large-scale analyses of common and rare variants identify 12 new loci associated with atrial fibrillation. Nat. Genet. 49, 946–952 (2017).
Article CAS PubMed PubMed Central Google Scholar
Ramasamy, A. et al. Genetic variability in the regulation of gene expression in ten regions of the human brain. Nat. Neurosci. 17, 1418–1428 (2014).
Article CAS PubMed PubMed Central Google Scholar
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
Article CAS PubMed PubMed Central Google Scholar
Samee, M. A. H., Bruneau, B. G. & Pollard, K. S. A de novo shape motif discovery algorithm reveals preferences of transcription factors for DNA shape beyond sequence motifs. Cell Syst. 8, 27–42.e6 (2019).
Article CAS PubMed PubMed Central Google Scholar
Afek, A., Schipper, J. L., Horton, J., Gordan, R. & Lukatsky, D. B. Protein–DNA binding in the absence of specific base-pair recognition. Proc. Natl Acad. Sci. USA 111, 17140–17145 (2014).
Article CAS PubMed PubMed Central Google Scholar
Slattery, M. et al. Absence of a simple code: how transcription factors read the genome. Trends Biochem. Sci. 39, 381–399 (2014).
Article CAS PubMed PubMed Central Google Scholar
Heinz, S. et al. Effect of natural genetic variation on enhancer selection and function. Nature 503, 487–492 (2013).
Article CAS PubMed PubMed Central Google Scholar
Johnson, A. D. et al. SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics 24, 2938–2939 (2008).
Article CAS PubMed PubMed Central Google Scholar
Hong, K. W. et al. Identification of three novel genetic variations associated with electrocardiographic traits (QRS duration and PR interval) in East Asians. Hum. Mol. Genet. 23, 6659–6667 (2014).
Article CAS PubMed Google Scholar
Van der Harst, P. et al. 52 genetic loci influencing myocardial mass. J. Am. Coll. Cardiol. 68, 1435–1448 (2016).
Article CAS PubMed PubMed Central Google Scholar
Evans, D. S. et al. Fine-mapping, novel loci identification, and SNP association transferability in a genome-wide association study of QRS duration in African Americans. Hum. Mol. Genet. 25, 4350–4368 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ellinor, P. T. et al. Meta-analysis identifies six new susceptibility loci for atrial fibrillation. Nat. Genet. 44, 670–675 (2012).
Article CAS PubMed PubMed Central Google Scholar
Jeff, J. M. et al. Generalization of variants identified by genome-wide association studies for electrocardiographic traits in African Americans. Ann. Hum. Genet. 77, 321–332 (2013).
Article PubMed PubMed Central Google Scholar
Bezzina, C. R. et al. Common variants at SCN5A-SCN10A and HEY2 are associated with Brugada syndrome, a rare disease with high risk of sudden cardiac death. Nat. Genet. 45, 1044–1049 (2013).
Article CAS PubMed PubMed Central Google Scholar
Ban, H. et al. Efficient generation of transgene-free human induced pluripotent stem cells (iPSCs) by temperature-sensitive Sendai virus vectors. Proc. Natl Acad. Sci. USA 108, 14234–14239 (2011).
Article CAS PubMed PubMed Central Google Scholar
Lian, X. et al. Directed cardiomyocyte differentiation from human pluripotent stem cells by modulating Wnt/β-catenin signaling under fully defined conditions. Nat. Protoc. 8, 162–175 (2013).
Article CAS PubMed Google Scholar
Tohyama, S. et al. Distinct metabolic flow enables large-scale purification of mouse and human pluripotent stem cell-derived cardiomyocytes. Cell Stem Cell 12, 127–137 (2013).
Article CAS PubMed Google Scholar
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article PubMed CAS Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Tarasov, A., Vilella, A. J., Cuppen, E., Nijman, I. J. & Prins, P. Sambamba: fast processing of NGS alignment formats. Bioinformatics 31, 2032–2034 (2015).
Article CAS PubMed PubMed Central Google Scholar
Tischler, G. & Leonard, S. biobambam: tools for read pair collation based algorithms on BAM files. Source Code Biol. Med. 9, 13 (2014).
Article PubMed Central Google Scholar
DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Article CAS PubMed PubMed Central Google Scholar
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Article CAS PubMed PubMed Central Google Scholar
Dobin, A. et al. STAR: ultrafast universal RNA-Seq aligner. Bioinformatics 29, 15–21 (2013).
Article CAS PubMed Google Scholar
Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
Article CAS PubMed PubMed Central Google Scholar
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Article PubMed PubMed Central CAS Google Scholar
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Article CAS PubMed Google Scholar
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Article CAS PubMed PubMed Central Google Scholar
Machanick, P. & Bailey, T. L. MEME-ChIP: motif analysis of large DNA datasets. Bioinformatics 27, 1696–1697 (2011).
Article CAS PubMed PubMed Central Google Scholar
Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013).
Article CAS PubMed PubMed Central Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biol. 15, 550 (2014).
PubMed PubMed Central Google Scholar
Van de Geijn, B., McVicker, G., Gilad, Y. & Pritchard, J. K. WASP: allele-specific software for robust molecular quantitative trait locus discovery. Nat. Methods 12, 1061–1063 (2015).
Article CAS PubMed PubMed Central Google Scholar
Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 43, 11.10.1-33 (2013).
Google Scholar
Mayba, O. et al. MBASED: allele-specific expression detection in cancer tissues and cell lines. Genome Biol. 15, 405 (2014).
Article PubMed PubMed Central Google Scholar
GTEx Consortium The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
Article PubMed Central CAS Google Scholar
Whitlock, M. C. Combining probability from independent tests: the weighted Z-method is superior to Fisher’s approach. J. Evol. Biol. 18, 1368–1373 (2005).
Article CAS PubMed Google Scholar
Schmidt, E. M. et al. GREGOR: evaluating global enrichment of trait-associated variants in epigenomic features using a systematic, data-driven approach. Bioinformatics 31, 2601–2606 (2015).
Article CAS PubMed PubMed Central Google Scholar
Stegle, O., Parts, L., Durbin, R. & Winn, J. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput. Biol. 6, e1000770 (2010).
Article PubMed PubMed Central CAS Google Scholar

Download references

Acknowledgements

This work was supported in part by California Institute for Regenerative Medicine grant (CIRM) GC1R-06673-B, NIH grants HG008118 and HL107442, and National Science Foundation grant 1728497. P.B. was supported by the Swiss National Science Foundation Postdoc Mobility fellowships P2LAP3-155105 and P300PA-167612. W.W.Y.G. was supported by the NHLBI under award number HL142151. C.D. was supported in part by the UCSD Genetics Training Program through an institutional training grant from the NIGMS under award number GM008666 and the CIRM Interdisciplinary Stem Cell Training Program at UCSD II (TG2-01154). Library preparation and sequencing services were conducted by K. Jepsen and M. Khosroheidari at the UCSD IGM Genomics Center, supported by NIH grant CA023100. N.S. was supported by NIH grants HL116747 and HL141989. K.J.G. was supported by NIH grant DK114650 and ADA grant 1-17-JDF-027. W.M., F.Y. and M.G.R. were supported by NIH grants DK018477 and DK039949. M.G.R. is a HHMI investigator. We are thankful to C.-A. Yen and N. Spann for assistance with the ChIP-Seq experiments, and to A. Schmitt for the Hi-C data. We thank A. Aguirre for performing immunofluorescence. We thank E. Farley and K. Olson for help with reporter assays. We thank many colleagues for helpful comments.

Author information

These authors contributed equally: Agnieszka D’Antonio-Chronowska, Wubin Ma.

Authors and Affiliations

Department of Pediatrics, Rady Children’s Hospital, Division of Genome Information Sciences, University of California, San Diego, La Jolla, CA, USA
Paola Benaglio, Sanghamitra Singhal, Kyle J. Gaulton, Erin N. Smith & Kelly A. Frazer
Institute for Genomic Medicine, University of California, San Diego, La Jolla, CA, USA
Agnieszka D’Antonio-Chronowska, He Li, Frauke Drees, Hiroko Matsui, Matteo D’Antonio & Kelly A. Frazer
Howard Hughes Medical Institute, Department of Medicine, University of California, San Diego, La Jolla, CA, USA
Wubin Ma, Feng Yang & Michael G. Rosenfeld
Bioinformatics and Systems Biology, University of California, San Diego, La Jolla, CA, USA
William W. Young Greenwald, Margaret K. R. Donovan & Christopher DeBoever
Department of Biomedical Informatics, University of California, San Diego, La Jolla, CA, USA
Margaret K. R. Donovan
Department of Cardiology, University Medical Center Utrecht, University of Utrecht, Utrecht, the Netherlands
Jessica van Setten
Department of Medicine, Cardiovascular Health Research Unit, Division of Cardiology, University of Washington, Seattle, WA, USA
Nona Sotoodehnia
Department of Epidemiology, Cardiovascular Health Research Unit, Division of Cardiology, University of Washington, Seattle, WA, USA
Nona Sotoodehnia

Authors

Paola Benaglio
View author publications
You can also search for this author in PubMed Google Scholar
Agnieszka D’Antonio-Chronowska
View author publications
You can also search for this author in PubMed Google Scholar
Wubin Ma
View author publications
You can also search for this author in PubMed Google Scholar
Feng Yang
View author publications
You can also search for this author in PubMed Google Scholar
William W. Young Greenwald
View author publications
You can also search for this author in PubMed Google Scholar
Margaret K. R. Donovan
View author publications
You can also search for this author in PubMed Google Scholar
Christopher DeBoever
View author publications
You can also search for this author in PubMed Google Scholar
He Li
View author publications
You can also search for this author in PubMed Google Scholar
Frauke Drees
View author publications
You can also search for this author in PubMed Google Scholar
Sanghamitra Singhal
View author publications
You can also search for this author in PubMed Google Scholar
Hiroko Matsui
View author publications
You can also search for this author in PubMed Google Scholar
Jessica van Setten
View author publications
You can also search for this author in PubMed Google Scholar
Nona Sotoodehnia
View author publications
You can also search for this author in PubMed Google Scholar
Kyle J. Gaulton
View author publications
You can also search for this author in PubMed Google Scholar
Erin N. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Matteo D’Antonio
View author publications
You can also search for this author in PubMed Google Scholar
Michael G. Rosenfeld
View author publications
You can also search for this author in PubMed Google Scholar
Kelly A. Frazer
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P.B. designed the study, generated the ChIP-Seq and RNA-Seq data, and performed the statistical analyses. A.D’A.-C. generated the iPSC-CMs, ChIP-Seq, ATAC-Seq and RNA-Seq data, and performed the EMSA. W.M. generated the constructs for luciferase assay and CRISPRi, and performed the luciferase assays. F.Y. performed the CRISPRi experiments. W.W.Y.G. implemented the fgwas analysis pipeline. C.D. implemented the RNA-Seq, ATAC-Seq and ASE analysis pipelines. H.L. processed the WGS and ChIP-Seq data. F.D. and S.S. generated iPSC-CMs and contributed to data generation. M.K.R.D. and H.M. performed data processing and computational analyses. N.S. and J.v.S. provided summary statistics for the PR interval GWAS. K.J.G. supervised the EMSA experiments. M.D’A. and E.N.S. performed statistical analyses. M.G.R. supervised the experimental validation of the variants. K.A.F. conceived and oversaw the study. P.B., E.N.S. and K.A.F. prepared the manuscript.

Corresponding authors

Correspondence to Michael G. Rosenfeld or Kelly A. Frazer.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Characterization of iPSC-CMs by flow cytometry, immunofluorescence and gene expression.

(a) Percentage of cells positive for the cardiomyocyte-specific marker TNNT2 analyzed by flow cytometry on iPSC-CMs generated in this study and collected at either day 15 (n = 15 independent samples) or day 25 and purified with L-lactate selection (n = 12 independent samples). The day 15 sample that was lactate purified is indicated. Plot lines indicate median, lower and upper quartiles. (b) Confocal microscopy of immunofluorescence for TNNT2 and MYL7 in one replicate of iPSCORE_2_1 day 15 iPSC-CMs at two different magnifications. Similar staining was observed for other two day15-iPSC-CM samples (subjects iPSCORE_2_3 and iPSCORE_2_9, not shown). (c) Hierarchical clustering and heatmap of RNA-seq expression data from 61 selected cell-type specific genes. In addition to the 56 samples from this study, Roadmap RNA-seq data from stem cell lines (H1, HUES64, iPS-20b and iPS-18) and human tissues (right ventricle, left ventricle, right atrium and fetal heart) are included as reference samples. Gene expression values are reported as vst-normalized read counts.

Supplementary Figure 2 Epigenetic profile differences between iPSCs and iPSC-CMs.

(a) Tag heatmap of sequence coverage at the TSSs of differentially expressed genes (n = 5,307 DEGs, minimum log₂ fold change = 2, FDR < 0.05, DESeq2), listed in decreasing order of log₂ ratio of expression in iPSCs versus iPSC-CMs. The density plot at the top shows the average normalized coverage (range 0-1) across iPSC-upregulated genes (2,444 genes, green) and iPSC-CM-upregulated genes (2,863 genes, purple). For a 2-kb window centered at the TSS, coverage values for 25-bp bins were obtained by combining all samples of each data type and normalized to a total of 10⁷ reads. (b-f) Heatmap and hierarchical clustering of similarity based on overlap between enhancer annotations (from 25-state ChromHMM) for 127 samples from Roadmap Epigenomics (Ernst, J. and M. Kellis, Nat. Methods 9, 215-216, 2012; Kundaje, A. et al. Nature 518, 317-330, 2015) and ATAC-seq peaks or ChIP-Seq peaks from iPSCs (b, c) or iPSC-CMs (d-f) combined samples. Similarity was calculated using the Jaccard statistic (intersection/union of base pairs in each comparison), mean-centered across the 127 tissues and ordered by highest average enhancer similarity. A zoom in of the top 10 Roadmap tissues with highest average enhancer similarity is shown.

Supplementary Figure 3 Analysis of variation of molecular phenotypes across samples.

(a, c, e, g, i) Plots of the two first principal components (PC) calculated on all genes/peaks identified in each RNA-seq (a, iPSCs: 29 independent samples; e, iPSC-CMs: 26 independent samples and 1 technical replicate) or ChIP-seq dataset (c, H3K27ac in iPSCs: 17 independent samples and 4 technical replicates; g, H3K27ac in iPSC-CMs: 25 independent samples and 2 technical replicates; i, NKX2-5: 12 independent samples and 3 technical replicates). Samples are color-coded by subject and distinguished by a different symbol representing a different batch for iPSC (cultured, collected and sequenced at different times) or a different protocol (day 15 vs. day 25) for iPSC-CMs differentiation. Samples from the same individual are considered independent if iPSC lines were cultured -or iPSC-CMs were differentiated- at different times, and are considered technical replicates (indicated by A and B) if the assays were performed on cell material collected from the same sample. (b, d, f, h, j) Tables showing adjusted r-squared values from ANOVA tests are reported as a measure of association between PC1 to PC10 and different covariates in each dataset. Tables are color-coded according to –log₁₀ of ANOVA P-values. (k-o) Plots of the average Spearman correlation coefficients between pairs of samples across the 1,000 most variable genes or peaks for the indicated molecular phenotypes. Each dot corresponds to the per-sample pairwise correlation coefficient averaged across samples from either the same subject or different subjects. Technical replicates were excluded for the comparisons between samples of the same subject. Plot lines indicate median, lower and upper quartiles. P-values from one-tailed Mann-Whitney test are shown.

Supplementary Figure 4 Comparison of number and effects of ASE-SNVs identified in ChIP-seq and ATAC-seq.

(a) Number of uniquely mapped reads for each data type and subject (n = 7). Each open circle corresponds to merged reads from different samples of the same subject. (b) Average FrIP across different samples of the same subject in each data type. (c) Median number of reads at all heterozygous SNVs tested for ASE in each individual. (d,e) Scatterplot showing increase in the median SNV coverage as a function of the number of mapped reads (d) and of FrIP (e) in each subject (n = 7). Continuous lines indicate that the relation is significant (linear regression, P < 0.05). (f) Scatterplot showing increase in the fraction of identified ASE-SNV in each subject (FDR < 0.05), as a function of the median SNV coverage. Continuous lines indicate that the relation is significant (linear regression, P < 0.05). (g) Distribution of the mean SNV coverage between subjects, across all SNVs analyzed in each data type (number of SNVs in each distribution from left to right: 30,463, 26,201, 116,898, 123,151 and 19,371). Median values are shown as white dots within the violin plot and are indicated. (h) Distribution of the number of ASE-SNVs (FDR < 10%) in 100 samplings of 100 SNVs with the same coverage in the different data types. (i) Distribution of the mean ASE effect sizes (allele frequencies) of the 100 SNVs samples shown in h. Median values are shown. Boxplot elements: median (thick line), lower and upper quartiles (box), maximum and minimum (wiskers). (j-l) Scatterplot showing correlation of ASE effects of the same SNV between peaks from different data types in iPSC-CMs: (j) NKX2-5 vs. ATAC-seq peaks; (k) NKX2-5 vs. H3K27ac peaks; (l) ATAC-seq vs. H3K27ac peaks. The union of significant ASE-SNVs in each pair of datasets is shown, with ASE effects expressed as the proportion of the reference allele at heterozygous SNVs. Gray dots denote ASE-SNVs significant (binomial test, FDR < 0.05) only in the peaks indicated in the x-axis, blue dots in the y-axis, and green dots in both. Pearson correlation coefficient (r), number of SNVs (n) and P-values are indicated.

Supplementary Figure 5 Correlation between ASE and motif disruption in NKX2-5 ChIP-seq.

Scatterplots showing relationship between the proportion of reads for the reference allele at ASE-SNVs in the NKX2-5 ChIP-seq data and the difference in motif strength between the reference and alternate allele. Spearman correlation statistics (r and P-value) and the number of motif-altering ASE-SNVs (n) are indicated. The 12 most enriched families of motifs in NKX2-5 peaks (Supplementary Table 4) were tested. TFBS motifs that were strengthened (red) or weakened (blue) by the preferred allele of ASE-SNVs are indicated.

Supplementary Figure 6 Analysis of enrichment of ChIP-seq and ATAC-seq peaks for GWAS SNPs.

(a) Heatmap of enrichment for GWAS SNPs in ChIP-seq and ATAC-seq peaks from iPSCs and iPSC-CMs combined samples from this study as well as in peaks from cardiac tissues from Roadmap (DHS of fetal heart, H3K27ac of right ventricle and right atrium). Heatmap is ordered by the most enriched GWAS traits on average in the cardiac datasets (iPSC-CMs and Roadmap tissues) and shows fold change values for significant enrichment at FDR corrected P-value < 0.05. A total of 125 GWAS traits were tested for enrichment, and the corresponding number of independent SNPs is given in parenthesis. The statistical enrichment test was performed using GREGOR software. (b-f) Volcano plots showing -log₁₀ P-values (y-axis) and fold enrichment (x-axis) for GWAS loci showed in a, indicating the position of the 6 electrocardiographic traits. Red symbols indicate significant enrichment at FDR corrected P-value < 0.05. The iPSC-CMs NKX2-5, H3K27ac and ATAC-seq enrichment plots are shown in Fig. 5a–c.

Supplementary Figure 7 Enrichment of GWAS signals within iPSC-CM functional annotations using fgwas single state models.

Fgwas natural log fold enrichment of iPSC-CM genomic annotations (y-axis) in heart rate (den Hoed, M. et al. Nat. Genet. 45, 621-631, 2013), atrial fibrillation (Christophersen, I. E. et al. Nat. Genet. 49, 946-952, 2017), and PR interval (van Setten, J. et al. Nat. Commun. 9, 2904, 2018) GWAS signals. Solid circles indicate significant enrichment (defined as 95% CI above zero) and the bars indicate the 95% confidence intervals. The genomic annotations include NKX2-5 ASE-SNVs, NKX2-5 peaks, ATAC-seq peaks, H3K27ac peaks and H3K27ac ASE-SNVs. Peaks were called by combining all samples from each data type. The number of SNPs analyzed for each GWAS was: 2,516,407 SNPs for heart rate, 11,779,664 SNPs for atrial fibrillation, and 2,712,310 SNPs for PR interval.

Supplementary Figure 8 Functional characterization of candidate causal variants at four loci associated with heart rate.

(a, d, f, h) For each of the four loci, the top panel shows the regional plot of association P-values with heart rate (den Hoed, M. et al. Nat. Genet. 45, 621-631, 2013); SNPs are color coded based on r² values from the 1000 Genome Project CEU population (Johnson, A. D. et al. Bioinformatics, 24, 2938-2939, 2008); lead GWAS variants in the locus are indicated by a diamond. The second panel shows the posterior probability of causality (PPA) of the variants in the locus calculated using fgwas, and panels three through five show epigenetic tracks from iPSC-CM combined samples (NKX2-5, ATAC-seq and H3K27a). The bottom panel shows the Roadmap fetal heart ChromHMM and genes from UCSC genome browser (conventional ChromHMM color code). For d and h, the bottom panel shows the locus at lower scale. For d, f and h, the locations of Hi-C loops from iPSC-CM are shown in red. For the candidate causal variants (turquoise lines), the allelic imbalance (pie chart) of NKX2-5 ASE and FRD-corrected P-values are shown; for a, the altered TF motif is shown. Significant associations (P < 0.05, linear regression) between putative variants genotypes and normalized gene expression of candidate genes in iPSC-CMs from 128 different individuals from iPSCORE are shown (c, e, g, i). Boxplot elements: median (thick line), lower and upper quartiles (box), maximum and minimum (whiskers). (b) EMSA with iPSC-CM nuclear extract using probes containing both allelic variants of rs7612445. An independent replicate is shown in Supplementary Figure 9.

Supplementary Figure 9 Electrophoretic mobility shift assay (EMSA) for NKX2-5 ASE-SNVs.

(a-c) Second independent replicate of EMSA with iPSC-CMs nuclear extract using probes containing two allelic variants of rs590041 (a), rs3807989 (b), and rs7612445 (c). (d) Original (not cropped) blots of all presented EMSAs. The figures and supplementary figures where we showed the corresponding cropped versions are indicated.

Supplementary Figure 10 Experimental validation using luciferase assays and CRISPRi in iPSC-CMs.

(a) Representative fluorescence microscopy image showing that in the luciferase assays we achieved approximately 70% transfection efficiency of a GFP-over-expressing plasmid in iPSC-CMs. Efficiency was measured once. (b) Test of gRNA efficiency for CRISPR system in HEK293T cells. HEK293T were transfected with two vectors: one containing two gRNAs targeting the indicated SNP (2sgRNA-ccdB-EF1a-Puromycin) and the other expressing Cas9 (Lenti-Cas9-Blast, Addgene #52962). The gRNAs for SNPs rs590041 (SSBP3 intron) and rs3807989 (CAV1 intron) showed additional bands corresponding to targeted deletions (arrows) and were used for CRISPRi experiments in iPSC-CMs. The gRNAs for SNPs rs7612445 (GNB4) and rs8044595 (MYH11) did not show additional bands and were not used for CRISPRi. Efficiency was tested once. (c,d) qPCR expression of SSBP3 (c) or CAV1 and CAV2 (d) in iPSC-CMs (id: iPSCORE_1_57) stably expressing dCas9-KRAB (CRISPRi) and either a control guide RNA (gCTL) or two guide RNAs targeting the region encompassing rs590041 (c) or rs3807989 (d). Bars and error bars represent the mean and the standard deviation from three qPCR measurements, respectively; two-tailed t-test P-values are shown. Similar results were obtained in an independent cell line presented in the main manuscript (Fig. 5i and Fig. 7h).

Supplementary Figure 11 UCSC genome browser screenshots showing quality of ChIP-seq and ATAC-seq data.

Bedgraph tracks are shown for one representative sample from each of the 7 individuals for H3K27ac ChIP-seq: (a) iPSCs and (b) iPSC-CMs; (c) iPSCs and iPSC-CMs samples from 3 individuals are shown for ATAC-seq; and (d) iPSC-CMs for 7 individuals for NKX2-5.

Supplementary information

Supplementary Information

Supplementary Figs. 1–11, Tables 1, 3 and 7, and Note

Reporting Summary

Supplementary Table 2

Metadata and per-sample sequence data metrics

Supplementary Table 4

Motif enrichment analysis of ATAC-Seq and ChIP-Seq peaks

Supplementary Table 5

Annotation of SNVs showing ASEs in ChIP-Seq and ATAC-Seq datasets

Supplementary Table 6

Results from fgwas fine-mapping analysis of heart rate, atrial fibrillation and PR interval GWAS studies using iPSC-CM functional genomics data

Rights and permissions

Reprints and permissions

About this article

Cite this article

Benaglio, P., D’Antonio-Chronowska, A., Ma, W. et al. Allele-specific NKX2-5 binding underlies multiple genetic associations with human electrocardiographic traits. Nat Genet 51, 1506–1517 (2019). https://doi.org/10.1038/s41588-019-0499-3

Download citation

Received: 21 February 2018
Accepted: 15 August 2019
Published: 30 September 2019
Issue Date: October 2019
DOI: https://doi.org/10.1038/s41588-019-0499-3

This article is cited by

Stem cell models of inherited arrhythmias
- Tammy Ryan
- Jason D. Roberts
Nature Cardiovascular Research (2024)
Current advances in human-induced pluripotent stem cell-based models and therapeutic approaches for congenital heart disease
- Meiling Cao
- Yanshan Liu
- Hongkun Jiang
Molecular and Cellular Biochemistry (2024)
Complex regulatory networks influence pluripotent cell state transitions in human iPSCs
- Timothy D. Arthur
- Jennifer P. Nguyen
- Kelly A. Frazer
Nature Communications (2024)
Single-cell genomics improves the discovery of risk variants and genes of atrial fibrillation
- Alan Selewa
- Kaixuan Luo
- Sebastian Pott
Nature Communications (2023)
Fine mapping spatiotemporal mechanisms of genetic variants underlying cardiac traits and disease
- Matteo D’Antonio
- Jennifer P. Nguyen
- Kelly A. Frazer
Nature Communications (2023)