Electrocardiographic PR interval measures atrio-ventricular depolarization and conduction, and abnormal PR interval is a risk factor for atrial fibrillation and heart block. Our genome-wide association study of over 92,000 European-descent individuals identifies 44 PR interval loci (34 novel). Examination of these loci reveals known and previously not-yet-reported biological processes involved in cardiac atrial electrical activity. Genes in these loci are over-represented in cardiac disease processes including heart block and atrial fibrillation. Variants in over half of the 44 loci were associated with atrial or blood transcript expression levels, or were in high linkage disequilibrium with missense variants. Six additional loci were identified either by meta-analysis of ~105,000 African and European-descent individuals and/or by pleiotropic analyses combining PR interval with heart rate, QRS interval, and atrial fibrillation. These findings implicate developmental pathways, and identify transcription factors, ion-channel genes, and cell-junction/cell-signaling proteins in atrio-ventricular conduction, identifying potential targets for drug development.
The PR interval on the surface electrocardiogram reflects atrial and atrioventricular node myocyte depolarization and conduction. Abnormalities in PR interval duration are associated with increased risk of atrial fibrillation (AF), which carries a substantial risk of morbidity and mortality, and with cardiac conduction defects and heart block, conditions that often necessitate pacemaker implantation1. Understanding the molecular mechanisms affecting the PR interval may provide insights into cardiac electrical disease processes, and identify potential drug targets for prevention and treatment of AF and conduction disease.
Twin and family studies suggest that the heritability of PR interval is between 40 and 60%2. Prior genome-wide association studies (GWAS) in up to 30,000 individuals have identified ten loci associated with PR interval among European-descent individuals3, 4. The key motivation for the present study is to extend the biological and clinical insights derived from GWAS data by utilizing the largest sample size to detect novel PR loci genome-wide. We further increase power by performing trans-ethnic meta-analyses. To gain additional biological and clinical insights, we test for pleiotropy with other clinically relevant phenotypes. We examine the biological and functional relevance of identified associations to gain insights into molecular processes underlying clinically important phenotypes.
Our GWAS of over 92,000 European-descent individuals identifies 44 loci (34 novel) associated with PR interval. Examination of the 44 loci revealed known and novel biological processes involved in cardiac atrial electrical activity, including cardiac sodium channels, transcription factors involved in cardiac development, and sarcomeric-related proteins. Ten of the 61 non-redundant variants in these 44 loci are in high linkage disequilibrium (LD) with one or more missense variants. More than half of the index variants influence transcript expression levels as measured in the atria or in blood, with the regulation of certain genes found only in atrial tissue. Indeed, cardiac regulatory regions of the genome as measured by cardiac DNA hypersensitivity sites are enriched for variants associated with PR interval, compared to non-cardiac regulatory regions. Genes in the 44 loci are highly over-represented in a number of disease processes, including sick sinus syndrome, heart block, and AF. This motivated us to perform pleiotropic analyses, where we jointly analyze the phenotypes of PR–heart rate; PR–QRS interval (a measure of ventricular conduction); and PR–AF, and identify an additional three new pleiotropic loci. More than half of the single nucleotide polymorphisms (SNPs) identified show evidence of pleiotropy with other electrophysiologic phenotypes; SNPs that influence atrial conduction also influence ventricular conduction, atrial arrhythmias, and/or heart rate. Trans-ethnic analyses suggest that the majority of the associations derived from European-descent population are also present in African-American population. Meta-analysis examining over 100,000 individuals of African and European descent identifies five novel PR loci (two of which are also identified by pleiotropic analyses). These findings underscore the power of GWAS to extend knowledge of the molecular underpinnings of clinical processes.
PR interval meta-analysis of genome-wide association studies
We meta-analyzed ~2.7 million SNPs from GWAS data on 92,340 individuals of European ancestry from 31 studies (Supplementary Data 1 and 2) for association with PR interval using an additive genetic model. A total of 1601 SNPs mapping to 44 loci (of which 34 were novel in Europeans) reached genome-wide significance (P ≤ 5 × 10−8) (Fig. 1, Tables 1 and 2, Supplementary Figures 1 and 2). The genomic inflation factor lambda was 1.11 and LD score regression5 showed that that the inflation of the test statistic was mainly caused by true polygenicity (Supplementary Fig. 3). Using a Bayesian locus-based test of association (GWiS)6, we identified 61 non-redundant signals in the 44 loci (listed in Supplementary Data 4). For example, the top locus on chromosome 3, a known cardiac conduction locus mapping to the two cardiac sodium channel genes SCN5A and SCN10A3, 4, 7, 8, had seven non-redundant signals associated with PR interval (Fig. 2a).
Putative functional variants
To assess the functional relevance of the identified SNPs, we used the 1000G reference panel to identify variants in high LD with the index SNPs. We then examined whether those variants were either nonsynonymous variants or fell within putative regulatory regions. Ten of the 44 loci had missense variants in high LD (r2 > 0.8) with the index SNP (Tables 1 and 2, Supplementary Data 4). TTN, in particular, was enriched for missense SNPs, with the top signal and approximately one-third of the 47 genome-wide significant TTN SNPs being missense (Fig. 2b). To examine the possible impact of these amino acid substitutions on protein structure or function, we used two prediction algorithms, Sift9 and PolyPhen-210. The vast majority of the genome-wide significant missense variants at the 44 loci were categorized as tolerated by Sift and benign by PolyPhen-2, consistent with modest effects on PR interval not subjected to purifying selection (Supplementary Data 4).
Expression quantitative trait locus (eQTL) analysis using RNA-seq data suggests that index SNPs in nearly half of the identified loci (20/44) are associated with cis gene expression in cardiac left atrial appendage (LAA) tissue (n = 230) (Supplementary Data 5). Of these, we identified co-localizing variants that jointly affect both molecular expression and the PR phenotype to provide intuition regarding the candidate gene that may play a role in atrial conduction (Supplementary Data 6). Several points are worth highlighting. First, for most of the loci, the eQTL associations are for the gene nearest the index SNP, but for nearly one-quarter, they are not. Second, certain SNPs can be promiscuous in that they are associated with the transcript expression of multiple different genes (e.g., rs6599250 is associated with both SCN5A and SCN10A expression). Third, most of the eQTL associations found in cardiac atrial tissue—e.g., associations with SCN10A, MEIS1, CAV1, FAT1, and TMEM182 transcripts—were not found in whole blood samples, even at nominal significance (Supplementary Data 6), despite larger sample sizes (n = 369 GTEx blood samples using RNA-seq and n = 5311 blood samples assayed by Illumina gene expression array). Similarly, while most cardiac eQTLs were identified in both atrial and ventricular tissue, TMEM182 and CAV1 eQTL associations are identified only in atrial tissue (Supplementary Data 6). Fourth, certain SNPs are associated with transcript expression levels of different genes, depending on the tissue being examined. For instance, rs2732860 was associated with TMEM182 expression in atrial tissue but with MFSD9 expression in blood, again suggesting tissue-specificity for SNP–eQTL associations. Taken together, these data underscore the importance of examining eQTL data in tissue types relevant to the trait of interest: even with a modest study size of 230 cardiac atrial samples, a notable number of eQTL associations were uncovered.
The majority of loci (30/44) contain index SNPs that lie in, or are in high LD with, regulatory regions of the genome that are marked by deoxyribonuclease I (DNAse I) hypersensitivity sites (DHSs), lending further support to the hypothesis that regulation of gene-expression plays an important role in determining PR interval (Tables 1 and 2). To provide insight into the cellular and tissue structure of the phenotype, we examined P-values of all SNPs in the PR meta-analysis and assessed cell- and tissue-selective enrichment patterns of progressively more strongly associated variants to explore localization of signal within specific lineages or cell types. As would be expected for the cardiac phenotype of PR interval, we found enrichment of signal in cardiac DHSs compared with DHSs from other tissue types (Supplementary Fig. 5). Interestingly, the second most enriched tissue DHSs were in those that regulate microvascular endothelial cells, complementing our findings (noted below) that there is an enrichment in genes involved in blood vessel morphogenesis. These findings possibly reflect the involvement of an overlapping set of transcription factors (e.g., CAV1, NKX2-5, EFNA1, FGFR1, MEIS1, TBX5, WNT11, TBX20, ARHGAP24, and MYOCD) influencing both cardiac and vascular development during mesodermal differentiation and development.
Molecular function and biological processes of PR genes
Although extensive LD among common variants and the incompleteness of the HapMap reference panel preclude an unambiguous identification of the functional variant or the culprit gene, we used the following criteria to implicate genes in 37 of the 44 loci: (1) the gene selected was the only nearby gene (within a ±500 kb window); (2) the gene selected has a missense variant in high LD (r2 > 0.8) with the index SNP; or (3) the index SNP was associated with cardiac transcript expression levels of the selected gene (Tables 1 and 2). The set of implicated genes, detailed in Supplementary Note 3, showed strong enrichment for genes (permutation false discovery rate (FDR) < 1.0 × 10−4) involved in cardiac development, specifically the cardiac chambers and His-Purkinje system development (Supplementary Data 3). Other notable biological processes include the development of the vasculature and cardiac myocyte cell differentiation. The molecular function and cellular component of the identified genes were largely enriched for transcription factors, ion-channel related genes, and cell junction/cell signaling proteins (Supplementary Data 3).
Clinical relevance of PR-associated loci
To examine the clinical relevance of the identified loci, we intersected the PR genes with gene membership from multiple knowledge bases encompassing over 4000 human diseases. The most highly over-represented conditions (permutation FDR < 1.0 × 10−3) are heart diseases including congenital abnormalities and heart failure, sick sinus syndrome and sinus arrhythmia (phenotypes related to the sinus node which houses the pacemaker cells that generate heartbeats), heart block (related to cardiac conduction between atria and ventricles), and AF (Supplementary Data 7).
We examined the significant PR-associated SNPs for their effect on heart rate11, QRS interval (measure of ventricular conduction)12, and AF13. All 61 non-redundant SNPs from 44 independent loci were examined. Over half of the non-redundant SNPs (31/61) representing 20 loci were also associated with at least one of the other electrical phenotypes (Supplementary Data 8, Fig. 3). The cardiac sodium channel genes, SCN5A and SCN10A, clearly play a critical role in cardiac electrophysiology. PR prolonging variants in these genes are also associated with prolonged QRS duration, and some though not all variants at this locus are associated with lower heart rate, and lower risk of AF (Fig. 3). The role of transcription factors in cardiac electrophysiology is equally complex. Several T-box containing transcription factors, important for cardiac conduction system formation in the developing heart, are associated with PR interval. Although TBX3 and TBX5 sit close together on chromosome 12, the top PR prolonging allele in TBX5 prolongs QRS and decreases AF risk while the top PR prolonging allele in TBX3 shortens QRS duration while also decreasing AF risk. The PR prolonging variant near TBX20 prolongs QRS duration but is not associated with AF risk (Fig. 3). Overall, eight of the 13 transcription factor genes associated with PR interval were also associated with at least one other atrial or atrioventricular electrical phenotype.
PR interval prolongation may reflect conduction disease, and prolonged PR interval is a risk factor for pacemaker implantation (~25% increase in the risk of pacemaker implantation for each 10 ms increase in PR duration above the median in the Copenhagen Study)14. We examined whether PR prolonging variants were associated with higher risk of pacemaker implantation among ~370,000 individuals from the UKBiobank, of whom 1074 require pacemaker implantation. Using Mendelian randomization, we show that while prolongation of PR interval is causally related to pacemaker implantation, the MR estimate of the causal effect is smaller (OR = 1.14/10 ms increase in PR interval) than the effect size seen observationally for PR on pacemaker implantation, suggesting that acquired factors such as heart disease may also play an important role (Supplementary Fig. 6).
PR and QRS intervals
Many loci regulate both atrial/atrioventricular (PR interval) and ventricular (QRS) depolarization and conduction: 12 of the 44 PR loci were nominally associated with QRS duration (Supplementary Data 7) and, similarly, 17 of 22 previously identified QRS loci were at least nominally associated with PR interval (Supplementary Data 9). Several intriguing findings are worth highlighting. First, while SNPs in most loci that are associated with prolonged PR are also associated with prolonged QRS, two loci have genome-wide significant discordant PR–QRS relationships, in which prolonged PR variants are associated with shorter QRS duration (TBX3 and SNORD56B); Supplementary Data 8, Fig. 3, Supplementary Fig. 7b. Second, although TBX20 plays a crucial role in the development of the cardiac conduction system, the SNPs that are associated with atrial and atrioventricular conduction (PR) differ from those related to ventricular conduction (QRS) (index SNP PR rs11763856, index SNP QRS rs1419856, r2 = 0.001). A better understanding of the influence of these specific regions on cardiac conduction will require further investigation.
PR interval and atrial fibrillation
One-third (18/61) of PR index SNPs were nominally associated with AF. For six of these SNPs, the alleles associated with prolonged PR are associated with increased AF risk, whereas for 12, the alleles associated with prolonged PR are paradoxically associated with lower AF risk. This observation is consistent with the relationship between PR interval and AF described in population studies, where we showed that while both short (<120 ms) and long (>200 ms) PR intervals are associated with increased AF risk, short PR interval is associated with higher risk than long PR interval13. For both concordant (meaning relationships where the PR prolonging variant is associated with increased AF risk) and discordant PR–AF relationships, the larger the SNP effect size for PR interval, the larger the odds ratio for association with AF (Supplementary Fig. 7a). The CAV1 index SNP associated with increased PR interval and decreased AF risk reached genome-wide significance for both phenotypes. Furthermore, of 23 previously described AF GWAS loci, 11 were at least nominally associated with PR interval15. Interestingly, despite adequate power to identify modest associations, several loci, including PITX2, the most significant AF GWAS locus, showed no association with PR interval, though a prior report found modest nominal association with P-wave duration (Supplementary Data 9)16. Therefore, these loci may have a mode of action independent of atrial and atrioventricular depolarization or conduction.
PR interval and heart rate
Ten PR loci were nominally associated with heart rate, including two sarcomeric proteins, MYH6 and TTN. At the MYH6 locus, a known heart rate locus4, variant rs365990 is associated with prolonged PR interval and with slower heart rates, whereas a non-redundant second MYH6 signal (<20 kb away; rs11465506), the allele associated with prolonged PR is associated with faster heart rates. We then examined heart rate SNPs for association with PR and found half of the heart rate SNPs were associated with PR interval, with both concordant and discordant effects. Adjusting for heart rate in the regression model did not impact the effect size or significance of the PR–genotype associations, even though resting heart rate is modestly associated with PR interval (Supplementary Fig. 8).
Cross-trait genome-wide meta-analyses
Finally, we performed joint phenotype analyses, with PR–heart rate, PR–QRS, and PR–AF as outcomes, to increase the power of finding loci involved in cardiac electrical activity. As described above, prolonged PR variants can have either a concordant or discordant association with another electrical phenotype. Therefore, we modeled the outcome for each joint analysis in two ways: with a variant having a concordant effect on PR–QRS, PR–HR, and PR–AF, and a discordant effect (Supplementary Figures 6a–c). These analyses yielded three novel loci associated with atrial electrical activity: one related to atrial and ventricular conduction (from PR–QRS analyses) and two related to atrial electrical activity and arrhythmias (from PR–AF analyses); (Supplementary Tables 1 and 2, Supplementary Note 3, Supplementary Fig. 9). Additional support for association of these loci were obtained by an analysis limited to cardiac DHS sites, and by trans-ethnic meta-analysis with African Americans, described below, lending further support to the validity of these associations (Supplementary Table 1, Supplementary Fig. 9).
Our study had less power to find associations among African Americans (n = 13,415) than among European-descent individuals (n = 92,340). Nonetheless, 16 of the 44 European-identified loci nominally replicated among African Americans, suggesting that a large proportion of genetic associations with PR interval are shared between the two ethnic groups (Supplementary Data 9). For European-descent GWAS PR SNPs at least nominally associated with PR among African Americans, the estimated effect was in the same direction for the two population (Supplementary Fig. 7d).
Examining only the index signal may underestimate the true number of locus associations that replicate. Differences in LD structure between the genomes of individuals of European descent and those of African American descent cause dissimilar patterns of SNPs associated with PR interval. For instance, the TBX5 locus index SNP rs6489953 is part of a large LD block associated with PR interval among individuals of European descent. This SNP is not significantly associated with PR interval among African Americans (beta = 0.04, P = 0.90, Supplementary Data 8, Fig. 2c). There is, however, a strong SNP–PR association signal in the TBX5 among African Americans (index SNP rs7955405, beta = 1.16, P = 9.2 × 10−16 in African Americans), Fig. 2c. This SNP is in high LD with rs6489953 among European descent individuals (HapMap CEU r2 = 0.62), but not among the population from African descent (HapMap YRI r2 = 0.03). Hence, examination of only the top European descent index signal would miss the association among African Americans. Furthermore, interrogation of the TBX5 locus among African Americans narrows the association block, allowing for fine mapping of the association signal (Fig. 2c). A second noteworthy interethnic difference is that there are SNP associations among those of European descent, for instance, rs1896312 in TBX3, where despite adequate power, no association could be established among African Americans (Fig. 2c).
Our trans-ethnic GWAS meta-analysis of PR interval among 13,415 African Americans and 92,340 European-ancestry individuals identified five additional novel loci associated with atrial and atrioventricular conduction (PR interval) (Supplementary Table 1, Supplementary Fig. 9).
Our GWAS meta-analytic study of over 92,000 individuals of European ancestry identified 44 loci associated with cardiac atrial and atrioventricular conduction (PR interval). The implicated genes at these loci show strong enrichment for genes involved in processes related to cardiac conduction, namely, cardiovascular system development and, specifically, in development of the cardiac chamber and bundle of His. Similarly, diseases overrepresented by these genes are processes related to arrhythmias and heart block, consistent with the current knowledge of the physiology and epidemiology of cardiac atrial conduction.
Using HapMap17 imputation, we tested over 2.7 million SNPs, and while we did not directly test all common variants with this approach, nor did we test low-frequency variants (with minor allele frequencies below 1%), we identified many index SNPs in LD with functional variants, either through amino acid changes or involvement in gene regulation. For most newly identified loci, we are able to pinpoint a gene that potentially may be causative, either because the index SNP (or a SNP in high LD with it) is a missense variant in the gene, or because it regulates the expression of the gene. Regulation of gene expression can be tissue-specific, and our findings underscore the importance of examining eQTL data in tissue types relevant to the trait of interest.
A total of 34 novel loci were identified for PR interval in Europeans. Several have been identified previously in a related phenotype or in a different ancestral population, complementing our findings. Two loci, EFHA1 and LRCH1, were previously identified for association with the PR segment18. In addition, the novel locus CAMK2D was found to be associated with P-wave duration, and MYH6 with P-wave duration and P-wave terminal force19. The ID2 locus on chromosome 2 was found in a GWAS on PR interval in Hispanic/Latino population20. A locus that was identified in two studies in Asian population21, 22, SLC8A1, did not reach genome-wide significance in our meta-analysis, but was suggestive with the strongest SNP being rs13026826 (beta for A-allele: 0.278, P = 1.036 × 10−6).
Contrasting meta-analyzed association results from European descent individuals with results from a smaller sample of African Americans, we find that, with few exceptions, a large proportion of genetic associations with PR interval are shared between the two ethnic groups. We then combined data from Europeans and African Americans in a trans-ethnic meta-analysis, allowing us to find additional loci. With over 105,000 samples in total, our power was ~80% to find a significant association at common variants that explain ~0.04% of the variance in PR interval. Future studies should examine sequence or other data that provide better assessment of rare and common functional variants, as was done previously for SCN5A7.
We also combined our results on PR interval with previously published results on heart rate, QRS duration, and AF, and identified loci contributing to atrial arrhythmias and cardiac conduction. We observed significant pleiotropy of effect of these SNPs, with over half of the SNPs associated with PR interval (atrial conduction) in the study also associated with ventricular conduction (QRS interval), atrial arrhythmias (AF), and heart rate (RR interval).
Our series of GWAS, including transethnic and cross-trait meta-analytic studies, has identified 50 loci, 40 of which are novel, associated with cardiac atrial and atrioventricular electrical activity among individuals of European and African ancestry. Understanding the biology of a trait in this way provides insight into related disease processes and may help identify potential approaches to drug therapy.
Meta-analysis of PR interval
We included 32 cohorts comprising 92,340 individuals. Ethical review boards of the respective cohorts approved the studies and informed consent was obtained from all subjects. Detailed information on the participating cohorts is in Supplementary Note 1. Each cohort conducted a GWAS on PR-interval measured on baseline EKG recordings of healthy individuals. Subjects were excluded from analysis based on a set of criteria employed by all participating cohorts, including presence of AF on the baseline EKG, a history of myocardial infarction or heart failure, extreme PR values (≤80 ms or ≥ 320 ms), Wolff–Parkinson–White syndrome, pacemaker implantation, the use of class I and III blocking medications (ATC code prefix C01B) or digoxin, and pregnancy. Age, sex, height, body mass index, and principal components were included as covariates, as well as study site if applicable. We did not exclude or correct for beta blocker and/or calcium channel blocker use. As a sensitivity analysis, we further adjusted for these variables in the largest cohort, ARIC. No appreciable change was observed in the effect estimates (r > 0.99). Analyses were stratified by ethnicity to maintain a homogeneous population with similar LD patterns across cohorts. Low-quality SNPs were removed based on stringent quality control criteria and untyped SNPs were imputed using HapMap 2 as reference panel prior to the association analysis.
Summary level data from all cohorts were collected and stringent quality control was applied to the data, removing SNPs with extreme beta and/or standard error values, or with poor imputation quality. Per cohort, SNPs with an imputation quality below 0.1 or above 1.1 were removed, as well as SNPs with a beta or standard error greater than 1000 or less than −1000. Next, SNPs were removed based on manual examination of quantile–quantile plots stratified for minor allele frequency and for imputation quality; SNPs in strata with early departure from the null were excluded for that specific cohort. Remaining SNPs were meta-analyzed using an inverse-variance fixed effects model23, correcting for per cohort inflation factors (lambda). Two cohorts, AGES and deCODE, contained a small percentage of overlapping samples: approximately 5% based on projections as well as based on Z-statistics from each study using the program METAL (https://genome.sph.umich.edu/w/images/7/7b/METAL_sample_overlap_method_2017-11-15.pdf). To account for this overlap and to adjust for a corresponding inflation of the test statistic, we separately meta-analyzed AGES and deCODE, corrected for the corresponding genomic inflation factor (1.089), and included all corrected association results into the overall meta-analysis.
We conducted the meta-analysis by using three independent analysts and two different software packages: MANTEL (http://debakker.med.harvard.edu/resources.html) and METAL (http://www.sph.umich.edu/csg/abecasis/metal/). All results were extremely concordant, reflecting a robust analysis. In total, 2,712,613 SNPs were tested for association with PR interval. Results were considered statistically significant at a P = 5 × 10−8, a figure that reflects the estimated testing burden of one million independent SNPs in samples of European ancestry24. Regions harboring association signals were visualized using SNAP25.
To identify non-redundant association signals within each locus and calculate the variance explained, we implemented the GWiS method, which aggregates the statistical support for multiple independent effects at a locus using a reference LD matrix6. A locus is defined as the genomic region flanked by the 5′ and 3′ most genome-wide significant signal, plus 100 KB of flanking sequence on each end. For each locus, GWiS uses Bayesian model selection to find the number of independent effects and the SNPs that best tag them, choosing the SNPs that maximize the posterior probability in a greedy search. In each step, the SNP that gives the greatest increase in the posterior probability is added into the model, and this step is repeated until no more SNPs increase the posterior probability.
The SNPs selected by the Bayesian model selection are then used in a multivariate linear regression to calculate the variance explained. We modified the original implementation of GWiS to use the meta-analysis results as input26.
GWiS was applied to the GWAS meta-analysis, making use of pairwise SNP r2 estimates from the ARIC study. GWiS estimated 61 non-redundant signals of association at the 40 genome-wide significant loci (Supplementary Data 4).
Gene selection, gene enrichment, clinical relevance
Although extensive LD among common variants and the incompleteness of the HapMap reference panel preclude an unambiguous identification of the functional variant or the culprit gene, we used the following criteria to implicate genes in 37 of the 44 loci: (1) the gene selected was the only nearby gene (within a ±500 kb window); (2) the gene selected has a missense variant in high LD (r2 > 0.8) with the index SNP; or (3) the index SNP was associated with cardiac transcript expression levels of the selected gene.
We performed over-representation enrichment analysis on PR genes relative to the entire human genome by leveraging several human disease knowledge bases including PharmGKB (~3500 disorders, www.pharmgkb.org), Human Phenotype Ontology (~4000 common diseases, http://human-phenotype-ontology.github.io/), and DisGeNET (http://www.disgenet.org). The analysis used the program WebGestalt (www.webgestalt.org), which computed over-representation P-values based on hypergeometric distributions27. To further increase our confidence in gene set analysis, we also applied gene set enrichment analysis (GSEA) to the entire GWAS gene list rank ordered based on their association P-values. We used the latest GO dataset available at the Molecular Signatures Database (v6.1). We performed 1000 random permutations and used an FDR < 0.01 threshold to identify enriched GO categories. The highly enriched GO annotations identified using the parametric approach were also significant based on the permutation method, and we report only GO categories that were significantly enriched at FDR < 0.01 common to both the parametric and nonparametric procedures.
Multiple hypothesis testing was addressed using Benjamini–Hochberg’s FDR adjustment of the enrichment P-values, and an FDR threshold < 0.01 was used to designate significantly over-represented disease states. We applied the same approach using WebGestalt to identify enriched functional categories (FDR < 0.01) based on Gene Ontology annotations of molecular function, cellular component, and biologic process28.
Functional variants in significant loci
We annotated the 61 index SNPs and nearby SNPs in LD with the index SNPs (within 1 Mb and with r2 > 0.8 in 1000 Genomes Phase 1 CEU) and tested all non-synonymous SNPs with PolyPhen-210 and SIFT9 to predict the functional implication.
We performed an eQTL analysis using the 61 PR index SNPs. We examined eQTL associations from LAA and validated findings in tissue from right atrial appendage (RAA). For comparison, we also evaluated left ventricular tissue as well as peripheral whole blood.
Human LAA tissue was obtained with consent from 223 European-American patients undergoing cardiac surgery for treatment of AF, valvular, and/or coronary artery disease. Use of discarded surgical tissue was approved by the Institutional Review Board of the Cleveland Clinic. Before 2008, verbal consent was obtained and documented in the medical records in a process approved by the Cleveland Clinic Institutional Review Board. From 2008 onward, patients provided separate Institutional Review Board-approved written informed consent. AF history, type of AF, and other clinical data were collected in a research database. LAA tissue was also obtained from 12 non-failing donor hearts not used for transplant with written consent for research use provided by the family. Donor information included age, race, and sex. The Cleveland Clinic Institutional Review Board approved the tissue studies included in this report. Demographic characteristics of the study population have been summarized Table 1, “subjects of European descent” column29.
LAA specimens were snap frozen in liquid nitrogen and stored at −80 °C. Total RNA was extracted 50–100 mg tissue using the Trizol protocol. Tissue was homogenized with sterile Omni Tip Disposable Generator Probes (Omni International). RNA processing and sequencing and DNA genotyping have been described29. Library generation for RNAseq was done at the University of Chicago Genomics Facility using standard Illumina protocols. Samples were filtered based on RNA quality. Unstranded 100-bp paired-end sequencing was performed on the Illumina HiSeq 2000 platform and multiplexed to six samples across two lanes. Samples were demultiplexed and aligned to hg19 using TopHat (v2.0.4)11 with the default options. Reads from exactly matched PCR duplicates were marked using Picard tools (https://broadinstitute.github.io/picard/) and excluded from further analysis. Sequence reads were mapped to the human genome to derive a digital count of the expression of genes, which were defined using the Ensembl (version 71) gene catalog.
DNA was extracted from 25–50 mg homogenized LAA tissue (as above) using the DNAzol protocol. DNA was genotyped using Illumina Hap550v3 and Hap610-quad SNP microarrays. SNP data were imputed to 1000 Genomes Project phase 2 yielding ≈19 million SNPs, using IMPUTE10 after filtering out variants falling below 0.5 on IMPUTE’s information statistic. For the eQTL analysis, we excluded SNPs with <5% minor allele frequency, resulting in roughly 6.8 million SNPs.
Methods for LAA eQTL analysis have been described29. Expression counts were obtained from aligned files using htseq counts against the human Ensembl gene annotation. On average, 26 million paired-end read fragments aligned to this annotation of the transcriptome across all of our samples. Reads were quantile-normalized, and gene counts for eQTL analysis were variance-stabilized transformed using the R package Deseq2. Expression of each gene was adjusted by the following covariates: sex, genetic substructure based on four multidimensional scaling factors, and 25 expression surrogate variable analysis (SVA) covariates. The SVA method is similar to principal component analysis, which uses unsupervised mathematical models to separate out the high variance components in high dimensional data. Thus, without manual normalization, the SVA method corrects for potential large effectors of gene expression such as read-depth, batch effects, and other technical variables, as well as environmental and disease effects such as AF status, history of structural heart disease, coronary artery disease, etc. Surrogate variables were calculated from the variance-stabilizing transformation data using the sva package. eQTL analyses were performed using MatrixeQTL (2.1.0) to test associations between genotype and variance-stabilizing transformation counts. β-coefficients were calculated as the additive effect of 1 allelic difference on log2 gene expression. The QVALUE package was used to estimate FDR from the complete list of cis-eQTL SNP/genome-wide expressed gene pairs P values. Linear regression and Q–Q plot comparison of the LAA eQTLs with selected tissues were performed using the version 6p analysis of GTEx project.
We also performed eQTL associations in 5311 samples from peripheral blood. Those methods have also been described previously30. In brief, Illumina Gene expression data for each dataset was obtained and sequences mapped against the human genome build 36 (Ensembl build 54, Hg18). Highly stringent alignment criteria were used to ensure that probes map unequivocally to one single genomic position. Genotype data was acquired using different genotyping platforms, and harmonized by imputation (HapMap2 CEU).
Gene expression data was quantile-normalized to the median distribution, and subsequently log2 transformed. The probe and sample means were centered to zero. Gene expression data was then corrected for possible population structure by removal of four multi-dimensional scaling components using linear regression.
After normalization of the data, we performed cis-eQTL mapping. eQTLs were deemed cis-eQTLs when the distance between the SNP chromosomal position and the probe midpoint was less than 250 kilobases (kb). eQTLs were mapped using a Spearman’s rank correlation on the imputed genotype dosage values. We used a weighted Z-method for subsequent meta-analysis. We permuted the sample identifiers labels of the expression data and repeated this analysis ten times. In each permutation, the sample labels were permuted. We then corrected for multiple testing by controlling the FDR at 0.05, by testing each P-value in the real data against a null-distribution created from the permuted datasets.
Significant associations from LAA eQTL analyses were replicated using pre-calculated eQTLs from GTEx. We accessed full single tissue cis-eQTL analyses for left ventricle, RAA, and whole blood from GTEx v7, accessed on 22nd March, 2018. Samples with genotype and expression data for eQTL analyses were n = 272, n = 264, and n = 369, for left ventricle, RAA, and whole blood, respectively.
For each of the 61 nonredundant SNPs in the 44 independent loci, we identified the probes/genes for which there was a cis-eQTL association. We also identified the most significant SNP (eSNP) associated with that gene. Frequently, the eSNP and our SNP of interest were in high or near perfect LD and represented the same signal (see Supplementary Figures 2 and 4 for example of MEIS1 demonstrating strong co-localization).
Our aim for identifying co-localizing genetic variants that jointly affect both molecular expression and the PR phenotype is to provide intuition regarding the candidate gene that may play a role in atrioventricular conduction. In any given locus, we identify a candidate gene from eQTL data if it meets the following three criteria: (1) the SNP–transcript association in LAA is significant at a threshold of genome-wide q < 0.05; (2) there is evidence of co-localization in that the PR-GWAS index SNP and the top LAA SNP are in high LD (>0.90) OR there is evidence that the association remains significant in conditional analysis examining the PR-GWAS index SNP adjusted for the top LAA SNP (P < 0.01); and (3) the findings from the LAA replicate (P < 0.05) in GTEx data from the RAA. It is important to note that our replication tissue from GTEx is RAA whereas our discovery tissue is LAA. While we only claim as candidate genes those that replicate, differences in eQTL associations between LAA and RAA are nonetheless interesting and noted in Supplementary Data 6.
At successively more stringent P-value thresholds, SNPs were evaluated for enrichment in tissue-specific DNAse I hypersensitive sites. SNPs from each PR association P-value bin were intersected with the complete set of DHS false discovery 5% hotspot regions identified in any of the 349 tissue or cell line samples available from Maurano et al.31. Intersections between GWAS SNPs and DHS regions were computed using the BEDOPS32 software. Fold enrichment was calculated by comparing the proportion of SNPs within each P-value bin to the background rate of all GWAS variants falling within the DHS sites for each tissue separately.
To search for additional loci involved in PR interval, results of a published GWAS on PR interval in African Americans33 were combined with our GWAS meta-analysis results in Europeans using inverse variance weighted fixed-effect models, correcting for the inflation factor of both cohorts. New loci were called if they reached statistical significance at a P ≤ 5 × 10−8, and if this locus was not significantly associated with PR interval in Europeans or African Americans separately (i.e., if none of the SNPs within one Megabase of the tested SNP reached P ≤ 5 × 10−8 in any of the population). SNP look-ups of index SNPs in Europeans were performed in African American GWAS results, to test for overlapping signals in both ethnicities that were not observed in African Americans because of the relatively low sample size (n = 13,415).
For the joint analysis of PR and AF, beta estimates and standard errors were used to generate z-scores (beta/se), which were then combined as (zPR + zAF)/sqrt(2) to identify genetic variants that both increase PR interval and risk for AF, and as (zPR−zAF)/sqrt(2) to identify genetic variants that increase PR interval, but decrease the risk for AF. Genome-wide significance was set at 8.3 × 10−9, to account for the six tests performed across the three traits that were meta-analyzed with PR interval. Only loci that did not contain variants genome-wide significant separately for PR or AF were concerned novel.
To search for additional loci involved in atrial and ventricular cardiac conduction, we meta-analyzed our PR interval GWAS results with previously published QRS duration8 and RR interval11 results, respectively. We used sample size (z-score) weighted models to identify variants that increase both PR interval and the second trait tested (either QRS duration or RR interval) and variants that increase PR interval but decrease risk for the second trait.
Genomic inflation factor lambda was 1.02 for concordant PR–QRS, 0.98 for discordant PR–QRS, 1.01 for concordant PR–RR, and 0.98 for discordant PR–RR. Therefore, we did not correct for these lambdas, even though the meta-analyses contain overlapping samples.
New loci were called if they reached statistical significance at a P ≤ 8.3 × 10−9, and if this locus was not significantly associated with PR interval nor with the second trait tested. SNP look-ups of index SNPs in PR interval were performed in QRS duration and RR interval, and also the other way around (QRS duration and RR interval index SNPs in PR interval) to test for overlapping signals.
Association with pacemaker implantation
Association with pacemaker implantation was determined in the UKBiobank data for the 61 SNPs independently associated with PR interval. Samples were limited to unrelated whites of British ancestry (~370,000 samples), of whom 1074 had a pacemaker implanted. A logistic regression model was run, including covariates for age, sex, and 40 PCs to account for potential population substructure or other potential confounding. Inverse variance weights (IVW) Mendelian randomization was performed using the “MendelianRandomization” package in R. Results were consistent with those produced by MR-EGGER and MR-Median regression.
The full meta-analysis results are available for download through the CHARGE repository in dbGaP: http://www.chargeconsortium.com/main/results
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Acknowledgements per cohort are listed in Supplementary Note 2. We thank the following studies for sharing their summary level results: QRS voltage (van der Harst et al., 2016), heart rate (den Hoed et al., 2013), RR interval (Eijgelsheim et al., 2010), atrial fibrillation (Christophersen et al., 2017), and CARe-COGENT AA PR Consortium (Butler et al., 2012). We acknowledge Dr. Vinicius Tragante for his help generating the author list.
About this article
Genome-wide association meta-analysis of 30,000 samples identifies seven novel loci for quantitative ECG traits
European Journal of Human Genetics (2019)