Introduction

The electrocardiogram is among the most common clinical tests ordered to assess cardiac abnormalities. Reproducible waveforms indicating discrete electrophysiologic processes were described over 100 years ago, yet the biological underpinnings of conduction and repolarization remain incompletely defined. The electrocardiographic PR interval reflects conduction from the atria to ventricles, across specialized conduction tissues such as the atrioventricular node and the His-Purkinje system. Pathological variation in the PR interval may indicate heart block or pre-excitation, both of which can lead to sudden death1. The PR interval also serves as a risk factor for atrial fibrillation and cardiovascular mortality1,2,3. Prior genetic association studies have identified 64 PR interval loci4,5,6,7,8,9,10,11,12,13. Yet the underlying biological mechanisms of atrioventricular conduction and relationships between genetic predisposition to PR interval duration and disease are incompletely characterized.

To enhance our understanding of the genetic and biological mechanisms of atrioventricular conduction, we perform genome-wide association studies (GWAS) meta-analyses of autosomal and X chromosome variants mainly imputed with the 1000 Genomes Project reference panel (http://www.internationalgenome.org)14 of PR interval duration. We then conduct downstream in silico analyses to elucidate candidate genes and key pathways, and examine relationships between genetic variants linked to PR interval duration and cardiovascular disease in the UK biobank (UKB; https://www.ukbiobank.ac.uk). Over 200 loci are genome-wide significant, and our results imply key regulation processes for atrioventricular conduction, and candidate genes in cardiac muscle development/contraction and the cytoskeleton. We observe associations between polygenic predisposition to PR interval duration with distal conduction disease, AF, and atrioventricular pre-excitation. Our findings highlight the polygenic basis of atrioventricular conduction, and the genetic relationship between PR interval duration and other cardiovascular diseases.

Results

Meta-analysis of GWASs

We performed a primary meta-analysis including 293,051 individuals of European (92.6%), African (2.7%), Hispanic (4%), and Brazilian (<1%) ancestries from 40 studies (Supplementary Data 1 and 2, Supplementary Table 1). We also performed ancestry-specific meta-analyses (Fig. 1). A total of 202 genome-wide significant loci (P < 5 × 10−8) were identified in the multi-ancestry analysis, of which 141 were not previously reported (Supplementary Data 3, Fig. 2, Supplementary Figs. 1 and 2). We considered for discovery only variants present in >60% of the maximum sample size in the GWAS summary results, a filtering criterion used to ensure robustness of associated loci (median proportion of sample size included in analyses for lead variants 1.0, interquartile range 0.99–1.00; Methods). There was strong support in our data for all 64 previously reported loci (61 at P < 5 × 10−8 and 3 at P < 1.1 × 10−4; Supplementary Data 4 and 5). In a secondary analysis among the European ancestry subset, a total of 127 loci not previously reported reached genome-wide significance (Supplementary Data 6, Supplementary Figs. 14), of which lead variants at 8 loci were borderline genome-wide significant (P < 9.1 × 10−7) in our multi-ancestry meta-analysis. None of the previously unreported loci were genome-wide significant in African or Hispanic/Latino ancestry meta-analyses (Supplementary Data 7, Supplementary Figs. 1 and 3). We observed no genome-wide significant loci in the X chromosome meta-analyses (Supplementary Fig. 5). In sensitivity analyses, we examined the rank-based inverse normal transformed residuals of PR interval. Results of absolute and transformed trait meta-analyses were highly correlated (P > 0.94, Supplementary Data 810, Supplementary Figs. 6 and 7).

Fig. 1: Overview of the study design.
figure 1

An overview of contributing studies, single-stage discovery approach, and downstream bioinformatics and in silico annotations performed to link variants to genes, and polygenic risk score analysis to link variants to cardiovascular disease risk is illustrated. Asterisk (*) The multi-ancestry meta-analysis is our primary analysis. Previously not reported loci were identified from the multi-ancestry meta-analysis. Ancestry specific and chromosome X meta-analysis are secondary. Hash (#) For bioinformatics and in silico annotations we also included loci that reached genome-wide significance in European only meta-analysis (N = 8) and were borderline genome-wide significant in the multi-ancestry meta-analysis.

Fig. 2: Manhattan plot of the multi-ancestry meta-analysis for PR interval.
figure 2

P values are plotted on the -log10 scale for all variants present in at least 60% of the maximum sample size from the fixed-effects meta-analysis of 293,051 individuals from multiple ancestries (multi-ancestry meta-analysis). Associations of genome-wide significant (P < 5 × 10−8) variants at previously not reported (N = 141) and previously reported loci (N = 61) are plotted in dark and light blue colors respectively.

By applying joint and conditional analyses in the European meta-analysis data, we identified multiple independently associated variants (Pjoint < 5 × 10−8 and r2 < 0.1) at 12 previously not reported and 25 previously reported loci (Supplementary Data 11). The overall variant-based heritability (h2g) for the PR interval estimated in 59,097 unrelated European participants from the UKB with electrocardiograms was 18.2% (Methods). In the UKB, the proportion of h2g explained by variation at all loci discovered in our analysis was 62.6%, compared with 33.5% when considering previously reported loci only.

We annotated variants at 149 loci (141 previously not reported loci from the multi-ancestry meta-analysis and 8 loci from the meta-analysis of European ancestry subset). The majority of the lead variants at the 149 loci were common (minor allele frequency, MAF > 5%). We observed 6 low-frequency (MAF 1–5%) variants, and one rare (MAF < 1%) predicted damaging missense variant. The rare variant (rs35816944, p.Ser171Leu) is in SPSB3 encoding SplA/Ryanodine Receptor Domain and SOCS Box-containing 3. SPSB3 is involved in degradation of the transcription factor SNAIL, which regulates the epithelial-mesenchymal transition15, and has not been previously associated with cardiovascular traits. At MYH6, a previously described locus for PR interval6,10, sick sinus syndrome16, AF and other cardiovascular traits17, we observed a previously not reported predicted damaging missense variant in MYH6 (rs28711516, p.Gly56Arg). MYH6 encodes the α-heavy chain subunit of cardiac myosin. In total, we identified missense variants in genes at 11 previously not reported loci, one from the European subset meta-analysis, and 6 previously reported loci (Supplementary Data 12). These variants are a representation of multiple variants at each locus, which are in high LD, and thus may not be the causative variant.

Expression quantitative trait loci (eQTLs)

PR interval lead variants (or best proxy [r2 > 0.8]) at 43 previously not reported and 23 previously reported loci were significant cis-eQTLs (at a 5% false discovery rate (FDR) in left ventricle (LV) and right atrial appendage (RAA) tissue samples from the Genotype-Tissue Expression (GTEx; https://gtexportal.org/home/) project18. Variants at 13 previously not reported and 6 previously reported loci were eQTLs in spleen, which was used as negative control tissue (Supplementary Data 13). The PR interval associations and eQTLs colocalized at 31 previously not reported loci and 14 previously reported loci (posterior probability [PP] > 75%. Variants at 9 previously not reported loci were significant eQTLs only in LV and RAA tissues with consistent directionality of gene expression.

Predicted gene expression

In an exploratory analysis, we also performed a transcriptome-wide analysis to evaluate associations between predicted gene expression in LV and RAA with the PR interval. We identified 113 genes meeting our significance threshold (P < 3.1 × 10−6, after Bonferroni correction), of which 91 were localized at PR interval loci (within 500 kb from a lead variant; Supplementary Data 14, Supplementary Fig. 8). Longer PR interval duration was associated with decreased levels of predicted gene expression for 57 genes, and increased levels for 56 genes (Fig. 3). In spleen tissues, only 31 gene expression-PR interval associations were detected, and 19 of them did not overlap with the findings in heart tissues.

Fig. 3: Plausible candidate genes of PR interval from S-PrediXcan.
figure 3

Diagram of standard electrocardiographic intervals and the heart. The electrocardiographic features are illustratively aligned with the corresponding cardiac conduction system structures (orange) reflected on the tracing. The PR interval (labeled) indicates conduction through the atria, atrioventricular node, His bundle, and Purkinje fibers. Right: Supplementary Data 14 shows 113 genes whose expression in the left ventricle (N = 233) or right atrial appendage (N = 231) was associated with PR interval duration in a transcriptome-wide analysis using S-PrediXcan and GTEx v7. Displayed genes include those with significant associations after Bonferroni correction for all tested genes (P < 3.1 × 10−6). Longer PR intervals were associated with increased predicted expression of 56 genes (blue) and reduced expression of 57 genes (orange).

Regulatory annotation of loci

Most PR interval variants were annotated as non-coding. Therefore, we explored whether associated variants or proxies were located in transcriptionally active genomic regions. We observed enrichment for DNase I-hypersensitive sites in fetal heart tissue (P < 9.36 × 10−5, Supplementary Fig. 9). Analysis of chromatin states indicated variants at 97 previously not reported, 6 European, and 52 previously reported loci were located within regulatory elements that are present in heart tissues (Supplementary Data 15), providing support for gene regulatory mechanisms in specifying the PR interval. To identify distal candidate genes at PR interval loci, we assessed the same set of variants for chromatin interactions in a LV tissue Hi-C dataset19. Forty-eight target genes were identified (Supplementary Data 16). Variants at 35 previously not reported and 3 European loci were associated with other traits, including AF and coronary heart disease (Supplementary Data 17, Supplementary Fig. 10).

In silico functional annotation and pathway analysis

Bioinformatics and in silico functional annotations for potential candidate genes at the 149 loci are summarized in Supplementary Data 18 and 19. Using a prior GWAS of AF20,21, we identified variants with shared associations between PR interval duration and AF risk (Supplementary Fig. 11). Enrichment analysis of genes at PR interval loci using Data driven Expression-Prioritized Integration for Complex Traits (DEPICT: https://data.broadinstitute.org/mpg/depict/)22 indicated heart development (P = 1.87 × 10−15) and actin cytoskeleton organization (P = 2.20 × 10−15) as the most significantly enriched processes (Supplementary Data 20 and 21). Ingenuity Pathway Analysis (IPA; https://www.qiagenbioinformatics.com/products/ingenuity-pathway-analysis/) supported heart development, ion channel signaling and cell-junction/cell-signaling amongst the most significant canonical pathways (Supplementary Data 22).

Polygenic risk scores (PRSs) with cardiovascular traits

Finally, we evaluated associations between genetic predisposition to PR interval duration and 16 cardiac phenotypes chosen a priori using ~309,000 unrelated UKB European participants not included in our meta-analyses23. We created a PRS for PR interval using the European ancestry meta-analysis results (Fig. 4, Supplementary Table 2). Genetically determined PR interval prolongation was associated with higher risk of distal conduction disease (atrioventricular block; odds ratio [OR] per standard deviation 1.11, P = 7.02 × 10−8) and pacemaker implantation (OR 1.06, P = 1.5 × 10−4). In contrast, genetically determined PR interval prolongation was associated with reduced risk of AF (OR 0.95, P = 4.30 × 10−8) and was marginally associated with a reduced risk of atrioventricular pre-excitation (Wolff–Parkinson–White syndrome; OR 0.85, P = 0.003). Results were similar when using a PRS derived using the multi-ancestry meta-analysis results (Supplementary Fig. 12, Supplementary Table 2, and Supplementary Data 3).

Fig. 4: Bubble plot of phenome-wide association analysis of European ancestry PR interval polygenic risk score.
figure 4

The polygenic risk score was derived from the European ancestry meta-analysis. Orange circles indicate that polygenic predisposition to longer PR interval is associated with an increased risk of the condition, whereas blue circles indicate that polygenic predisposition to longer PR interval is associated with lower risk of the condition. The darkness of the color reflects the effect size (odds ratio, OR) per 1 standard deviation (s.d.) increment of the polygenic risk score from logistic regression. Sample size (N) in each regression model is provided under X-axis. Given correlation between traits, we set significance threshold at P < 3.13 × 10−3 after Bonferroni correction (P < 0.05/16; dotted line) for the analysis and also report nominal associations (P < 0.05; dashed line).

Discussion

In a meta-analysis of nearly 300,000 individuals, we identified 202 loci, of which 141 were previously not reported underlying cardiac conduction as manifested by the electrocardiographic PR interval. Apart from confirming well-established associations in loci harboring ion-channel genes, our findings further underscore the central importance of heart development and cytoskeletal components in atrioventricular conduction10,12,13. We also highlight the role of common variation at loci harboring genes underlying monogenic forms of arrhythmias and cardiomyopathies in cardiac conduction.

We report signals in/near 12 candidate genes at previously not reported loci with functional roles in cytoskeletal assembly (DSP, DES, OBSL1, PDLIM5, LDB3, FHL2, CEFIP, SSPN, TLN2, PTK2, GJA5, and CDH2; Fig. 5). DSP and DES encode components of the cardiac desmosome, a complex involved in ionic communication between cardiomyocytes and maintenance of cellular integrity. Mutations in the desmosome are implicated in arrhythmogenic cardiomyopathy (ACM) and dilated cardiomyopathy (DCM)24,25,26,27,28. Conduction slowing is a major component of the pathophysiology of arrhythmia in ACM and other cardiomyopathies29,30. OBSL1 encodes obscurin-like 1, which together with obscurin (OBSCN) is involved in sarcomerogenesis by bridging titin (TTN) and myomesin at the M-band31. PDLIM5 encodes a scaffold protein that tethers protein kinases to the Z-disk, and has been associated with DCM in homozygous murine cardiac knockouts32. FHL2 encodes calcineurin-binding protein four and a half LIM domains 2, which is involved in cardiac development by negatively regulating calcineurin/NFAT signaling in cardiomyocytes33. Missense mutations in FHL2 have been associated with hypertrophic cardiomyopathy34. CEFIP encodes the cardiac-enriched FHL2-interacting protein located at the Z-disc, which interacts with FHL2. It is also involved in calcineurin–NFAT signaling, but its overexpression leads to cardiomyocyte hypertrophy35.

Fig. 5: Candidate genes in PR interval loci encoding proteins involved in cardiac muscle cytoskeleton.
figure 5

Candidate genes or encoded proteins are indicated by a star symbol in the figure and are listed in Supplementary Data 3. More information about the genes is provided in Supplementary Data 18 and 19. This figure was created with BioRender. *Previously not reported locus, # genome-wide significant locus in transformed trait meta-analysis. 1Missense variant; 2Nearest gene to the lead variant; 3Gene within the region (r2 > 0.5); 4Variant(s) in the locus are associated with gene expression in left ventricle and/or right atrial appendage; 5Left ventricle best HiC locus interactor (RegulomeDB score ≤ 2); 6Animal model; 7Monogenic disease with a cardiovascular phenotype.

Common variants in/near genes associated with monogenic arrhythmia syndromes were also observed, suggesting these genes may also affect atrioventricular conduction and cardiovascular pathology in the general population. Apart from DSP, DES, and GJA5 discussed above, our analyses indicate 2 additional candidate genes (HCN4 and RYR2). HCN4 encodes a component of the hyperpolarization-activated cyclic nucleotide-gated potassium channel which specifies the sinoatrial pacemaker “funny” current, and is implicated in sinus node dysfunction, AF, and left ventricular noncompaction36,37,38. RYR2 encodes a calcium channel component in the cardiac sarcoplasmic reticulum and is implicated in catecholaminergic polymorphic ventricular tachycardia39.

Genes with roles in autonomic signaling in the heart (CHRM2, ADCY5) were indicated from expression analyses (Supplementary Data 13 and 18). CHRM2 encodes the M2 muscarinic cholinergic receptors that bind acetylcholine and are expressed in the heart40. Their stimulation results in inhibition of adenylate cyclase encoded by ADCY5, which in turn inhibits ion channel function. Ultimately, the signaling cascade can result in reduced levels of the pacemaker “funny” current in the sinoatrial and atrioventricular nodes, reduced L-type calcium current in all myocyte populations, and increased inwardly rectifying IK.Ach potassium current in the conduction tissues and atria causing cardiomyocyte hyperpolarization41. Stimulation has also been reported to shorten atrial action potential duration and thereby facilitate re-entry, which may lead to AF42,43,44.

By constructing PRSs, we also observed that genetically determined PR interval duration is an endophenotype for several adult-onset complex cardiovascular diseases, the most significant of which are arrhythmias and conduction disorders. For example, our findings are consistent with previous epidemiologic data supporting a U-shaped relationship between PR interval duration and AF risk2. Although aggregate genetic predisposition to PR interval prolongation is associated with reduced AF risk, top PR interval prolonging alleles are associated with decreased AF risk (e.g., localized to the SCN5A/SCN10A locus; Supplementary Fig. 11) whereas others are associated with increased AF risk (e.g., localized to the TTN locus; Supplementary Fig. 11), consistent with prior reports8. These findings suggest that genetic determinants of the PR interval may identify distinct pathophysiologic mechanisms leading to AF, perhaps via specifying differences in tissue excitability, conduction velocity, or refractoriness. Future efforts are warranted to better understand the relations between genetically determined PR interval and specific arrhythmia mechanisms.

In conclusion, our study more than triples the reported number of PR interval loci, which collectively explain ~62% of trait-related heritability. Our findings highlight important biological processes underlying atrioventricular conduction, which include both ion channel function, and specification of cytoskeletal components. Our study also indicates that common variation in Mendelian cardiovascular disease genes contributes to population-based variation in the PR interval. Lastly, we observe that genetic determinants of the PR interval provide novel insights into the etiology of several complex cardiac diseases, including AF. Collectively, our results represent a major advance in understanding the polygenic nature of cardiac conduction, and the genetic relationship between PR interval duration and arrhythmias.

Methods

Contributing studies

A total of 40 studies (Supplementary Methods) comprising 293,051 individuals of European (N = 271,570), African (N = 8,173), Hispanic (N = 11,686), and Brazilian (N = 485) ancestries contributed GWAS summary statistics for PR interval. Study-specific design, sample quality control and descriptive statistics are provided in Supplementary Tables 13. For the majority of the studies imputation was performed for autosomal chromosomes and X chromosome using the 1000 Genomes (1000 G: http://www.internationalgenome.org) project14 reference panel. A few studies used whole genome sequence data and the Haplotype Reference Consortium (HRC: http://www.haplotype-reference-consortium.org)/UK10K and 1000 G phase 3 panel was used for UK Biobank (Full details are provided in Supplementary Table 2).

Ethical approval

All contributing studies had study-specific ethical approvals and written informed consent. The details are provided in Supplementary Note 1.

PR interval phenotype and exclusions

The PR interval was measured in milliseconds (ms) from standard 12-lead electrocardiograms (ECGs), except in the UK Biobank where it was obtained from 4-lead ECGs (CAM-USB 6.5, Cardiosoft v6.51) recorded during a 15 second rest period prior to an exercise test (Supplementary Methods). We requested exclusion of individuals with extreme PR interval values (<80 ms or >320 ms), second/third degree heart block, AF on the ECG, or a history of myocardial infarction or heart failure, Wolff–Parkinson–White syndrome, those who had a pacemaker, individuals receiving class I and class III antiarrhythmic medications, digoxin, and pregnancy. Where data were available these exclusions were applied.

Study-level association analyses

We regressed the absolute PR interval on each genotype dosage using multiple linear regression with an additive genetic effect and adjusted for age, sex, height, body mass index, heart rate and any other study-specific covariates. To account for relatedness, linear mixed effects models were used for family studies. To account for population structure, analyses were also adjusted for principal components of ancestry derived from genotyped variants after excluding related individuals. Analyses of autosomal variants were conducted separately for each ancestry group. X chromosome analyses were performed separately for males and females. Analyses using rank-based inverse normal transformed residuals of PR interval corrected for the aforementioned covariates were also conducted. Residuals were calculated separately by ancestral group for autosomal variants, and separately for males and females for X chromosome variants.

Centralized quality control

We performed quality control centrally for each result file using EasyQC version 11.4 (https://www.uni-regensburg.de/medizin/epidemiologie-praeventivmedizin/genetische-epidemiologie/software/#)45. We removed variants that were monomorphic, had a minor allele count (MAC) < 6, imputation quality metric <0.3 (imputed by MACH; http://csg.sph.umich.edu/abecasis/mach/tour/imputation.html) or 0.4 (imputed by IMPUTE2; http://mathgen.stats.ox.ac.uk/impute/impute_v2.html), had invalid or mismatched alleles, were duplicated, or if they were allele frequency outliers (difference > 0.2 from the allele frequency in 1000 G project). We inspected PZ plots, effect allele frequency plots, effect size distributions, QQ plots, and compared effect sizes in each study to effect sizes from prior reports for established PR interval loci to identify genotype and study-level anomalies. Variants with effective MAC ( = 2 × N × MAF × imputation quality metric) <10 were omitted from each study prior to meta-analysis.

Meta-analyses

We aggregated summary-level associations between genotypes and absolute PR interval from all individuals (N = 293,051), and only from Europeans (N = 271,570), African Americans (N = 8,173), and Hispanic/Latinos (N = 12,823) using a fixed-effects meta-analysis approach implemented in METAL (http://csg.sph.umich.edu/abecasis/metal/, release on 2011/03/25)46. We considered as primary our multi-ancestry meta-analysis, and ancestry-specific meta-analyses as secondary. For the X chromosome, meta-analyses were conducted in a sex-stratified fashion. Genomic control was applied (if inflation factor λGC > 1) at the study level. Quantile–quantile (QQ) plots of observed versus expected –log10(P) did not show substantive inflation (Supplementary Figs. 1 and 2).

Given the large sample size we undertook a one-stage discovery study design. To ensure the robustness of this approach we considered for discovery only variants reaching genome-wide significance (P < 5 × 10−8) present in at least 60% of the maximum sample size (Nmax) in our GWAS summary results. We denote loci as previously not reported if the variants map outside 64 previously reported loci (Supplementary Methods, Supplementary Data 4) for both the multi-ancestry and ancestry-specific meta-analysis (secondary meta-analyses). Genome-wide significant variants were grouped into independent loci based on both distance (±500 kb) and linkage disequilibrium (LD, r2 < 0.1) (Supplementary Methods). We assessed heterogeneity in allelic effect sizes among studies contributing to the meta-analysis and among ancestral groups by the I2 inconsistency index47 for the lead variant in each previously not reported locus. LocusZoom (http://locuszoom.org/)48 was used to create region plots of identified loci. For reporting, we only declare as previously not reported genome-wide significant loci from our primary meta-analysis. However, we considered ancestry-specific loci for annotation and downstream analyses. The results from secondary analyses are specifically indicated in Supplementary Data 6 and 7.

Meta-analyses (multi-ancestry [N = 282,128], European only [N = 271,570], and African [N = 8,173]) of rank-based inverse normal transformed residuals of PR interval were also performed (sensitivity meta-analyses). Because not all studies contributed summary-level association statistics of the transformed PR interval, we considered as primary the multi-ancestry meta-analysis of absolute PR interval for which we achieved the maximum sample size. Loci that met our significance criteria in the meta-analyses of transformed PR interval were not taken forward for downstream analyses.

Conditional and heritability analysis

Conditional and joint GWAS analyses were implemented in GCTA v1.91.3 (https://cnsgenomics.com/software/gcta/#Overview)49 using summary-level variant statistics from the European ancestry meta-analysis to identify independent association signals within PR interval loci. We used 59,097 unrelated (kinship coefficient >0.0884) UK Biobank participants of European ancestry as the reference sample to model patterns of LD between variants. We declared as conditionally independent any genome-wide significant variants in conditional analysis (Pjoint < 5 × 10−8) not in LD (r2 < 0.1) with the lead variant in the locus.

Using the same set of individuals from UK Biobank, we estimated the aggregate genetic contributions to PR interval with restricted maximum likelihood as implemented in BOLT-REML v2.3.4 (https://data.broadinstitute.org/alkesgroup/BOLT-LMM/)50. We calculated the additive overall variant-heritability (h2g) based on 333,167 LD-pruned genotyped variants, as well as the h2g of variants at PR interval associated loci only. Loci windows were based on both distance (±500 kb) and LD (r2 > 0.1) around previously not reported and reported variants (Supplementary Methods). We then calculated the proportion of total h2g explained at PR interval loci by dividing the h2g estimate of PR interval loci by the total h2g.

Bioinformatics and in silico functional analyses

We use Variant Effect Predictor (VEP; https://www.ensembl.org/info/docs/tools/vep/index.html)51 to obtain functional characterization of variants including consequence, information on nearest genes and, where applicable, amino acid substitution and functional impact, based on SIFT52 and PolyPhen-253 prediction tools. For non-coding variants, we assessed overlap with DNase I–hypersensitive sites (DHS) and chromatin states as determined by Roadmap Epigenomics Project54 across all tissues and in cardiac tissues (E083, fetal heart; E095, LV; E104, right atrium; E105, right ventricle) using HaploReg v4.1 (https://pubs.broadinstitute.org/mammals/haploreg/haploreg.php)55 and using FORGE (https://github.com/iandunham/Forge).

We assessed whether any PR interval variants were related to cardiac gene expression using GTEx (https://gtexportal.org/home/)18 version 7 cis-eQTL LV (N = 233) and RAA (N = 231) European data. If the variant at a locus was not available in GTEx, we used proxy variants (r2 > 0.8). We then evaluated the effects of predicted gene expression levels on PR interval duration using S-PrediXcan (https://github.com/hakyimlab/MetaXcan)56. GTEx18 genotypes (variants with MAF > 0.01) and normalized expression data in LV and RAA provided by the software developers were used as the training datasets for the prediction models. The prediction models between each gene-tissue pair were performed by Elastic-Net, and only significant models for prediction were included in the analysis, where significance was determined if nested cross validated correlation between predicted and actual levels were greater than 0.10 (equivalent to R2 > 0.01) and P value of the correlation test was less than 0.05.  We used the European meta-analysis summary-level results (variants with at least 60% of maximum sample size) as the study dataset and then performed the S-PrediXcan calculator to estimate the expression-PR interval associations. For both eQTL and S-PrediXcan assessments, we additionally included spleen tissue in Europeans (N = 119) as a negative control. In total, we tested 5366, 5977, and 4598 associations in LV, RAA, and spleen, respectively. Significance threshold of S-PrediXcan was set at P = 3.1 × 10−6 (=0.05/(5977 + 5366 + 4598)) to account for multiple testing. In order to determine whether the GWAS identified loci were colocalized with the eQTL analysis, we performed genetic colocalization analysis for eQTL and S-PrediXcan identified gene regions, using the Bayesian approach in COLOC package (R version 3.5; https://cran.r-project.org/web/packages/coloc/index.html). Variants located within the same identified gene regions were included. We set the significant threshold for the PP (two significant associations sharing a common causal variant) at >75%.

We applied GARFIELD (GWAS analysis of regulatory or functional information enrichment with LD correction; https://www.ebi.ac.uk/birney-srv/GARFIELD/)57 to analyze the enrichment patterns for functional annotations of the European meta-analysis summary statistics, using regulatory maps from the Encyclopedia of DNA Elements (ENCODE)58 and Roadmap Epigenomics54 projects. This method calculates odds ratios and enrichment P values at different GWAS P value thresholds (denoted T) for each annotation by using a logistic regression model accounting for LD, matched genotyping variants and local gene density with the application of logistic regression to derive statistical significance. Threshold for significant enrichment was set to P = 9.36 × 10−5 (after multiple-testing correction for the number of effective annotations).

We identified potential target genes of regulatory variants using long-range chromatin interaction (Hi-C) data from the LV19. Hi-C data was corrected for genomic biases and distance using the Hi-C Pro and Fit-Hi-C pipelines according to Schmitt et al. (40 kb resolution – correction applied to interactions with 50 kb–5 Mb span). We identified the promoter interactions for all potential regulatory variants in LD (r2 > 0.8) with our lead and conditionally independent PR interval variants and report the interactors with the variants with the highest regulatory potential a Regulome DB score of ≤2 (RegulomeDB; http://www.regulomedb.org) to annotate the loci.

We performed a literature review, and queried the Online Mendelian Inheritance in Man (OMIM; https://www.omim.org/) and the International Mouse Phenotyping Consortium (https://www.mousephenotype.org/) databases for all genes in regions defined by r2 > 0.5 from the lead variant at each previously not reported locus. We further expanded the gene listing with any genes that were indicated by gene expression or chromatin interaction analyses. We performed look-ups for each lead variant or their proxies (r2 > 0.8) for associations (P < 5 × 10−8) for common traits using both GWAS catalog59 and PhenoScanner v260 databases. For AF, we summarized the results of lead PR interval variants for PR interval and their associations with AF risk from two recently published GWASs20,21. We included variants in high linkage disequilibrium with lead PR variants (r2 > 0.7).

Geneset enrichment and pathway analyses

We used DEPICT (https://data.broadinstitute.org/mpg/depict/)22 to identify enriched pathways and tissues/cell types where genes from associated loci are highly expressed using all genome-wide significant (P < 5 × 10−8) variants in our multi-ancestry meta-analysis present in at least 60% of Nmax (Nvariants = 20,076). To identify uncorrelated variants for PR interval, DEPICT performed LD-clumping (r2 = 0.1, window size = 250 kb) using LD estimates between variants from the 1000 G reference data on individuals from all ancestries after excluding the major histocompatibility complex region on chromosome 6. Geneset enrichment analysis was conducted based on 14,461 predefined reconstituted gene sets from various databases and data types, including Gene ontology, Kyoto encyclopedia of genes and genomes (KEGG), REACTOME, phenotypic gene sets derived from the Mouse genetics initiative, and protein molecular pathways derived from protein–protein interaction. Finally, tissue and cell type enrichment analyses were performed based on expression information in any of the 209 Medical Subject Heading (MeSH) annotations for the 37,427 human Affymetrix HGU133a2.0 platform microarray probes.

IPA (https://www.qiagenbioinformatics.com/products/ingenuity-pathway-analysis/) was conducted using an extended list comprising 593 genes located in regions defined by r2 > 0.5 with the lead or conditionally independent variants for all PR interval loci, or the nearest gene. We further expanded this list by adding genes indicated by gene expression analyses. Only molecules and/or relationships for human or mouse or rat and experimentally verified results were considered. The significance P value associated with enrichment of functional processes is calculated using the right-tailed Fisher’s exact test by considering the number of query molecules that participate in that function and the total number of molecules that are known to be associated with that function in the IPA.

Associations between genetically determined PR interval and cardiovascular conditions

We examined associations between genetic determinants of atrioventricular conduction and candidate cardiovascular diseases in unrelated individuals of European ancestry from UK Biobank (N~309,000 not included in our GWAS meta-analyses) by creating PRSs for PR interval based on our GWAS results. We derived two PRSs. One was derived from the European ancestry meta-analysis results, and the other from the multi-ancestry meta-analysis results. We used the LD-clumping feature in PLINK v1.9061 (r2 = 0.1, window size = 250 kb, P = 5 × 10−8) to select variants for each PRS. Referent LD structure was based on 1000 G European only, and all ancestry data. In total, we selected 582 and 743 variants from European only and multi-ancestry meta-analysis results, respectively. We calculated the PRSs for PR interval by summing the dosage of PR interval prolonging alleles weighted by the corresponding effect size from the meta-analysis results. A total of 581 variants for the PRS derived from European results and 743 variants for the PRS derived from multi-ancestry results (among the variants with imputation quality >0.6) were included in our PRS calculations.

We selected candidate cardiovascular conditions a priori, which included various cardiac conduction and structural traits such as bradyarrhythmia, AF, atrioventricular pre-excitation, heart failure, cardiomyopathy, and congenital heart disease. We ascertained disease status based on data from baseline interviews, hospital diagnosis codes (ICD-9 and ICD-10), cause of death codes (ICD-10), and operation codes. Details of individual selections and disease definitions are described in Supplementary Data 23.

We tested the PRSs for association with cardiovascular conditions using logistic regression. We adjusted for enrolled age, sex, genotyping array, and phenotype-related principal components of ancestry. Given correlation between traits, we set significance threshold at P < 3.13 × 10−3 after Bonferroni correction (P < 0.05/16) for the number of analyses performed and also report nominal associations (P < 0.05).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.