Introduction

Abnormalities of ventricular depolarization and repolarization are a cause of malignant arrhythmia, which are associated with cardiac morbidity and mortality1. Mechanisms underlying the relationship of conventional electrocardiographic (ECG) measures with arrhythmogenesis (e.g. the QT interval and QRS duration) have previously been explored and highlight the role of cardiac ion channels. However, the biology reflected by markers derived from the vectorcardiogram is largely unknown2. These markers include the spatial (spQRSTa) and frontal (fQRSTa) QRS-T angles, which are the angles between the directions of ventricular depolarization and repolarization in 3- and 2-dimensional space, respectively (Fig. 1)3. Previous experimental and theoretical studies have shown that a wider QRS-T angle is determined through local variation in action potential duration and morphology4,5.

Fig. 1: Graphical representation of the spQRSTa and fQRSTa alongside a single electrocardiogram lead signal.
figure 1

a Single lead electrocardiogram (ECG) signal with classical measures QRS duration and the QT interval labelled. The dark orange (estimates ventricular depolarization time) and blue (ventricular repolarization time) shaded sections of the signal represent the regions used to calculate the QRS and T-wave axes respectively with multiple ECG leads. b The spatial QRS-T angle (spQRSTa) mean is the angle between the mean amplitude of QRS and T-wave spatial loops. These spatial loops can be constructed from the resting 12-lead ECG using a standardised transformation, to produce representative X, Y and Z vectors that can be plotted over time. c The frontal QRS-T angle (fQRSTa) is the absolute difference between QRS and T-wave axes in the frontal plane only.

While vectorcardiographic measures are not currently used in routine clinical practice, there has been a resurgence of interest in their potential clinical utility, which has coincided with computational advances for efficient calculation of these markers. Recent studies have reported associations of the spQRSTa and fQRSTa with risk for arrhythmogenesis, sudden cardiac death and cardiac-related mortality6,7,8. In a population-based study, an abnormal spQRSTa was associated with a five-fold increased risk of cardiac and sudden death. No other conventional cardiovascular or ECG measure provided higher hazard ratios9. These measures may also be broad markers of cardiovascular risk, and associations have been reported with cardiomyopathies and cardioembolic stroke10,11. Improved knowledge of these markers will increase our understanding of these clinical relationships and has potential to identify new biology that is not captured by conventional ECG measures. Genome-wide association studies (GWAS) allow investigation of intermediate phenotypes and complex diseases to identify candidate genes and pathways that contribute to the underlying biology without a predefined hypothesis12. A previous GWAS meta-analysis for the spQRSTa (N = 13,826) identified 3 independent loci, with candidate genes involved in cardiac conduction and development13. However, this study was limited by a small sample size, and no GWAS has investigated the fQRSTa.

We performed the largest multi-ancestry studies to date for the spQRSTa (N = 118,780) and fQRSTa (N = 159,715) to identify additional candidate genes and pathways enriched for these markers, to advance our understanding of their genetic relationship with other ECG traits and cardiovascular disease, and to enhance the interpretation of existing and future clinical studies.

Results

Meta-analysis of QRS-T angle GWAS

Our primary multi-ancestry GWAS meta-analysis for spQRSTa had a total sample size of 118,780 individuals, including European (81.3%), Hispanic/Latino (10.7%) and African (7%) ancestries from 14 studies. The multi-ancestry GWAS meta-analysis for fQRSTa included 159,715 individuals from 23 studies and a similar ancestral composition (Supplementary Data 13, Supplementary Note 1). Ancestry-stratified analyses were also conducted. Due to the non-normal distribution of the traits, results are for the rank-based inverse normal transformed phenotype, with reference to corresponding effect sizes from the raw-phenotype analyses (degrees [°]) for clinical interpretation. No inflation of tests statistics was identified, but early deviation from the reference line was observed in Quantile-Quantile (Q-Q) plots for multi-ancestry and European-ancestry meta-analyses (driven by a locus on chromosome 17; Supplementary Figs. 1 and 2).

Genome-wide significant loci

In multi-ancestry meta-analyses, we identified a total of 61 (58 previously unreported) and 11 lead genome-wide significant (GWS; P < 5 × 10−8) variants at independent loci associated with spQRSTa and fQRSTa, respectively (Figs. 2 and 3, Supplementary Data 4 and 5). All lead variants for fQRSTa mapped within a locus reported for spQRSTa. All previously reported loci for spQRSTa (NFIA, HAND1 and TBX3) were GWS and were the most significant loci. A total of 51 and 9 GWS independent loci were identified in European ancestry meta-analyses for spQRSTa and fQRSTa, respectively. All loci were also GWS in the corresponding multi-ancestry analysis, except one fQRSTa locus (TTN; Supplementary Data 4 and 5).

Fig. 2: Manhattan plot for the spQRSTa multi-ancestry meta-analysis.
figure 2

Manahattan plot for the spatial QRS-T angle (spQRSTa) meta-analysis. Two-sided P-values are plotted on the -log10 scale (Y-axis). The red horizontal line indicates genome-wide significance (P < 5 × 10−8). Variants within the boundaries of loci previously reported for the spatial QRS-T angle are labelled with the candidate gene and colored blue. Variants at previously unreported loci are green.

Fig. 3: Manhattan plot for the fQRSTa multi-ancestry meta-analysis.
figure 3

Manahattan plot for the frontal QRS-T angle (fQRSTA) meta-analysis. Two-sided P-values are plotted on the -log10 scale (Y-axis). The red horizontal line indicates genome-wide significance (P < 5 × 10−8). Variants within the boundaries of loci previously reported for the spatial QRS-T angle are labelled with the candidate gene and colored blue. Variants at previously unreported loci are green.

Conditional analyses and heritability estimates in European ancestry individuals

To identify additional signals, Genome-wide Complex Trait Analysis (GCTA, v1.26.0)14 was performed using European ancestry UK Biobank (UKB) participant meta-analysis summary statistics from 33,960 individuals. The analyses identified conditionally independent variants at 4 loci for spQRSTa and at 2 loci for fQRSTa (Supplementary Data 6).

Common SNP-based heritability was estimated in the same set of UKB participants with BOLT-Restricted Maximum Likelihood (BOLT-REML, v2.3.2) software15. Heritabilities of spQRSTa and fQRSTa were 22.3% and 6.8%, respectively (standard error [SE] 1.0%). European ancestry lead and conditionally independent variants explained 4.0% and 0.5% of the variance of spQRSTa and fQRSTa, respectively. Therefore, these variants explain approximately 17.8% and 7.4% of the SNP-based heritability of spQRSTa and fQRSTa, respectively.

Follow-up of loci for the spatial QRS-T angle

Over 96% (59/61) of the spQRSTa lead multi-ancestry variants were common (minor allele frequency [MAF] > 0.05). Across all loci, the lead variant with the largest effect size was rs117526881, located upstream of MYH7 (effect size 3.7° per allele). At each locus, Variant Effect Predictor (VEP, Ensembl release 99) was used to identify potential functional consequences of lead variants and their proxies (r2 > 0.8)16. Missense variants were identified at 6 (9.8%) loci (Supplementary Data 7). SIFT or Polyphen-2 prediction tools identified variants that were likely to be deleterious at 2 loci (ADPRHL1 and KANSL1). The KANSL1 locus contained missense variants in strong LD with the lead SNP (r2 > 0.94) in multiple genes (KANSL1, SSPL2C, MAPT and LRRC37A2). The lead variant (or a proxy) of five loci had a Combined Annotation Dependent Depletion (CADD) score \(\ge\)20, and were therefore predicted to be among the most deleterious variants in the genome (i.e., in the top 1%; Supplementary Data 8). The low frequency missense variant rs41306688 (effect size −2.5° per allele) at the ADPRHL1 locus had the highest CADD score (26.7).

To identify variants associated with tissue-specific gene expression in cardiovascular tissues, data were extracted from the Genotype-Tissue Expression (GTEx, v8) project17. At 11 loci, the lead variant or a proxy was a significant cis- expression quantitative trait locus (eQTL) variant in cardiac (left ventricular [LV], right atrial appendage [RAA]) or vascular (coronary or aorta artery) tissue (Supplementary Data 9). At 5 loci, we identified support for pairwise colocalization (BACH [RAA], C1QTNF4 [LV, aorta artery], CDH13 [LV, RAA], LINC00964 [LV] and MTSS1 [LV, RAA], and PKDCC [LV]; posterior probability [PP] > 0.75).

To predict the effects of gene expression in LV, RAA and vascular tissue on our phenotypes, a transcriptome-wide association study (TWAS) was performed with S-PrediXcan software. The expression of 33 genes was significantly associated with the spQRSTa (Bonferroni corrected threshold; P < 3.1 × 10−6), 26 of which mapped within GWS loci, and 10 were significant in multiple tissues (Supplementary Data 10). Increased expression was associated with an increase in spQRSTa for 17 genes, whereas an inverse relationship was found for 15 genes (Supplementary Fig. 3). For TMEM198, increased expression in the aorta was associated with an increase in spQRSTa, but an inverse relationship was observed in LV tissue. All other genes with significant findings in multiple tissues had concordant directions of effect.

Non-coding variants may influence cardiac electrophysiology through effects on regulatory elements and chromatin folding. We used 40 kb and ~4 kb-resolution long-range chromatin interaction (Hi-C) datasets to identify potential target genes of regulatory variants18,19. Promoter interactions were identified at 17 (27.9%) multi-ancestry loci in LV or RV tissues (Supplementary Data 11a, b). GWAS Analysis of Regulatory and Functional Information Enrichment with LD correction (GARFIELD) was used to test for enrichment of variants at DNase 1 hypersensitivity sites in specific tissues using European ancestry summary statistics. The strongest enrichment was in fetal heart tissue (P < 7.5 × 10−36); however, additional tissues were identified, including fetal renal pelvis, adult heart and brain (Supplementary Fig. 4).

With single nucleus Assay for Transposase-Accessible Chromatin using sequencing (snATAC-seq) data, we tested for enrichment of non-coding variants at open chromatin regions, to identify cell-type specific functional effects in adult heart, by utilizing Chromatin Element Enrichment Ranking by Specificity (CHEERS)20,21. Significant enrichment was observed across all variants in atrial and ventricular cardiomyocytes (Supplementary Fig. 5).

Reconstituted gene-sets in Data-driven Expression-Prioritisation Integration for Complex Traits (DEPICT) software were used to prioritize potential candidate genes based on overlapping functional pathways22. Significant gene-set enrichment (false discovery rate [FDR] < 0.01) was observed in cardiac tissues (ventricle, atrial and atrial appendage) (Supplementary Data 12). Significantly enriched Gene-Ontology (GO) biological processes were extracted from DEPICT pathway analyses (Supplementary Data 13). Redundant GO terms were removed and the remaining processes clustered using the reduce and visualise Gene Ontology (REVIGO) web application23. This analysis identified clusters of biological processes involved in: cardiac development (including embryonic heart tube morphogenesis, muscle structure development, trabeculae formation and vasculogenesis); muscle cell differentiation and regulation of organ growth; actin filament-based movement; and cardiac contraction and hypertrophy (Fig. 4). Significant KEGG pathways were dilated, hypertrophic and arrhythmogenic right ventricular cardiomyopathies; cardiac muscle contraction; and arginine and proline metabolism. The top 10 enriched mouse phenotypes included dilated cardiac chambers; ventricular wall thickness (thick and thin); and abnormal cardiac development (Supplementary Data 13).

Fig. 4: Significant GO biological processes from spQRSTa DEPICT multi-ancestry findings.
figure 4

All significant (false discovery rate <0.01) multi-ancestry spatial QRS-T angle (spQRSTa) gene-ontology (GO) biological processes from Data-driven Expression-Prioritization Integration for Complex Traits (DEPICT) software were analyzed using the Reduce and Visualize Gene Ontology (REVIGO) web application to remove redundant terms and cluster related nodes. Highly similar GO terms are linked by edges where the line width indicates the degree of similarity. Within each cluster, the colour gradient represents differences in the DEPICT gene-set enrichment two-sided P-values, with lighter gradients reflecting smaller enrichment P-values (therefore more significant) compared with other nodes in the same cluster.

A summary of bioinformatic annotations for all spQRSTa multi-ancestry loci is provided in Supplementary Data 14. These findings have been supplemented with additional trait-relevant information from: Online Mendelian Inheritance in Man (OMIM)24; the International Mouse Phenotyping Consortium25 (IMP); the Human Protein Atlas26; and PubMed literature reviews for each candidate gene. We also performed lookups of each lead variant in the Open Targets Genetics ‘Locus to Gene’ machine learning gene-prioritization pipeline for further annotations (Supplementary Data 14)27.

We identified two independent loci in the Hispanic/Latino spQRSTa meta-analysis, including one locus that was not GWS in the multi-ancestry meta-analysis (lead variant rs112628278, multi-ancestry GWAS P = 0.01). rs112628278 (nearest gene VAV2) is a low frequency Hispanic/Latino variant (MAF = 0.011) and rare among European ancestry individuals (MAF = 0.0002, 1000 Genomes [1000 G] reference panel).

One unreported locus (FAM135B) identified in the African ancestry spQRSTa meta-analysis showed no evidence for association in the multi-ancestry meta-analysis (P > 0.05). The lead variant (rs28377209) has a higher MAF in African ancestry populations, compared with Europeans (0.19 vs 0.10).

Follow-up of loci for the frontal QRS-T angle

Three variants at two loci were significant eQTL variants (LV [SSXP10, RP11-632C17_A.1], coronary artery [GNAZ]), but there was no support for colocalization (Supplementary Data 9). Eight genes were significant in the TWAS, and overlapped with spQRSTa genes, except for two (CEP85L and MMP11) (Supplementary Data 10). Tissue-specific promoter interactions were identified for variants at two loci that were not reported for spQRSTa loci (lead variant rs10885011; FAM124A and DLEU7, rs5030613; BCR) (Supplementary Data 11a, b). An unreported locus identified in the African ancestry fQRSTa meta-analysis was not GWS in spQRSTa analyses. The gene nearest to the lead signal is CCDC60 (Coiled-Coil Domain Containing 60).

Genetic correlation and overlap of GWS loci with other ECG measures

LD Score Regression (LDSC) software was used to estimate genetic correlations (rg) of spQRSTa and fQRSTa with ECG markers of cardiac conduction (PR interval), ventricular depolarization (QRS duration) and repolarization (QT and JT intervals)28,29. There was a high positive genetic correlation between spQRSTa and fQRSTa (rg = 0.61). Weak positive correlations wereobserved with PR interval (rg = 0.12, P = 6 × 10−4 for spQRSTa; rg = 0.19, P = 2.2 × 10−5 for fQRSTa). However, no statistically significant correlation was observed with the other ECG traits (Supplementary Fig. 6).

We used additional approaches to interrogate genetic overlaps. First, lead variants reported for other resting ECG traits were extracted and overlap was reported if they mapped within spQRSTa locus boundaries (within r2 > 0.1 or ±500 kb from the lead spQRSTa variant). Despite the low genetic correlations observed genome-wide, 26 (42.6%), 27 (44.3%) and 26 (42.6%) lead multi-ancestry spQRSTa variants mapped to reported PR, QRS and HR loci, respectively (Supplementary Data 15). Fewer variants mapped to reported QT and JT loci (19 [31.1%] and 14 [23%], respectively) (Fig. 5). Of the 7 loci reported for the global electrical heterogeneity trait SAI QRST, 3 lead variants mapped within the boundaries of spQRSTa loci (SCN5A, MYBPC3 and NDRG4).

Fig. 5: Overlap of multi-ancestry spQRSTa loci with ECG measures.
figure 5

Venn diagram showing spatial QRS-T angle (spQRSTa) multi-ancestry loci where a lead variant reported for another electrocardiographic ECG measure maps within the locus boundaries. For this figure, ECG measures shown are PR interval (cardiac conduction), QRS duration (ventricular depolarization), QT and JT intervals (ventricular repolarization) and heart rate (HR). Overlap was declared if a lead variant for these ECG measures mapped to within ±500 kb or r2 > 0.1 of a lead variant at a spQRSTa locus. Some loci overlap with other ECG traits (not visualised here but presented in Supplementary Data 15). At seven spQRSTa loci, no overlap was observed with any ECG trait (blue circle bottom right).

Next, we performed a pairwise GWAS with GWAS-PW, which uses Bayesian bivariate methods to estimate the probability for each genomic region that a variant affects both traits tested30. Across all spQRSTa loci, there was evidence for shared genetic influences at 17 (27.9%), 20 (32.8%), 7 (11.5%), 14 (22.9%) and 12 (19.7%) loci involving PR, QRS, HR, QT and JT, respectively (PP > 0.9). Of the loci that shared effects with QT and JT, 8/14 (57.1%) and 6/12 (50%) loci, respectively, also influenced QRS duration (Supplementary Data 15). The smallest P-value for variants at the NOS1AP locus in the spQRSTa multi-ancestry meta-analysis was 7.3 × 10−5. NOS1AP is the locus consistently reported with the strongest QT and JT associations. We performed a sensitivity analysis in ~34,000 UKB individuals to determine whether inclusion of the QT interval as a covariate influenced our findings. Beta estimates and P-values were highly correlated (rho[ρ] = 0.99 and 0.96 respectively) across all variants comparing a GWAS with or without the QT interval as a covariate. Also, there was no substantial change in the minimum P-value of variants at the NOS1AP locus.

At 7 multi-ancestry spQRSTa loci, we observed no overlap with previously reported ECG loci. Candidate genes at these loci include AHNAK2, ALDH1A2, SGCG and TAOK2.

Pleiotropy of genetic variants with other phenotypes

We performed a phenome-wide association study (PheWAS) to identify associations of European ancestry lead and conditionally independent spQRSTa variants with 1301 clinical conditions in 395,758 unrelated individuals European-ancestry individuals. Data on clinical conditions were from hospital episode statistics. Significantly associated conditions included atrial fibrillation, bundle branch block (BBB), atrioventricular block (AVB), arterial embolism and thrombosis, and hypertension (Fig. 6). We also performed lookups of all multi-ancestry lead spQRSTa variants (and proxies) in Phenoscanner (v2), to determine if they appeared in GWAS reports for non-ECG phenotypes and diseases (Supplementary Data 16). Lead variants or proxies at 19 spQRSTa loci (31.1%) had reported associations with blood pressure, anthropometric traits, blood counts, or psychiatric features or disorders (P < 5 × 10−8).

Fig. 6: Significant associations observed in phenome-wide association study of lead and conditionally independent spQRSTa variants.
figure 6

X-axis: Lead variant (RsID [Chromosome: Position (hg19): Allele1: Allele2]) or conditionally independent variant from the spatial QRS-T angle (spQRSTa) European ancestry meta-analysis that had a significant association with a clinical phenotype in UK Biobank. Y-axis: Phenotype derived from hospital episode statistics, with colour coding for each major group (circulatory system; red, digestive system; green, neoplasms; yellow, respiratory; blue). Odds ratios (OR) are color coded according to decreasing (blue) or increasing (green) odds. 3:38587306:A:G was a conditionally independent variant at the SCN5A locus.

Association of genetically determined spQRSTa and fQRSTa with cardiovascular disease

Polygenic risk scores (PRSs) were used to explore associations between genetically determined spQRSTa and fQRSTa and relevant cardiovascular diseases. PRSs were calculated by summing the dosage of lead variants from the European-ancestry meta-analysis, weighted by the effect size estimates from the corresponding untransformed analysis. To obtain preliminary β estimates for the association of PRSs with the directly measured ECG trait, we performed a linear regression adjusting for age, sex, RR interval, BMI, height and 10 genetic principal components, in 33,960 unrelated individuals of European ancestry from UKB. These individuals were included in the GWAS meta-analysis, and therefore β estimates and CIs are biased. However, approximation is useful to aid interpretation of subsequent analyses. Associations observed for each PRS were (β [95% CI]): 5.4° (5.1–5.7) for spQRSTa; and 2.03° (1.8–2.3) for fQRSTa (per standard deviation [SD] increase in the PRS).

Subsequently, each PRS was tested for association with prevalent cases of cardiovascular disease in 395,758 unrelated European ancestry UKB participants who were not in the GWAS meta-analysis (adjusting for sex, age, and 10 genetic principal components). We used a Bonferroni corrected threshold to identify significant findings (0.05/number of conditions tested, P < 6.3 × 10−3). Genetically determined spQRSTa was associated with increased odds for fascicular or bundle branch block (odds ratio [OR] (95% CI) per SD: 1.10 [1.07–1.13]) (Supplementary Fig. 7, Supplementary Data 17). Association of a QRS PRS with fascicular or bundle branch block has been reported31. However inclusion of a QRS PRS as a covariate did not substantially change the point estimates (1.09 [1.06–1.13]), supporting an interpretation that the spQRSTa PRS contains independent risk information. There was suggestive evidence for an association with AV block but not at the Bonferroni corrected significance threshold (OR: 1.04 [1.01–1.06], P = 7.7 × 10−3). Genetically determined fQRSTa was significantly associated with fascicular block or bundle branch block (OR 1.05 [1.02–1.08]), and AV block (OR 1.04 [1.01–1.07]).

No evidence for a causal relationship between spQRSTa and cardiomyopathies

Because candidate genes and pathway analyses indicated potential involvement with cardiomyopathies, we performed Mendelian randomization (MR) studies to test for a causal relationship of genetically determined spQRSTa (as the exposure) with hypertrophic cardiomyopathy32 (HCM) and idiopathic dilated cardiomyopathy33 (DCM) (as outcomes). Lead variants from multi-ancestry and European spQRSTa meta-analyses were used as instrumental variables (IV). A relationship was suggested with HCM (multi-ancestry: OR 1.01 [1.00–1.02], P = 0.004; European: 1.01 [1.00–1.02], P = 0.009), using a fixed-effect inverse variance-weighted (IVW) model. However, the association was not supported in sensitivity analyses, including MR-Egger, weighted median and MR-PRESSO analyses (Supplementary Data 18). Similarly, no causal relationship was identified with either sarcomere positive or sarcomere negative HCM cases. There were no significant findings in spQRSTa-DCM MR analyses (Supplementary Data 19). Funnel, scatter, and forest plots for HCM and DCM analyses are presented in Supplementary Figs. 8 and 9.

Discussion

Our large-scale analyses of spQRSTa and fQRSTa -- two emerging markers for arrhythmogenesis and cardiovascular disease -- significantly advance our understanding of their basic biology and relationships with classical ECG markers. We identify candidate genes involved in cardiac development, muscle cell differentiation, cardiac contraction and actin-filament based movement. The genes also haverelationships with cardiomyopathies and central arterial vascular development. spQRSTa and fQRSTa shared loci with other ECG measures. But there are also 7 unshared loci, suggesting distinct genetic influences. Among spQRSTa and fQRSTa loci, there are fewer genes for cardiac ion channels, in contrast to findings for other ECG traits. Based on a phenome-wide scan, we report associations with atrial fibrillation, conduction disease and arterial embolism. Moreover, PRSs are associated with fascicular and bundle branch block, and AV block, indicating potential downstream effects of the loci.

A substantial proportion of lead candidate genes at spQRSTa loci are associated with development of inherited cardiomyopathies in humans (including MYH7, TTN, TNNT2, MYBPC3, DSP, RBM20; Fig. 7)34. There was also support for genes with non-Mendelian roles in cardiac myogenesis, including ADPRHL1, NACA and NFIA. The function of ADPRHL1 in humans has yet to be established, however, knockout of ADPRHL1 in Xenopus laevis causes loss of the myofibril assembly in ventricular cardiomyocytes and prevents ventricular outgrowth35.

Fig. 7: Illustration of candidate genes at spQRSTa multi-ancestry loci and their potential function.
figure 7

Candidate genes at spatial QRS-T angle (spQRSTa) loci are grouped according to potential roles in embryonic development, cardiac structure and function. RYR2 and ACTN2 are candidate genes from the same locus. A summary of the bioinformatic evidence for each gene is presented in Supplementary Data 14. Created using BioRender.com.

Small clinical studies have identified an association between a widened spQRSTa and HCM in paediatric and adult populations10,36. A widened spQRSTa also predicts occurrence of ventricular arrhythmia among HCM patients37. Interestingly, we did not identify a causal relationship between genetically determined spQRSTa and HCM or DCM in MR studies. Lack of association could be due to the small sizes of the HCM and DCM cohorts. However, the analyses did identify GWS loci. Therefore, the spQRSTa may reflect functional information in these cardiomyopathies (conditional, non-obligatory), rather than causal mechanisms for the structural phenotype. The spQRSTa may also reflect mechanisms and conditions predisposing to intermittent changes in ventricular conduction (e.g., intermittent or persistent BBB) indicating the development of cardiac memory11,38. This is supported by our PheWAS and PRS analyses, where we observed associations with fascicular or bundle branch block and AV block. Therefore, although we did not find a causal relationship with structural HCM or DCM phenotypes, the spQRSTa may reflect the burden of intermittent ventricular arrhythmia or conduction abnormalities occurring over time in these conditions39.

Multiple findings support a role for angiogenesis and arterial development in modulating the spQRSTa, including candidate genes (ALDH1A2, ANGPT1, and VAV2), significant enrichment of GO-terms (coronary vascular development and vasculogenesis), and associations identified in PheWAS (arterial embolism, thrombosis and hypertension). VAV2, a candidate gene identified in Hispanic/Latino ancestry-specific analyses, is a guanine nucleotide exchange factor for Ras-related GTPases and modulates receptor-mediated angiogenic responses40,41. Knockout mice for this gene show signs of left ventricular hypertrophy, cardiac fibrosis and hypertension42. Abnormal angiogenesis influences cardiac structure and function through physiological and pathological cardiac hypertrophy, effects on tissue recovery following ischaemia, and regenerative capacity43,44. These processes may potentially lead to an arrhythmogenic substrate. A recent study identified an association between a widened spQRSTa and increased risk for cardioembolic and haemorrhagic stroke45. Our findings provide potential biological explanations for stroke associations.

Previous theoretical studies suggested that the spQRSTa reflects abnormalities of ventricular repolarization due to abnormal depolarization3. We identified shared genetic influences and loci overlapping with mainly PR interval and QRS duration. We also report loci that are shared across multiple ECG traits including NFIA, CASQ2, RYR2, TTN, SCN5A, PITX2, CDKN1A, PLN, NACA and NDRG4 (Fig. 7, Supplementary Data 15). In comparison to results reported for QT and JT, there is less support for the involvement of cardiac potassium channels, which are important determinants of ventricular repolarization and common targets of existing anti-arrhythmics46. Combined with other studies, our results support an interpretation that the spQRSTa is primarily a marker of abnormal ventricular depolarization and suggests new therapies targeting depolarization should be investigated for arrhythmia prevention and management.

Despite evidence for shared effects at some loci, genetic and phenotypic correlations of spQRSTa and fQRSTa with other ECG traits are weak. Therefore, spQRSTa and fQRSTa may represent unique biology that may contribute to arrhythmic risk. There was no overlap with other ECG traits at 7 multi-ancestry spQRSTa loci. Candidate genes at these loci include: AHNAK2, which encodes a large nucleoprotein that localises to the Z-band region of mouse cardiomyocytes and may have a role in excitation-contraction coupling through effects on L-type voltage-gated calcium channels; SGCG, a component of the subsarcolemmal cytoskeleton; and ALDH1A2, which encodes an enzyme responsible for early embryonic retinoic acid synthesis, a process that is critical for normal cardiac and arterial development47,48,49,50. Another candidate gene TAOK2, is a protein kinase most studied for its role in dendritic spine maturation51. More recently, TAOK2 has been identified in tethering the endoplasmic reticulum to microtubules. We report another locus, MACF1, that is also involved in microtubule organization52,53. Validation of these loci is required.

Although sample sizes were significantly larger for fQRSTa than for spQRSTa (134% larger), we found fewer loci and lower heritability estimates for fQRSTa. All multi-ancestry fQRSTa loci overlapped with spQRSTa loci. There were candidate genes involved in cardiac development and cardiomyopathies including SCN5A, RBM20, PLN, TBX3 and MYO18B. The fQRSTa represents the QRS-T angle in the frontal plane only, whereas the spQRSTa is 3-dimensional. Therefore the fQRSTa trait likely loses information that resides in other planes. However, we identified an unreported locus in African ancestry-specific analyses (candidate gene FAM135B). Knockdown of FAM135B in iPSC lines reduces spinal motor neuron survival and contributes to neurite defects as seen in spinal and bulbar muscular atrophy. These disorders are associated with cardiac arrhythmia and structural abnormalties54,55,56.

Although our study includes individuals from multiple ancestries, ancestry-specific analyses were limited by sample sizes. Larger studies are needed to yield additional signals. The precise algorithms used to calculate the spQRSTa will marginally differ despite efforts to harmonise approaches; however, such differences are unlikely to affect our positive findings (measurement error or noise will dampen signals), and summary statistics for spQRSTa across all studies are broadly similar (Supplementary Data 3)57,58,59.

In summary, our analyses significantly advance our knowledge of the underlying biology reflected by the spQRSTa and fQRSTa, which are independent risk markers for arrhythmogenesis. We also identified loci that have not been reported for ECG traits. Our findings highlight biological processes and candidate genes that may explain associations observed in previous clinical studies and could inform future research on the utility of these markers in risk prediction.

Methods

Study cohorts

Fourteen studies (32 ancestry-specific sub-studies) and 23 studies (40 ancestry-sub-studies) contributed GWAS summary statistics for spQRSTa and fQRSTa meta-analyses, respectively. These included members of the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) consortium60 (Supplementary Data 1). This study was approved by all participating cohorts. Ethics and consent was obtained at a study level. The majority of participating cohorts were population based with a small number of case-control studies. Information for study level genotyping method (typically Illumina or Affymetrix), quality control (Hardy-Weinberg equilibrium [HWE] P, and MAF), are provided in Supplementary Data 2. The 1000 G reference panel (26/40 sub-studies) was most used for imputation (26/40) followed by the Haplotype Reference Consortium panel (13/40). The Atherosclerosis Risk in Communities (ARIC) study was imputed with TOPMed Freeze 5 reference panel61,62. All GWAS summary data included in the meta-analyses utilized NCBI build 37 (summary statistics for ARIC sub-studies were converted from build 38 to 37 using a liftover tool [https://genome.sph.umich.edu/wiki/LiftOver]).

Cohort-level single variant association analyses

A GWAS was performed by each participating cohort for the spQRSTa (mean) and fQRSTa. If the spQRSTa was not already calculated and digitized ECGs were available, it was derived by transformation of the 12-lead ECG using previously published algorithms57. In brief, after applying a bandpass butterworth filter and signal averaging to reduce noise, orthogonal X, Y and Z vector beats were estimated using Kors’ regression matrix63. The spQRSTa was subsequently calculated as the angle between mean QRS and T-wave spatial vector loops57,58. The fQRSTa was defined as the absolute difference between QRS and T-wave frontal plane axes (fQRSTa = abs[QRS-axisT-axis])3. Values for both phenotypes are between 0 and 180°.

The primary analysis for this study to declare GWS and previously unreported associations, was the rank-based inverse normal transformed phenotype (as both the spQRSTa and fQRSTa are not normally distributed). The raw phenotype was also analysed to calculate clinically meaningful effect sizes (on the degree [°] scale). Study level GWASs were performed using an additive genetic model, accounting for relatedness with appropriate software (e.g. BOLT linear mixed model [BOLT-LMM])15 or by including a kinship matrix or pedigree64,65,66. Poorly imputed genotypes were excluded (Rsq > 0.3 or similar for IMPUTE) and a MAF > 0.01 was applied, so that only high-quality variants were included in the study.

Summary statistics for cohort level distributions of each ECG trait and covariates included in the GWAS model, are provided in Supplementary Data 3. Age, sex, RR interval, height, and body-mass index (BMI) were mandatory covariates in the GWAS model. In addition, as the QT interval is associated with the QRS-T angle and we wished to identify associations that were not primarily driven by this marker of ventricular depolarization and repolarization, the QT interval was also included as a covariate. If pedigree data was not available, or if the chosen GWAS software did not correct for underlying population stratification, genetic principal components (PCs) were also included as covariates. Cohorts could also select additional covariates if relevant to their study, such as genotyping method or recruitment site. Cohorts comprising multiple ancestries performed separate analyses for each ancestry.

Individuals were not included in the study-level GWAS if they had a prior diagnosis of heart failure, myocardial infarction, pacemaker or implantable cardiac defibrillator; were prescribed class I or III anti-arrhythmics, QT-prolonging or digitalis medication; or were pregnant at the time of ECG acquisition. In addition, individuals were excluded if atrial fibrillation, BBB or a QRS duration greater than 120 ms, was present on their ECG.

Additional quality control of study-level data

After submissions of results in a standardized format, quality control was performed using EasyQC (R package v9.2)67. Allele frequencies of all variants were compared to those reported in the reference panel used by the study for imputation. To identify analytical errors, QQ and P–Z-score plots were inspected, and summary statistics for β estimates and SE were compared across all studies. To identify potential uncorrected population stratification, the genomic-control inflation factor was calculated to identify test statistic inflation.

GWAS meta-analysis

The primary GWAS meta-analysis for spQRSTa and fQRSTa was the multi-ancestry rank-based inverse normal transformed meta-analysis; however, to estimate clinically relevant effect sizes, a GWAS meta-analysis was also performed using the untransformed phenotype (on the degree [°] scale). European, African, and Hispanic/Latino ancestry-specific meta-analyses were also performed as secondary analyses. Meta-analyses were performed with METAL (v2011-03-25) using an IVW model68. If a study’s λ was >1.0, genomic control during the meta-analysis. Summary statistics and plots were produced for the entire meta-analysis. Subsequently in downstream analyses, variants were only included if present in >60% of the total meta-analysis sample size. The GWS threshold was set as P < 5 × 10−08. To calculate the correlation between variants, relevant individuals from the 1000 G reference panel were used; all individuals for the multi-ancestry summary statistics, ancestry-specific for European, African and Hispanic/Latino analyses. Some in-silico analyses relied upon correlations calculated by the software developers and did not permit modification. In these instances (and explicitly stated in the manuscript text), only European-ancestry summary statistics were used in recognition that the multi-ancestry meta-analysis contained a substantial proportion of individuals of European descent.

Definition of known and previously unreported loci

One previous GWAS has been reported for spQRSTa, with 3 loci reaching GWS13. Using PLINK (v1.9)69, lead variants from the study were extracted to calculate locus boundaries, defined as ±500 kb or r2 < 0.1 within a 4 mb region (whichever was larger), centered on the lead variant. The 1000 G reference panel was used to calculate correlations between variants61. The variants furthest upstream or downstream were declared the locus start and end respectively. We used the same list to define known loci for fQRSTa as no previous GWAS has been reported for this trait and as the phenotypic correlation with spQRSTa is high. The same method was used to identify GWS loci in our study. Loci that did not overlap with the list of known loci, were declared as previously unreported.

Heterogeneity I2 statistics and forest plots were produced for each lead variant (smallest P) at each locus, to identify evidence for heterogeneity. Locus-Zoom plots were also produced to visually inspect patterns of association and variant correlations70.

Conditional and heritability analyses

To identify independent secondary signals within GWS loci, conditional analyses using European-ancestry statistics were performed using Genome-wide Complex Trait Analysis (GCTA, v1.26.0)14. As recommended by GCTA, the largest cohort in the meta-analysis was used as the reference panel (UKB, N = 33,960). For this analysis, related individuals in the UKB sample were excluded (up to the 2nd-degree [kinship coefficient <0.0884]). A strict threshold (r2 < 0.1 with the lead variant and PJoint < 5 × 10−08) was used to declare “conditionally independent” signals within loci.

Heritability estimates were calculated using the same UKB individuals of European-ancestry included in the GWAS meta-analysis, using BOLT-REML (v2.3.2)15. BOLT-REML models directly genotyped SNPs to estimate relatedness within a sample and obtains SNP-based heritability estimates. The percentage of variance explained (PVE) by lead and conditionally independent variants was subsequently calculated (Eq. 1)71;

$${{{{{\rm{PVE}}}}}}=\frac{{\left[ {2*({{{{{\rm{beta}}}}}} ^\wedge \,2)*{{{{{\rm{MAF}}}}}}*\left( {1 - {{{{{\rm{MAF}}}}}}} \right)} \right]}}{{\left[ {2*({{{{{\rm{beta}}}}}} ^\wedge 2)*{{{{{\rm{MAF}}}}}}(1 - {{{{{\rm{MAF}}}}}})+(({{{{{\rm{se}}}}}}({{{{{\rm{beta}}}}}})) ^\wedge 2)*2*{{{{{\rm{N}}}}}}*{{{{{\rm{MAF}}}}}}*(1 - {{{{{\rm{MAF}}}}}})} \right]}}$$
(1)

Variant annotation

Lead and conditionally independent variants (and their proxies [r2 > 0.8]) were annotated using Variant Effect Predictor (VEP, Ensembl release 99) to identify potential functional consequences16. VEP also contains data from prediction tools Sorting Intolerant From Tolerant algorithm (SIFT, version 5.2.2)72 and PolyPhen-2 (v2.2.273), which supplied deleteriousness scores. In addition, CADD74 and RegulomeDB75 scores for each of these variants were extracted. CADD scores annotate coding and non-coding variants, and enable ranking of their potential deleteriousness compared with other variants in the genome74.

Association with tissue-specific gene expression

To identify relationships between lead and conditionally independent variants (and their proxies), with tissue-specific gene expression, cis-eQTL data was extracted from the GTEx portal (v8)17,76,77. Tissues included in these analyses were cardiac (LV and RAA) and vascular (coronary and aorta artery), for their known influence on cardiovascular disease. If a variant was also a lead cis-eQTL variant, colocalization analysis were performed at the locus using the R package COLOC, to determine whether the variant was causal in both the GWAS meta-analysis and the eQTL study78. These colocalization analyses use Bayesian statistical methods to calculate a posterior probability (PP) for the variant being causal in both analyses (PP > 75%).

To predict the effects of gene expression levels on spQRSTa and fQRSTa, we performed a TWAS using S-PrediXcan. S-PrediXcan is an extension of the original software PrediXcan and infers results using GWAS summary statistics, thus removing the need for individual-level genotype and phenotype data79. S-PrediXcan provides a precalculated transcriptome model database from GTEX-based tissues and covariance matrices of SNPs within each gene model (https://github.com/hakyimlab/MetaXcan). We used European meta-analysis summary statistics for these analyses and tested for association in a total of 16,097 genes across LV, RAA and vascular tissues. A Bonferroni corrected threshold (0.05/number of genes tested [16,097] = 3.1 × 10−6) was used to declare significance and results are only reported when more than one SNP was included in the model.

Tissue- and cell-type specific regulatory elements

GARFIELD (v2) was used to identify tissue-specific enrichment of variants at DNase I hypersensitivity sites80. GARFIELD annotates variants with data from the ENCODE, GENCODE and Roadmap Epigenomics projects and calculated odd ratio using a generalised linear model framework80.

Chromatin interaction data was used to identify target genes of regulatory variants (RegulomeDB score ≤3b) in LV and RV tissues. First, using FUMA GWAS (Functional Mapping and Annotation of Genome-Wide Association Studies) software (v1.3.6), overlap was identified between lead and conditionally independent variant, and pre-processed loops determined by Fit-Hi-C pipelines18,81. An FDR threshold <0.05 was used to report results. In addition, we performed the same analysis using loops called from recently published Knight-Ruiz normalised 5 kb, 10 kb and 25 kb resolution promotor capture Hi-C data19.

To identify cardiac cell-type specific enrichment of non-coding variants, we utilized accessible chromatin information from snATAC-seq data, for atrial and ventricular cardiomyocyte, smooth muscle, endothelial, adipocyte, macrophage, fibroblast, lymphocyte and nervous cells21. Using PLINK, our GWAS meta-analysis summary statistics were partitioned into haplotype blocks centered on each lead variant (r2 > 0.1 within a 2 Mb radius). Peaks within the lowest decile of total read counts from the snATAC-seq data were removed using a SNP enrichment method CHEERS (version accessed 2020)20, followed by quantile normalization of the remaining peak counts20. Enrichment of variants (one-sided P) within the ATAC-seq peaks was estimated and a Bonferroni-corrected threshold (0.05/number of cell-types) used to report significant findings.

Candidate gene prioritisation and pathway enrichment

To identify additional candidate genes at each locus, DEPICT (v3) software was used, that prioritizes genes according to common functional pathways. DEPCT calculates a membership probability for each gene within 14,461 reconstituted gene-sets22. Additional analyses were performed using DEPICT to identify pathway enrichment of these genes using Gene-Ontology (GO), Kyoto Encyclopaedia of Genes and Genomes (KEGG), REACTOME and the Mouse genetics initiative (MGI) data. DEPICT also performs gene-set tissue enrichment analyses using annotations from human Affymetrix microarray probes. For all analyses, an FDR < 0.01 was used to identify significant results. To visualise GO biological processes from the DEPICT spQRSTa multi-ancestry meta-analysis output were analysed using the REVIGO web application to remove redundant terms and cluster related nodes23. They were subsequently visualised using Cytoscape (v3.8.2)82.

The output of all bioinformatic analyses were pooled and supplemented with trait relevant information from Online Mendelian Inheritance in Man (OMIM)24 and International Mouse Phenotyping Consortium25 (IMP, www.mousephenotype.org) databases, the Human Protein Atlas26 (www.proteinatlas.org) and a PubMed literature review (Supplementary Data 14). We also performed a look up of each lead variant in the Open Targets Genetics “Locus to Gene” machine learning pipeline, which uses supervised learning to weight evidence from different sources and prioritize genes at a locus27. This database does not include trait-specific information in the pipeline and therefore it is used to supplement the analyses performed for this work. For each locus, the candidate gene with the most support across all lines of evidence is indicated. We also included a second gene if there is support from multiple analyses.

LD score regression

LD score regression with LDSC (v1.0.1), was performed to calculate the genetic correlation of the spQRSTa and fQRSTa with other ECG traits including PR, QRS, JT and QT intervals28. LDSC (v1.0.1) uses pre-computed LD scores and therefore these analyses were performed with European ancestry summary statistics only. These LD scores are used as weights in the regression model29.

Overlap of spQRSTa loci with other resting ECG traits and association with clinical phenotypes

Lead variants previously reported for other ECG markers including P-wave duration, atrioventricular conduction (PR interval12), ventricular depolarization (QRS duration83 and QRS voltage84), ventricular repolarization (JT83, QT83 and Tp-Tend intervals85) and HR86 were tested for overlap with spQRSTa loci (definition of overlap; if previously reported lead variants were within ±500 kb or r2 > 0.1 of the lead spQRSTa variant). Summary statistics for each ECG trait were also extracted and pairwise-GWASs performed using Bayesian bivariate analyses as implemented in GWAS-PW30. GWAS-PW combines GWAS summary statistics using the variance of effect sizes at each SNP to estimate the probability that a given genomic region contains a variant that influences both traits or distinct associations, and learns reasonable thresholds from the data to declare significance. A pairwise GWAS was performed with the summary statistics of the multi-ancestry spQRSTa meta-analysis and each ECG trait of interest. To account for sample overlap between summary statistics, an expected correlation (-cor in GWAS-PW) between two traits was specified for each analysis87. The values used after adjusted for estimated sample overlap were; −0.0045, −0.0258, 0.0539, 0.0107, 0.0045 and 0.0135 for QT, JT, QRS, PR, HR and TpTe respectively. A posterior probability >0.9 was used as evidence supporting the presence of a causal SNP within the genomic region that influences both traits.

To identify evidence of pleiotropy with clinical conditions, a PheWAS was performed using the R package PheWAS (v0.99.5-5)88. ICD-10 and 9 codes were extracted from UKB hospital episode statistics and mapped to phecodes. Lead and conditionally independent variants from the European ancestry spQRSTa meta-analysis were subsequently tested for association with each phecode in 395,758 European individuals. Related pairs were excluded (kinship coefficient >0.0884). A Bonferroni corrected threshold for the number of phecodes tested (0.05/1,301 = 3.8 × 10−5) was used to declare significance. To identify evidence for pleiotropy with non-cardiac phenotypes and diseases from previously reported GWAS, a look-up was performed of lead and conditionally-independent spQRSTa variants (and proxies, r2 > 0.8) using Phenoscanner v289,90. Associations reaching GWS with other traits and diseases were extracted.

Sensitivity analyses

To determine whether the QT interval significantly influences the findings from our spQRSTa meta-analyses, sensitivity analyses were performed in UKB (N = 34,361). Analyses were repeated without the QT interval as a covariate. Spearman rank correlations (rho [ρ]) for beta estimates and -log10 P-values, were calculated across all variants between the original model and the sensitivity analysis.

Association between genetically determined spQRSTa and fQRSTa with cardiovascular diseases

A PRS was calculated for each trait using lead variants from the European meta-analysis, to test for association with atrial fibrillation, stroke, coronary artery disease, conduction disease, heart failure, non-ischaemic cardiomyopathy and ventricular arrhythmia. Analyses were performed in individuals of European ancestry in UKB (N = 395,758). Participants included in the GWAS meta-analysis were excluded, along with related pairs up to the 2nd-degree (kinship coefficient <0.0884). To take advantage of genotype probability data in BGEN format, PRSice-291 was used. The PRSs were calculated by summing the dosage of lead variants weighted by the effect size from the corresponding raw-phenotype meta-analysis. Disease status for each cardiovascular outcome of interest was extracted using ICD-9/ICD-10 codes from hospital admission episodes, self-reported data, operation codes and death certification (Supplementary Note 2). Associations were identified for prevalent and incident cases using a logistic regression model, including covariates age, sex, genotyping array and ten genetic principal components. A Bonferroni-corrected threshold of 0.05/number of outcomes tested (0.05/8 = 6.3.1 × 10−3) was used to declare significant associations.

Relationship between spQRSTa and HCM and DCM

The TwoSampleMR R package (v0.5.6), was performed to test for association of spQRSTa with cardiomyopathies, using data from cohorts with HCM and DCM92. First, summary statistics from a previously reported multi-ancestry (2780 cases, 47,486 controls) and European (2,244 cases, 42,668 controls) HCM GWAS were retrieved32. Summary statistics for multi-ancestry sarcomere positive (871 cases, 20,142 controls) and sarcomere negative (1874 cases, 27,344 controls) HCM GWAS were also extracted. The HCM GWAS included UKB participants as controls; however, there was no overlap of individuals included in the spQRSTa meta-analyses. Multi-ancestry (61 variants) and European (51 variants) IVs were constructed from GWS variants in the rank-based inverse normal transformed spQRSTa meta-analysis, with the corresponding β, SE and P retrieved from the untransformed meta-analysis to facilitate clinical interpretation. Effect alleles were harmonised between IVs and HCM summary statistics. Two variants, rs398110577 and rs35185344, from the multi-ancestry IV were unavailable within the HCM summary statistics and proxies were selected, rs4946230 (r2 = 0.70) and rs12928779 (r2 = 0.98), respectively. Four different methods were performed, specifically IVW, MR-Egger, weighted median and MR-PRESSO (mendelian randomisation pleiotropy residual sum and outlier), using MR-Base92,93. Results are reported as OR (95% CI) for risk of HCM per 1° increase in genetically determined spQRSTa. The same process was followed to test for association with DCM, but with the following differences. Summary statistics from a European ancestry “sporadic” DCM GWAS (2651 cases, 4329 controls) were used33. Sporadic DCM was defined as a reduced LV ejection fraction and enlarged LV end-diastolic volume/diameter in the absence of any obvious pathology33. For these analyses, one variant from the European IV was not available (rs2668692), therefore a suitable proxy was selected (rs10514897, r2 = 0.78).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.