Sequence variants with large effects on cardiac electrophysiology and disease

Features of the QRS complex of the electrocardiogram, reflecting ventricular depolarisation, associate with various physiologic functions and several pathologic conditions. We test 32.5 million variants for association with ten measures of the QRS complex in 12 leads, using 405,732 electrocardiograms from 81,192 Icelanders. We identify 190 associations at 130 loci, the majority of which have not been reported before, including associations with 21 rare or low-frequency coding variants. Assessment of genes expressed in the heart yields an additional 13 rare QRS coding variants at 12 loci. We find 51 unreported associations between the QRS variants and echocardiographic traits and cardiovascular diseases, including atrial fibrillation, complete AV block, heart failure and supraventricular tachycardia. We demonstrate the advantage of in-depth analysis of the QRS complex in conjunction with other cardiovascular phenotypes to enhance our understanding of the genetic basis of myocardial mass, cardiac conduction and disease.

T he QRS complex (Fig. 1) of the electrocardiogram (ECG) is a summation of electrical activity in the heart during ventricular depolarisation. It represents electrical activation of the left and right ventricles of the heart, propagated through the specialised conduction system 1 , which includes the His bundle, right and left bundle branches and the Purkinje network. Normal ventricular depolarisation is a rapid process activating different ventricular myocardial segments in a precise temporal sequence, resulting in a narrow QRS complex with a characteristic pattern on the 12-lead ECG 1 . Deviation from the norm has been associated with sudden cardiac arrest (prolonged ventricular activation time) 2,3 and mortality in the general population (prolonged QRS duration) 4 and among individuals free of cardiovascular disease (low QRS voltage) 5 .
Many traits and diseases affect the morphology of the QRS complex, including body habitus, primary conduction abnormalities, hypertrophy and dilatation of the ventricles, myocardial infarction, pericardial effusion and lung disease 6 . Measures of the QRS complex have been used as prognostic indicators and markers of heart disease severity, such as heart failure 7 , ventricular hypertrophy 8 and amyloidosis 9 .
To gain a better understanding of the molecular mechanism of ventricular conduction, we perform a large GWAS of ten measures of the QRS complex in 12 leads from ECGs of 81,192 individuals. These ten measures are the Q, R and S wave amplitudes and durations, QRS complex area and duration, ventricular activation time, and the distance from the peak of the R wave to the nadir of the S wave (QRS p-2-p). We analyse each measure for 12 ECG leads: limb leads (I-III), augmented limb leads (aVR, aVL, aVF) and precordial leads (V1-V6). We also analyse three other parameters: mean QRS duration over 12 leads, the Cornell voltage criterion and the Sokolow-Lyon voltage criterion. In total, we analyse 123 QRS complex parameters and this results in the identification of 190 associations between the QRS complex and variants at 130 loci, for which we further assess effects on seven echocardiographic traits and 22 cardiovascular diseases. We demonstrate the advantage of analysing 12 leads of the ECG and identify rare protein-coding variants in genes that can improve the understanding of risk and progression of heart disease and help direct future studies.

QRS associations.
We performed a GWAS on 81,192 individuals, testing 32.5 million common and rare (MAF > 0.01%) SNPs and indels for association with 123 QRS complex parameters ( Supplementary Fig. 1, Supplementary Tables 1 and 2). The variants were identified by whole-genome sequencing 15,220 Icelanders and imputed into 151,677 long-range phased individuals and their relatives 16 . We adjusted the threshold for genome-wide significance with a weighted Bonferroni procedure, using as weights the predicted functional impact of association signals 17 (Supplementary Table 3). Sequence variants at 130 loci (MAF > 0.1%) associate with at least one QRS complex parameter. Using conditional analysis, we identified 60 secondary signals at 32 of the loci, resulting in 190 distinct associations (Supplementary Data 1 and 2). We replicated associations at all 54 reported QRS loci with at least one of the 123 QRS parameters (Supplementary Data 3, Supplementary Table 4; P < 0.05/123 = 4.1 × 10 −4 ).
We found genome-wide significant associations with 115 of 123 QRS parameters tested. There were no genome-wide significant associations with eight Q wave amplitude and duration parameters, of which four had relatively few available measurements (N < 15,000). Of the 190 variants, 160 associate genomewide significantly with more than one QRS parameter. The R amplitude is the QRS measure that captures the most associations (N = 96), followed by QRS area (N = 85) and R duration (N = 76), while the Cornell voltage criterion captures the fewest (N = 18) ( Supplementary Fig. 2 (N = 9), QRS area (N = 5), QRS p-2-p (N = 4) or ventricular activation time (N = 4).
For the R amplitude, we identified most associations with lead V1 (N = 47, Fig. 2). Fifteen variants associate only with the R amplitude in lead V1. We identified more than twice as many associations for the QRS p-2-p amplitude in lead aVR than in any other lead (N = 47). For the QRS area, we identified most associations with lead II (N = 43); with lead V3 for QRS duration (N = 39); with lead V5 for R duration (N = 33), S duration (N = 33) and S amplitude (N = 31); and with lead V6 for ventricular activation time (N = 28), Q amplitude (N = 24) and Q duration (N = 9). Across all QRS measures, we observed the fewest associations with leads III and aVL.
To assess the genetic relationship between ventricular depolarisation and other electrical functions of the cardiac conduction system, we tested the genetic correlation between different ECG measures. The genetic correlation (Methods) between mean QRS duration and mean PR interval duration is 0.14 (95% CI: 0.06-0.21) and between mean QRS duration and mean QT interval duration is 0.31 (95% CI: 0.23-0.40). We also tested the 190 QRS variants for association with ECG measures reflecting atrial and atrioventricular conduction (P-Q component of the ECG), and ventricular repolarisation (ST-T component) ( Supplementary Fig. 3, Supplementary Data 5). Using a 5% FDR (P ≤ 0.016), 177 of the variants associate with a non-QRS measure of the ECG. Of those, 16 have more significant effects on non-QRS components: the PR interval (at CAV1, OBFC1, SCN5A, SCN10A and TBX3), T amplitude (at KLF12, LMF1, MYBPC3 and RNF207), P amplitude (at SYNPO2L and MYH6) and ST duration (at MYH7B and RNF207). The pattern of magnitude and direction of effects associated with different ECG parameters is highly variable between variants.
Replication. We assessed the associations of unreported QRS variants in (1) QRS parameters similar to ours from 19,885 ECGs of participants in the UK Biobank and (2) a published GWAS 14 on four QRS measures from up to 73,518 participants.
Using data from the published GWAS on QRS duration and three voltage criteria 14 , we tested variants that associated genomewide significantly in the Icelandic material with any of these four parameters. We tested seven variants and all replicated (P < 0.05 and the same direction of effects, Supplementary Data 6). In the smaller UK Biobank data, we tested variants that we expected to replicate (over 99% power). Of 13 variants tested, all had the same direction of effects in both datasets and 11 replicated (P < 0.05 and the same direction of effects, Supplementary Data 7).
Associations with cardiovascular diseases. The 190 QRS variants are a priori more likely to associate with cardiovascular diseases than random sequence variants, and thus we tested them for association with 22 cardiovascular diseases in Icelandic datasets, including cardiac arrhythmias, cardiomyopathy and coronary artery disease (CAD). For 18 of the diseases, we also had data from the UK Biobank 22 and performed a meta-analysis (Supplementary Table 6, Supplementary Data 9).
Pathway analysis. We applied the data-driven expression-prioritised integration for complex traits (DEPICT, Methods) 40 to search for tissues and cell types in which our variants are more likely to be expressed. Enrichment testing of expression in 209 tissues and cell types pointed out cardiovascular tissue (atria, ventricles and arteries) as well as the myometrium and muscles (1% FDR, Supplementary Data 10). Furthermore, DEPICT identified 495 enriched gene sets out of 14,461 reconstituted gene sets   Of the 190 QRS variants, 20 are intronic or coding in genes with higher expression in heart muscle than other tissue types in the Human Protein Atlas (total 201 genes) 41 . Variants in some of these genes have been described by OMIM as causing autosomal dominant heart conditions (MYBPC3, MYH6, MYH7, CTNNA3, RBM20, RYR2, SCN5A) or associated in GWAS with heart diseases (CASQ2, MYH7B, MYO18B, SYNPO2L, TBX5, TRIM63) 27,42,43 , resting heart rate (CCDC141, FHOD3) 44 or non-QRS ECG parameters (ALPK3, RNF207) 45,46 . Other genes have not been implicated in heart diseases (ADPRHL1, KLHL38, SH3BGR). Given both the QRS association and the heart muscle expression, variants in these genes could substantially affect cardiac function.
Using the prior information on expression of genes in the heart, we searched for additional rare (MAF ≤ 1%) coding variants that associate with the QRS complex in these 201 genes. Thirteen additional variants associate with one or more QRS parameters (Supplementary Data 13, P < 0.05/(number of coding variants tested) = 2.4 × 10 −5 ). We found four missense variants at unreported QRS loci: C10orf71, FBXO40, GRM1 and HCN4. We note that HCN4 is a known bradycardia gene 47 . We identified one frameshift and three missense QRS variants where we had identified non-coding QRS variants previously in our study: in CTNNA3, NKX2-5, RNF207 and TBX5. We recently reported the missense variant p.Phe145Leu in NKX2-5 as causing cardiomyopathy and arrhythmias 36 .

Discussion
We used whole-genome sequence data to perform a GWAS on the QRS complex of the 12-lead ECG in 81,192 Icelanders and identified 130 QRS loci. We found 190 distinct QRS variants, of which 106 are at 86 loci that have not been reported for QRS before, and assessed their effects on seven echocardiographic traits and 22 cardiovascular diseases.
We demonstrate an advantage of performing an in-depth examination of the QRS complex when studying the genetic influence on ventricular depolarisation, myocardial mass and associated cardiac disease. Our approach yielded 130 QRS loci, compared to 52 loci identified in a recent GWAS of the QRS complex 14 of a similar size (up to 73,518 individuals) that assessed only the QRS duration and three voltage criteria, measures commonly used clinically. If we had exclusively tested these same four measures, we would have identified associations at only 54 loci. Thus, additional measures of the QRS complex (the QRS area, ventricular activation time and distinct measures of the Q, R and S waves) yielded 76 additional loci.
Many of the 123 parameters tested are correlated, and most variants associate with more than one. However, some variants associate with only one QRS measure, some with only one lead. The R amplitude is the QRS measure that captures most QRS associations, about half of them. The R wave in lead V1 captures half of the R amplitude associations, and 15 variants associate only with the R amplitude in lead V1. Under normal conditions, the R wave is small in lead V1 but gets progressively larger in the precordial leads, usually reaching maximum amplitude in lead V5. The small R wave in lead V1 represents the initial part of ventricular depolarisation. Some conditions may cause an abnormally large R wave in lead V1, including right bundle branch block, type A Wolf-Parkinson-White Syndrome, posterior myocardial infarction, right ventricular hypertrophy and acute right ventricular dilatation 48 . The QRS area captures the second highest number of associations. Although the QRS area has not been used in a GWAS before, it improves identification of left ventricular hypertrophy over standard voltage criteria 49 . The genetic influence on various aspects of the cardiac conduction system is both tightly linked and complex, as the majority of QRS variants associates also with ECG measures reflecting atrial and atrioventricular conduction, and ventricular repolarisation, but with a variable magnitude and direction of effects.
Before our study, only two low-frequency (1% ≤ MAF ≤ 5%) and no rare variants had been shown to associate with the QRS complex in a GWAS 14 . We found 33 unreported large-effect associations with rare or low-frequency coding variants, including 13 associations identified by using a priori expression information. This includes associations with a missense variant in ADPRHL1 that also associates genome-wide significantly with left anterior fascicular block, and a stop-gain variant in HMCN2 that also associates with a high risk of left posterior fascicular block. The rare missense variant p.Ala2397Val in FLNC associates with a larger R amplitude in lead V1 and reduced risk of myocardial infarction, with opposite direction of effects to those of p. Phe1626Serfs*40, a reported frameshift variant that causes cardiomyopathy, thus suggesting an opposite functional effect of p. Ala2397Val 36 . FLNC encodes filamin C, a large actin-crosslinking protein expressed in striated muscles. Filamin C anchors major protein complexes at the sarcolemma, Z-discs and intercalated discs in cardiomyocytes to the actin cytoskeleton and provides a scaffold for a variety of cytoplasmic signalling proteins 50,51 .
We identified 91 associations for 57 of the QRS variants with a spectrum of cardiovascular diseases, including arrhythmias and conduction disorders, hypertension, ischemic disease, cardiomyopathies, and valve diseases. We found 41 unreported associations, including four with supraventricular tachycardia (SVT), one of which is with a low-frequency variant located near KCND3, a potassium channel gene that has been implicated in Brugada syndrome 52 and AF 27 , and is the first genome-wide significant SVT association.
We found an association with an intronic variant in PXN, encoding paxillin, which has not been associated with AF before. Paxillin is expressed at focal adhesions of non-striated cells and costameres of striated muscle cells 35 , and regulates cardiac  53 . The PXN variant, as well as a variant intronic to DGKB, associates primarily with the QRS complex when the allele is inherited paternally. These loci are not known to be imprinted, although paxillin has been shown to regulate expression, not imprinting, of the imprinted genes IGF2 and H19 through long-range chromosomal interactions between the IGF2 and H19 promoters and a shared distal enhancer 54 . A common variant in HLF associates with a decreased risk of heart failure, a lower risk of CAVB and RBBB, a smaller ventricle and a shorter QRS duration. HLF encodes a member of the proline acidic-rich (PAR) protein family, a subset of the bZIP transcription factors that accumulate with robust circadian rhythms in tissues with high amplitudes of clock gene expression. The knockout of HLF along with two other PAR bZip transcription factors has been shown to lead to cardiac hypertrophy and left ventricular dysfunction in mice 55 .
In summary, our results demonstrate the advantage of analysing the QRS complex in a detailed manner. The use of wholegenome sequencing facilitates the discovery of associations with protein-coding variants that implicate particular genes in the biology of ventricular depolarisation and cardiac pathology. The findings provide new insights into the complex genetics of cardiac electrophysiology and will help direct future functional studies of cardiovascular diseases.

Methods
Icelandic data. The ECGs were performed between 1998 and 2015 at Landspitali-The National University Hospital, the sole tertiary care hospital in Iceland. They were done in both inpatient and outpatient setting, using the Philips PageWriter Trim III, Philips PageWriter 200, Philips PageWriter 50 and Philips PageWriter 70 cardiographs, and stored in the Philips TraceMasterVue ECG Management System. We analysed ten measures of the QRS complex from 405,732 ECGs of 81,192 individuals. These ten measures were the Q, R and S wave amplitudes and durations; QRS complex area and duration; ventricular activation time (VAT, time between the onset of the Q wave and the peak of the R wave); and the distance from the peak of the R wave to the nadir of the S wave (QRS peak-to-peak amplitude). We analysed each measure for 12 ECG leads: limb leads (I-III), augmented limb leads (aVR, aVL, aVF) and precordial leads (V1-V6). In addition, we analysed mean QRS duration over 12 leads, the Cornell voltage criterion (for men: (R amplitude in lead aVL + S amplitude in lead V3) × QRS duration; for women: (R amplitude in lead aVL + S amplitude in lead V3 + 0.8 mV) × QRS duration) and the Sokolow-Lyon voltage criterion ((S amplitude in lead V1 + R amplitude in lead V6) × QRS duration). We also used four phenotypes derived from automated ECG diagnoses: left anterior fascicular block (LAFB), left bundle branch block (LBBB), left posterior fascicular block (LPFB) and right bundle branch block (RBBB).
We Genotyping. In Iceland, 32.5 million sequence variants (MAF > 0.01%) were identified by whole-genome sequencing 15,220 Icelanders using Illumina standard TruSeq methodology to a mean depth of 35× (s.d. 8X) and imputed into 151,677 chip-typed individuals and their firstand second-degree relatives 16,56 . In the UK Biobank, genotyping was performed using a custom-made Affimetrix chip (UK BiLEVE Axiom) 57 in the first 50,000 participants and Affimetrix UK Biobank Axiom array in the remaining participants 58 (95% of the signals were on both chips). Wellcome Trust Centre for Human Genetics performed the imputation using a combination of 1000 Genomes phase 3 (ref. 59 ), UK10K 60 and HRC 61 reference panels, for up to 92.5 million SNPs.
Statistical analysis. We adjusted quantitative traits for sex and age at measurement, transformed them into a standard normal distribution using a rank-based inverse normal transformation, and used a linear mixed model implemented by BOLT-LMM 62 to test them for association with genotypes. We used a logistic regression model to test binary traits for association with genotypes. For the deCODE data, we adjusted for sex, county of birth, current age or age at death (first-and second-order terms included), blood sample availability for the individual, and an indicator function for the overlap of the lifetime of the individual with the timespan of phenotype collection. For the UK Biobank data, we used 40 principal components to adjust for population stratification and adjusted for age and sex in the logistic regression model. The UK Biobank study included only white British individuals. In each study, we used LD score regression 63 to account for inflation in test statistics due to cryptic relatedness and stratification. To combine the deCODE and UK Biobank results, we used a fixed-effect inverse variance method based on effect estimates and standard errors 64 . We used a likelihood-ratio test to compute all P values. In conditional analyses, we tested variants within ±1 Mb from the top variant and with correlation r 2 < 0.05 with the top variant, and required the variants to be genome-wide significant after correcting for the top variant in a regression model.
Depict. We used DEPICT 40 to (1) prioritise candidate causal genes at associated loci, (2) highlight enriched pathways, and (3) identify tissues/cell types where genes at associated loci are highly expressed. DEPICT uses gene expression data derived from a panel of 77,840 mRNA expression arrays together with 14,461 existing gene sets based on molecular pathways derived from experimentally verified protein-protein interactions 66 , genotype-phenotype relationships from the Mouse Genetics Initiative 67 , Reactome pathways 68 , KEGG pathways 69 , and Gene Ontology (GO) terms 70 . DEPICT reconstitutes these 14,461 gene sets by calculating for each gene the probability of membership in each gene set, based on similarities across the expression data. Using these membership probabilities and a set of traitassociated loci, DEPICT tests whether any of the 14,461 reconstituted gene sets are enriched for genes at the trait-associated loci, and prioritises genes that share predicted functions with genes at other trait-associated loci. Additionally, DEPICT uses 37,427 human mRNA microarrays to search for tissues/cell types in which genes from associated loci are highly expressed (all genes harbouring variants in LD of r 2 > 0.5 from the most significant variant). We ran DEPICT using all QRSassociated variants. We also used DEPICT to compute pairwise Pearson correlations between all reconstituted gene sets and clustered them by similarity using the Affinity Propagation method 71 .
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The Icelandic population WGS data has been deposited at the European Variant Archive under accession code PRJEB8636. The GWAS summary statistics are available at https:// www.decode.com/summarydata. The data supporting the findings of this study are available within the article, its Supplementary Data files and upon request. The UK Biobank data can be obtained upon application (ukbiobank.ac.uk).