Genome-wide association meta-analysis of 30,000 samples identifies seven novel loci for quantitative ECG traits

Genome-wide association studies (GWAS) of quantitative electrocardiographic (ECG) traits in large consortia have identified more than 130 loci associated with QT interval, QRS duration, PR interval, and heart rate (RR interval). In the current study, we meta-analyzed genome-wide association results from 30,000 mostly Dutch samples on four ECG traits: PR interval, QRS duration, QT interval, and RR interval. SNP genotype data was imputed using the Genome of the Netherlands reference panel encompassing 19 million SNPs, including millions of rare SNPs (minor allele frequency < 5%). In addition to many known loci, we identified seven novel locus-trait associations: KCND3, NR3C1, and PLN for PR interval, KCNE1, SGIP1, and NFKB1 for QT interval, and ATP2A2 for QRS duration, of which six were successfully replicated. At these seven loci, we performed conditional analyses and annotated significant SNPs (in exons and regulatory regions), demonstrating involvement of cardiac-related pathways and regulation of nearby genes.

While HapMap included only 270 samples (30 trios and 90 unrelated samples) from three continental populations [11], the 1000 Genomes Project Phase 3 contains 2504 samples from 26 populations [14]. Larger reference panels cover a broader variety of haplotypes and, therefore, increase the quality of imputation in a GWAS sample. Moreover, the number of observed SNPs also increases, expanding the number available for imputation. This has led to novel findings in non-ECG related studies [15].
In the current study, we meta-analyzed genome-wide data on four ECG traits in 30,000 predominantly Dutch samples. We tested over 19 million SNPs for association, which were imputed using the Genome of the Netherlands (GoNL) reference panel [16]. This dataset contains whole-genome sequencing data at 12x coverage collected in 250 families (trios and parents with two offspring). Nearly all polymorphic sites with a population frequency of more than 0.5% are captured. This makes it one of the largest single population sequencing efforts worldwide and the trio design ensures very accurate haplotype phasing. These features and the good match with the predominantly Dutch cohorts, make this dataset well suited as a reference panel for imputation. Using this approach, we had two aims: (1) the discovery of novel loci associated with ECG traits, and (2) the finemapping and functional annotation of known regions associated with ECG traits. We increased our SNP density almost seven-fold compared to previous studies based on HapMap, enabling us to study key signals in much finer detail.

Individual cohort data
Eight cohorts were included in the discovery phase of this study, totaling approximately 30,000 samples (Supplementary Tables 1 and 2, Supplementary Notes). Most study participants were Dutch with the exception of most participants of PROSPER; this study included approximately 19% samples of Dutch origin, while the remaining samples were of other European descent. All cohorts performed stringent quality control to exclude low-quality samples and SNPs prior to imputation and also post-imputation. Imputation was performed using 998 phased haplotypes from the Genome of the Netherlands Project release 4 as the reference panel, encompassing 19,763,454 SNPs [16]. All genomic data in this manuscript is listed with respect to the hg19 (build37) reference genome.
We evaluated four phenotypes on the electrocardiogram: RR interval, PR interval, QRS duration, and QT interval. Seven out of eight cohorts contributed data to all four phenotypes; NTR only had data on RR interval available.
Samples of non-European descent and samples with missing data were excluded, as well as individuals that fulfilled any of the exclusion criteria listed in Supplementary  Table 3. SNPs were individually tested for association with each trait using linear models. For all four phenotypes, we included age, sex, height, BMI, and study specific covariates (for instance to correct for study site, relatedness, or population stratification) as covariates. In addition, RR interval and hypertension (in those cohorts that had data available on this measure) were included as covariates for QT interval to reduce noise introduced by these factors. We chose these covariates to correspond with previously published GWAS on these four ECG traits.

Quality control and meta-analysis
Association results from all cohorts were collected at a single site and underwent quality control. SNPs with extreme values of beta (>1000 or <−1000), standard error (SE) (>1000), or imputation quality (<0.1 or >1.1) were removed and distributions of beta, SE, and P-values were manually checked. We made QQ-plots to test P-value distributions, which were stratified by minor allele frequency and by imputation quality. Aberrant subsets of SNPs (usually with very low frequency) were removed from downstream analyses.
Inverse-variance fixed-effect model meta-analyses were conducted for all four traits using MANTEL [17]. For each individual GWAS, genomic inflation factors (lambda) were calculated and, during meta-analysis, standard errors were adjusted accordingly to correct for population structure and technical errors. We did not correct for genomic inflation after meta-analysis. SNP associations were considered significant if P ≤ 5 × 10 -8 .

Follow-up on known and novel loci
For each locus, we tested the number of independent signals using the LD structure from GoNL in GCTA-COJO, which was designed to allow conditional analyses based on summary-level data [18]. Secondary hits had to fulfill two criteria: (1) genome-wide significant in the GWAS, and (2) P < 1 × 10 -5 after conditioning to correct for multiple testing of 4757 significant SNPs across all four traits. A novel locus for a trait was defined if the significant SNPs, or SNPs within a distance of 1 Mb upstream and downstream of the significant SNPs, had not been observed before in GWAS of the same trait. We performed a look-up of all novel loci in previous HapMap-based GWAS.

Replication of novel loci in CHARGE
We sought to replicate our findings in 13 independent cohorts taking part in the CHARGE consortium [19] (Supplementary Tables 1 and 2, Supplementary notes). Twelve studies (TwinsUK, CHS, ARIC, KORA F3, KORA S4, JHS, AGES, BRIGHT, YFS, INGI-FVG, and INGI-CARL) used 1000 Genomes Phase 1 as their imputation reference panel and a single study (Inter99) provided only genotyped data. All studies contained samples of European ancestry, except for JHS, which consisted only of African-American samples. The summary-level results for all novel SNPs determined in the discovery analysis were combined in inverse-variance fixed-effects meta-analyses. A two-sided P-value ≤ 0.05, in conjunction with a concordant effect direction, was considered significant.

In silico tests of possibly functional SNPs
We looked up the functional annotations for all SNPs that reached genome-wide significance in any of the four traits. First, we checked whether SNPs were potentially damaging to protein function, testing all non-synonymous SNPs in SIFT [20] and PolyPhen-2 [21]. Second, we used GREAT [22] to identify biological pathways in which regulatory SNPs are involved, testing the index SNPs for all locus-trait associations. Lastly, we tested all significant SNPs one by one for their possible effect on regulatory regions using RegulomeDB [23].

Meta-analysis detects novel loci
We conducted a GWAS meta-analysis comprising eight cohorts that together encompassed approximately 30,000 samples. Over 19 million SNPs, imputed using the GoNL reference panel, were assessed for association with four quantitative ECG traits: RR, PR, QRS, and QT. Considering all traits, we observed 52 locus-phenotype associations (17 for PR, 13 for QRS, 15 for QT, and 7 for RR; Supplementary Figures 1 Table 4). A locus was defined as an associated region (containing one or more SNPs with P ≤ 5 × 10 -8 ) that is located at least 1 Mb away from the next (i.e., if two associated SNPs are within 1 Mb, they belong to the same locus). Of these 52 loci, 45 have been observed before in large GWAS meta-analyses [2][3][4][7][8][9] and seven are novel findings (Table 1). Box 1 shows regional association plots and provides additional information on the seven novel loci. Imputation qualities of the index SNPs were 0.60 and 0.84 for the relatively rare KCNE1 and KCDN1 variants, respectively, and >0.96 for the remaining common index SNPs. The variance explained by each of these variants ranges between 0.09 and 0.23%.

Fine-mapping of known loci
For each locus, we tested if more than one independent signal was present (Supplementary Table 4). Thirteen loci had suggestive evidence of having more than one independent signal; four locus-phenotype associations had five or more independent signals. The SCN5A/SCN10A locus was the most outstanding locus with eleven independent signals for PR, and six for QRS. NOS1AP for QT contained seven independent signals.

Replication in CHARGE
For six out of seven novel loci, we were able to conduct look-ups of the index SNP or a proxy SNP in strong LD (r 2 ≥ 0.89) in previous large-scale HapMap-based GWAS. These GWAS contained over 70,000 samples each, and included many of the Dutch cohorts from our current study. All six loci were associated with their respective traits (P ≤ 0.004). Next, we tested the seven novel loci for replication in 13 studies from the CHARGE consortium. In contrast to the HapMap look-ups, this replication was independent from the Dutch discovery sample. Results are shown in Table 1. Allele frequencies were very similar to the discovery dataset, except for JHS, which consists of individuals of African-American descent. Effect directions for all seven SNPs were concordant between our primary findings and replication, with effect sizes between 0.2 and 1.5 times those of the betas in the discovery study. Six of seven loci were replicated with P < 0.05, three of which pass Bonferroni correction, accounting for seven tests.

Functional SNPs in genes and regulatory regions
All genome-wide significant SNPs were tested in silico for their potential effect on gene expression and protein structure. Ten loci contained, in total, 15 non-synonymous SNPs, which were tested using the prediction programs PolyPhen-2 and SIFT. According to PolyPhen-2, three SNPs were possibly damaging (rs1805128 in KCNE1 for QT, rs12666989 in UFSP1 for RR, and rs2070492 in SLC22A14 for PR). SIFT predicted only one SNP to be damaging to a protein (rs3746471 in KIAA1755 for RR).
We used GREAT to test all 100 index SNPs from the four ECG traits combined for their biological function in cis-regulatory regions. Significant GO-terms (molecular function, biological process, and cellular component), human phenotypes, and disease ontologies are shown in Supplementary Table 5a-d. In total, these index SNPs mapped to 103 genes.
Of 52 locus-phenotype associations, 34 contained significant SNPs that have a RegulomeDB score of 3 or better, Using GoNL as reference panel in~30,000 samples mostly of Dutch descent, we found seven loci not previously identified or (in the case of KCNE1 for QT interval) not consistently replicated in previous genome-wide association studies. We conducted look-ups of these SNPs (or proxy SNPs in strong LD if the SNPs were not present in HapMap) in their respective HapMap-based metaanalyses and replicated six out of seven in a combined analysis of 13 CHARGE cohorts imputed with 1000 Genomes Phase 1. All effect estimates and allele frequencies are with respect to the coded allele meaning that they may affect protein binding (Supplementary Table 6). We observed 15 loci containing SNPs with scores of 1 (likely to affect binding and linked to the expression of a gene target), 15 loci containing SNPs with a maximum score of 2 (likely to affect binding), and four loci that have SNPs with a maximum score of 3 (less likely to affect binding). Eighteen loci contained only SNPs with scores from 4 to 6 (minimal binding evidence) and 7 (no data available).

Discussion
We imputed over 19 million SNPs using GoNL as the reference panel, and tested these SNPs for association with four traits in eight predominantly Dutch cohorts comprising roughly 30,000 samples. We observed 52 locus-phenotype associations, seven of which were novel (

Discovery of loci associated with quantitative ECG traits
We detected seven novel loci, three for PR interval, three for QT interval, and one for QRS duration (Box 1). No novel loci were found for RR interval, accounting for loci previously associated with either RR interval [4] or heart rate [5]. We replicated six out of seven novel loci utilizing 13 independent studies from the CHARGE consortium. Interestingly, the only variant that does not replicate is rs74640693 for PR interval, located close to PLN (phospholamban). Variants in this gene have been consistently associated with various QRS measures [6] but not with PR interval. The gene transcribes the phospholamban protein, which is important in calcium signaling in cardiac muscle cells [24]. Although a Dutch-specific pathogenic mutation, p.Arg14del, in the PLN gene has been described [25], it is unlikely that this mutation drives the association signal in our study because the allele frequency of SNP rs74640693

Box 1
Seven novel loci were identified; three for PR, three for QT, and one for QRS. Information and regional association plots are shown for every locus. Each SNP is plotted with respect to its chromosomal location (hg19, x-axis) and its P-value (y-axis on the left). The tall blue spikes indicate the recombination rate (y-axis on the right) at that region of the chromosome. We observed two independent signals at the KCND3 gene. The first signal consists of low-frequency SNPs (MAF < 3.8%, index SNP MAF = 2.4%) upstream of KCND3 (top), while the second signal contains intronic SNPs with much higher allele frequencies (index SNP MAF = 19.6%, bottom). KCND3 encodes voltage-gated potassium channel subunit K v 4.3. SNPs near KCND3 have been associated with P-wave duration and ST-T wave amplitude [29], and with Atrial Fibrillation in the Japanese population [30]. It is thought that KCND3 overexpression may be involved in Brugada syndrome because of its direct interaction with KCNE3. This gene inhibits KCND3, and specific mutations in the latter gene lead to Brugada syndrome [31,32]. Moreover, it has been shown that mutations in KCND3 cause spinocerebellar ataxia [33] ( Fig. 1a, b). The association signal in this locus spans the NR3C1 gene, with the two genome-wide significant SNPs located between NR3C1 and ARHGAP26. Both SNPs are common, with MAFs of approximately 45%. NR3C1 encodes the glucocorticoid receptor, which interacts with a wide variety of proteins, transcription factors, and other cellular compounds [34]. In mice, this gene is involved in cardiac development [35], and overexpression causes ECG abnormalities [36], which makes it likely that this is the gene underlying the association signal. ARHGAP26 encodes GRAF protein (GTPase Regulator Associated with Focal Adhesion Kinase), which is required in specific exo-and endocytosis pathways [37], but also for muscle development [38]. Mutations in this gene have been implicated in leukemia [39] (Fig. 1c). Fig 1d: This locus has been associated previously with RR interval [4], QT interval [8,9], and QRS duration [7]. The index SNP has a MAF of 5.4% and the association signals spans SLC35F1 and PLN. The latter gene encodes phospholamban, which is an important regulator of cardiac contractility [40]. SLC35F1 encodes a transporter protein that is highly expressed in the human brain [41] (Fig. 1d). Although only one (common, MAF = 32.2%) SNP reached genome-wide significance, SNPs in strong LD with the index SNP span an area of almost 500 kb, covering many genes. This locus has been associated with QT interval previously [10]. Our most significant SNP is located just downstream of ATP2A2, a strong candidate gene in this region that encodes a SERCA Ca 2+ ATPase, which is involved in calcium transport in the human heart and under regulation of phospholamban [42] (Fig. 1e). This locus spans~300 kb in between two recombination hotspots. Significant SNPs are in almost complete LD with each other, with minor allele frequencies of approximately 15%. The locus spans two genes, SGIP1 and TCTEX1D1. SGIP1 encodes a proline-rich endocytic protein that interacts with endophilin and is involved in energy homeostasis [43,44]. This gene is mainly expressed in the human brain [43] and has been associated with fat mass [45]. The TCTEX1D1 gene belongs to the dynein light chain Tctex-type family and has an unknown function (Fig. 1f). The most significant SNPs in this locus are located upstream of the NFKB1 gene, encoding the NF-kappa-B p105 subunit. SNPs in this locus are common (MAF = 43.5%). An indel in the promotor of this gene has been associated with coronary heart disease [46] and dilated cardiomyopathy [47]. This particular indel is in moderate LD with the index SNP in this locus (r 2 in GoNL = 0.4). NFKB1 is a transcription factor is involved in many immune-and tumor-related processes, and has been associated with ulcerative colitis [48] and bladder cancer [49] ( Fig. 1g). This locus contains a low frequency SNP (MAF = 1.7%) with a large effect on QT interval. This SNP has been observed in GWAS before, but could not be replicated (in this [8] and later studies [10]) because it was poorly imputed so only cohorts that genotyped the SNP directly could be included [8]. KCNE1 encodes a voltage-gated potassium channel, and the index SNP encodes a pathogenic Asp to Asn amino acid substitution at position 85 of KCNE1, causing long QT syndrome 5 [50] (Fig. 1h).
is similar in our samples (4.9%) compared to other samples of European ancestry (4.6% in the 12 European CHARGE replication cohorts). Furthermore, the allele frequency of this SNP is~5 times higher than that of the mutation and the SNP is located~200 kb upstream of the PLN gene, so, therefore, not in LD with these mutations. In addition, a  (a, b). ARHGAP26 and NR3C1, associated with PR interval (c). SLC35F1 and PLN, associated with PR interval (d). ATP2A2, associated with QRS duration (e). SGIP1 and TCTEX1D1, associated with QT interval (f). NFKB1, associated with QT interval (g). KCNE1, associated with QT interval (h) recent large study of PR interval used the Illumina exome chip to identify a common variant (rs74640693, allele frequency 47%) in this region [26], however, this variant is not in LD with the variant that we identified (r 2 = 0.04). To confirm that the lack of association was not caused by strand issues (because rs74640693 is an A/T variant), we tested the nearby proxy SNP rs12210733 (which is an A/G variant, r 2 = 0.89) in the CHARGE replication cohorts, and found it was also non-significant. We looked up our top SNPs in previous, much larger, HapMap-based GWAS meta-analyses to determine why our SNPs were not identified in those studies (Table 1). Two loci contained rare SNPs with MAF < 5%. Low-frequency SNPs at KCND3 were not present in HapMap and could therefore not be tested. The functional SNP at KCNE1 was observed in a single cohort in a meta-analysis in 2009, but this result could neither be replicated in other cohorts [9], nor in later studies, because the imputation quality was too low.
For common SNPs (MAF > 5%), it is much more difficult to define why they were not previously observed at genome-wide significance. For many loci we may have better tags of the causal variants because our coverage is almost sevenfold greater. Indeed, the index SNPs at PLN (PR), NFKB1 (QT), and ATP2A2 (QRS) were not tested in previous studies. Nevertheless, for all SNPs, proxies with r 2 > 0.9 were available in the respective studies (Table 1). Common SNPs at KCND3 (PR), NR3C1 (PR), and SGIP1 (QT) were present in HapMap. Both proxies and directly imputed SNPs were at least nominally significant in previous studies (P-values ranging from 10 -3 to 10 -6 ) with typically high imputation quality.
In addition to the "winner's curse" effect, we expect that higher quality imputation due to the considerably larger haplotype panel (compared to HapMap) and the ancestry matching between GoNL and our Dutch cohorts will improve the power to detect a true association signal, if present. Although combining multiple reference panels for imputation is becoming the new standard [27], limitations to our study remain: (1) the GoNL reference panel may not contain sufficient information on rare SNPs; (2) the small sample size of individual cohorts may cause abnormal behavior of rare SNPs as a group, requiring us to remove that subset of SNPs; or (3) the sample size or power of the overall study is still limited to detect rare variant associations.

Fine-mapping of known loci
Although we did not sequence the loci containing the known and novel signals, we have a much denser interrogation of these regions compared to previous (HapMapbased) studies. In an attempt to fine map the significant loci, we annotated all significant SNPs with their predicted functional consequences.
First, we used SIFT and PolyPhen-2 to predict the effect of 15 non-synonymous SNPs that were associated with one of the ECG traits at genome-wide significance. PolyPhen-2 classified three SNPs as possibly damaging and SIFT predicted only one SNP to be damaging. These were nonoverlapping, raising questions as to the actual effect of these SNPs on their respective genes. Functional studies should be pursued to test the direct effect of these SNPs on protein structure.
Combining all index SNPs, we tested the function of those SNPs located in cis-regulatory regions using GREAT [22]. We identified 100 independent SNP-trait associations, which mapped to 103 genes. Interestingly, we find hundreds of significant nodes, of which the vast majority is related to cardiac functioning and heart disease (Supplementary Table 5a-e). This shows that, indeed, many SNPs are located in cis-regulatory regions of genes that are critical in the functioning of the human heart, which implicates a regulatory function of these loci rather than a structural function changing the protein directly. One example is shown in Supplementary Figure 3; this figure contains all significant GO molecular function nodes. Most of these nodes are in the group of transporter activity, which includes all transmembrane channels that are known to be important for cardiac function.
Because the GREAT pathways show that many SNPs probably have their effect on the trait due to gene regulation, we extracted all significant SNPs from RegulomeDB to check which variants would likely affect binding in regulatory regions. A majority of loci contained at least one SNP that was expected to affect transcription factor binding (Supplementary Table 6). The score that is provided by RegulomeDB indicates that a SNP is likely (or less likely) located in a binding site. Interestingly, there are strong differences between loci in terms of the number of SNPs that may have a regulatory effect, and percentage of loci per trait that have a high score. For instance, seven out of 15 QT interval loci contains SNPs with a score of 1, while only a single PR interval locus contains a SNP with this score. The SCN5A/SCN10A locus is strongly associated with PR interval (best SNP P = 4.9 × 10 -107 ) and contains over 450 significant SNPs. Nevertheless, only six SNPs have a score of 2 or 3, and none of the significant SNPs have a score of 1. However, many binding sites are tissue specific, and, therefore, testing SNPs with high scores systematically for their role in cardiac tissue could lead to more knowledge about their biological function.

Conclusions
Using the Genome of the Netherlands as imputation reference panel, we identified seven novel loci for quantitative ECG traits. Higher SNP density and higher imputation quality enabled us to annotate known loci, facilitating future studies to understand the biological effects of causal variants at many loci. Ultimately, combining imputation reference panels and increasing sample size for GWAS meta-analyses will continue to increase power for genetic discovery and novel target identification. With many sequencing efforts ongoing and large population-based cohorts being genotyped (such as UK Biobank, of which the first release data showed 46 novel loci for RR interval [28]), we can expect hundreds of novel loci for ECG phenotypes in the near future.
Funding Folkert W. Asselbergs is supported by UCL Hospitals NIHR Biomedical Research Centre.

Compliance with ethical standards
Conflict of interest de Bakker is currently an employee of and owns equity in Vertex Pharmaceuticals. M.J. Caulfield is Chief Scientist for Genomics England a UK Government company. The remaining authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/.