Atrial fibrillation (AF) is a common cardiac arrhythmia resulting in increased risk of stroke. Despite highly heritable etiology, our understanding of the genetic architecture of AF remains incomplete. Here we performed a genome-wide association study in the Japanese population comprising 9,826 cases among 150,272 individuals and identified East Asian-specific rare variants associated with AF. A cross-ancestry meta-analysis of >1 million individuals, including 77,690 cases, identified 35 new susceptibility loci. Transcriptome-wide association analysis identified IL6R as a putative causal gene, suggesting the involvement of immune responses. Integrative analysis with ChIP-seq data and functional assessment using human induced pluripotent stem cell-derived cardiomyocytes demonstrated ERRg as having a key role in the transcriptional regulation of AF-associated genes. A polygenic risk score derived from the cross-ancestry meta-analysis predicted increased risks of cardiovascular and stroke mortalities and segregated individuals with cardioembolic stroke in undiagnosed AF patients. Our results provide new biological and clinical insights into AF genetics and suggest their potential for clinical applications.
Atrial fibrillation (AF) is the most common cardiac arrhythmia, affecting approximately 46.3 million individuals worldwide1. The global prevalence of AF is increasing due to the rapid aging of the general population and intensified search for subclinical AF2. Despite progress in diagnostic and therapeutic technologies, a substantial number of patients with AF are admitted with life-threatening complications such as stroke and heart failure3, causing a considerable burden on patients and public healthcare systems4. Besides conventional clinical risk factors such as aging, obesity, hypertension and heart failure, the genetic contribution to the development of AF is also widely recognized. Recent genome-wide association studies (GWASs) have identified more than 100 AF-associated loci, some of which are involved in cardiac developmental, electrophysiological, contractile and structural pathways5,6,7,8. However, because the vast majority of AF-GWASs have been predominantly performed in European populations, the genetic pathophysiology of AF in non-European populations is not comprehensively understood, and it is difficult to apply polygenic risk scores (PRSs) derived from such GWASs to non-European populations.
Here we sought to explore the genetic architecture of AF in a non-European population and improve the statistical power of AF-GWASs by performing a large-scale Japanese GWAS, followed by a cross-ancestry meta-analysis. Further, we investigated the biological role of the identified AF-associated loci by leveraging gene expression and epigenomic datasets. Additionally, we developed a PRS derived from the cross-ancestry meta-analysis and assessed the impact of the PRS on relevant phenotypes and long-term mortality, which may provide evidence for the clinical utility of AF-PRS and lay the foundation for the realization of precision medicine in AF.
Five new risk loci for AF identified in the Japanese GWAS
An overview of the study design is shown in Fig. 1. We performed a GWAS on the case–control dataset from BioBank Japan (BBJ) that comprised 9,826 AF cases and 140,446 controls, using 16,394,105 variants in the autosomes and 423,039 variants in the X chromosome with a minor allele frequency (MAF) > 0.1%. The GWAS identified 31 AF-associated loci with genome-wide significance, of which five were previously unreported (Table 1, Supplementary Table 1, Extended Data Fig. 1 and Supplementary Datasets 1 and 2). The proportion of the variation in AF (the single nucleotide polymorphism (SNP) heritability; h2) explained by the total genome-wide genetic variation detected in the current Japanese GWAS was estimated to be 6.1% (s.e.m. 1.4%), and the liability-scale h2 was estimated at 11.7% (s.e.m. 2.6%) using linkage disequilibrium (LD)-score regression.
To replicate the five newly identified loci, we performed genotyping and association analysis in an independent Japanese cohort including 4,602 cases and 44,075 controls. All of the lead variants were successfully replicated with nominal associations (P < 0.05) in the same effect direction (Supplementary Table 2). Among the lead variants in the five new loci, rs202030113 (MAF = 1.2%) and rs778479352 (MAF = 0.25%) were observed only in the East Asian population according to the Genome Aggregation Database v2.1.1 (gnomAD)9. rs202030113 is located in the intronic region 3 bp away from an exon-intron boundary of SYNE1 and is predicted as a splice donor loss with a spliceAI δ score of 0.33 (ref. 10). A strong signal (odds ratio (OR) for AF development = 2.00, 95% confidence interval (CI) = 1.73–2.31, P = 1.6 × 10−20) was displayed by rs778479352, the lead variant located in an intron of FGF13, which is involved in the region of ENCODE11 accession no. EH38E2771113 (https://screen-v2.wenglab.org/search/?q=EH38E2771113&assembly=GRCh38), where high H3K4me3 and H3K27ac signals were observed, suggesting that rs778479352 might function as a candidate cis-regulatory element.
To identify AF-associated variants independent of the lead variant at each locus, we performed a stepwise conditional analysis, in which 18 independent variants (locus-wide P < 5.0 × 10−6) were additionally detected, increasing the total number of AF-associated signals to 49 (Extended Data Fig. 2 and Supplementary Table 3). We identified ten loci that had multiple independent association signals, especially in the PITX2-C4orf32 locus with six association signals (nsignal = 2: GORAB-PRRX1, CAND2, HAND2-AS1, FANCC, NEBL, AKAP6, ZFHX3 loci, nsignal = 3: LINC02459-TBX5 locus, nsignal = 4: NEURL1 locus, and nsignal = 6: PITX2-C4orf32 locus). Of these additional signals, three variants were observed only in East Asian populations in gnomAD (rs577463446 at the FANCC locus, MAF = 0.5%, OR = 1.58; rs965277670 at NEBL locus, MAF = 0.6%, OR = 1.65; rs201901902 at NEURL1 locus MAF = 0.5%, OR = 1.68).
Cross-ancestry meta-analysis identified 33 new risk loci for AF
To improve the statistical power to detect further genetic associations with AF, we conducted a cross-ancestry meta-analysis by combining the current Japanese GWAS (BBJ) and two European GWASs: a large-scale meta-analysis of European populations (EUR)7 and biobank data of FinnGen data release 2 (FIN). Together, the three datasets yielded 77,690 cases (BBJ: 9,826, EUR: 60,620 and FIN: 7,244) and 1,167,040 controls (BBJ: 140,446, EUR: 970,216 and FIN: 56,378). We tested a total of 5,158,449 variants with MAF ≥ 1% and identified 150 AF-associated loci with genome-wide significance (log10 Bayes factor (BF) > 6; Fig. 2, Supplementary Table 4 and Supplementary Datasets 3 and 4). Of these loci, 33 have not been reported previously, including three new loci detected in the current Japanese GWAS. In total, we identified 35 new loci through the current Japanese GWAS and cross-ancestry meta-analysis (Table 2).
Of the 3,637 variants in LD (r2 > 0.8) with 150 lead variants, 19 missense variants were observed (Supplementary Table 5). Among new loci, we found a missense variant, rs848208 (p.Ala970Val), in the SPEN gene, encoding a hormone-inducible transcriptional coregulator that activates and represses downstream targets. It was reported that SPEN-deficient zebrafish embryos developed bradycardia, atrioventricular block and heart chamber fibrillation with downregulation of connexin 43 expression12, which is a well-known component of gap junctions and is associated with the cardiac conduction system13. Another missense variant in a new locus, rs3746471 (p.Arg1045Trp), is located in the KIAA1755 gene, which is reported to be associated with heart rate14 and heart rate variability15. Given the robust relationship between autonomic nervous dysfunction and AF16, KIAA1755 is a potential target gene for neural modulation contributing to AF management; however, the biological association between KIAA1755 and AF has not been fully examined.
Prioritization of associated genes and transcription factors
We performed a transcriptome-wide association study (TWAS) using the identified loci in the cross-ancestry meta-analysis and GTEx data17 to identify candidate genes associated with AF. Given the enrichment of AF-associated loci in heart tissue (Supplementary Note and Supplementary Fig. 1), we used gene expression data from GTEx in the atrial appendage and left ventricle as a reference. TWAS prioritized 132 and 127 candidate causal genes substantially associated with AF in the atrial appendage and left ventricle, respectively (Fig. 3a and Supplementary Table 6). Intriguingly, we found that IL6R is one of the candidate genes associated with AF in the atrial appendage (βIL6R = 0.221, P = 2.147 × 10−9). The prediction model of IL6R expression included rs10908837 (MAF = 42%, log10 BF = 7.237 in the cross-ancestry meta-analysis), which is located in an intron of IL6R. Furthermore, to assess the cis and trans effects of AF-associated variants on the candidate genes, we calculated the physical distances from the variants to the transcription start site (TSS) of the candidate genes. Only 34 and 35 genes overlapped between the nearest genes to the lead variants and the candidate genes identified by TWAS (Fig. 3b), and the median physical distances were 2.25 kb and 1.14 kb (Fig. 3c) in the atrial appendage and left ventricle, respectively. This relationship between AF-associated variants and candidate genes is comparable to a previous study in which the distances from the noncoding GWAS signals to the target genes were assessed based on the chromatin state and three-dimensional contacts18. Exceptionally, only one gene, FBN2, was more than 500 kb away from the variant in the left ventricle (512 kb). This result indicates that, although disease-associated genes are not necessarily closest to the lead variants, most candidate genes are influenced by cis effects of AF-associated variants. Finally, we performed Gene Ontology enrichment analysis using the candidate genes identified by TWAS and found several substantially enriched pathways, such as cardiac developmental, conduction and cardiomyocyte contractile or structure (Extended Data Fig. 3).
Next, we sought to identify transcription factors that bind to AF-associated loci and orchestrate the expression of causative genes involved in AF development. We performed enrichment analysis using the ChIP-Atlas dataset19, which comprises several high-throughput ChIP-seq experiments (15,109 experiments, 1,028 transcription factors). We found that estrogen-related receptor gamma (ERRg) binding was substantially enriched in AF-associated loci with Bonferroni-corrected significance level of P = 3.3 × 10−6 (0.05/15,109) (Fig. 4a and Supplementary Table 7). Indeed, ERRg ChIP-seq peaks overlapped with AF-associated loci around genes encoding cardiac ion channels (CAMK2D, KCNJ5, KCNH2 and HCN4), where active histone marks such as H3K27ac and H3K4me3 in induced pluripotent stem cell (iPSC)-derived cardiac cells were also observed (Extended Data Fig. 4). To demonstrate that ERRg is functionally involved in the pathogenesis of AF, we performed a functional analysis of ERRg using human induced pluripotent stem cell-derived cardiomyocytes (iPSCMs). We first evaluated changes in gene expression after administration of an inverse agonist of ERRg, GSK5182 (ref. 20); ion channels and sarcomere genes were selected from the downstream genes of ERRg based on the binding profiles of ChIP-seq data using Target Genes function in ChIP-Atlas. We found that gene expression was substantially decreased after ERRg administration (Fig. 4b). Furthermore, GSK5182-treated iPSCMs revealed a trend toward decreased spontaneous beating rate and notable irregularity and prolonged contraction duration (Fig. 4c,d). Similarly, the calcium transient duration was also found to be prolonged (Fig. 4e,f), and the increase in beating rate by isoproterenol was attenuated by GSK5182 administration (Fig. 4g). Such changes in beating rate and action potential duration have been reported in iPSCMs derived from patients with AF21,22. These results collectively suggest that ERRg is critically involved in the pathogenesis of AF through the regulation of expression of target genes, including ion channels, in cardiomyocytes.
Performance of PRS derived from cross-ancestry meta-GWAS
PRS offers potential for risk stratification of complex traits and diseases based on genetic data. However, the transferability of PRS from diverse populations to a population of another ancestry remains challenging. Therefore, we examined the performance of a PRS derived from various combinations of summary statistics in the Japanese population. We split our case–control samples into derivation, validation and test datasets, and constructed 376 combinations of the summary statistics of three GWASs (BBJ, EUR and FIN) with parameters for PRS derivation. Based on the PRS performance in the validation cohort, we determined the parameters that showed the best performance for each combination of summary statistics (BBJ, FIN, EUR, BBJ + FIN, BBJ + EUR, EUR + FIN and BBJ + EUR + FIN) (Supplementary Table 8) and assessed the performance of the best model in the test cohort (Fig. 5, Extended Data Figs. 5 and 6 and Supplementary Table 9). For the PRS derived from a single population GWAS, as concordant with the population specificity, the PRS derived from BBJ showed higher performance trend than those from EUR (pseudo R2 = 0.122 in EUR versus 0.124 in BBJ, P = 0.681) and FIN (pseudo R2 = 0.102 in FIN, P < 4 × 10−4) despite the smaller sample size, although there was no statistically significant difference in the PRS performance between BBJ and EUR. Among the PRS derived from the meta-GWAS, we found significant superiority of the PRS derived from BBJ + EUR compared to those from FIN + EUR (pseudo R2 = 0.144 in BBJ + EUR versus 0.131 in FIN + EUR, P < 4 × 10−4) even though the number of cases was similar. Among all models, the PRS derived from three studies with multi-ancestry and the largest sample size (BBJ + EUR + FIN) showed the highest performance (pseudo R2 = 0.146, 95% CI = 0.115–0.170, area under the curve of receiver operating characteristic = 0.738, 95% CI = 0.726–0.745).
Impact of AF-PRS on relevant phenotypes and outcomes
To assess the potential of the PRS for clinical applications, we investigated the association between PRS and the onset age of AF in individuals from our BBJ case samples (n = 7,458). We observed that the onset age decreased as the PRS increased, and individuals with the top 1% PRS were estimated to be approximately 4 years younger at AF onset compared to the remaining individuals (Fig. 6a and Extended Data Fig. 7a,b). Moreover, we examined whether AF-PRS could explain the phenotypic variability of stroke in individuals without a diagnosis of AF. We performed logistic regression analysis in 121,351 control samples in our dataset, and found significant associations of the PRS with increased risks of cerebral infarction (OR (95% CI) = 1.042 (1.018–1.065), P = 4.0 × 10−4) and cardioembolic stroke (OR (95% CI) = 1.355 (1.126–1.630), P = 1.3 × 10−3) after Bonferroni correction (Fig. 6b). Notably, we observed the largest impact of the PRS on cardioembolic stroke among those with other stroke phenotypes, indicating that AF-PRS may reveal clinically undetectable AF (that is, subclinical AF) or AF-related conditions such as prothrombotic or hypercoagulable state, in individuals without AF.
To further explore the clinical utility of AF-PRS, we assessed the impact of PRS on mortality using long-term follow-up data in BBJ. The Kaplan–Meier estimates of cumulative mortality rate were increased in individuals with a high PRS, especially in cardiovascular- and stroke-related mortality (Fig. 6c,d and Extended Data Fig. 8). Moreover, we performed Cox regression analysis, and as shown in Fig. 6d, no significant association between AF-PRS and all-cause death was found, but a trend was observed (hazard ratio (HR) per 1 s.d. of PRS = 1.02, 95% CI = 0.99–1.04, P = 0.13). The secondary outcome indicates that this trend was highly specific to cardiovascular death, which was substantially associated with AF-PRS (HR (95% CI) = 1.06 (1.02–1.11), P = 4.4 × 10−3 for cardiovascular disease; HR (95% CI) = 1.00 (0.98–1.01), P = 0.50 for noncardiovascular disease). Furthermore, the tertiary outcome suggests stroke death as a leading factor that impacts the association between AF-PRS and cardiovascular deaths (HR (95% CI) = 1.13 (1.04–1.22), P = 2.7 × 10−3). In contrast to evidence from clinical studies, the association between AF-PRS and heart failure death did not reach statistical significance in the present study (HR (95% CI) = 1.05 (0.95–1.16)) P = 0.37). Among 132,737 individuals for whom mortality data were available, the number of events for heart failure death was 1,093 (0.82%), which was approximately half of stroke events (n = 2,012) and even less than 20% of cardiovascular events (n = 6,877). Thus, it was assumed that the standard deviation for heart failure death was larger due to the smaller number of events in our cohort, which resulted in a relatively wide CI that might make it difficult to reach statistical significance.
Cross-trait genetic liability of AF
AF is frequently concomitant with various cardiovascular diseases, such as valvular heart disease, heart failure and stroke. These cardiovascular diseases, including AF, partially share the underlying pathophysiology and are mutually associated with the development of each other, whereas the causality between AF and cardiovascular diseases is not comprehensively elucidated. Therefore, we estimated the causal effect of AF on a wide range of cardiovascular diseases using two-sample Mendelian randomization (MR), where the exposure was AF and all the distinct AF-associated variants from the cross-ancestry meta-analysis were used as instrumental variables. Consistent with the clinical evidence, we observed significant genetic liability of AF to the development of several cardiovascular diseases such as heart failure, cardiomyopathy, stroke and transient ischemic attack (Extended Data Fig. 9a and Supplementary Table 10). Additionally, we found the causal effect of AF on valvular disease (OR (95% CI) = 1.139 (1.133–1.630), P = 9.4 × 10−4 for rheumatic valvular disease; OR (95% CI) = 1.183 (1.112–1.258), P = 1.1 × 10−7 for valvular heart disease), indicating that hemodynamic instability and structural remodeling underlying AF may contribute to the development of valvular diseases.
AF is also known as a consequent phenotype accumulated by multiple atherosclerotic- and metabolic-related traits. Large observational studies have identified these traits as significant risk factors associated with AF23,24, but the causal relationship between them has not been fully assessed due to potential mediators or confounders of these associations. Therefore, we performed an MR analysis to thoroughly investigate the causality of quantitative traits. We represented the exposures as quantitative traits and selected the distinct variants associated with each trait as instrumental variables. As expected25, height and BMI were significant predictors for AF (OR (95% CI) = 1.398 (1.164–1.679), P = 3.3 × 10−4; OR (95% CI) = 1.133 (1.061–1.209), P = 1.8 × 10−4, respectively). Furthermore, among atherosclerotic- and metabolic-related traits, we found blood pressure as the only trait with a causal effect on AF development (OR (95% CI) = 1.400 (1.285–1.525), P = 1.2 × 10−14 for systolic blood pressure; OR (95% CI) = 1.455 (1.330–1.591), P = 2.1 × 10−16 for diastolic blood pressure; OR (95% CI) = 1.267 (1.161–1.381), P = 9.2 × 10−8 for pulse pressure; Extended Data Fig. 9b and Supplementary Table 10).
We conducted a large-scale GWAS with approximately 10,000 AF cases in the Japanese population and identified 31 genome-wide significant loci associated with AF. This includes five new loci, where disease-relevant rare and highly East Asian-specific variants were found in the SYNE1 and FGF13 loci, suggesting the involvement of functional alteration in the nuclear envelope and ion channels as a mechanism underlying AF. SYNE1 encodes nesprin-1 (spectrin repeat) protein and, together with the Sad1p/UNC84-domain-containing proteins (SUN1/2), compose the nuclear envelope protein complex via its nucleoplasmic domains to lamin A/C. Mutations in LMNA and SYNE1 have been identified in patients with severe muscle dystrophy and dilated cardiomyopathy26,27. Mutations in SYNE1 cause defects in nuclear morphology, myoblast differentiation and heart development28, altering the nuclear envelope protein complex that contributes to the structural substrate in atrial arrhythmogenesis. FGF13 encodes a member of the fibroblast growth factor family, which possesses broad mitogenic and cell survival activities. FGF13 directly binds to the C-terminus of the main cardiac sodium channel (NaV1.5) in the sarcolemma, and FGF13 knockdown in rat cardiomyocytes exhibited a loss of function of NaV1.5-reduced Na+ current density, decreased Na+ channel availability and slowed NaV1.5-reduced Na+ current recovery from inactivation29. This evidence of conduction disturbance in cardiomyocytes indicates that FGF13 is an important target gene associated with AF.
Furthermore, we performed the largest cross-ancestry meta-analysis for AF to date, where 150 genome-wide significant loci were identified, resulting in the discovery of 35 new loci. By integrating these loci with transcriptomic and epigenomic data, we prioritize candidate genes and transcription factors associated with AF. Transcriptome-wide analysis linked AF-associated loci to target genes and particularly revealed IL6R as a candidate gene associated with AF. Despite increasing evidence for the role of inflammation in AF pathophysiology30, only suggestive association between IL6R and AF (P = 5.0 × 10−4) has so far been reported31, and the genetic contribution of inflammatory process to AF development has not been fully elucidated. Our transcriptome-wide analysis revealed a significant association between IL6R and AF development, shedding light on the inflammatory signaling as a key pathway in the pathogenesis of AF and a therapeutic target. Additionally, our approach based on the ChIP-seq dataset clearly implicated ERRg as a candidate transcription factor associated with AF. In previous work, ERRg knockdown mice exhibited cardiomyopathy with an arrest of cardiac maturation through transcriptional regulation of genes involved in mitochondrial energy transduction, contractile function and ion transport32, but the association between AF and ERRg had not been fully examined. Our results from functional studies using iPSCMs indicated a new transcriptional network orchestrated by ERRg in the pathophysiology of AF.
During the last decade, there has been a growing interest in predicting complex diseases or traits using genetic data. PRS is expected to provide a clinical utility to enhance disease risk prediction, whereas previous studies demonstrated comparable or less performance and a weak additive effect of PRS to the established risk prediction models33,34. Additionally, the lack of cross-ancestry portability of PRS has also been reported due to the predominant proportion of individuals of European descent in the current GWASs35. In this study, we found shared allelic effects of AF-associated variants and genetic correlations between Japanese and European populations (Supplementary Note and Supplementary Fig. 2). Therefore, we exhaustively examined AF-PRS using various combinations of GWASs and multiple parameters to maximize the predictive performance; AF-PRS achieved (1) a higher performance when applied to the same population as the derivation-GWAS population regardless of the sample size in the single derivation-GWAS category, (2) a higher performance when it was derived from a cross-ancestry meta-GWAS including the Japanese population compared to that derived from a meta-GWAS in a non-Japanese population even with a similar or smaller sample size of derivation-GWAS and (3) the best performance when it was derived from the cross-ancestry meta-GWAS including the Japanese population and with the largest sample size. Furthermore, recent studies have shown the potential utility of PRS in a variety of clinical settings, such as diagnostic refinement36 and prediction of disease progression37. Our study also demonstrated that, in addition to the predictive ability for AF itself, AF-PRS segregated individuals with AF-related phenotypes, such as early onset of AF and cardioembolic stroke, and those with increased risks of long-term cardiovascular and stroke mortalities. This indicated that the cumulative genetic risk for AF could be an indicator for early therapeutic intervention, including anticoagulation in at-risk individuals as a primary prevention of stroke. Taken together, our results have several implications for the clinical utility of AF-PRS, which will be clues for the realization of future precision medicine.
Finally, MR analysis revealed evidence of a causal relationship between AF and relevant diseases or traits, which supports the results from clinical observational studies. In particular, blood pressure was the only trait that showed significant causality among atherosclerotic- and metabolic-related traits, which indicates that blood pressure is an important modifiable risk factor, and the intensive management of blood pressure may reduce the risk of AF development.
In conclusion, our large-scale Japanese and cross-ancestry genetic analyses identified 35 new risk loci and provided insights into the distinct and shared genetic architecture of AF between Japanese and Europeans. Integrative analysis of transcriptome and epigenome data highlighted candidate genes and implicated a transcription factor involved in the mechanism of disease development. Furthermore, analyses of disease prediction and long-term survival demonstrated the clinical utility of the AF-PRS. These data highlight the importance of AF genetics in clinical settings and provide useful evidence for the implementation of genomic medicine.
This study was approved by ethics committees of the RIKEN Center for Integrative Medical Sciences, the Institute of Medical Sciences and the University of Tokyo. Informed consent was obtained from all participants. All study participants were Japanese who were registered in the BBJ project (https://biobankjp.org/). The BBJ is a hospital-based national biobank project that collects DNA and serum samples and clinical information from cooperative medical institutes. Approximately 200,000 patients with any of the 47 target diseases were enrolled between 2003 and 2007. All study participants were at least 18 years old.
For GWAS quality control (QC), we excluded samples with a call rate <0.98 and related individuals with PI_HAT > 0.2 by PLINK 2.0 (20 Aug 2018 version). We then excluded samples with a heterozygosity rate > +4 s.d. To identify population stratification, we performed principal component analysis (PCA) using PLINK 2.0 and excluded outliers from the Japanese cluster. For the case samples in GWAS, we selected individuals with AF or atrial flutter diagnosed by a physician based on the general medical practices or documented on a 12-lead electrocardiogram. The demographic features of the case–control cohort are shown in Supplementary Table 11.
The samples in the replication study were registered in the BBJ second cohort, which comprised DNA samples and clinical information of approximately 80,000 new patients with the 38 target diseases collected between 2013 and 2018 to expand research outcomes from the first cohort. We applied the same inclusion criteria to the clinical information of the participants and excluded related individuals estimated by PI_HAT and PCA outliers from the East Asian population. Finally, 48,677 individuals (4,602 cases and 44,075 controls) were included in the replication study.
Genotyping, imputation and quality control
GWAS participants were genotyped using the Illumina HumanOmniExpress Genotyping BeadChip or a combination of Illumina HumanOmniExpress and HumanExome BeadChips. For genotype QC, we excluded variants with (1) SNP call rate <0.99, (2) MAF < 0.01 and (3) Hardy–Weinberg equilibrium P < 1.0 × 10−6. We prephased the genotypes using EAGLE and imputed dosages with the 1,000 Genome Project Phase 3 (1 KG Phase 3; May 2012)38 reference panel with 1,037 Japanese in-house reference panel from BBJ using minimac3. For the X chromosome, prephasing was performed in both males and females, and imputation was performed separately for males and females. Dosages of variants in X chromosomes for males were assigned between zero and two.
In the replication study, all participants were genotyped using Illumina Asian Screening Array. We excluded variants meeting any of the following criteria: (1) SNP call rate <98%, (2) a minor allele count of <5 and (3) Hardy–Weinberg equilibrium P < 1.0 × 10−6. Post-QC genotype data were prephased using SHAPEIT2 and imputed using minimac4 with the 1 KG Phase 3 reference panel and 3,256 Japanese in-house reference panel from BBJ. Prephasing and imputation of the X chromosome were performed using the same pipeline applied for autosomes.
Genome-wide association study
In the Japanese GWAS, association was performed by logistic regression analysis assuming an additive model with adjustment for age, age2, sex and top 20 principal components (PCs) using PLINK 2.0. We selected variants with minimac3 imputation quality score of >0.3 and MAF ≥ 0.001. For the X chromosome, we conducted association analyses in males and females separately and integrated the results using an inverse-variance weighted fixed-effects model implemented in METASOFT (v2.0.1). Heterogeneity between studies was calculated using Cochran’s Q test. We filtered variants with strong heterogeneity (Phet < 1.0 × 10−4). The genome-wide significance threshold was defined at P < 5.0 × 10−8 for variants with MAF ≥ 1% and P < 5.71 × 10−9 for those with MAF < 1% (0.05/8,753,038 variants). Although the genomic inflation factor (𝜆GC) was 1.12, LD score regression indicated that the inflation was primarily due to polygenic effects (LD score regression intercept = 1.02; Supplementary Fig. 3a). Adjacent genome-wide significant SNPs were grouped into one locus if they were within 1 Mb of each other. We defined a locus as follows: (1) extracted genome-wide significant variants (P < 5 × 10−8) from the association result, (2) added 500 Mb to both sides of these variants and (3) merged overlapping regions. If the locus did not contain coordinates with previously reported genome-wide significant variants (that is, all variants with P < 5 × 10−8 in the previously reported locus), the region was annotated as being new. We mapped variants to nearby genes and functionally annotated genes using Open Targets (https://www.opentargets.org/), in which the pair of variant and gene with the highest variant-to-gene score was selected.
To identify independent association signals in the loci, we conducted a stepwise conditional analysis for genome-wide significant loci defined as described in the GWAS. First, we performed logistic regression conditioning on the lead variants of each locus. We set a locus-wide significance at P < 1.0 × 10−5 and repeated this procedure until none of the variants reached locus-wide significance for each locus.
LD score regression
We performed LD score regression (version 1.0.0) using selected SNPs with MAF ≥ 0.01 and without the major histocompatibility complex region. For the regression, we used the East Asian LD scores provided by the authors (https://github.com/bulik/ldsc/).
Summary results from two European AF GWASs (EUR and FIN) were obtained from a previously published website (http://csg.sph.umich.edu/willer/public/afib2018)7 and from the FinnGen research project website (https://www.finngen.fi/en), respectively. We calculated the LD score regression intercept for each study and confirmed that these two studies were well calibrated (LD score regression intercept for EUR = 1.052 (s.e.m. = 0.012) and FIN = 1.033 (s.e.m. = 0.010); Supplementary Fig. 3b,c). We also calculated the genetic correlation and found a significant genetic correlation between EUR and FIN (rg = 0.918, s.e.m. = 0.035, P = 3.9 × 10−155).
To account for ancestral heterogeneity among the three studies, we applied the MANTRA algorithm in the cross-ancestry meta-analysis39, which allows for heterogeneity between diverse ancestry groups and improves performance compared to fixed-effects meta-analysis and random-effects meta-analysis. Variants with MAF ≥ 1% in both the Japanese and European populations were selected for association. We considered SNPs with log10 BF > 6 to be genome-wide significant.
Transcriptome-wide association study
We performed a TWAS using MetaXcan v0.3.512 (ref. 40), which estimates the association between predicted gene expression levels and a phenotype of interest using summary statistics and gene expression prediction models. We used precomputed prediction models of gene expression in atrial appendage and left ventricular tissues with LD reference data in GTEx v8 and the summary statistic of the cross-ancestry AF-GWAS as input. Bonferroni significance level was set at P = 4.8 × 10−6 (= 0.05/10,414) for the atrial appendage and P = 5.2 × 10−6 (= 0.05/9,702) for the left ventricle to account for the number of genes tested in each tissue. To assess the relationship between AF-associated variants and the candidate genes, we first extracted an AF-associated variant with the lowest association P value among those included in the prediction model obtained from GTEx PredictDB41 for each candidate gene and then calculated the physical distances from the variant to the TSS of the canonical transcript for each candidate gene. Furthermore, we performed Gene Ontology enrichment analysis using FUMA web application v1.3.7 (ref. 42) with false discovery rate correction considering the number of gene sets tested per category.
Enrichment analysis of transcription factors
To assess the enrichment of transcription factors in AF-associated loci, we defined AF-associated loci as regions within 500 Mb upstream and 500 Mb downstream of the AF-associated lead variants or proxies with r2 > 0.8 in European samples of 1 KG. We then searched for overlaps of peak-call data archived in the ChIP-Atlas dataset with AF-associated loci and control regions selected from all genomic regions by permutation test. P values were calculated with the two-tailed Fisher’s exact probability test (the null hypothesis is that the two regions overlap with the ChIP-Atlas peak-call data in the same proportion). The epigenetic landscapes around cardiac ion channel-related genes were visualized using ChIP-Atlas peak browser and integrative genomics viewer43.
Functional analysis of ERRg using iPSCMs
For functional assessment of ERRg, we first prepared iPSCMs, which were established at the University of Tokyo (IRB 11044), and cultured and maintained them in Essential 8-flex medium (ThermoFisher, A2858501). iPSCMs were used between around 20 and 50 passages. The iPSCMs were differentiated into cardiomyocytes44 with minor modifications. Briefly, all iPSCM lines were differentiated with Asclestem Cardiac Differentiation Media (Nacalai Tesque, 13166-05) until day 12 and maintained with glucose-deficient DMEM (ThermoFisher, A1443001) with sodium DL-lactate (Wako Fujifilm, 128-00056) supplementation for 4 d. The purified cardiomyocytes were replated on gelatin-coated plates in DMEM media supplemented with 10% FBS (Nacalai Tesque, 08458-45). Before downstream assays, the iPSCMs were passaged onto gelatin-coated plates around day 28. An inverse agonist of ERRg, GSK5182 (Selleck, S3449), was dissolved in DMSO and administered to the iPSCMs at 10 µM for 4 d before gene expression and functional analysis. To measure gene expression, total RNA was extracted using TRIzol reagents (ThermoFisher, 15596026) according to the manufacturer’s instructions. RNA samples were reverse-transcribed using QuantiTect Reverse Transcription Kit (QIAGEN, 205313). Quantitative real-time PCR was performed using THUNDERBIRD Probe quantitative real-time PCR Mix (Toyobo, QPS-101). Relative expression levels of target genes were normalized to the expression of an internal control gene (RPS28) using the comparative Ct method. The primers used for quantitative real-time PCR are listed in Supplementary Table 12. For motion analysis of iPSCMs, the contractile characteristics of iPSCMs were analyzed using SI8000 Cell Motion Imaging System (SONY)44,45. The video of synchronously beating iPSCMs was captured, and the motion of each detection point was converted into a vector for quantitative analysis. Cellular motion was analyzed based on the sum of the vector magnitudes. Video images were taken 4 d after drug administration (GSK5182 versus control), and the spontaneous beating rate and the duration of contraction were calculated. To examine calcium handling, we performed a calcium transient assay, in which iPSCMs were plated on a gelatin-coated 96-well plate in DMEM containing 10% FBS. After drug administration, the cells formed a homogenously beating monolayer sheet and were incubated with Cal520AM (AAT, 21130) diluted in FluoroBrite medium (ThermoFisher, A1896701) containing 10% FBS for 1 h at 37 °C and 5% CO2. After staining, the medium was replaced with 90 μl of FluoroBrite medium containing 2% FBS. Calcium transient signals were recorded by FDSS/μCell (Hamamatsu Photonics K.K.)46. Light source (L11601-01) was used with an output excitation wavelength of 480 nm and an emission of 540 nm, at a sampling rate of 16 Hz for 30 s. Then, 100 nM isoproterenol (Wako Fujifilm, 553-69841) was added to the medium and the calcium transient was recorded again 30 min later. We measured averaged peak counts (beating rates (per min)) and peak width durations at 80% repolarization (PWD80 (ms)). All data analysis was performed using GraphPad Prism 7.04 (GraphPad Software).
PRS derivation and performance
First, we divided our dataset into the following three groups: (1) a discovery group to derive and validate PRS (6,890 cases and 49,451 controls), (2) a test group to assess PRS performance (2,953 cases and 21,194 controls) and (3) a group for the survival analysis (70,645 controls) (Extended Data Fig. 10). To secure independence between the PRS derivation and validation, we used a tenfold cross-validation approach. Next, we randomly split a discovery group into ten subgroups and used nine of these subgroups for PRS derivation and the remaining one for PRS validation. For each derivation cohort, we performed GWASs in combination with one Japanese and two European GWASs—(1) a population-specific GWAS (BBJ, EUR and FIN), (2) European meta-GWAS (EUR + FIN) and (3) the cross-ancestry meta-GWAS (BBJ + EUR, BBJ + FIN and BBJ + EUR + FIN). Meta-analyses were performed using the fixed effect and the random effect models by METASOFT software. We derived PRS using the pruning and thresholding method and the LDpred2 algorithm. For the pruning and thresholding method, in addition to the meta-analysis models, we applied the P value thresholds as 0.5, 5.0 × 10−2, 5.0 × 10−4, 5.0 × 10−6 and 5.0 × 10−8, and the r2 thresholds as 0.8, 0.5 and 0.2. For LDpred2, the variants were restricted to HapMap3 SNPs as recommended47, and we ran the LDpred2-grid model with the parameters: p (proportion of causal variants) in a sequence of five values from 10−4 to 1 on a log-scale and sparse option (true or false). We did not tune the parameter for the SNP heritability h2 because the different samples in each derivation cohort did not enable us to determine the optimal h2. For the LD reference, we used 1KG East Asian (EAS) and 1KG European (EUR) populations according to each cohort population as follows: (1) 1KG EAS for a cohort with only East Asian (BBJ), (2) 1 KG EUR for cohorts with only European (EUR, FIN and EUR + FIN) and (3) both 1KG EAS and 1KG EUR for cohorts with multiple ancestries (BBJ + EUR, BBJ + FIN and BBJ + EUR + FIN). Subsequently, we calculated PRSs in the withheld validation cohorts and repeated this procedure ten times by changing the withheld validation cohorts. Finally, we constructed 376 PRSs in total; the PRS with the best performance for each cohort is shown in Fig. 5. The performance of the PRS was measured as (1) Nagelkerke’s pseudo R2 obtained by modeling age, sex, the top 20 PCs and normalized PRS and (2) the area under the curve of the receiver operator curve in the same model as Nagelkerke’s pseudo R2. The best model/parameter set for each combination model (BBJ, EUR, FIN, BBJ + EUR, BBJ + FIN, EUR + FIN and BBJ + EUR + FIN) was determined by averaging Nagelkerke’s pseudo R2 (Supplementary Table 9). Using the best models and parameters determined in the derivation and validation cohorts, we calculated the PRSs and assessed their performance for the independent test cohort. To evaluate the PRS performance in the test cohort, we performed bootstrap over the samples in the test cohort with 5.0 × 104 replicates and assessed Nagelkerke’s pseudo R2 and the area under the curve of the receiver operator curve in each bootstrap group. Before the comparison of the combination models, we evaluated the performance of the base model, which included age, sex and the top 20 PCs (Supplementary Table 9). Next, to compare the performance of the PRS derived from each combination model, we calculated the pairwise difference of Nagelkerke’s pseudo R2 (ΔR2) between each pair of models (two of seven models; 21 combinations) and obtained the two-sided bootstrap P value by counting the number of ΔR2 ≤ 0 or ΔR2 > 0 and then multiplied the lower value by the minimum estimated P value (2 × 1/(5.0 × 104) = 4 × 10−5: two-sided). The significance was set at P = 2.3 × 10−3 (0.05/21).
Association of AF-PRS with relevant phenotypes
We extracted AF case samples with available data on age at AF onset (n = 7,458, the median age of AF onset was 63 years of age (IQR = 56–71)) and constructed a linear regression model of age at AF onset including AF-PRS as a dichotomous variable (individual with high PRS (the top 1%, 5%, 10% and 20%) versus those with the remaining PRS) to estimate the difference in the age of AF onset between them adjusted by sex and the top 20 PCs. For the association analysis with stroke phenotypes, we performed a logistic regression analysis adjusted by the use of anticoagulants or antiplatelets in addition to age, sex and the top 20 PCs, because antithrombotic therapy is associated with a decreased risk of ischemic stroke and an increased risk of hemorrhagic stroke. We selected the control samples with available data on antithrombotic therapy (n = 121,351), among whom we found 14,120 stroke phenotypes: 8,547 cerebral infarction, 111 cardioembolic strokes, 1,429 atherothrombotic infarction, 1,230 lacunar infarction, 1,061 cerebral hemorrhages and 879 subarachnoid hemorrhages.
The Cox proportional hazards model was used to assess the association between AF-PRS and long-term mortality. We obtained survival follow-up data with the ICD-10 code for 132,737 individuals from the BBJ dataset48,49. The causes of death were classified into three categories according to ICD-10 codes as follows: (1) primary outcome for all-cause death, (2) secondary outcome for cardiovascular death (100–199) and noncardiovascular death (not 100−199) and (3) tertiary outcome for heart failure death (150), ischemic heart disease death (120–125) and stroke death (160, 161, 163 and 164). The median follow-up period was 8.4 years (IQR 6.8–9.9). The Cox proportional hazards model was adjusted for sex, age, the top 20 PCs and disease status. Analyses were performed with the R package survival v.2.44, and survival curves were estimated using the R package survminer v.0.4.6, with modifications.
We extracted summary statistics from the UK Biobank (http://www.nealelab.is/uk-biobank/). To avoid sample overlap, we selected AF-associated variants from the cross-ancestry meta-analysis in combination with BBJ and FIN, although the statistical power to detect the associations with AF decreased. To select independent variants for exposure, genome-wide significant variants (P < 5 × 10−8) were pruned (r2 < 0.01; LD window of 10,000 kb; using European samples of 1KG for LD reference)50. For the assessment of the causal effect of AF on cardiovascular diseases, we excluded variants associated with cardiovascular risk factors such as hypertension, cholesterol, diabetes mellitus and smoking and those with cardiovascular diseases from the list of instrument variables to avoid the pleiotropic effects of them using PhenoScanner V2 (http://www.phenoscanner.medschl.cam.ac.uk/). Then, we performed MR analysis using TwoSampleMR package in R v4.0.3, with AF-associated variants as instrument variables and variants associated with cardiovascular diseases as outcome variables. Next, we assessed the causal effects of quantitative traits related to anthropometry, metabolites, serum protein, kidney function, liver function, hematocyte count and blood pressure on AF. To exclude variants with pleiotropic effects, we also used PhenoScanner to identify variants associated with risk factors for AF such as hypertension, diabetes mellitus and obesity from the list of instrument variables, unless exposure was a risk factor itself. Then, we performed MR analysis, where variants associated with quantitative traits were used as instrument variables and AF-associated variants as outcome variables. Causal estimates were based on the inverse-variance-weighting (IVW) method. To exclude horizontal pleiotropic outliers, we performed MR-PRESSO for instrument variables51. We also calculated Cochran’s Q statistics for heterogeneity between the causal effects using IVW and the MR-Egger intercept for directional pleiotropy.
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Summary statistics of Japanese GWAS and the cross-ancestry meta-analysis and the data for the calculation of PRS derived from the current study are publicly available in the National Bioscience Database Center (research ID: hum0014, https://humandbs.biosciencedbc.jp/en/). The cross-ancestry GWAS summary statistics and polygenic score are also available through the NHGRI-EBI GWAS catalog (study accession: GCST90204201, https://www.ebi.ac.uk/gwas/downloads/summary-statistics) and Polygenic Score catalog (https://www.pgscatalog.org/, score ID: PGS002814), respectively. The phenotype information can be provided by the BBJ project upon request (https://biobankjp.org/english/index.html).
Benjamin, E. J. et al. Heart disease and stroke statistics—2019 update: a report from the American Heart Association. Circulation 139, e56–e528 (2019).
Staerk, L., Sherer, J. A., Ko, D., Benjamin, E. J. & Helm, R. H. Atrial fibrillation: epidemiology, pathophysiology, and clinical outcomes. Circ. Res. 120, 1501–1517 (2017).
Healey, J. S. et al. Occurrence of death and stroke in patients in 47 countries 1 year after presenting with atrial fibrillation: a cohort study. Lancet 388, 1161–1169 (2016).
Kim, M. H., Johnston, S. S., Chu, B. C., Dalal, M. R. & Schulman, K. L. Estimation of total incremental health care costs in patients with atrial fibrillation in the United States. Circ. Cardiovasc Qual. Outcomes 4, 313–320 (2011).
Christophersen, I. E. et al. Large-scale analyses of common and rare variants identify 12 new loci associated with atrial fibrillation. Nat. Genet. 49, 946–952 (2017).
Low, S. K. et al. Identification of six new genetic loci associated with atrial fibrillation in the Japanese population. Nat. Genet. 49, 953–958 (2017).
Nielsen, J. B. et al. Biobank-driven genomic discovery yields new insight into atrial fibrillation biology. Nat. Genet. 50, 1234–1239 (2018).
Roselli, C. et al. Multi-ethnic genome-wide association study for atrial fibrillation. Nat. Genet. 50, 1225–1233 (2018).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548 (2019).
ENCODE Project Consortium et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).
Rattka, M., Westphal, S., Gahr, B. M., Just, S. & Rottbauer, W. Spen deficiency interferes with Connexin 43 expression and leads to heart failure in zebrafish. J. Mol. Cell. Cardiol. 155, 25–35 (2021).
Michela, P., Velia, V., Aldo, P. & Ada, P. Role of connexin 43 in cardiovascular diseases. Eur. J. Pharmacol. 768, 71–76 (2015).
den Hoed, M. et al. Identification of heart rate-associated loci and their effects on cardiac conduction and rhythm disorders. Nat. Genet. 45, 621–631 (2013).
Nolte, I. M. et al. Genetic loci associated with heart rate variability and their effects on cardiac disease risk. Nat. Commun. 8, 15805 (2017).
Chen, P. S., Chen, L. S., Fishbein, M. C., Lin, S. F. & Nattel, S. Role of the autonomic nervous system in atrial fibrillation: pathophysiology and therapy. Circ. Res. 114, 1500–1515 (2014).
Battle, A. et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Nasser, J. et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243 (2021).
Oki, S. et al. ChIP-Atlas: a data-mining suite powered by full integration of public ChIP-seq data. EMBO Rep. 19, e46255 (2018).
Chao, E. Y. et al. Structure-guided synthesis of tamoxifen analogs with improved selectivity for the orphan ERRgamma. Bioorg. Med. Chem. Lett. 16, 821–824 (2006).
Benzoni, P. et al. Human iPSC modelling of a familial form of atrial fibrillation reveals a gain of function of If and ICaL in patient-derived cardiomyocytes. Cardiovasc Res. 116, 1147–1160 (2020).
Hong, L. et al. Human induced pluripotent stem cell-derived atrial cardiomyocytes carrying an SCN5A mutation identify nitric oxide signaling as a mediator of atrial fibrillation. Stem Cell Rep. 16, 1542–1554 (2021).
Benjamin, E. J. et al. Independent risk factors for atrial fibrillation in a population-based cohort. The Framingham Heart Study. J. Am. Med. Assoc. 271, 840–844 (1994).
Huxley, R. R. et al. Absolute and attributable risks of atrial fibrillation in relation to optimal and borderline risk factors: the Atherosclerosis Risk in Communities (ARIC) study. Circulation 123, 1501–1508 (2011).
Levin, M. G. et al. Genetics of height and risk of atrial fibrillation: a Mendelian randomization study. PLoS Med. 17, e1003288 (2020).
Bonne, G. et al. Mutations in the gene encoding lamin A/C cause autosomal dominant Emery-Dreifuss muscular dystrophy. Nat. Genet. 21, 285–288 (1999).
Capell, B. C. & Collins, F. S. Human laminopathies: nuclei gone genetically awry. Nat. Rev. Genet. 7, 940–952 (2006).
Zhou, C. et al. Novel nesprin-1 mutations associated with dilated cardiomyopathy cause nuclear envelope disruption and defects in myogenesis. Hum. Mol. Genet. 26, 2258–2276 (2017).
Wang, C. et al. Fibroblast growth factor homologous factor 13 regulates Na+ channels and conduction velocity in murine hearts. Circ. Res. 109, 775–782 (2011).
Hu, Y. F., Chen, Y. J., Lin, Y. J. & Chen, S. A. Inflammation and the pathogenesis of atrial fibrillation. Nat. Rev. Cardiol. 12, 230–243 (2015).
Schnabel, R. B. et al. Large-scale candidate gene analysis in whites and African Americans identifies IL6R polymorphism in relation to atrial fibrillation: the National Heart, Lung and Blood Institute’s Candidate Gene Association Resource (CARe) project. Circ. Cardiovasc. Genet. 4, 557–564 (2011).
Sakamoto, T. et al. A critical role for estrogen-related receptor signaling in cardiac maturation. Circ. Res. 126, 1685–1702 (2020).
Mosley, J. D. et al. Predictive accuracy of a polygenic risk score compared with a clinical risk score for incident coronary heart disease. J. Am. Med. Assoc. 323, 627–635 (2020).
Elliott, J. et al. Predictive accuracy of a polygenic risk score-enhanced prediction model vs a clinical risk score for coronary artery disease. J. Am. Med. Assoc. 323, 636–645 (2020).
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
Oram, R. A. et al. A Type 1 diabetes genetic risk score can aid discrimination between type 1 and type 2 diabetes in young adults. Diabetes Care 39, 337–344 (2016).
Liu, H., Lutz, M. & Luo, S., Alzheimer’s Disease Neuroimaging Initiative. Association between polygenic risk score and the progression from mild cognitive impairment to Alzheimer’s disease. J. Alzheimers Dis. 84, 1323–1335 (2021).
Abecasis, G. R. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Morris, A. P. Transethnic meta-analysis of genomewide association studies. Genet. Epidemiol. 35, 809–822 (2011).
Barbeira, A. N. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 9, 1825 (2018).
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
Watanabe, K., Taskesen, E., van Bochoven, A. & Posthuma, D. Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 8, 1826 (2017).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Ito, M. et al. Characterization of a small molecule that promotes cell cycle activation of human induced pluripotent stem cell-derived cardiomyocytes. J. Mol. Cell. Cardiol. 128, 90–95 (2019).
Hayakawa, T. et al. Image-based evaluation of contraction-relaxation kinetics of human-induced pluripotent stem cell-derived cardiomyocytes: correlation and complementarity with extracellular electrophysiology. J. Mol. Cell. Cardiol. 77, 178–191 (2014).
Bedut, S., Kettenhofen, R. & D’Angelo, J. M. Voltage-sensing optical recording: a method of choice for high-throughput assessment of cardiotropic effects. J. Pharmacol. Toxicol. Methods 105, 106888 (2020).
Prive, F., Arbel, J. & Vilhjalmsson, B. J. LDpred2: better, faster, stronger. Bioinformatics 36, 5424–5431 (2020).
Nagai, A. et al. Overview of the BioBank Japan Project: study design and profile. J. Epidemiol. 27, S2–S8 (2017).
Hirata, M. et al. Cross-sectional analysis of BioBank Japan clinical data: a large cohort of 200,000 patients with 47 common diseases. J. Epidemiol. 27, S9–S21 (2017).
Shah, S. et al. Genome-wide association and Mendelian randomisation analysis provide insights into the pathogenesis of heart failure. Nat. Commun. 11, 163 (2020).
Verbanck, M., Chen, C. Y., Neale, B. & Do, R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat. Genet. 50, 693–698 (2018).
We thank the staff of BBJ for their assistance in collecting samples and clinical information. We thank Y. Kaneko (the Uniersity of Tokyo) for technical assistance for functional experiments using iPSCMs. We also thank A.P. Morris (University of Liverpool) for providing us with MANTRA software and valuable advice.
This research was funded by the Japan Agency for Medical Research and Development (AMED) (JP22ek0210164 to K.I., S.N. and I.K., JP21tm0724601 to K.I., S.N. and I.K., JP20km0405209 and JP20ek0109487 to K.M., K.I., S.N., I.K., JP20ek0109440 and JP22ek0210172 to S.N. and I.K., JP18km0405209 to I.K., JP21ek0109543 to S.N. and JP22bm1123011 to S.N.), MSD Life Science Foundation (to K. Miyazawa), the Japan Society for the Promotion of Science (a Grand-in-Aid for Scientific Research (S) to I.K., a Grant-in-Aid for Scientific Research (A) to K.I. and S.N., a Grant-in-Aid for Scientific Research (B) to K.I., a Grant-in-Aid for Early-Career Scientists to K. Miyazawa, and R.K. and H.I., and JSPS Fellows to Z.Z.), Research Funding for Longevity Sciences from the NCGG (21–23 to K.O. and K.I.), the Japan Science and Technology Agency (NBDC and PRESTO to S.O.) and Sakakibara Memorial Research Grant from the Sakakibara Heart Foundation (to H.M.). BBJ was supported by the Tailor-Made Medical Treatment Program of the Ministry of Education, Culture, Sports, Science and Technology (MEXT) and AMED under grant numbers JP22km0605001, JP17km0305002 and JP17km0305001.
The authors declare no competing interests associated with this manuscript.
Peer review information
Nature Genetics thanks Martin Dichgans, Jessica van Setten, and Rafik Tadros for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The results of the Japanese GWAS (9,826 AF cases and 140,446 controls) are shown. The negative log10 P values on the y-axis are shown against the genomic positions (hg19) on the x-axis. Association signals that reached a genome-wide significance level (P < 5.0 × 10−8) are shown in blue if previously reported loci and in red if novel loci. Two-sided P values were calculated using a logistic regression model. GWAS, genome-wide association study; AF, atrial fibrillation.
The ORs for AF development of 49 independent signals in the Japanese GWAS (31 lead variants and 18 independent variants) were plotted against the risk allele frequencies. Novel variants are highlighted in orange with annotated genes. The dotted line indicates 80% detection power at a significance threshold of 5.0 × 10−8. OR, odds ratio.
Gene ontology analysis enriched in the candidate genes identified by TWAS using hypergeometric tests in FUMA web application. The significance level accounts for multiple testing of gene sets using a Benjamini-Hochberg correction, and gene sets with adjusted P value ≤ 0.05 are displayed.
a, CAMK2D locus. b, KCNJ5 locus. c, KCNH2 locus. d, HCN4 locus. Plots show AF-associated variants (r2 > 0.8 in European samples in 1KG) and ChIP-seq track of the ERRg experiment (SRX4003759) and histone modification markers in iPSC-derived cardiac cells. iPSC, induced pluripotent stem cells.
PRS distribution and AF prevalence in the test cohort are shown (ncase = 2,949 and ncontrol = 21,081). a, Distribution of PRS in the case and control samples for each combination of GWAS. b, Prevalence of AF based on the AF-PRS deciles in each combination of GWAS. The number of individuals in each decile is 2,403. Data are presented as medians and 95% CI. PRS, polygenic risk score; CI, confidence interval.
Distribution of the pairwise difference of Nagelkerke’s pseudo R2 (Δ pseudo R2: Pseudo R2Score Y – Pseudo R2Score X, X and Y are found at the top of the panel) between each pair of GWAS models. The distributions were obtained by bootstrapping 5.0 × 104 times. Two-sided bootstrap P values were calculated by counting the number of Δ pseudo R2 ≤ 0 or Δ pseudo R2 > 0 and then multiplying the lower value by the minimum estimated P value (2 * 1 / (5.0 × 104) = 4 × 10−5: two-sided). The significance was set at P = 2.3 × 10−3 (0.05/21).
a, Difference in the age of AF onset between individuals with high PRS (the top 1%, 5%, 10%, and 20%) and those with the remaining PRS. We constructed linear regression models by adjusting for sex and the top 20 PCs. P values were calculated by linear regression analysis comparing individuals with high PRS versus those with the remaining PRS and the significance was set at P = 8.3 × 10−3 (0.05/6). Each dot represents an estimated effect size (β) with an error bar indicating the 95% CI of the estimate obtained from the linear regression models. b, Comparison of the age of AF onset between samples with high PRS and those with remaining PRS. Box plot represents the median, the bounds represent the first and third quartile, and the whiskers reach to 1.5 times the interquartile range. In a and b, data are shown for individuals based on PRS (ntop1% = 75 vs. nremaining = 7,383; ntop5% = 373 vs. nremaining = 7,085; ntop10% = 746 vs. nremaining = 6,712; ntop20% = 1,492 vs. nremaining = 5,966).
Kaplan-Meier estimates of cumulative events from all-cause death (top), heart failure death (middle), and ischemic heart disease death (bottom) are shown with a band of 95% CI. Individuals are classified into high PRS (top 10 percentile, red), low PRS (bottom 10 percentile, blue), and intermediate (others, green).
a, Causal effect of AF on cardiovascular diseases. b, Causal effect of quantitative traits on AF development. Each dot represents a causal estimate on the OR scale with an error bar indicating the 95% CI of the estimate. We analyzed the MR results to estimate causal effects using associated variants without pleiotropic effects. The P values were determined using the IVW two-sample MR method and the significance was set at P = 2.5 × 10−3 (0.05/20) for a, and P = 1.5 × 10−3 (0.05/33) for b, respectively. The sample sizes for individual traits and the P values of MR analyses are shown in Supplementary Table 10. MR, Mendelian randomization; IVW, inverse-variance-weighting; DBP, diastolic blood pressure; SBP, systolic blood pressure; PP, pulse pressure; BMI, body mass index; Neu, neutrophil count; MCH, mean corpuscular hemoglobin; WBC, white blood cell count; PLT, platelet count; TP, total protein; MCHC, mean corpuscular hemoglobin concentration; eGFR, estimated glomerular filtration rate; MCV, mean corpuscular volume; UA, uric acid; Eos, eosinophil count; ALP, alkaline phosphatase; Baso, basophil count; GGT, gamma-glutamyl transferase; CRP, C-reactive protein; HDL, high-density lipoprotein cholesterol; Mono, monocyte count; BUN, blood urea nitrogen; TG, triglyceride; Cre, serum creatinine; RBC, red blood cell; Hb, hemoglobin; AST, aspartate aminotransferase; LDL, low-density lipoprotein cholesterol; Lym, lymphocyte count; Alb, albumin; HCT, hematocrit; ALT, alanine aminotransferase; Glu, blood sugar.
Extended Data Fig. 10 Analytical scheme for PRS derivation, validation, and test cohorts, and survival analysis.
Schematic representation for cross-validation, performance testing of AF-PRS in the independent test cohort, and survival analysis for AF-PRS.
About this article
Cite this article
Miyazawa, K., Ito, K., Ito, M. et al. Cross-ancestry genome-wide analysis of atrial fibrillation unveils disease biology and enables cardioembolic risk prediction. Nat Genet (2023). https://doi.org/10.1038/s41588-022-01284-9