Main

Atrial fibrillation (AF) is the most common cardiac arrhythmia, affecting approximately 46.3 million individuals worldwide1. The global prevalence of AF is increasing due to the rapid aging of the general population and intensified search for subclinical AF2. Despite progress in diagnostic and therapeutic technologies, a substantial number of patients with AF are admitted with life-threatening complications such as stroke and heart failure3, causing a considerable burden on patients and public healthcare systems4. Besides conventional clinical risk factors such as aging, obesity, hypertension and heart failure, the genetic contribution to the development of AF is also widely recognized. Recent genome-wide association studies (GWASs) have identified more than 100 AF-associated loci, some of which are involved in cardiac developmental, electrophysiological, contractile and structural pathways5,6,7,8. However, because the vast majority of AF-GWASs have been predominantly performed in European populations, the genetic pathophysiology of AF in non-European populations is not comprehensively understood, and it is difficult to apply polygenic risk scores (PRSs) derived from such GWASs to non-European populations.

Here we sought to explore the genetic architecture of AF in a non-European population and improve the statistical power of AF-GWASs by performing a large-scale Japanese GWAS, followed by a cross-ancestry meta-analysis. Further, we investigated the biological role of the identified AF-associated loci by leveraging gene expression and epigenomic datasets. Additionally, we developed a PRS derived from the cross-ancestry meta-analysis and assessed the impact of the PRS on relevant phenotypes and long-term mortality, which may provide evidence for the clinical utility of AF-PRS and lay the foundation for the realization of precision medicine in AF.

Results

Five new risk loci for AF identified in the Japanese GWAS

An overview of the study design is shown in Fig. 1. We performed a GWAS on the case–control dataset from BioBank Japan (BBJ) that comprised 9,826 AF cases and 140,446 controls, using 16,394,105 variants in the autosomes and 423,039 variants in the X chromosome with a minor allele frequency (MAF) > 0.1%. The GWAS identified 31 AF-associated loci with genome-wide significance, of which five were previously unreported (Table 1, Supplementary Table 1, Extended Data Fig. 1 and Supplementary Datasets 1 and 2). The proportion of the variation in AF (the single nucleotide polymorphism (SNP) heritability; h2) explained by the total genome-wide genetic variation detected in the current Japanese GWAS was estimated to be 6.1% (s.e.m. 1.4%), and the liability-scale h2 was estimated at 11.7% (s.e.m. 2.6%) using linkage disequilibrium (LD)-score regression.

Fig. 1: Overview of the study design.
figure 1

Flowchart of the study, which encompasses the Japanese GWAS with the BBJ first cohort (9,826 AF cases and 140,446 controls), a replication study using the BBJ second cohort (4,602 cases and 44,075 controls), a cross-ancestry meta-analysis with large-scale European GWASs (77,690 cases and 1,167,040 controls in total) and downstream analyses.

Table 1 New AF risk loci identified in the Japanese GWAS

To replicate the five newly identified loci, we performed genotyping and association analysis in an independent Japanese cohort including 4,602 cases and 44,075 controls. All of the lead variants were successfully replicated with nominal associations (P < 0.05) in the same effect direction (Supplementary Table 2). Among the lead variants in the five new loci, rs202030113 (MAF = 1.2%) and rs778479352 (MAF = 0.25%) were observed only in the East Asian population according to the Genome Aggregation Database v2.1.1 (gnomAD)9. rs202030113 is located in the intronic region 3 bp away from an exon-intron boundary of SYNE1 and is predicted as a splice donor loss with a spliceAI δ score of 0.33 (ref. 10). A strong signal (odds ratio (OR) for AF development = 2.00, 95% confidence interval (CI) = 1.73–2.31, P = 1.6 × 10−20) was displayed by rs778479352, the lead variant located in an intron of FGF13, which is involved in the region of ENCODE11 accession no. EH38E2771113 (https://screen-v2.wenglab.org/search/?q=EH38E2771113&assembly=GRCh38), where high H3K4me3 and H3K27ac signals were observed, suggesting that rs778479352 might function as a candidate cis-regulatory element.

To identify AF-associated variants independent of the lead variant at each locus, we performed a stepwise conditional analysis, in which 18 independent variants (locus-wide P < 5.0 × 10−6) were additionally detected, increasing the total number of AF-associated signals to 49 (Extended Data Fig. 2 and Supplementary Table 3). We identified ten loci that had multiple independent association signals, especially in the PITX2-C4orf32 locus with six association signals (nsignal = 2: GORAB-PRRX1, CAND2, HAND2-AS1, FANCC, NEBL, AKAP6, ZFHX3 loci, nsignal = 3: LINC02459-TBX5 locus, nsignal = 4: NEURL1 locus, and nsignal = 6: PITX2-C4orf32 locus). Of these additional signals, three variants were observed only in East Asian populations in gnomAD (rs577463446 at the FANCC locus, MAF = 0.5%, OR = 1.58; rs965277670 at NEBL locus, MAF = 0.6%, OR = 1.65; rs201901902 at NEURL1 locus MAF = 0.5%, OR = 1.68).

Cross-ancestry meta-analysis identified 33 new risk loci for AF

To improve the statistical power to detect further genetic associations with AF, we conducted a cross-ancestry meta-analysis by combining the current Japanese GWAS (BBJ) and two European GWASs: a large-scale meta-analysis of European populations (EUR)7 and biobank data of FinnGen data release 2 (FIN). Together, the three datasets yielded 77,690 cases (BBJ: 9,826, EUR: 60,620 and FIN: 7,244) and 1,167,040 controls (BBJ: 140,446, EUR: 970,216 and FIN: 56,378). We tested a total of 5,158,449 variants with MAF ≥ 1% and identified 150 AF-associated loci with genome-wide significance (log10 Bayes factor (BF) > 6; Fig. 2, Supplementary Table 4 and Supplementary Datasets 3 and 4). Of these loci, 33 have not been reported previously, including three new loci detected in the current Japanese GWAS. In total, we identified 35 new loci through the current Japanese GWAS and cross-ancestry meta-analysis (Table 2).

Fig. 2: Manhattan plot for the cross-ancestry meta-analysis.
figure 2

The results of the cross-ancestry meta-analysis (77,690 AF cases and 1,167,040 controls) are shown. The log10 BFs on the y axis are shown against the genomic positions (hg19) on the x axis. Association signals that reached a genome-wide significance level (log10 BF > 6) are shown in blue if previously reported loci and in red if new loci.

Table 2 New AF risk loci identified in the cross-ancestry meta-analysis

Of the 3,637 variants in LD (r2 > 0.8) with 150 lead variants, 19 missense variants were observed (Supplementary Table 5). Among new loci, we found a missense variant, rs848208 (p.Ala970Val), in the SPEN gene, encoding a hormone-inducible transcriptional coregulator that activates and represses downstream targets. It was reported that SPEN-deficient zebrafish embryos developed bradycardia, atrioventricular block and heart chamber fibrillation with downregulation of connexin 43 expression12, which is a well-known component of gap junctions and is associated with the cardiac conduction system13. Another missense variant in a new locus, rs3746471 (p.Arg1045Trp), is located in the KIAA1755 gene, which is reported to be associated with heart rate14 and heart rate variability15. Given the robust relationship between autonomic nervous dysfunction and AF16, KIAA1755 is a potential target gene for neural modulation contributing to AF management; however, the biological association between KIAA1755 and AF has not been fully examined.

Prioritization of associated genes and transcription factors

We performed a transcriptome-wide association study (TWAS) using the identified loci in the cross-ancestry meta-analysis and GTEx data17 to identify candidate genes associated with AF. Given the enrichment of AF-associated loci in heart tissue (Supplementary Note and Supplementary Fig. 1), we used gene expression data from GTEx in the atrial appendage and left ventricle as a reference. TWAS prioritized 132 and 127 candidate causal genes substantially associated with AF in the atrial appendage and left ventricle, respectively (Fig. 3a and Supplementary Table 6). Intriguingly, we found that IL6R is one of the candidate genes associated with AF in the atrial appendage (βIL6R = 0.221, P = 2.147 × 10−9). The prediction model of IL6R expression included rs10908837 (MAF = 42%, log10 BF = 7.237 in the cross-ancestry meta-analysis), which is located in an intron of IL6R. Furthermore, to assess the cis and trans effects of AF-associated variants on the candidate genes, we calculated the physical distances from the variants to the transcription start site (TSS) of the candidate genes. Only 34 and 35 genes overlapped between the nearest genes to the lead variants and the candidate genes identified by TWAS (Fig. 3b), and the median physical distances were 2.25 kb and 1.14 kb (Fig. 3c) in the atrial appendage and left ventricle, respectively. This relationship between AF-associated variants and candidate genes is comparable to a previous study in which the distances from the noncoding GWAS signals to the target genes were assessed based on the chromatin state and three-dimensional contacts18. Exceptionally, only one gene, FBN2, was more than 500 kb away from the variant in the left ventricle (512 kb). This result indicates that, although disease-associated genes are not necessarily closest to the lead variants, most candidate genes are influenced by cis effects of AF-associated variants. Finally, we performed Gene Ontology enrichment analysis using the candidate genes identified by TWAS and found several substantially enriched pathways, such as cardiac developmental, conduction and cardiomyocyte contractile or structure (Extended Data Fig. 3).

Fig. 3: Transcriptome-wide association analysis.
figure 3

a, Volcano plot showing individual genes with the effect size and the −log10 P value for TWAS based on the atrial appendage (left) and left ventricular tissues (right) from GTEx. Effect sizes and P values were computed using the MetaXcan software to assess the association between predicted gene expression based on the GTEx data and AF. Bonferroni correction was applied to account for the number of genes tested in each tissue (P < 0.05/10,414 for atrial appendage and <0.05/9,702 for left ventricle). Significant genes are highlighted with red (positive effect of predicted gene expression on AF) and blue (negative effect of predicted gene expression on AF). b, Number of genes located closest to the lead variants (red), identified by TWAS (blue), and overlapped between them (green) in atrial appendage and left ventricular tissues. c, Distribution of physical distances from the AF-associated variant included in the prediction model to the TSS of the canonical transcript for each candidate gene identified by TWAS in atrial appendage (left) and left ventricular tissues (right).

Next, we sought to identify transcription factors that bind to AF-associated loci and orchestrate the expression of causative genes involved in AF development. We performed enrichment analysis using the ChIP-Atlas dataset19, which comprises several high-throughput ChIP-seq experiments (15,109 experiments, 1,028 transcription factors). We found that estrogen-related receptor gamma (ERRg) binding was substantially enriched in AF-associated loci with Bonferroni-corrected significance level of P = 3.3 × 10−6 (0.05/15,109) (Fig. 4a and Supplementary Table 7). Indeed, ERRg ChIP-seq peaks overlapped with AF-associated loci around genes encoding cardiac ion channels (CAMK2D, KCNJ5, KCNH2 and HCN4), where active histone marks such as H3K27ac and H3K4me3 in induced pluripotent stem cell (iPSC)-derived cardiac cells were also observed (Extended Data Fig. 4). To demonstrate that ERRg is functionally involved in the pathogenesis of AF, we performed a functional analysis of ERRg using human induced pluripotent stem cell-derived cardiomyocytes (iPSCMs). We first evaluated changes in gene expression after administration of an inverse agonist of ERRg, GSK5182 (ref. 20); ion channels and sarcomere genes were selected from the downstream genes of ERRg based on the binding profiles of ChIP-seq data using Target Genes function in ChIP-Atlas. We found that gene expression was substantially decreased after ERRg administration (Fig. 4b). Furthermore, GSK5182-treated iPSCMs revealed a trend toward decreased spontaneous beating rate and notable irregularity and prolonged contraction duration (Fig. 4c,d). Similarly, the calcium transient duration was also found to be prolonged (Fig. 4e,f), and the increase in beating rate by isoproterenol was attenuated by GSK5182 administration (Fig. 4g). Such changes in beating rate and action potential duration have been reported in iPSCMs derived from patients with AF21,22. These results collectively suggest that ERRg is critically involved in the pathogenesis of AF through the regulation of expression of target genes, including ion channels, in cardiomyocytes.

Fig. 4: Functional analysis of ERRg using iPSCMs.
figure 4

a, Volcano plot analysis. Each point represents a ChIP-seq experiment and is highlighted as red for cardiovascular cells, green for induced pluripotent stem cell-derived cardiac cells, blue for muscle cells and gray for other cell types. P values were calculated with two-tailed Fisher’s exact probability test. The x axis shows log2-transformed fold enrichment of transcription factor in 150 AF-associated loci, compared to 150 randomly selected genomic regions. The y-axis shows log10-transformed P value for enrichment. The dashed line indicates the significance threshold level of P = 0.05, and the dotted line indicates P = 0.05/15,109. b, Comparison of gene expression changes in iPSCMs with and without GSK5192 (an inverse agonist of ERRg) administration (n = 25 and n = 23, respectively). Ion channels and sarcomere genes were selected from the downstream genes of ERRg based on the binding profiles of ChIP-seq data using target genes function in ChIP-Atlas. c,d, Motion analysis of iPSCMs using the SI8000 Cell Motion Imaging System. The y axis represents the magnitude of cellular motion over time. GSK5182-treated iPSCMs show a decrease in spontaneous beating rate and irregularity (c). Beating rate and contraction duration analyzed using the SI8000 Cell Motion Imaging System (n = 10) (d). eg, Calcium handling analysis of iPSCMs. Calcium transient signal was recorded before and after isoproterenol administration (e). We measured averaged peak counts (beating rates) and peak with duration at 80% repolarization (PWD80) (n = 6) (f). The increases in beating rate were compared before and after isoproterenol administration (n = 6) (g). In b, d, f and g, data are presented as mean ± s.e.m., and P values from a two-sided Student’s t test are shown.

Performance of PRS derived from cross-ancestry meta-GWAS

PRS offers potential for risk stratification of complex traits and diseases based on genetic data. However, the transferability of PRS from diverse populations to a population of another ancestry remains challenging. Therefore, we examined the performance of a PRS derived from various combinations of summary statistics in the Japanese population. We split our case–control samples into derivation, validation and test datasets, and constructed 376 combinations of the summary statistics of three GWASs (BBJ, EUR and FIN) with parameters for PRS derivation. Based on the PRS performance in the validation cohort, we determined the parameters that showed the best performance for each combination of summary statistics (BBJ, FIN, EUR, BBJ + FIN, BBJ + EUR, EUR + FIN and BBJ + EUR + FIN) (Supplementary Table 8) and assessed the performance of the best model in the test cohort (Fig. 5, Extended Data Figs. 5 and 6 and Supplementary Table 9). For the PRS derived from a single population GWAS, as concordant with the population specificity, the PRS derived from BBJ showed higher performance trend than those from EUR (pseudo R2 = 0.122 in EUR versus 0.124 in BBJ, P = 0.681) and FIN (pseudo R2 = 0.102 in FIN, P < 4 × 10−4) despite the smaller sample size, although there was no statistically significant difference in the PRS performance between BBJ and EUR. Among the PRS derived from the meta-GWAS, we found significant superiority of the PRS derived from BBJ + EUR compared to those from FIN + EUR (pseudo R2 = 0.144 in BBJ + EUR versus 0.131 in FIN + EUR, P < 4 × 10−4) even though the number of cases was similar. Among all models, the PRS derived from three studies with multi-ancestry and the largest sample size (BBJ + EUR + FIN) showed the highest performance (pseudo R2 = 0.146, 95% CI = 0.115–0.170, area under the curve of receiver operating characteristic = 0.738, 95% CI = 0.726–0.745).

Fig. 5: PRS performance (Nagelkerke’s pseudo R2) in the Japanese test cohort (2,953 cases and 21,194 controls).
figure 5

The results of three PRS models derived from a single GWAS are shown in the left panel (ntotal = 63,622 for FIN; ntotal = 1,030,836 for EUR; ntotal = 56,341 for BBJ). The results of four PRS models derived from a meta-GWAS are shown in the right panel (ntotal = 1,094,458 for EUR + FIN; ntotal = 119,963 for BBJ + FIN; ntotal = 1,087,177 for BBJ + EUR; ntotal = 1,150,799 for BBJ + EUR + FIN). The distribution of pseudo R2 was estimated from 5 × 104 times bootstrapping. The box plot center line represents the median, the bounds represent the first and third quartile, and the whiskers reach to 1.5 times the interquartile range.

Impact of AF-PRS on relevant phenotypes and outcomes

To assess the potential of the PRS for clinical applications, we investigated the association between PRS and the onset age of AF in individuals from our BBJ case samples (n = 7,458). We observed that the onset age decreased as the PRS increased, and individuals with the top 1% PRS were estimated to be approximately 4 years younger at AF onset compared to the remaining individuals (Fig. 6a and Extended Data Fig. 7a,b). Moreover, we examined whether AF-PRS could explain the phenotypic variability of stroke in individuals without a diagnosis of AF. We performed logistic regression analysis in 121,351 control samples in our dataset, and found significant associations of the PRS with increased risks of cerebral infarction (OR (95% CI) = 1.042 (1.018–1.065), P = 4.0 × 10−4) and cardioembolic stroke (OR (95% CI) = 1.355 (1.126–1.630), P = 1.3 × 10−3) after Bonferroni correction (Fig. 6b). Notably, we observed the largest impact of the PRS on cardioembolic stroke among those with other stroke phenotypes, indicating that AF-PRS may reveal clinically undetectable AF (that is, subclinical AF) or AF-related conditions such as prothrombotic or hypercoagulable state, in individuals without AF.

Fig. 6: Impact of AF-PRS on relevant phenotypes and long-term mortality.
figure 6

a, Association between AF-PRS and onset age of AF. The onset age of AF in individuals with data available (n = 7,458) is shown based on the AF-PRS quintiles. The number of individuals in each quintile is 1,491 to 1,492. The center line of the box plot represents the median, the bounds represent the first and third quartile, and the whiskers reach to 1.5 times the interquartile range. b, Association between AF-PRS and stroke phenotypes in individuals without AF. P values were calculated for PRS by logistic regression analysis, and the significance was set at P = 8.3 × 10−3 (0.05/6). Each dot represents an estimate on the OR scale with an error bar indicating the 95% CI for 1 s.d. increase in AF-PRS. Significant and nonsignificant associations are shown in orange and gray, respectively. c, Kaplan–Meier estimates of cumulative events from cardiovascular mortality (upper) and stroke death (lower) are shown with a band of 95% CI. Individuals are classified into high PRS (top 10th percentile, red), low PRS (bottom 10th percentile, blue) and intermediate (others, green). d, Effects of AF-PRS on long-term mortality. P values were calculated for PRS by Cox proportional hazard analysis and the significance was set at P = 8.3 × 10−3 (0.05/6). Each dot represents an estimated HR with an error bar indicating the 95% CI for a 1 s.d. increase in AF-PRS. Significant and nonsignificant associations are shown in orange and gray, respectively. CV, cardiovascular; HF, congestive heart failure; IHD, ischemic heart disease.

To further explore the clinical utility of AF-PRS, we assessed the impact of PRS on mortality using long-term follow-up data in BBJ. The Kaplan–Meier estimates of cumulative mortality rate were increased in individuals with a high PRS, especially in cardiovascular- and stroke-related mortality (Fig. 6c,d and Extended Data Fig. 8). Moreover, we performed Cox regression analysis, and as shown in Fig. 6d, no significant association between AF-PRS and all-cause death was found, but a trend was observed (hazard ratio (HR) per 1 s.d. of PRS = 1.02, 95% CI = 0.99–1.04, P = 0.13). The secondary outcome indicates that this trend was highly specific to cardiovascular death, which was substantially associated with AF-PRS (HR (95% CI) = 1.06 (1.02–1.11), P = 4.4 × 10−3 for cardiovascular disease; HR (95% CI) = 1.00 (0.98–1.01), P = 0.50 for noncardiovascular disease). Furthermore, the tertiary outcome suggests stroke death as a leading factor that impacts the association between AF-PRS and cardiovascular deaths (HR (95% CI) = 1.13 (1.04–1.22), P = 2.7 × 10−3). In contrast to evidence from clinical studies, the association between AF-PRS and heart failure death did not reach statistical significance in the present study (HR (95% CI) = 1.05 (0.95–1.16)) P = 0.37). Among 132,737 individuals for whom mortality data were available, the number of events for heart failure death was 1,093 (0.82%), which was approximately half of stroke events (n = 2,012) and even less than 20% of cardiovascular events (n = 6,877). Thus, it was assumed that the standard deviation for heart failure death was larger due to the smaller number of events in our cohort, which resulted in a relatively wide CI that might make it difficult to reach statistical significance.

Cross-trait genetic liability of AF

AF is frequently concomitant with various cardiovascular diseases, such as valvular heart disease, heart failure and stroke. These cardiovascular diseases, including AF, partially share the underlying pathophysiology and are mutually associated with the development of each other, whereas the causality between AF and cardiovascular diseases is not comprehensively elucidated. Therefore, we estimated the causal effect of AF on a wide range of cardiovascular diseases using two-sample Mendelian randomization (MR), where the exposure was AF and all the distinct AF-associated variants from the cross-ancestry meta-analysis were used as instrumental variables. Consistent with the clinical evidence, we observed significant genetic liability of AF to the development of several cardiovascular diseases such as heart failure, cardiomyopathy, stroke and transient ischemic attack (Extended Data Fig. 9a and Supplementary Table 10). Additionally, we found the causal effect of AF on valvular disease (OR (95% CI) = 1.139 (1.133–1.630), P = 9.4 × 10−4 for rheumatic valvular disease; OR (95% CI) = 1.183 (1.112–1.258), P = 1.1 × 10−7 for valvular heart disease), indicating that hemodynamic instability and structural remodeling underlying AF may contribute to the development of valvular diseases.

AF is also known as a consequent phenotype accumulated by multiple atherosclerotic- and metabolic-related traits. Large observational studies have identified these traits as significant risk factors associated with AF23,24, but the causal relationship between them has not been fully assessed due to potential mediators or confounders of these associations. Therefore, we performed an MR analysis to thoroughly investigate the causality of quantitative traits. We represented the exposures as quantitative traits and selected the distinct variants associated with each trait as instrumental variables. As expected25, height and BMI were significant predictors for AF (OR (95% CI) = 1.398 (1.164–1.679), P = 3.3 × 10−4; OR (95% CI) = 1.133 (1.061–1.209), P = 1.8 × 10−4, respectively). Furthermore, among atherosclerotic- and metabolic-related traits, we found blood pressure as the only trait with a causal effect on AF development (OR (95% CI) = 1.400 (1.285–1.525), P = 1.2 × 10−14 for systolic blood pressure; OR (95% CI) = 1.455 (1.330–1.591), P = 2.1 × 10−16 for diastolic blood pressure; OR (95% CI) = 1.267 (1.161–1.381), P = 9.2 × 10−8 for pulse pressure; Extended Data Fig. 9b and Supplementary Table 10).

Discussion

We conducted a large-scale GWAS with approximately 10,000 AF cases in the Japanese population and identified 31 genome-wide significant loci associated with AF. This includes five new loci, where disease-relevant rare and highly East Asian-specific variants were found in the SYNE1 and FGF13 loci, suggesting the involvement of functional alteration in the nuclear envelope and ion channels as a mechanism underlying AF. SYNE1 encodes nesprin-1 (spectrin repeat) protein and, together with the Sad1p/UNC84-domain-containing proteins (SUN1/2), compose the nuclear envelope protein complex via its nucleoplasmic domains to lamin A/C. Mutations in LMNA and SYNE1 have been identified in patients with severe muscle dystrophy and dilated cardiomyopathy26,27. Mutations in SYNE1 cause defects in nuclear morphology, myoblast differentiation and heart development28, altering the nuclear envelope protein complex that contributes to the structural substrate in atrial arrhythmogenesis. FGF13 encodes a member of the fibroblast growth factor family, which possesses broad mitogenic and cell survival activities. FGF13 directly binds to the C-terminus of the main cardiac sodium channel (NaV1.5) in the sarcolemma, and FGF13 knockdown in rat cardiomyocytes exhibited a loss of function of NaV1.5-reduced Na+ current density, decreased Na+ channel availability and slowed NaV1.5-reduced Na+ current recovery from inactivation29. This evidence of conduction disturbance in cardiomyocytes indicates that FGF13 is an important target gene associated with AF.

Furthermore, we performed the largest cross-ancestry meta-analysis for AF to date, where 150 genome-wide significant loci were identified, resulting in the discovery of 35 new loci. By integrating these loci with transcriptomic and epigenomic data, we prioritize candidate genes and transcription factors associated with AF. Transcriptome-wide analysis linked AF-associated loci to target genes and particularly revealed IL6R as a candidate gene associated with AF. Despite increasing evidence for the role of inflammation in AF pathophysiology30, only suggestive association between IL6R and AF (P = 5.0 × 10−4) has so far been reported31, and the genetic contribution of inflammatory process to AF development has not been fully elucidated. Our transcriptome-wide analysis revealed a significant association between IL6R and AF development, shedding light on the inflammatory signaling as a key pathway in the pathogenesis of AF and a therapeutic target. Additionally, our approach based on the ChIP-seq dataset clearly implicated ERRg as a candidate transcription factor associated with AF. In previous work, ERRg knockdown mice exhibited cardiomyopathy with an arrest of cardiac maturation through transcriptional regulation of genes involved in mitochondrial energy transduction, contractile function and ion transport32, but the association between AF and ERRg had not been fully examined. Our results from functional studies using iPSCMs indicated a new transcriptional network orchestrated by ERRg in the pathophysiology of AF.

During the last decade, there has been a growing interest in predicting complex diseases or traits using genetic data. PRS is expected to provide a clinical utility to enhance disease risk prediction, whereas previous studies demonstrated comparable or less performance and a weak additive effect of PRS to the established risk prediction models33,34. Additionally, the lack of cross-ancestry portability of PRS has also been reported due to the predominant proportion of individuals of European descent in the current GWASs35. In this study, we found shared allelic effects of AF-associated variants and genetic correlations between Japanese and European populations (Supplementary Note and Supplementary Fig. 2). Therefore, we exhaustively examined AF-PRS using various combinations of GWASs and multiple parameters to maximize the predictive performance; AF-PRS achieved (1) a higher performance when applied to the same population as the derivation-GWAS population regardless of the sample size in the single derivation-GWAS category, (2) a higher performance when it was derived from a cross-ancestry meta-GWAS including the Japanese population compared to that derived from a meta-GWAS in a non-Japanese population even with a similar or smaller sample size of derivation-GWAS and (3) the best performance when it was derived from the cross-ancestry meta-GWAS including the Japanese population and with the largest sample size. Furthermore, recent studies have shown the potential utility of PRS in a variety of clinical settings, such as diagnostic refinement36 and prediction of disease progression37. Our study also demonstrated that, in addition to the predictive ability for AF itself, AF-PRS segregated individuals with AF-related phenotypes, such as early onset of AF and cardioembolic stroke, and those with increased risks of long-term cardiovascular and stroke mortalities. This indicated that the cumulative genetic risk for AF could be an indicator for early therapeutic intervention, including anticoagulation in at-risk individuals as a primary prevention of stroke. Taken together, our results have several implications for the clinical utility of AF-PRS, which will be clues for the realization of future precision medicine.

Finally, MR analysis revealed evidence of a causal relationship between AF and relevant diseases or traits, which supports the results from clinical observational studies. In particular, blood pressure was the only trait that showed significant causality among atherosclerotic- and metabolic-related traits, which indicates that blood pressure is an important modifiable risk factor, and the intensive management of blood pressure may reduce the risk of AF development.

In conclusion, our large-scale Japanese and cross-ancestry genetic analyses identified 35 new risk loci and provided insights into the distinct and shared genetic architecture of AF between Japanese and Europeans. Integrative analysis of transcriptome and epigenome data highlighted candidate genes and implicated a transcription factor involved in the mechanism of disease development. Furthermore, analyses of disease prediction and long-term survival demonstrated the clinical utility of the AF-PRS. These data highlight the importance of AF genetics in clinical settings and provide useful evidence for the implementation of genomic medicine.

Methods

Study populations

This study was approved by ethics committees of the RIKEN Center for Integrative Medical Sciences, the Institute of Medical Sciences and the University of Tokyo. Informed consent was obtained from all participants. All study participants were Japanese who were registered in the BBJ project (https://biobankjp.org/). The BBJ is a hospital-based national biobank project that collects DNA and serum samples and clinical information from cooperative medical institutes. Approximately 200,000 patients with any of the 47 target diseases were enrolled between 2003 and 2007. All study participants were at least 18 years old.

For GWAS quality control (QC), we excluded samples with a call rate <0.98 and related individuals with PI_HAT > 0.2 by PLINK 2.0 (20 Aug 2018 version). We then excluded samples with a heterozygosity rate > +4 s.d. To identify population stratification, we performed principal component analysis (PCA) using PLINK 2.0 and excluded outliers from the Japanese cluster. For the case samples in GWAS, we selected individuals with AF or atrial flutter diagnosed by a physician based on the general medical practices or documented on a 12-lead electrocardiogram. The demographic features of the case–control cohort are shown in Supplementary Table 11.

The samples in the replication study were registered in the BBJ second cohort, which comprised DNA samples and clinical information of approximately 80,000 new patients with the 38 target diseases collected between 2013 and 2018 to expand research outcomes from the first cohort. We applied the same inclusion criteria to the clinical information of the participants and excluded related individuals estimated by PI_HAT and PCA outliers from the East Asian population. Finally, 48,677 individuals (4,602 cases and 44,075 controls) were included in the replication study.

Genotyping, imputation and quality control

GWAS participants were genotyped using the Illumina HumanOmniExpress Genotyping BeadChip or a combination of Illumina HumanOmniExpress and HumanExome BeadChips. For genotype QC, we excluded variants with (1) SNP call rate <0.99, (2) MAF < 0.01 and (3) Hardy–Weinberg equilibrium P < 1.0 × 10−6. We prephased the genotypes using EAGLE and imputed dosages with the 1,000 Genome Project Phase 3 (1 KG Phase 3; May 2012)38 reference panel with 1,037 Japanese in-house reference panel from BBJ using minimac3. For the X chromosome, prephasing was performed in both males and females, and imputation was performed separately for males and females. Dosages of variants in X chromosomes for males were assigned between zero and two.

In the replication study, all participants were genotyped using Illumina Asian Screening Array. We excluded variants meeting any of the following criteria: (1) SNP call rate <98%, (2) a minor allele count of <5 and (3) Hardy–Weinberg equilibrium P < 1.0 × 10−6. Post-QC genotype data were prephased using SHAPEIT2 and imputed using minimac4 with the 1 KG Phase 3 reference panel and 3,256 Japanese in-house reference panel from BBJ. Prephasing and imputation of the X chromosome were performed using the same pipeline applied for autosomes.

Genome-wide association study

In the Japanese GWAS, association was performed by logistic regression analysis assuming an additive model with adjustment for age, age2, sex and top 20 principal components (PCs) using PLINK 2.0. We selected variants with minimac3 imputation quality score of >0.3 and MAF ≥ 0.001. For the X chromosome, we conducted association analyses in males and females separately and integrated the results using an inverse-variance weighted fixed-effects model implemented in METASOFT (v2.0.1). Heterogeneity between studies was calculated using Cochran’s Q test. We filtered variants with strong heterogeneity (Phet < 1.0 × 10−4). The genome-wide significance threshold was defined at P < 5.0 × 10−8 for variants with MAF ≥ 1% and P < 5.71 × 10−9 for those with MAF < 1% (0.05/8,753,038 variants). Although the genomic inflation factor (𝜆GC) was 1.12, LD score regression indicated that the inflation was primarily due to polygenic effects (LD score regression intercept = 1.02; Supplementary Fig. 3a). Adjacent genome-wide significant SNPs were grouped into one locus if they were within 1 Mb of each other. We defined a locus as follows: (1) extracted genome-wide significant variants (P < 5 × 10−8) from the association result, (2) added 500 Mb to both sides of these variants and (3) merged overlapping regions. If the locus did not contain coordinates with previously reported genome-wide significant variants (that is, all variants with P < 5 × 10−8 in the previously reported locus), the region was annotated as being new. We mapped variants to nearby genes and functionally annotated genes using Open Targets (https://www.opentargets.org/), in which the pair of variant and gene with the highest variant-to-gene score was selected.

To identify independent association signals in the loci, we conducted a stepwise conditional analysis for genome-wide significant loci defined as described in the GWAS. First, we performed logistic regression conditioning on the lead variants of each locus. We set a locus-wide significance at P < 1.0 × 10−5 and repeated this procedure until none of the variants reached locus-wide significance for each locus.

LD score regression

We performed LD score regression (version 1.0.0) using selected SNPs with MAF ≥ 0.01 and without the major histocompatibility complex region. For the regression, we used the East Asian LD scores provided by the authors (https://github.com/bulik/ldsc/).

Cross-ancestry meta-analysis

Summary results from two European AF GWASs (EUR and FIN) were obtained from a previously published website (http://csg.sph.umich.edu/willer/public/afib2018)7 and from the FinnGen research project website (https://www.finngen.fi/en), respectively. We calculated the LD score regression intercept for each study and confirmed that these two studies were well calibrated (LD score regression intercept for EUR = 1.052 (s.e.m. = 0.012) and FIN = 1.033 (s.e.m. = 0.010); Supplementary Fig. 3b,c). We also calculated the genetic correlation and found a significant genetic correlation between EUR and FIN (rg = 0.918, s.e.m. = 0.035, P = 3.9 × 10−155).

To account for ancestral heterogeneity among the three studies, we applied the MANTRA algorithm in the cross-ancestry meta-analysis39, which allows for heterogeneity between diverse ancestry groups and improves performance compared to fixed-effects meta-analysis and random-effects meta-analysis. Variants with MAF ≥ 1% in both the Japanese and European populations were selected for association. We considered SNPs with log10 BF > 6 to be genome-wide significant.

Transcriptome-wide association study

We performed a TWAS using MetaXcan v0.3.512 (ref. 40), which estimates the association between predicted gene expression levels and a phenotype of interest using summary statistics and gene expression prediction models. We used precomputed prediction models of gene expression in atrial appendage and left ventricular tissues with LD reference data in GTEx v8 and the summary statistic of the cross-ancestry AF-GWAS as input. Bonferroni significance level was set at P = 4.8 × 10−6 (= 0.05/10,414) for the atrial appendage and P = 5.2 × 10−6 (= 0.05/9,702) for the left ventricle to account for the number of genes tested in each tissue. To assess the relationship between AF-associated variants and the candidate genes, we first extracted an AF-associated variant with the lowest association P value among those included in the prediction model obtained from GTEx PredictDB41 for each candidate gene and then calculated the physical distances from the variant to the TSS of the canonical transcript for each candidate gene. Furthermore, we performed Gene Ontology enrichment analysis using FUMA web application v1.3.7 (ref. 42) with false discovery rate correction considering the number of gene sets tested per category.

Enrichment analysis of transcription factors

To assess the enrichment of transcription factors in AF-associated loci, we defined AF-associated loci as regions within 500 Mb upstream and 500 Mb downstream of the AF-associated lead variants or proxies with r2 > 0.8 in European samples of 1 KG. We then searched for overlaps of peak-call data archived in the ChIP-Atlas dataset with AF-associated loci and control regions selected from all genomic regions by permutation test. P values were calculated with the two-tailed Fisher’s exact probability test (the null hypothesis is that the two regions overlap with the ChIP-Atlas peak-call data in the same proportion). The epigenetic landscapes around cardiac ion channel-related genes were visualized using ChIP-Atlas peak browser and integrative genomics viewer43.

Functional analysis of ERRg using iPSCMs

For functional assessment of ERRg, we first prepared iPSCMs, which were established at the University of Tokyo (IRB 11044), and cultured and maintained them in Essential 8-flex medium (ThermoFisher, A2858501). iPSCMs were used between around 20 and 50 passages. The iPSCMs were differentiated into cardiomyocytes44 with minor modifications. Briefly, all iPSCM lines were differentiated with Asclestem Cardiac Differentiation Media (Nacalai Tesque, 13166-05) until day 12 and maintained with glucose-deficient DMEM (ThermoFisher, A1443001) with sodium DL-lactate (Wako Fujifilm, 128-00056) supplementation for 4 d. The purified cardiomyocytes were replated on gelatin-coated plates in DMEM media supplemented with 10% FBS (Nacalai Tesque, 08458-45). Before downstream assays, the iPSCMs were passaged onto gelatin-coated plates around day 28. An inverse agonist of ERRg, GSK5182 (Selleck, S3449), was dissolved in DMSO and administered to the iPSCMs at 10 µM for 4 d before gene expression and functional analysis. To measure gene expression, total RNA was extracted using TRIzol reagents (ThermoFisher, 15596026) according to the manufacturer’s instructions. RNA samples were reverse-transcribed using QuantiTect Reverse Transcription Kit (QIAGEN, 205313). Quantitative real-time PCR was performed using THUNDERBIRD Probe quantitative real-time PCR Mix (Toyobo, QPS-101). Relative expression levels of target genes were normalized to the expression of an internal control gene (RPS28) using the comparative Ct method. The primers used for quantitative real-time PCR are listed in Supplementary Table 12. For motion analysis of iPSCMs, the contractile characteristics of iPSCMs were analyzed using SI8000 Cell Motion Imaging System (SONY)44,45. The video of synchronously beating iPSCMs was captured, and the motion of each detection point was converted into a vector for quantitative analysis. Cellular motion was analyzed based on the sum of the vector magnitudes. Video images were taken 4 d after drug administration (GSK5182 versus control), and the spontaneous beating rate and the duration of contraction were calculated. To examine calcium handling, we performed a calcium transient assay, in which iPSCMs were plated on a gelatin-coated 96-well plate in DMEM containing 10% FBS. After drug administration, the cells formed a homogenously beating monolayer sheet and were incubated with Cal520AM (AAT, 21130) diluted in FluoroBrite medium (ThermoFisher, A1896701) containing 10% FBS for 1 h at 37 °C and 5% CO2. After staining, the medium was replaced with 90 μl of FluoroBrite medium containing 2% FBS. Calcium transient signals were recorded by FDSS/μCell (Hamamatsu Photonics K.K.)46. Light source (L11601-01) was used with an output excitation wavelength of 480 nm and an emission of 540 nm, at a sampling rate of 16 Hz for 30 s. Then, 100 nM isoproterenol (Wako Fujifilm, 553-69841) was added to the medium and the calcium transient was recorded again 30 min later. We measured averaged peak counts (beating rates (per min)) and peak width durations at 80% repolarization (PWD80 (ms)). All data analysis was performed using GraphPad Prism 7.04 (GraphPad Software).

PRS derivation and performance

First, we divided our dataset into the following three groups: (1) a discovery group to derive and validate PRS (6,890 cases and 49,451 controls), (2) a test group to assess PRS performance (2,953 cases and 21,194 controls) and (3) a group for the survival analysis (70,645 controls) (Extended Data Fig. 10). To secure independence between the PRS derivation and validation, we used a tenfold cross-validation approach. Next, we randomly split a discovery group into ten subgroups and used nine of these subgroups for PRS derivation and the remaining one for PRS validation. For each derivation cohort, we performed GWASs in combination with one Japanese and two European GWASs—(1) a population-specific GWAS (BBJ, EUR and FIN), (2) European meta-GWAS (EUR + FIN) and (3) the cross-ancestry meta-GWAS (BBJ + EUR, BBJ + FIN and BBJ + EUR + FIN). Meta-analyses were performed using the fixed effect and the random effect models by METASOFT software. We derived PRS using the pruning and thresholding method and the LDpred2 algorithm. For the pruning and thresholding method, in addition to the meta-analysis models, we applied the P value thresholds as 0.5, 5.0 × 10−2, 5.0 × 10−4, 5.0 × 10−6 and 5.0 × 10−8, and the r2 thresholds as 0.8, 0.5 and 0.2. For LDpred2, the variants were restricted to HapMap3 SNPs as recommended47, and we ran the LDpred2-grid model with the parameters: p (proportion of causal variants) in a sequence of five values from 10−4 to 1 on a log-scale and sparse option (true or false). We did not tune the parameter for the SNP heritability h2 because the different samples in each derivation cohort did not enable us to determine the optimal h2. For the LD reference, we used 1KG East Asian (EAS) and 1KG European (EUR) populations according to each cohort population as follows: (1) 1KG EAS for a cohort with only East Asian (BBJ), (2) 1 KG EUR for cohorts with only European (EUR, FIN and EUR + FIN) and (3) both 1KG EAS and 1KG EUR for cohorts with multiple ancestries (BBJ + EUR, BBJ + FIN and BBJ + EUR + FIN). Subsequently, we calculated PRSs in the withheld validation cohorts and repeated this procedure ten times by changing the withheld validation cohorts. Finally, we constructed 376 PRSs in total; the PRS with the best performance for each cohort is shown in Fig. 5. The performance of the PRS was measured as (1) Nagelkerke’s pseudo R2 obtained by modeling age, sex, the top 20 PCs and normalized PRS and (2) the area under the curve of the receiver operator curve in the same model as Nagelkerke’s pseudo R2. The best model/parameter set for each combination model (BBJ, EUR, FIN, BBJ + EUR, BBJ + FIN, EUR + FIN and BBJ + EUR + FIN) was determined by averaging Nagelkerke’s pseudo R2 (Supplementary Table 9). Using the best models and parameters determined in the derivation and validation cohorts, we calculated the PRSs and assessed their performance for the independent test cohort. To evaluate the PRS performance in the test cohort, we performed bootstrap over the samples in the test cohort with 5.0 × 104 replicates and assessed Nagelkerke’s pseudo R2 and the area under the curve of the receiver operator curve in each bootstrap group. Before the comparison of the combination models, we evaluated the performance of the base model, which included age, sex and the top 20 PCs (Supplementary Table 9). Next, to compare the performance of the PRS derived from each combination model, we calculated the pairwise difference of Nagelkerke’s pseudo R2R2) between each pair of models (two of seven models; 21 combinations) and obtained the two-sided bootstrap P value by counting the number of ΔR2 ≤ 0 or ΔR2 > 0 and then multiplied the lower value by the minimum estimated P value (2 × 1/(5.0 × 104) = 4 × 10−5: two-sided). The significance was set at P = 2.3 × 10−3 (0.05/21).

Association of AF-PRS with relevant phenotypes

We extracted AF case samples with available data on age at AF onset (n = 7,458, the median age of AF onset was 63 years of age (IQR = 56–71)) and constructed a linear regression model of age at AF onset including AF-PRS as a dichotomous variable (individual with high PRS (the top 1%, 5%, 10% and 20%) versus those with the remaining PRS) to estimate the difference in the age of AF onset between them adjusted by sex and the top 20 PCs. For the association analysis with stroke phenotypes, we performed a logistic regression analysis adjusted by the use of anticoagulants or antiplatelets in addition to age, sex and the top 20 PCs, because antithrombotic therapy is associated with a decreased risk of ischemic stroke and an increased risk of hemorrhagic stroke. We selected the control samples with available data on antithrombotic therapy (n = 121,351), among whom we found 14,120 stroke phenotypes: 8,547 cerebral infarction, 111 cardioembolic strokes, 1,429 atherothrombotic infarction, 1,230 lacunar infarction, 1,061 cerebral hemorrhages and 879 subarachnoid hemorrhages.

Survival analysis

The Cox proportional hazards model was used to assess the association between AF-PRS and long-term mortality. We obtained survival follow-up data with the ICD-10 code for 132,737 individuals from the BBJ dataset48,49. The causes of death were classified into three categories according to ICD-10 codes as follows: (1) primary outcome for all-cause death, (2) secondary outcome for cardiovascular death (100–199) and noncardiovascular death (not 100−199) and (3) tertiary outcome for heart failure death (150), ischemic heart disease death (120–125) and stroke death (160, 161, 163 and 164). The median follow-up period was 8.4 years (IQR 6.8–9.9). The Cox proportional hazards model was adjusted for sex, age, the top 20 PCs and disease status. Analyses were performed with the R package survival v.2.44, and survival curves were estimated using the R package survminer v.0.4.6, with modifications.

Mendelian randomization

We extracted summary statistics from the UK Biobank (http://www.nealelab.is/uk-biobank/). To avoid sample overlap, we selected AF-associated variants from the cross-ancestry meta-analysis in combination with BBJ and FIN, although the statistical power to detect the associations with AF decreased. To select independent variants for exposure, genome-wide significant variants (P < 5 × 10−8) were pruned (r2 < 0.01; LD window of 10,000 kb; using European samples of 1KG for LD reference)50. For the assessment of the causal effect of AF on cardiovascular diseases, we excluded variants associated with cardiovascular risk factors such as hypertension, cholesterol, diabetes mellitus and smoking and those with cardiovascular diseases from the list of instrument variables to avoid the pleiotropic effects of them using PhenoScanner V2 (http://www.phenoscanner.medschl.cam.ac.uk/). Then, we performed MR analysis using TwoSampleMR package in R v4.0.3, with AF-associated variants as instrument variables and variants associated with cardiovascular diseases as outcome variables. Next, we assessed the causal effects of quantitative traits related to anthropometry, metabolites, serum protein, kidney function, liver function, hematocyte count and blood pressure on AF. To exclude variants with pleiotropic effects, we also used PhenoScanner to identify variants associated with risk factors for AF such as hypertension, diabetes mellitus and obesity from the list of instrument variables, unless exposure was a risk factor itself. Then, we performed MR analysis, where variants associated with quantitative traits were used as instrument variables and AF-associated variants as outcome variables. Causal estimates were based on the inverse-variance-weighting (IVW) method. To exclude horizontal pleiotropic outliers, we performed MR-PRESSO for instrument variables51. We also calculated Cochran’s Q statistics for heterogeneity between the causal effects using IVW and the MR-Egger intercept for directional pleiotropy.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.