Deep learning of left atrial structure and function provides link to atrial fibrillation risk

Increased left atrial volume and decreased left atrial function have long been associated with atrial fibrillation. The availability of large-scale cardiac magnetic resonance imaging data paired with genetic data provides a unique opportunity to assess the genetic contributions to left atrial structure and function, and understand their relationship with risk for atrial fibrillation. Here, we use deep learning and surface reconstruction models to measure left atrial minimum volume, maximum volume, stroke volume, and emptying fraction in 40,558 UK Biobank participants. In a genome-wide association study of 35,049 participants without pre-existing cardiovascular disease, we identify 20 common genetic loci associated with left atrial structure and function. We find that polygenic contributions to increased left atrial volume are associated with atrial fibrillation and its downstream consequences, including stroke. Through Mendelian randomization, we find evidence supporting a causal role for left atrial enlargement and dysfunction on atrial fibrillation risk.


Results
First, we were able to replicate previous observations demonstrating associations between greater LA 1 volume and cardiovascular diseases [7][8][9][10]19,20 . Participants with a history of AF had larger LA volumes; and 2 participants with larger LA volumes were more likely to be subsequently diagnosed with AF, stroke, or 3 heart failure. 4 5 Second, these measurements enabled the largest genetic analysis to date of LA measurements. To our 6 knowledge, one locus (near NPR3) has previously been associated at genome-wide significance with LA 7 measurements 25 . In this work, 20 distinct genetic loci were associated with LAmax, LAmin, LAEF, 8 LASV, or the BSA-indexed versions of these phenotypes. Forty percent of these loci (8 of 20) were 9 previously associated with AF, significantly more than expected by chance. 32 . At all 8 loci, the allele 10 associated with increased AF risk was directionally associated with a lower LAEF, and generally with 11 greater LA volumes. The uniformly opposed effect directions of these SNPs for AF risk and LAEF may 12 be consistent with the concept of atrial cardiomyopathy 22 . 13

14
As an example of the pattern of opposed SNP effects on LAEF and AF risk, we identified a missense 15 variant within CASQ2 (rs4074536; p.Thr66Ala) as a lead SNP for LAEF on chromosome 1. The T allele 16 of this SNP (encoding Thr66) corresponds with a reduced LAEF in our GWAS, and with reduced 17 expression of CASQ2 in the right atrial appendage and left ventricle in GTEx 37 . This variant is also in LD 18 (r 2 =1.0) in non-African 1KG populations for the AF lead SNP rs4484922 32,38 . In the study by Roselli and 19 colleagues, the rs4484922-G allele is associated with an increased risk for AF; notably, that risk-20 increasing allele corresponds to the LAEF-reducing T allele of rs4074536. The rs4074536-T allele has 21 also previously been associated with a longer QRS complex duration 39,40 . CASQ2 encodes calsequestrin 2, 22 which resides in the sarcoplasmic reticulum in abundance and binds to calcium ions during the cardiac 23 cycle. Missense variants in this gene have also been associated with catecholamine-induced polymorphic 24 ventricular tachycardia, typically following a recessive inheritance pattern 41,42 . 25 26 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted August 5, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 Even among LA-associated loci that were not previously associated with AF, several showed the same 1 consistent pattern of inverse effect between AF risk and LAEF (e.g., near NPR3, SSSCA1, and HMGA2). 2 However, this pattern did not uniformly hold. For example, at the gene-dense locus near 3 FBXO46/DMWD/RPSH6A, the LA volume-increasing (and LAEF-decreasing) variants were weakly 4 associated with decreased AF risk. 5 6 Also notable was the PITX2 locus, which was the first locus associated with AF. In the present GWAS, 7 SNPs at that locus were associated with BSA-indexed LAmax and LAmin. The lead SNP for AF 8 (rs2129977 from Roselli, et al, 2018) was in close LD with the lead SNP for LAmax and LAmin 9 (rs2634073; r 2 = 0.85) 32,38 . Consistent with clinical expectations, the AF risk allele was associated with 10 greater left atrial maximum and minimum volumes. These analyses excluded participants with a history 11 of AF or abnormal atrial contraction on MRI; therefore, these results support the hypothesis that the 12 PITX2 locus may be associated with an increase in LA volume that occurs prior to AF onset. 13 14 Fourth, we developed polygenic scores to gain additional insight into the relationship between LA 15 volumes and cardiovascular diseases. A genome-wide 1.1-million variant AF PRS derived from 16 Christophersen, et al, 2017 was associated with all of the LA phenotypes-and most strongly with 17 LAmin-even after excluding participants known to have AF 35 . This genetic evidence is consistent with 18 and extends prior observational evidence, and suggests that some of the genetic drivers of AF risk may 19 manifest in ways that are detectable in LA size and function. 20 A 1.1-million variant polygenic predictor of BSA-indexed LAmin was modestly associated with incident 22 AF (Figure 6), and weakly with stroke, in the UK Biobank. The score was also associated with heart 23 failure-an association which was almost completely attenuated after excluding participants who were 24 diagnosed with AF prior to heart failure. This attenuation suggests that much of the heart failure 25 association may be mediated through AF. 26 1 Finally, we found strong evidence of genetic correlation between LA phenotypes and AF. We pursued 2 Mendelian randomization analyses to more formally assess the hypothesis of bidirectional causation 3 between LA phenotypes and AF. These revealed strong evidence of a causal effect of AF on LAmin, as 4 has been previously observed 11 . There was also evidence that LA volumes, particularly LAmin, may be 5 causal for AF. The causal effect persisted even after excluding three variants associated with at least one 6 risk factor from CHARGE-A 4 . However, because AF can be paroxysmal and remain undiagnosed, we 7 cannot exclude the possibility of cryptic reverse causation: namely, that some participants may have had 8 larger atria because of undiagnosed paroxysmal AF, such that AF itself induced the genetic association 9 with LA volumes. 10

11
This study has several limitations. All LA measurements were derived from deep learning models of 12 cardiovascular MRI. Because a complete trans-axial stack of atrial images was not part of the UK 13 Biobank imaging protocol, the LA measurements are estimates that are interpolated from cross sections of 14 the LA. Because contrast protocols were not used during image acquisition, we were not able to ascertain 15 atrial fibrosis. The deep learning models have not been tested outside of the specific devices and imaging 16 protocols used by the UK Biobank and are unlikely to generalize to other data sets without fine tuning. 17 Disease labels were determined by diagnostic and procedural codes; because AF can be paroxysmal and 18 may go undetected, it is likely that a subset of the participants had undiagnosed AF prior to MRI, which 19 would bias causal estimates of the impact of LA volume on disease risk away from the null. The study 20 population was largely composed of people of European ancestries, limiting generalizability of the 21 findings to global populations. The participants who underwent MRI in the UK Biobank tended to be 22 healthier than the remainder of the UK Biobank population, which itself is likely to be healthier than the 23 general population. At present, there is little follow-up time subsequent to the first MRI visit for most UK 24 Biobank participants. 25 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Conclusions 1
Measures of LA structure and function are heritable traits that are associated with AF, stroke, and heart 2 failure. Genetic predictors of LA volume are linked to an elevated risk of AF and, to a lesser extent, 3 stroke and heart failure. In future work, it will be interesting to determine if targeting the genes and 4 pathways associated with abnormalities in LA function will be helpful to reduce the risk of AF, heart 5 failure, and stroke. 6 7 8 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted August 5, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 Methods 1 Study design 2 Except where otherwise stated, all analyses were conducted in the UK Biobank, which is a richly 3 phenotyped, prospective, population-based cohort that recruited 500,000 participants aged 40-69 years in 4 the UK via mailer from 2006-2010 43 . We analyzed 487,283 participants with genetic data who had not 5 withdrawn consent as of February 2020. Access was provided under application #7089 and approved by 6 the Partners HealthCare institutional review board (protocol 2019P003144). 7 8 Statistical analyses were conducted with R version 3.6 (R Foundation for Statistical Computing, Vienna, 9 Austria). 10

11
At the time of this study, the UK Biobank had released images in over 45,000 participants of an imaging 12 substudy that is ongoing 26,27 . Cardiovascular magnetic resonance imaging was performed with 1.5 Tesla 13 scanners (Syngo MR D13 with MAGNETOM Aera scanners; Siemens Healthcare, Erlangen, Germany), 14 and electrocardiographic gating for synchronization 27 . Several cardiac views were obtained. For this 15 study, four views (the long axis two-, three-, and four-chamber views, as well as the short axis view) were 16 used. In these views, balanced steady-state free precession cines, consisting of a series of 50 images 17 throughout the cardiac cycle for each view, were acquired for each participant 27 . For the three long axis 18 views, only one imaging plane was available for each participant, with an imaging plane thickness of 19 6mm and an average pixel width and height of 1.83mm. For the short axis view, several imaging planes 20 were acquired. Starting at the base of the heart, 8mm-thick imaging planes were acquired with 21 approximately 2mm gaps between each plane, forming a stack perpendicular to the longitudinal axis of 22 the left ventricle to capture the ventricular volume. For the short axis images, the average pixel width and 23 height was 1.86mm. 24 Semantic segmentation and quality control 1 We labeled pixels using a process similar to that described in our prior work evaluating the thoracic 2 aorta 44 . Cardiac structures were manually annotated in images from the short axis view and the two-, 3 three-, and four-chamber long axis views from the UK Biobank by a cardiologist (JPP). To produce the 4 models used in this manuscript, 714 short axis images were chosen, manually segmented, and used to 5 train a deep learning model with PyTorch and fastai v1.0.61 28,45 . The same was done separately with 98 6 two-chamber images, 66 three-chamber images, and 445 four-chamber images. The models were based 7 on a U-Net-derived architecture constructed with a ResNet34 encoder that was pre-trained on ImageNet 46-8 49 . The Adam optimizer was used 50 . The models were trained with a cyclic learning rate training policy 51 . 9 80% of the samples were used to train the model, and 20% were used for validation. Held-out test sets 10 with images that were not used for training or validation were used to assess the final quality of all 11 models. 12 13 Four separate models were trained: one for each of the three long axis views, and one for the short axis 14 view. During training, random perturbations of the input images (augmentations) were applied, including 15 affine rotation, zooming, and modification of the brightness and contrast. 16 For the short axis images, all images were resized initially to 104x104 pixels during the first half of 18 training, and then to 224x224 pixels during the second half of training. The model was trained with a 19 mini-batch size of 16 (with small images) or 8 (with large images). Maximum weight decay was 1E-03. 20 The maximum learning rate was 1E-03, chosen based on the learning rate finder 28,52 . A focal loss function 21 was used (with alpha 0.7 and gamma 0.7), which can improve performance in the case of imbalanced 22 labels 53 . When training with small images, 60% of iterations were permitted to have an increasing 23 learning rate during each epoch, and training was performed over 30 epochs while keeping the weights 24 for all but the final layer frozen. Then, all layers were unfrozen, the learning rate was decreased to 1E-07, 25 and the model was trained for an additional 10 epochs. When training with large images, 30% of 26 iterations were permitted to have an increasing learning rate, and training was done for 30 epochs while 1 keeping all but the final layer frozen. Finally, all layers were unfrozen, the learning rate was decreased to 2 1E-07, and the model was trained for an additional 10 epochs. The semantic segmentation model training 3 hyperparameters for the two-, three-, and four-chamber long axis images were similar, and are detailed in 4 the Supplementary Note. 5

6
Each model was applied to all available images from its respective view that were available in the UK 7 Biobank as of November 2020. 8

Poisson surface reconstruction 9
To integrate the output from each of the four models into one LA volume estimate, Poisson surface 10 reconstruction was performed. Among the views included in the UK biobank cardiac MRI dataset, none 11 fully captures the 3-D anatomical structure of the LA. The short axis stack only occasionally included the 12 lower portion of the chamber, while the three long-axis (i.e., two-, three-, and four-chamber) views 13 provided only single-slice cross-sections of the LA at different orientations. To integrate information from 14 the four incomplete MRI views into a consistent 3D representation of the LA anatomy, we followed a 15 procedure similar to Pirruccello et al. (2021) 54 . Briefly, we first co-rotated the MRI views into the same 16 reference system using standard DICOM metadata (i.e., from the Image Position (Patient) [0020,0032] 17 and Image Orientation (Patient) [0020,0037] tags). Then, we applied the Poisson surface reconstruction 18 algorithm 55 to interpolate 3-D surfaces through the points marking the boundaries of the LA chamber 19 segmentations. In addition to the interpolation point coordinates, the Poisson algorithm requires as input 20 the local normal directions, which constrain the curvature of the reconstructed surface. In our approach, 21 we assumed that the normals lie onto the MRI view planes and are radially oriented outwards from the 22 center of gravity of the LA segmentation. 3D surfaces of the LA were reconstructed for each of the 50 23 MRI frames captured during the cardiac cycle. At each timepoint, the volume of the LA was computed 24 using routines for triangulated meshes included in the VTK library (Kitware Inc.). From the reconstructed 25 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted August 5, 2021. ; https://doi.org/10.1101/2021.08.02.21261481 doi: medRxiv preprint volume traces, we estimated the maximum and minimum LA volumes, as well as LA stroke volume and 1 emptying fraction. 2 Identification of abnormal atrial contraction patterns 3 We sought to identify participants with abnormal atrial contraction patterns at the time of acquisition of 4 the magnetic resonance images. Although the imaging protocol was ECG-gated, the instantaneous ECG 5 signal was not available. Therefore, we used the filling patterns of the atrium and ventricle as markers of 6 normal filling. 7 8 To create a training set, we first pulled CINE videos from the 2-, 3-, and 4-chamber long axis views of all 9 participants with a history of AF. A cardiologist (JPP) evaluated whether the videos appeared to represent 10 a typical cardiac cycle including an atrial contraction. A deep learning model was then trained to classify 11 filling patterns as representing a normal atrial contraction or not. Each input channel represented the pixel 12 counts of a cardiac chamber from a different long-axis view, divided by the maximum number of pixels 13 seen for each channel for that participant, over the entire cardiac cycle. This approach prevented the 14 model from accessing information about the absolute size of the chambers, forcing it instead to identify 15 patterns based on relative size differences throughout the cardiac cycle. In total, 8 channels were used as 16 input: four from the 4-chamber long axis images (left atrium, right atrium, left ventricle, right ventricle), 17 two from the 3-chamber long axis images (left atrium, left ventricle), and two from the 2-chamber long 18 axis images (left atrium, left ventricle). Cases were excluded if all 8 channels were not available. 19 Therefore, the shape of the input was 50x8. Training was performed with FastAI version 2.2.5 28 , using the 20 TimeseriesAI library version 0.2.15 (github.com/timeseriesAI/tsai) to train an InceptionTime model 56 . 21 The Ranger optimization function was used with cross entropy loss, and the number of filters in the 22 InceptionTime model was 32, all of which are the software defaults in the TimeseriesAI library. Ranger 23 incorporates RAdam and Lookahead to improve training stability early and later during training, 24 respectively 57,58 . 20% of samples were randomly chosen as the validation set. The model was trained with 25 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted August 5, 2021. ; https://doi.org/10.1101/2021.08.02.21261481 doi: medRxiv preprint a batch size of 32. Variable learning rates from 5E-06 to 5E-03 were permitted during training. Training 1 was conducted using the One-Cycle policy for 20 epochs 51,52 . 2 Evaluation of the relationship between the left atrium and cardiovascular 3 diseases 4 We focused on three disease definitions related to LA structure and function: AF or flutter, ischemic 5 stroke, and heart failure (defined below in Online Methods). For prevalent disease that was diagnosed 6 prior to the time of imaging, linear models were used to test for an association between each disease (as a 7 binary independent variable) and LA phenotypes (as the dependent variables), adjusting for the MRI 8 serial number, sex, age, and the interaction between sex and age. 9 For incident disease, participants with pre-existing diagnoses prior to the MRI were excluded from the 11 analysis. A Cox proportional hazards model was used, with survival defined as the time between MRI and 12 either the time of censoring, or disease diagnosis. The model was adjusted for the MRI serial number, sex, 13 age, the interaction between sex and age, the cubic natural spline of height, the cubic natural spline of 14 weight, and the cubic natural spline of BMI. As a sensitivity analysis, adjustment was additionally made 15 for heart rate, P duration, QRS duration, P-Q interval, QTc interval, left ventricular end systolic volume, (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted August 5, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 Participants without imputed genetic data, or with a genotyping call rate < 0.98, mismatch between self-1 reported sex and sex chromosome count, sex chromosome aneuploidy, excessive third-degree relatives, or 2 outliers for heterozygosity were excluded from genetic analysis 59 . Participants were also excluded from 3 genetic analysis if they had a history of AF or flutter, hypertrophic cardiomyopathy, dilated 4 cardiomyopathy, heart failure, myocardial infarction, or coronary artery disease documented prior to the 5 time they underwent cardiovascular magnetic resonance imaging at a UK Biobank assessment center. Our 6 definitions of these diseases in the UK Biobank are provided in Supplementary Table 12. 7 Genome-wide association study of the left atrium 8 We analyzed four primary LA phenotypes, as well as LAmax, LAmin, and LASV estimates that were 9 adjusted for BSA or LVEDV. In total, we conducted 10 genome-wide association studies with these 10 traits. Before conducting genetic analyses, a rank-based inverse normal transformation was applied 60 . All 11 traits were adjusted for sex, age at enrollment, age and age 2 at the time of MRI, the first 10 principal 12 components of ancestry, the genotyping array, and the MRI scanner's unique identifier. with covariate adjustment as noted above. Associations on the X chromosome were also analyzed, using 20 all autosomal SNPs and X chromosomal SNPs to construct the GRM (N=732,214 SNPs), with the same 21 covariate adjustments and significance threshold as in the autosomal analysis. In this analysis mode, 22 BOLT treats individuals with one X chromosome as having an allelic dosage of 0/2 and those with two X 23 chromosomes as having an allelic dosage of 0/1/2. Variants with association P < 5·10 -8 , a commonly used 24 threshold, were considered to be genome-wide significant. 25 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted August 5, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 We identified lead SNPs for each trait. Linkage disequilibrium (LD) clumping was performed with 2 PLINK-1.9 63 using the same participants used for the GWAS. We outlined a 5-megabase window (--3 clump-kb 5000) and used a stringent LD threshold (--r2 0.001) in order to account for long LD blocks. 4 With the independently significant clumped SNPs, distinct genomic loci were then defined by starting 5 with the SNP with the strongest P value, excluding other SNPs within 500kb, and iterating until no SNPs 6 remained. Independently significant SNPs that defined each genomic locus are termed the lead SNPs. Linkage disequilibrium (LD) score regression analysis was performed using ldsc version 1.0.0 29 . With 11 ldsc, the genomic control factor (lambda GC) was partitioned into components reflecting polygenicity and 12 inflation, using the software's defaults. 13 Genetic correlation with atrial fibrillation 14 We used ldsc version 1.0.1 to perform cross-trait LD score regression to estimate genetic correlation 15 between the LA measurements, atrial fibrillation (from Roselli, et al, 2018), and all-cause or 16 cardioembolic stroke (from Malik, et al, 2018) [31][32][33] . Summary stats were pre-processed with the 17 munge_sumstats.py script from ldsc 1.0.1 using the default settings, filtering out variants with imputation 18 INFO scores less than 0.9 or minor allele frequencies below 0.01, as well as strand-ambiguous variants. 19

Overlap of left atrial loci with atrial fibrillation loci 20
We identified the gene nearest to SNPs associated with AF from Supplementary Table 16 of Roselli,et 21 al 32 . For this exercise, we used each of the 134 SNPs that achieved association P < 5E-8 in the primary 22 GWAS (column 'I') or in the meta-analysis (column 'AD'). We counted the number of AF nearest genes 23 that fell within 500kb of the LA lead SNPs from our study. We used SNPsnap to generate 10,000 sets of 24 SNPs that matched the LA lead SNPs based on parameters including minor allele frequency, SNPs in 25 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted August 5, 2021. ; https://doi.org/10.1101/2021.08.02.21261481 doi: medRxiv preprint linkage disequilibrium, distance from the nearest gene, and gene density 34 . We then repeated the same 1 counting procedure for each of the 10,000 synthetic SNPsnap lead SNP lists, to set a neutral expectation 2 for the number of overlapping AF nearest genes based on chance. This allowed us to compute a one-tailed 3 permutation P value (with the most extreme possible P value based on 10,000 randomly chosen sets of 4 SNPs being 1E-04). 5 6 We sought to assess a potential causal relationship between LAmin and AF using Mendelian 7 randomization (MR). We considered LAmin as the exposure and AF as the outcome. The genetic 8 instruments for LAmin were generated using the genome-wide association results from this analysis. The 9

Mendelian randomization
variants from the exposure summary statistics were clumped with P < 1E-06, r 2 < 0.001, and a radius of 5 10 megabases using the TwoSampleMR package in R 64 . The variants with ambiguous alleles were removed. 11 19 variants were harmonized with a large AF GWAS that did not include UK Biobank participants 35 . The 12 inverse variance weighted (IVW) method was performed as the primary MR analysis. We also performed 13 simple median, weighted median, MR-Egger, and MR-PRESSO to account for violations of the 14 instrumental variable assumptions. Since MR-Egger provides robust estimates under the InSIDE 15 (Instrument Strength Independent of Direct Effect) assumption, we additionally conducted the MR-Egger 16 bootstrap method to confirm the results from MR-Egger. 17

18
To assess risk of pleiotropy of the LA genetic instruments, each SNP was tested for association with risk 19 factors from CHARGE-AF 4 within the same participants in which the GWAS was conducted. Association 20 between each of the 19 variants and seven risk factors (height, weight, systolic blood pressure, diastolic 21 blood pressure, use of antihypertensive medications [ascertainment described below in Online Methods], 22 diagnosis of diabetes, and current smoking) was tested in a linear regression model that accounted for age 23 and age 2 at the time of MRI, sex, the MRI serial number, the genotyping array, and genetic principal 24 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

3.8E-04). 2 3
To understand the bidirectional causal effects, we also performed an MR analysis using AF variants from 4 the 2017 GWAS as the exposure and LAmin as the outcome. After applying the same clumping threshold 5 and filtering methods to AF summary statistics, 36 remaining variants were harmonized with the LAmin 6 association results and used to construct the instrumental variable. The primary and sensitivity analyses 7 were then conducted in the same manner as described above. 8

Polygenic risk analysis 9
A polygenic score for the LAmin GWAS was computed using PRScs with a UK Biobank European 10 ancestry linkage disequilibrium panel 36 . This method applies a continuous shrinkage prior to the SNP 11 weights. PRScs was run in 'auto' mode on a per-chromosome basis. This mode places a standard half-12 Cauchy prior on the global shrinkage parameter and learns the global scaling parameter from the data; as 13 a consequence, PRScs-auto does not require a validation data set for tuning. Based on the software default 14 settings, only the 1.1 million SNPs found at HapMap3 sites that were also present in the UK Biobank 15 were permitted to contribute to the score. 16

17
This score was applied to the entire UK Biobank. Participants related within 3 degrees of kinship to those 18 who had undergone MRI, based on the precomputed relatedness matrix from the UK Biobank, were 19 excluded from analysis 59 . We analyzed the relationship between this polygenic prediction of the LAmin 20 and incident diagnoses of AF in the UK Biobank using a Cox proportional hazards model as implemented 21 by the R survival package 65 . We excluded participants with disease that was diagnosed prior to enrollment 22 in the UK Biobank. We counted survival as the number of years between enrollment and disease 23 diagnosis (for those with disease) or until death, loss to follow-up, or end of follow-up time (for those 24 without disease). We adjusted for covariates including sex, the cubic basis spline of age at enrollment, the 25 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Definitions of diseases and medications 4
We defined AF or flutter, dilated cardiomyopathy, hypertrophic cardiomyopathy, heart failure, diabetes, 5 and ischemic stroke based on self report, ICD codes, and procedural codes (Supplementary Table 12). 6 The data were obtained from the UK Biobank in June 2020, at which time the recommended phenotype 7 censoring date was March 31, 2020. The UK Biobank defines that date as the last day of the month for 8 which the number of records is greater than 90% of the mean of the number of records for the previous 9 three months ( https://biobank.ndph.ox.ac.uk/ukb/exinfo.cgi?src=Data_providers_and_dates ). 10

11
We identified participants taking antihypertensive medications based on the Anatomical Therapeutic 12 Classification (ATC) 66 . Medications taken by UK Biobank participants were previously mapped to ATC 13 codes 67 . We considered medications with ATC codes beginning with C02, C09, C08CA, C03AA, 14 C08CA01, or C03BA04 to be antihypertensives (medication names enumerated in Supplementary Table  15 13). 16 Acknowledgments 17 We would like to thank Mary O'Reilly from the Broad Institute PATTERN Team for contributing to the 18 graphical overview in Figure 1. 19 20 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

16.
Tsang TS, Barnes ME, Bailey  (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted August 5, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021  (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted August 5, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 14 arXiv:191201703 [cs, stat]  (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted August 5, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021  (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Code availability
The copyright holder for this preprint this version posted August 5, 2021  "Indexed" indicates that the trait has been divided by body surface area. 3 4 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted August 5, 2021. ; https://doi.org/10.1101/2021.08.02.21261481 doi: medRxiv preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted August 5, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021  and then further into groups that do and do not appear to have normal atrial contraction patterns. In the 4 right panel, the LAmin volume is depicted for these groups with violin plots; the median for each group 5 is demarcated with a vertical line. 6 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted August 5, 2021. ; https://doi.org/10.1101/2021.08.02.21261481 doi: medRxiv preprint  (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted August 5, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 Manhattan plots show the chromosomal position (X-axis) and the strength of association (-log10 of the P 1 value, Y-axis) for all raw and BSA-indexed phenotypes. Loci that contain SNPs with P < 5E-08 are 2 colored red and labeled with the name of the nearest gene to the most strongly associated SNP. 3 All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint this version posted August 5, 2021. ; https://doi.org/10.1101/2021.08.02.21261481 doi: medRxiv preprint Black dots represent an association P < 5E-8; gray dots represent P < 5E-6. Effect sizes are oriented with 5 respect to the minor allele. Effect size for AF loci represents the logarithm of the odds ratio. 6 s.
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.