Genome-wide association study of thoracic aortic aneurysm and dissection in the Million Veteran Program

The current understanding of the genetic determinants of thoracic aortic aneurysms and dissections (TAAD) has largely been informed through studies of rare, Mendelian forms of disease. Here, we conducted a genome-wide association study (GWAS) of TAAD, testing ~25 million DNA sequence variants in 8,626 participants with and 453,043 participants without TAAD in the Million Veteran Program, with replication in an independent sample of 4,459 individuals with and 512,463 without TAAD from six cohorts. We identified 21 TAAD risk loci, 17 of which have not been previously reported. We leverage multiple downstream analytic methods to identify causal TAAD risk genes and cell types and provide human genetic evidence that TAAD is a non-atherosclerotic aortic disorder distinct from other forms of vascular disease. Our results demonstrate that the genetic architecture of TAAD mirrors that of other complex traits and that it is not solely inherited through protein-altering variants of large effect size.


Supplementary Figure 7 -Area under the curve for TAAD prediction for the continuous PRS per standard deviation and deleterious variants on TAAD risk in the CHIP+MGI Cohort (N = 1,842 cases and 1,887 controls with both genotyping and targeted or whole exome sequencing)
Supplementary Figure 7 -Area under the curve (AUC) performance in the CHIP-MGI cohort of 1) a "baseline model" of age, sex, and principal components, 2) the baseline model plus a set of rare TAAD risk variants that were manually curated as "pathogenic or likely pathogenic" for Heritable TAAD according to ACMG best practices 1 , and 3) the baseline model plus PRS per standard deviation increase constructed based on the MVP discovery TAAD summary statistics (7,

Quality Control Analysis
In MVP, we excluded: duplicate samples, samples with more heterozygosity than expected, an excess (>2.5%) of missing genotype calls, or discordance between genetically inferred sex and phenotypic gender. In addition, one individual from each pair of related individuals (kinship > 0.0884 as measured by the KING 6 software) were removed. Veterans were then divided into three mutually exclusive ethnic groups based on DNA extracted from whole blood was genotyped in MVP using a customized Affymetrix Axiom biobank array, the MVP 1.0 Genotyping Array. Veterans were divided into three mutually exclusive populations using the HARE algorithm 7 : 1) non-Hispanics of European ancestry, 2) non-Hispanics of African ancestry, and 3) Hispanics. Prior to imputation, variants that were poorly called or that deviated from Hardy-Weinberg equilibrium or their expected allele frequency based on reference data from the 1000 Genomes Project 5 were excluded. After pre-phasing using SHAPE-IT4 8 , genotypes from the African Genome Resources reference panel were imputed into Million Veteran Program (MVP) participants via Minimac4 software 9 . Ethnicity-specific principal component analysis was performed using the EIGENSOFT v6 software 10 . Following imputation, variant level quality control was performed using the EasyQC R package 11 (www.R-project.org), and exclusion metrics included: imputation quality <0.3, minor allele frequency (MAF) < 0.005, call rate < 97.5% for common variants (MAF > 1%), and call rate < 99% for rare variants (MAF < 1%). Variants were also excluded if they deviated > 10% from their expected allele frequency based on reference data from the 1000 Genomes Project 5 . Quality control metrics for replication cohorts are depicted in Supplementary Table 22.

Analysis adjusting the TAAD association for DBP and standing height
To evaluate the conditional effects of the DBP and height associated variants on TAAD after accounting for these risk factors, we re-tested their association with TAAD after including DBP or standing height as a covariate in the association model in the European, African, and Hispanic ancestry participants in MVP, stratified by ancestry. Association analysis was performed using the REGENIE v2.0 statistical software program 12 adjusting for age, sex, and 5 principal components of ancestry as in the primary model.

Fine-mapped Transcriptome-wide Association Study (TWAS)
We performed a fine-mapped TWAS 13 using the FOCUS v0.6 software 14 . This technique leverages expression weights from bulk RNA-seq data from post-mortem aorta tissue from the Genotype-Tissue Expression project 15 (GTEx V6), and combined TAAD meta-analysis summary statistics yielding candidate causal genes from the GWAS results under the assumption that the causal mechanism of the tested genes involves changes in cis-expression. Briefly, this approach integrates information from expression reference panels (variant-expression correlation), GWAS summary statistics (variant-trait correlation), and linkage disequilibrium (LD) reference panels (variant-variant correlation) to assess the association between the cis-genetic component of expression and phenotype 13 . The results are then fine-mapped leveraging genetic variant finemapping approaches 16 into a 90%-credible set. We selected genes with a marginal posterior inclusion probability (PIP) > 0.2 as evidence identifying a candidate causal gene. In a sensitivity analysis, we restricted the input TAAD summary statistics to individuals of European ancestry, and our results were observed to be unchanged.

Colocalization Analysis
We identified genome-wide significant signals in our TAAD GWAS meta-analysis that successfully replicated and performed colocalization analysis using the coloc 17 tool. To identify putative causal genes and variants, we formally tested for shared association signals between expression quantitative trait loci (eQTLs) in GTEx bulk RNA-seq data from post-mortem aortic tissue (387 individuals from V8) and our TAAD meta-analysis summary statistics. We performed colocalization within a 1 mB window (+/-500 kB) around the lead TAAD risk variant, and defined a conditional probability of colocalization (PP4) of greater than or equal to 0.9 as significant. We then additionally report loci/variants in situations when the resultant 99% credible set of causal variants identified by the coloc software identified 5 or fewer causal variants.

MR-BMA Analysis
Genetic associations between BP traits (exposure) and the TAAD outcome were tested initially using inverse-variance weighted MR for a single BP exposure, and then using the MR-BMA methodology for multivariable models 18 . MR-BMA is an extension of multivariable MR utilizing a Bayesian variable selection method in an effort to identify likely causal risk factors among correlated exposures. In the primary analysis, the instrumental variables consisted of independent genetic variants (r 2 < 0.001 based on 1000 Genomes 5 European ancestry Reference Panel) associated with any BP trait at genome-wide significance in the Pan UKBB analysis 19 of up to 436,845 European-ancestry participants. Genetic associations with BP traits (SBP, DBP, PP, MAP) were used as exposures. The subsequent MR-BMA analysis was completed using TAAD GWAS summary statistics from the current study, with the exception of removing UK Biobank data from the TAAD GWAS summary statistics to minimize sample overlap. Variable selection was based on marginal inclusion probabilities for which an empirical permutation procedure was used to derive P values. The Nyholt procedure of effective tests was used to account for the strong correlation among the BP traits with a multiple testing-adjusted P value of P=0.05 set as the significance threshold 20 .
MR-BMA performs variable selection by evaluating models with all possible combinations of BP-related traits as exposures and computing the posterior probability that the model contains the true causal risk factors. Unlike other univariate or multivariable MR methods, MR-BMA aims to identify true causal risk factors among correlated traits, rather than estimate the magnitude of effect. The marginal inclusion probability (level of evidential support for each exposure) is derived from the sum of all posterior probabilities of the models where the specific exposure was included. We removed influential variants based on the Cook's distance and outliers based on the q-statistic as previously recommended 21 . An empirical permutation procedure was performed to calculate p-values. Briefly, the expected marginal inclusion probability distribution for each risk factor under the null hypothesis was generated by performing 1,000 permutations of the MR-BMA analysis, holding the SNP-risk factor associations constant and randomly permuting the SNP-outcome associations. The observed marginal inclusion probabilities for each risk factor were then compared to the expected distribution under the null, with p-values computed by pj = (rj +1)/(nperm + 1), where rj represents the rank of the observed marginal inclusion probability of a given risk factor (j) across all permutations (nperm = 1000). Adjustment for multiple testing was done using the Nyholt correction for correlated traits 20 .

Replication Cohort Descriptions
For each cohort, genotyping platform, quality control metrics, phenotype definitions, and participant counts/ancestry are provided in Supplementary Table 22.

CHIP-MGI
The Cardiovascular Health Improvement Project (CHIP) is a cohort of individuals treated at Michigan Medicine with linked genotype, EHR, and family history data. The Michigan Genomics Initiative (MGI) is a hospital-based cohort with linked genotype and EHR data from participants recruited during pre-surgical encounters at Michigan Medicine.
Penn Medicine Biobank Penn Medicine Biobank (PMBB) recruits patients from throughout the University of Pennsylvania Health System for genomic and precision medicine research. Participants actively consent to allow the linkage of biospecimens to their longitudinal EHR. Currently, >60 000 participants are enrolled in the PMBB. A further subset of ~23,000 subjects with imputed genotype data was used in this analysis.

UK Biobank
The UKB is a population-based cohort of approximately 500,000 participants recruited from 2006-2010 with existing genomic and longitudinal phenotypic data and median 10-year followup 22 . Baseline assessments were conducted at 22 assessment centres across the UK with sample collections including blood-derived DNA. Use of the data was facilitated through UK Biobank Application 7089.

MassGeneral Brigham Biobank
The MGBB contains genotypic and clinical data from >105,000 patients who consented to broad-based research across 7 regional hospitals and median 3-year follow-up 23 . Baseline phenotypes were ascertained from the electronic medical record and surveys.

HUNT
The Nord-Trøndelag Health Study (HUNT) is a population-based health survey conducted in the county of Nord-Trøndelag, Norway, since 1984. Individuals were included at three different time points during approximately 20 years of follow up.
University of Texas Health Science Center at Houston Data contributed from this study comprised a GWAS of 765 individuals with sporadic ascending aortic aneurysms or classic aortic dissection of the ascending or descending thoracic aorta (Stanford types A and B, respectively) who presented for treatment at the Texas Medical Center. The diagnosis of TAAD was confirmed by cross-sectional imaging in all subjects and by direct inspection during surgical repair in most subjects. Controls were individuals free of disease, as described in LeMaire et al 2 .