Shared genomic architecture between COVID-19 severity and numerous clinical and physiologic parameters revealed by LD score regression analysis

The COVID-19 pandemic has produced broad clinical manifestations, from asymptomatic infection to hospitalization and death. Despite progress from genomic and clinical epidemiology research, risk factors for developing severe COVID-19 are incompletely understood and identification of modifiable risk factors is desperately needed. We conducted linkage disequilibrium score regression (LDSR) analysis to estimate cross-trait genetic correlation between COVID-19 severity and various polygenic phenotypes. To attenuate the genetic contribution of smoking and BMI, we further conducted sensitivity analyses by pruning genomic regions associated with smoking/BMI and repeating LDSR analyses. We identified robust positive associations between the genetic architecture of severe COVID-19 and both BMI and smoking. We observed strong positive genetic correlation (rg) with diabetes (rg = 0.25) and shortness of breath walking on level ground (rg = 0.28) and novel protective associations with vitamin E (rg = − 0.53), calcium (rg = − 0.33), retinol (rg = − 0.59), Apolipoprotein A (rg = − 0.13), and HDL (rg = − 0.17), but no association with vitamin D (rg = − 0.02). Removing genomic regions associated with smoking and BMI generally attenuated the associations, but the associations with nutrient biomarkers persisted. This study provides a comprehensive assessment of the shared genetic architecture of COVID-19 severity and numerous clinical/physiologic parameters. Associations with blood and plasma-derived traits identified biomarkers for Mendelian randomization studies to explore causality and nominates therapeutic targets for clinical evaluation.


GWAS summary statistics for additional traits.
To estimate cross-trait genetic correlation patterns between COVID-19 disease severity and multiple polygenic traits, we harmonized publicly available GWAS summary-level data from the UK Biobank (UKBB), a prospective population-based cohort study consisting of ~ 500,000 individuals, aged 40-69 years, who were recruited in the United Kingdom between 2006 and 2010 15,16 . All methods were carried out in accordance with relevant guidelines and regulations.
GWAS summary-level data used for the LDSR analyses of UKBB traits are from publicly-posted results generated by the Neale lab (http:// www. neale lab. is/ uk-bioba nk/). These association analyses are adjusted with the first 20 principal components, which adjust for sources of population level variability in genetic allele frequencies. The GWAS summary-level data of UKBB used in our study are restricted to "British ancestry" using the first 6 principal components to determine "British ancestry" and further filtered by self-reported ethnicity with "white-British", "Irish", or "White". The sample sizes and more details for the tested traits are shown in Supplementary Table 2.
Estimating SNP-heritability and cross-trait genetic correlation of COVID-19. LD score regression analysis with 1000 Genomes Project European (EUR) samples as a reference for pattern of genome-wide LD quantifies the co-heritability of diverse traits 4,9,10,17 using GWAS summary statistics for common genetic variants (i.e., SNPs). In brief, LDSR method regresses χ 2 statistics from GWAS on LD scores, allowing the estimation Table 1. Study description.

Strata
Trait Sample size SNPs   A2 very severe, respiratory-confirmed COVID-19 patients versus population-based  controls  COVID-19 A2  707,407  1,140,193   B2 hospitalized COVID-19 patients versus population-based controls  COVID-19 B2  1,206,629  1,141,302   Exclusion of smoking-associated genomic regions   A2 very severe, respiratory-confirmed COVID-19 patients versus population-based  controls  COVID-19 A2⟂Smoke 707,407  1,001,866   B2 hospitalized COVID-19 patients versus population-based controls  COVID-19 B2⟂Smoke 1,206, www.nature.com/scientificreports/ of genetic correlation without bias due to population stratification or cryptic relatedness 4,9,10,18,19 . By regressing SNP-level associations for two traits, (i.e., the product of Z scores, Z COVID19_A2 × Z UKBB_BMI ) and weighting each SNP by its LD Score (an estimate of the amount of total genetic variation tagged by each variant), one can estimate the magnitude and direction of shared genomic architecture between these traits. To control the multiple testing burden, we restricted analyses to the tested UKBB traits showing heritability ≥ 1% and for which prior studies have suggested correlations between COVID-19 and risk for severe outcomes, or traits that were correlated with traits that have been associated with severe outcomes. We conservatively set the test-wise level of significance after Bonferroni correction to be 0.05/(6 × 64), adjusting for analysis of COVID-19 severity (A2 and B2) with 64 UKBB traits, with and without removal of BMI and Smoking SNPs. We first implemented the command option of LD Score (https:// github. com/ bulik/ ldsc; ldsc v1.0.1) with "munge_sumstats.py" to generate the ".sumstats" format from the GWAS summary statistics after ~ 1.14 M HapMap3 SNPs with MAF > 1% were selected for the analysis as recommended. Multi-allelic SNPs and the major histocompatibility complex (MHC) region (Chr6:25-34 Mb) were excluded from summary statistics because of the complex and unusual LD pattern and genetic architecture of the MHC region 4 . We then applied "ldsc.py -rg covid19.A2.sumstats.gz, trait1. sumstats.gz-ref-ld-chr eur_w_ld_chr/-w-ld-chr eur_w_ld_chr/-out covid19.A2_triat1".
Exclusion of genomic regions related to smoking behavior and BMI. Although a clearer picture is emerging, the contribution of cigarette smoking to COVID-19 disease severity remains incompletely understood, with most studies suggesting increased disease severity among former smokers versus never-smokers, but some studies observing a protective effect for current smoking 20 and others showing an increased risk for more severe symptoms in smokers 21 . Since smoking behaviors are heritable traits that correlate with many other complex diseases, we performed sensitivity analyses by excluding chromosomal regions (± 500 kb) around 473 SNPs previously associated with various smoking behaviors (⟂Smoke) to attenuate the genetic contribution of smoking-related variants 4 . The removed genomic regions related to cigarettes per day, smoking initiation, smoking cessation, initiation age of regular smoking, and nicotine dependence (Supplementary Tables 3 and 4).
Although obesity increases risk of systemic inflammation, pulmonary clots, stroke, and myocardial infarction, it remains unclear whether reported associations between BMI and COVID-19 disease severity are confounded by socioeconomic status or concurrent health issues. We performed sensitivity analyses by excluding genomic regions (± 500 kb) around 941 SNPs previously associated with BMI (⟂BMI) to attenuate the genetic contribution of BMI-related variants (Supplementary Tables 4 and 5).

Results
We implemented cross-trait LDSR analysis to examine shared genetic contributions to COVID-19 disease severity and multiple clinical and epidemiologic traits using pairwise genetic correlations (rg) and the observed-scale heritability (h 2 , representing the proportion of phenotypic variance explained by all common SNPs). The flow chart presented in Fig. 1 summarizes the steps from data preparation to LDSR analysis for COVID-19 severity versus 64 polygenic traits we studied. A prior GWAS analysis 13 of very severe, respiratory-confirmed COVID-19 (phenotype A2: critical illness; 4606 cases, 702,801 controls in only European descent) identified loci on chromosomes 3, 12, 17, 19, and 21 that reached genome-wide statistical significance (P < 5.0 × 10 −8 shown in the red horizontal line), with a genomic inflation factor of 1.047, and an estimated h 2 of 0.35%. Sensitivity analysis excluding chromosomal regions known to be associated with smoking reduced the genomic inflation factor to 1.041 and h 2 to 0.34%. Sensitivity analysis excluding chromosomal regions known to be associated with BMI increased the genomic inflation factor to 1.050 and h 2 to 0.35% (Fig. 2, Supplementary Table 6).
A prior GWAS analysis of hospitalized COVID-19 (phenotype B2: hospitalization; 9373 cases, 1,197,256 controls in only European-descent) identified loci on chromosomes 3, 12, 19, and 21 at genome-wide statistical significance, with a genomic inflation factor of 1.041, and an estimated h 2 of 0.19% (Fig. 2). Sensitivity analysis excluding chromosomal regions known to be associated with smoking reduced the genomic inflation factor to 1.038 and h 2 to 0.19%. Sensitivity analysis excluding chromosomal regions known to be associated with BMI reduced the genomic inflation factor to 1.035 and h 2 to 0.17% (Fig. 2, Supplementary Table 6).

Discussion
We investigated the genetic correlations between COVID-19 disease severity (A2:critical illness and B2:hospitalization) with a variety of clinical and physiologic traits using summary-level GWAS data from extremely large patient cohorts, observing shared genomic architecture with a number of illnesses and biomarkers of somatic well-being. We identify a suite of medical conditions and physiological traits that appear to share the genetic architecture with that of COVID-19 severity. Many of these traits overlap those previously identified in the large databases of COVID-19 patient outcomes, including traits related to adiposity, kidney function, and pulmonary insufficiency. We also identified additional traits that have received comparatively little attention, such as blood and serum levels of several vitamins and nutrients. Although our datasets are quite large (COVID-19 severity GWAS n = 707,407 and 1,206,629 for critical illness (A2) and hospitalization (B2), respectively; UKBB GWAS n = 361,194), larger datasets would likely identify many of these same associations and could potentially bring some of the nominally associated associations to a corrected level of statistical significance.
Using an orthogonal genomics-driven approach that complements previous COVID-19 clinical epidemiology research, we confirm a link between the development of severe COVID-19 illness and both elevated BMI and diabetes. We also clarify associations with current smoking status, observing that it was positively correlated with COVID-19 disease severity, and note new associations with diverticulosis and reticulocyte traits. Additionally, www.nature.com/scientificreports/ we observe a suggestive association between increased disease severity and reduced levels of IGF-1-a marker of nutritional status-and additional suggestive protective associations with magnesium, retinol, and vitamin E levels. COVID-19 is primarily a respiratory illness. We observed that higher forced vital capacity (FVC) was negatively (protectively) associated with COVID-19 disease severity and observed a strongly positive correlation between the genetic architecture of 'shortness of breath while walking on level ground' and development of severe COVID-19 illness. Chest pain and discomfort have previously been associated with COVID-19 hospitalization and the U.S. Centers for Disease Control and Prevention (CDC) announced that individuals with chronic lung diseases including emphysema, chronic bronchitis, COPD, and interstitial lung disease are at high risk for becoming critically ill from SARS-CoV-2 1 . Our study demonstrates a positive correlation between the genetic architecture of these risk factors and COVID-19 disease severity through LDSR analyses. In this study, a differential diagnosis of COPD was strongly positively correlated with COVID-19 hospitalization, regardless of the exclusion of genomic regions related to BMI and smoking behaviors. Since chronic inflammation is an important feature in developing both emphysema and bronchitis, these finding suggest a potential shared genetic contribution between COPD and COVID-19 hospitalization separate from the contributions of known BMI and Table 2. Cross-trait genetic correlations of COVID-19 on inclusion/exclusion of genomic regions associated with BMI and smoking. P-values in bold indicates P ≤ 1.30 × 10 −4 . COVID19_A2, very severe respiratory confirmed covid versus population including whole genomic regions; A2⟂BMI, very severe respiratory confirmed covid versus population with exclusion of genomic regions related to BMI; A2⟂Smoke, very severe respiratory confirmed covid versus population with exclusion of genomic regions related to smoking behaviors; COVID19_B2, hospitalized covid versus population including whole genomic regions; B2⟂BMI, hospitalized covid versus population with exclusion of genomic regions related to BMI; B2⟂Smoke, hospitalized covid versus population with exclusion of genomic regions related to smoking behaviors. Diseases of the musculoskeletal system and connective tissue 0. www.nature.com/scientificreports/ smoking-related variants. Variants located in immune-related genes and contributing to increased pulmonary inflammation could be evaluated in future work. Traits related to smoking behaviors were generally associated with increased COVID-19 disease severity in our analyses, including current smoking, exposure to tobacco smoke either at home or outside home, in utero tobacco smoke exposure, and cumulative pack-years. Conversely, never-smoker status showed negative genomic correlation with COVID-19 disease severity. Although UKBB does not delineate former smokers in ascribing smoking status, our analyses indicate that the genetic determinants of current smoking are associated with increased COVID-19 disease severity and do not support the clinical observations that current smoking may protect against severe COVID-19 illness.
Given the lack of a COVID-19 vaccine during the first year of the pandemic and continued supply scarcity in numerous regions, many studies have sought to identify alternative strategies to minimize risk of developing severe COVID-19 following SARS-CoV-2 infection and also to treat severe COVID-19. In addition to evaluations of existing pharmacologic agents (e.g., ivermectin, hydroxychloroquine, azithromycin, and dexamethasone), vitamin and nutrient supplementation has been widely studied. Global mortality rate differences associated with latitude and clinical observations of low serum 25-hydroxyvitamin D levels among hospitalized COVID-19 patients has perhaps garnered greatest attention 22 , but we did not observe a significant association between genetic determinants of vitamin D levels and COVID-19 severity. However, we observed nominally significant protective effects for less-studied nutrient-related traits, including magnesium, calcium, retinol, and vitamin E. A combined vitamin D/magnesium/vitamin B12 combination was associated with a reduction in the proportion of elderly COVID-19 patients requiring oxygen support and intensive care support in a small prospective cohort 23 , and lower plasma retinol levels have also been observed in hospitalized COVID-19 patients 24 . We did not observe a significant association between serum Vitamin D levels and risk for COVID-19 or severe outcomes. Vitamin E levels have not been widely examined in the context of COVID-19, but deficiency is frequently associated with intestinal malabsorption rather than dietary insufficiency and thus may reinforce the observed genetic correlation between COVID-19 disease severity and diverticular disease in our analyses. In our study, we do observe an association between higher levels. Further, we observe protective associations for both HDL and serum concentration of apolipoprotein A, a major component of the HDL complex involved in clearing fat. HDL is involved in vitamin E absorption and contains approximately 40% of circulating α-tocopherol, the main dietary source of vitamin E 25 .
Integration and harmonization of extant large-scale GWAS datasets has become a popular approach to reveal novel epidemiologic associations. Still, access to individual-level GWAS datasets remains limited, because of data www.nature.com/scientificreports/ use restrictions. The LDSR method does not require individual-level genotype data or LD pruning and can quantify the shared genetic architecture of traits having undergone GWAS analysis. However, LDSR analysis assumes absence of population stratification in the underlying summary statistics used and necessitates incorporation of GWAS data from populations expected to have similar genomic architecture. This assumption restricted our analysis to use of GWAS data from British-ancestry individuals, limiting our ability to make conclusions about the shared genetic architectures among other racial/ethnic groups. Given that COVID-19 disease severity has been associated with racial/ethnic background, as well as socioeconomic status and somatic well-being, it is imperative that efforts be made to enrich future genetic epidemiology studies for participants of non-European descent to expand generalizability of results. Interpretation of our results is also limited by the strong correlation between many of the traits studied with BMI and smoking behaviors. Although we made efforts to limit the impact of BMI and smoking-associated genetic variation by excluding known loci from LDSR analysis, such sensitivity analyses cannot account for polygenic contributions not yet having reached genome-wide statistical significance in prior research.
To estimate cross-trait genetic correlation, we restricted the range of UKBB traits with an arbitrary threshold of h 2 ≥ 1% to improve reliability. For instance, there were additional subtypes of diabetes derived from the various medical records in UKBB and their estimates of SNP-heritability showed h 2 Type 1 Diabetes = 0.3% and h 2 Type 2 Diabetes = 0.4%. Therefore, we did not include results from them. The type of diabetes reported in Table 1, described as "Diabetes diagnosed by doctor" (UKBB Field Identifier:2443), is not specified for the type of diabetes, but given the age of participants and the general prevalence of T2D versus T1D, the association between diabetes and COVID-19 severity is ostensibly driven by the shared genetic architecture between COVID-19 severity and the genetic architecture of T2D. Furthermore, LDSR analysis relies on the common genetic variants with MAF > 1% and therefore it can fail to capture the SNP-heritability on the observed scale due to underlying low-frequency or rare variants 4 . If a polygenic trait in UKBB shows a significant genetic correlation with COVID-19 severity, this does not imply a causal association. Both the tested trait and COVID-19 severity risk may be jointly influenced by an unmodeled trait that is independently associated with each 19 . Although our study relies on associations with common genetic variants (generally, MAF > 1%), the inclusion of additional rare genetic variants might be valuable as it could increase the overall trait heritability being modeled. Inferences from LDSR rely on normality assumptions that may be violated when rare variants are studied, and we therefore restricted analysis to more common variants. Additionally, LDSR does not explicitly model confounding effects which can arise when studying multiple correlated traits. Therefore, the method identifies novel associations that can be further studied using Mendelian Randomization or direct analyses of the nominated trait phenotypes for further confirmation of causal relationships. LDSR is a useful approach for identifying potential novel associations that will warrant further epidemiological analysis to tease apart causal associations from associations that are influenced by confounding.
Our findings support previously identified risk factors for severe COVID-19 illness, including elevated BMI, diabetes, and numerous pulmonary conditions (e.g., COPD, reduced FEV, shortness of breath during mild activity). We also observe protective associations between the genetic underpinnings of COVID-19 severity and that of non-smoking, serum albumin, apolipoprotein A, HDL cholesterol level, and several nutrients. Further studies using Mendelian randomization approaches may help to dissect causal associations between COVID-19 disease severity and these traits, potentially nominating targets for therapeutic intervention.