Introduction

Since the outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in December 2019, the global pandemic of coronavirus disease 2019 (COVID-19) has resulted in more than 205 million confirmed cases and 4.3 million deaths worldwide as of August 13, 20211. Genome-wide association studies (GWAS) have revealed the genetic underpinnings of complex human traits, helping to elucidate the heritability of susceptibility to both chronic diseases and infectious diseases—including COVID-192. Such complex diseases typically have multifactorial etiologies, with contributions from germline genetic variation and environmental exposures3, and they frequently present with related, comorbid medical conditions4.

SARS-CoV-2 is a highly contagious respiratory virus that also impacts additional organ systems. Several comorbid risk factors are known to be associated with developing severe COVID-19 illness5, including hypertension, cardiovascular disease, chronic obstructive pulmonary disease (COPD), high body mass index (BMI), and type 2 diabetes6. Diabetes, associated with immunosuppression and vascular and renal complications, has emerged as a critical comorbidity among severely ill COVID-19 patients7.

Leveraging GWAS data, we conducted cross-trait genetic correlation analyses to examine the sum effect of pleiotropy across all causal loci to determine if we could reveal shared genetic correlations between multiple polygenic traits. In addition, we determined the directionality of these associations, and whether the genetic architecture of two traits are correlated or anti-correlated8. Examining the shared genetic architecture underlying co-occurrence of disease can help elucidate their common genetic etiology4, but there has been no comprehensive evaluation of the shared genetic architecture between COVID-19 disease severity and additional diseases or traits. Previous analyses of lung cancer risk reveal shared genetic architecture with emphysema, a common co-morbid condition, as well as with cigarette consumption, the leading cause of lung cancer4. In the setting of COVID-19 disease severity, such an evaluation has similar potential to identify co-morbid conditions and to uncover traits that are causally involved in worsening clinical course.

To identify polygenic traits sharing underlying genetic etiology with COVID-19 disease severity, we utilized summary statistics from prior GWAS and conducted cross-trait linkage disequilibrium (LD) score regression (LDSR) analyses9,10. Results provide fundamental knowledge on traits and conditions that share genetic underpinnings with COVID-19 disease severity, reveal potential risk factors for developing severe COVID-19 disease subsequent to SARS-CoV-2 infection, and implicate several modifiable factors that merit further study and may ultimately help improve patient outcomes.

Methods

GWAS summary statistics for COVID-19 critical illness and hospitalization

We downloaded the GWAS summary statistics (COVID19-hg GWAS meta-analyses round 5, released on January 18, 2021; https://www.covid19hg.org/results/r5/) from the COVID-19 Host Genetics Initiative (COVID-19 HGI)11,12,13, comprising (1) A2 (critical illness)13: 4,606 very severe, respiratory-confirmed COVID-19 patients versus 702,801 population-based controls (A2_ALL_eur_leave_23andme) and (2) B2 (hospitalization)13: 9,373 hospitalized COVID-19 patients versus 1,197,256 population-based controls (B2_ALL_eur_leave_23andme) (Table 1 and Supplementary Tables 1 and 2). While the summary statistics of COVID-19-hg GWAS meta-analyses across multiple populations have been deposited at the COVID-19 HGI11,13, we restricted analyses to European-ancestry subjects (to align with the ancestral background of participants in GWAS of traits used in our downstream LDSR analyses) and did not include the 23andMe cohort (due to the data-use constraint, which makes only the top 10,000 SNPs publicly available)14. All methods were performed in accordance with the relevant guidelines and regulations.

Table 1 Study description.

GWAS summary statistics for additional traits

To estimate cross-trait genetic correlation patterns between COVID-19 disease severity and multiple polygenic traits, we harmonized publicly available GWAS summary-level data from the UK Biobank (UKBB), a prospective population-based cohort study consisting of ~ 500,000 individuals, aged 40–69 years, who were recruited in the United Kingdom between 2006 and 201015,16. All methods were carried out in accordance with relevant guidelines and regulations.

GWAS summary-level data used for the LDSR analyses of UKBB traits are from publicly-posted results generated by the Neale lab (http://www.nealelab.is/uk-biobank/). These association analyses are adjusted with the first 20 principal components, which adjust for sources of population level variability in genetic allele frequencies. The GWAS summary-level data of UKBB used in our study are restricted to “British ancestry” using the first 6 principal components to determine “British ancestry” and further filtered by self-reported ethnicity with “white-British”, “Irish”, or “White”. The sample sizes and more details for the tested traits are shown in Supplementary Table 2.

Estimating SNP-heritability and cross-trait genetic correlation of COVID-19

LD score regression analysis with 1000 Genomes Project European (EUR) samples as a reference for pattern of genome-wide LD quantifies the co-heritability of diverse traits4,9,10,17 using GWAS summary statistics for common genetic variants (i.e., SNPs). In brief, LDSR method regresses χ2 statistics from GWAS on LD scores, allowing the estimation of genetic correlation without bias due to population stratification or cryptic relatedness4,9,10,18,19. By regressing SNP-level associations for two traits, (i.e., the product of Z scores, ZCOVID19_A2 × ZUKBB_BMI) and weighting each SNP by its LD Score (an estimate of the amount of total genetic variation tagged by each variant), one can estimate the magnitude and direction of shared genomic architecture between these traits. To control the multiple testing burden, we restricted analyses to the tested UKBB traits showing heritability ≥ 1% and for which prior studies have suggested correlations between COVID-19 and risk for severe outcomes, or traits that were correlated with traits that have been associated with severe outcomes. We conservatively set the test-wise level of significance after Bonferroni correction to be 0.05/(6 × 64), adjusting for analysis of COVID-19 severity (A2 and B2) with 64 UKBB traits, with and without removal of BMI and Smoking SNPs. We first implemented the command option of LD Score (https://github.com/bulik/ldsc; ldsc v1.0.1) with “munge_sumstats.py” to generate the “.sumstats” format from the GWAS summary statistics after ~ 1.14 M HapMap3 SNPs with MAF > 1% were selected for the analysis as recommended. Multi-allelic SNPs and the major histocompatibility complex (MHC) region (Chr6:25–34 Mb) were excluded from summary statistics because of the complex and unusual LD pattern and genetic architecture of the MHC region4. We then applied “ldsc.py -rg covid19.A2.sumstats.gz, trait1.sumstats.gz-ref-ld-chr eur_w_ld_chr/-w-ld-chr eur_w_ld_chr/-out covid19.A2_triat1”.

Exclusion of genomic regions related to smoking behavior and BMI

Although a clearer picture is emerging, the contribution of cigarette smoking to COVID-19 disease severity remains incompletely understood, with most studies suggesting increased disease severity among former smokers versus never-smokers, but some studies observing a protective effect for current smoking20 and others showing an increased risk for more severe symptoms in smokers21. Since smoking behaviors are heritable traits that correlate with many other complex diseases, we performed sensitivity analyses by excluding chromosomal regions (± 500 kb) around 473 SNPs previously associated with various smoking behaviors (Smoke) to attenuate the genetic contribution of smoking-related variants4. The removed genomic regions related to cigarettes per day, smoking initiation, smoking cessation, initiation age of regular smoking, and nicotine dependence (Supplementary Tables 3 and 4).

Although obesity increases risk of systemic inflammation, pulmonary clots, stroke, and myocardial infarction, it remains unclear whether reported associations between BMI and COVID-19 disease severity are confounded by socioeconomic status or concurrent health issues. We performed sensitivity analyses by excluding genomic regions (± 500 kb) around 941 SNPs previously associated with BMI (BMI) to attenuate the genetic contribution of BMI-related variants (Supplementary Tables 4 and 5).

Results

We implemented cross-trait LDSR analysis to examine shared genetic contributions to COVID-19 disease severity and multiple clinical and epidemiologic traits using pairwise genetic correlations (rg) and the observed-scale heritability (h2, representing the proportion of phenotypic variance explained by all common SNPs). The flow chart presented in Fig. 1 summarizes the steps from data preparation to LDSR analysis for COVID-19 severity versus 64 polygenic traits we studied. A prior GWAS analysis13 of very severe, respiratory-confirmed COVID-19 (phenotype A2: critical illness; 4606 cases, 702,801 controls in only European descent) identified loci on chromosomes 3, 12, 17, 19, and 21 that reached genome-wide statistical significance (P < 5.0 × 10−8 shown in the red horizontal line), with a genomic inflation factor of 1.047, and an estimated h2 of 0.35%. Sensitivity analysis excluding chromosomal regions known to be associated with smoking reduced the genomic inflation factor to 1.041 and h2 to 0.34%. Sensitivity analysis excluding chromosomal regions known to be associated with BMI increased the genomic inflation factor to 1.050 and h2 to 0.35% (Fig. 2, Supplementary Table 6).

Figure 1
figure 1

Flow chart of the analytical workflow in the study.

Figure 2
figure 2

Manhattan plots of the COVID-19 GWAS meta-analysis for pre- and post-removal of genomic regions associated with smoking behaviors and BMI in European descnt population. A2: very severe respiratory confirmed COVID-19 cases versus population: 4606 cases and 702,801 controls, B2: hospitalized COVID-19 cases versus population: 9373 cases and 1,197,256 controls.

A prior GWAS analysis of hospitalized COVID-19 (phenotype B2: hospitalization; 9373 cases, 1,197,256 controls in only European-descent) identified loci on chromosomes 3, 12, 19, and 21 at genome-wide statistical significance, with a genomic inflation factor of 1.041, and an estimated h2 of 0.19% (Fig. 2). Sensitivity analysis excluding chromosomal regions known to be associated with smoking reduced the genomic inflation factor to 1.038 and h2 to 0.19%. Sensitivity analysis excluding chromosomal regions known to be associated with BMI reduced the genomic inflation factor to 1.035 and h2 to 0.17% (Fig. 2, Supplementary Table 6).

Using these GWAS results, we next performed LDSR analyses with two phenotypes for COVID-19 severity (COVID-19 A2, and COVID-19 B2) considering four phenotypes for exclusions of genomic regions related to BMI and Smoking (COVID19_A2BMI, COVID19_A2Smoke, COVID19_B2BMI, and COVID19_B2Smoke) and 64 UKBB polygenic traits that had SNP array-based heritability (h2) ≥ 1% (to maximize study power and to provide reliable inferences). Twenty-three diverse traits showed moderate to strong co-heritability with COVID-19 disease severity (Table 2, Supplementary Table 7), including several at Bonferroni-corrected significance level (P < 1.30 × 10−4). Very severe, respiratory-confirmed COVID-19 illness (A2) and COVID-19 hospitalization (B2) showed strong genomic correlation with traits related to adiposity, diabetes, digestive diseases, smoking behaviors, hematologic traits, and selected nutrient levels (Fig. 3, Supplementary Table 7).

Table 2 Cross-trait genetic correlations of COVID-19 on inclusion/exclusion of genomic regions associated with BMI and smoking.
Figure 3
figure 3

The pairwise genetic correlation of COVID-19 disease severity and selected traits.

Among physical traits, the genetic architecture of COVID-19 disease severity was positively correlated with BMI (rgCOVID19_A2 = 0.20, PCOVID19_A2 = 1.51 × 10−5; rgCOVID19_B2 = 0.34, PCOVID19_B2 = 1.99 × 10−8), weight (rgCOVID19_A2 = 0.17, PCOVID19_A2 = 1.24 × 10−4; rgCOVID19_B2 = 0.27, PCOVID19_B2 = 7.23 × 10−7), and whole body fat mass (rgCOVID19_A2 = 0.20, PCOVID19_A2 = 7.81 × 10−6; rgCOVID19_B2 = 0.33, PCOVID19_B2 = 2.24 × 10−8). After excluding genomic regions previously associated with BMI, both BMI and whole body fat mass continued to show strongly significant positive correlation with COVID-19 disease severity (rgCOVID19_A2BMI = 0.17 and PCOVID19_A2BMI = 2.37 × 10−3; rgCOVID19_B2BMI = 0.28 and PCOVID19_B2BMI = 1.82 × 10−5).

Among medical conditions, the genetic architecture of COVID-19 disease severity was positively correlated with shortness of breath walking on level ground (rgCOVID19_A2 = 0.28, PCOVID19_A2 = 2.87 × 10−3; rgCOVID19_B2 = 0.43, PCOVID19_B2 = 4.56 × 10−5), diabetes (rgCOVID19_A2 = 0.54, PCOVID19_A2 = 7.10 × 10−4; rgCOVID19_B2 = 0.31, PCOVID19_B2 = 3.39 × 10−5), diverticulosis (rgCOVID19_A2 = 0.30, PCOVID19_A2 = 4.31 × 10−4; rgCOVID19_B2 = 0.38, PCOVID19_B2 = 1.66 × 10−5), diseases of the digestive system (rgCOVID19_A2 = 0.20, PCOVID19_A2 = 3.48 × 10−3; rgCOVID19_B2 = 0.45, PCOVID19_B2 = 1.86 × 10−7), and diseases of the musculoskeletal system and connective tissue (rgCOVID19_A2 = 0.24, PCOVID19_A2 = 4.84 × 10−4; rgCOVID19_B2 = 0.34, PCOVID19_B2 = 3.54 × 10−6). Excluding genomic regions associated with smoking behaviors or BMI generally attenuated these correlations, although most remained nominally associated at P < 0.05 and the association between COVID-19 hospitalization and diseases of the digestive system remained significant at Bonferroni-corrected levels after exclusion of smoking-associated loci (rgCOVID19_B2Smoke = 0.38, PCOVID19_B2Smoke = 2.30 × 10−5).

Among smoking behaviors, current tobacco smoking (rgCOVID19_B2 = 0.34, PCOVID19_B2 = 2.01 × 10−6) and exposure to tobacco smoke at home (rgCOVID19_B2 = 0.47, PCOVID19_B2 = 1.73 × 10−5) presented strongly significant positive genetic correlation with COVID-19 hospitalization, which were only modestly attenuated when removing known smoking-associated loci from analysis. Current tobacco smoking was more modestly associated with severe, respiratory-confirmed COVID-19 illness COVID (rgCOVID19_A2 = 0.13, PCOVID19_B2 = 0.021) and this association became non-significant after removing smoking-associated loci from analysis (Table 2), supporting a link between known smoking risk loci and risk for severe COVID-19 outcomes.

Examining hematologic traits, both high light scatter reticulocyte percentage and count were significantly positively correlated with COVID-19 hospitalization, as was immature reticulocyte fraction. These traits were also positively correlated with severe COVID-19 illness, but not at Bonferroni-corrected levels of statistical significance. C reactive protein levels were also positively correlated with COVID-19 disease severity. Interestingly, serum (not urinary) albumin was negatively correlated with COVID-19 disease severity at nominal statistical significance (rgCOVID19_A2 = − 0.12, PCOVID19_A2 = 0.026; rgCOVID19_B2 = − 0.16, PCOVID19_B2 = 0.011), as were HDL, apolipoprotein A levels, and levels of serum IGF-1 (Table 2).

We also examined the pairwise genetic relationship between COVID-19 disease severity and nutrient-related traits in UKB. Although we were not able to observe any significant associations between COVID-19 critical illness and hospitalization and nutrient-related traits at Bonferroni-corrected levels, we identified suggestive negative correlations with magnesium (rgCOVID19_A2 = − 0.39, PCOVID19_A2 = 2.28 × 10−3; rgCOVID19_B2 = − 0.36, PCOVID19_B2 = 5.17 × 10−3), retinol (rgCOVID19_A2 = − 0.59, PCOVID19_A2 = 0.041; rgCOVID19_B2 = − 0.59, PCOVID19_B2 = 0.029), and vitamin E (rgCOVID19_A2 = − 0.53, PCOVID19_A2 = 2.16 × 10−3; rgCOVID19_B2 = − 0.53, PCOVID19_B2 = 3.10 × 10−3) (Table 2 and Supplementary Table 7). Vitamin D levels were not associated with risk for severe COVID-19 (rgCOVID19_A2 = − 0.023, PCOVID19_A2 = 0.67; rgCOVID19_B2 = − 0.043, PCOVID19_B2 = 0.44).

Discussion

We investigated the genetic correlations between COVID-19 disease severity (A2:critical illness and B2:hospitalization) with a variety of clinical and physiologic traits using summary-level GWAS data from extremely large patient cohorts, observing shared genomic architecture with a number of illnesses and biomarkers of somatic well-being. We identify a suite of medical conditions and physiological traits that appear to share the genetic architecture with that of COVID-19 severity. Many of these traits overlap those previously identified in the large databases of COVID-19 patient outcomes, including traits related to adiposity, kidney function, and pulmonary insufficiency. We also identified additional traits that have received comparatively little attention, such as blood and serum levels of several vitamins and nutrients. Although our datasets are quite large (COVID-19 severity GWAS n = 707,407 and 1,206,629 for critical illness (A2) and hospitalization (B2), respectively; UKBB GWAS n = 361,194), larger datasets would likely identify many of these same associations and could potentially bring some of the nominally associated associations to a corrected level of statistical significance.

Using an orthogonal genomics-driven approach that complements previous COVID-19 clinical epidemiology research, we confirm a link between the development of severe COVID-19 illness and both elevated BMI and diabetes. We also clarify associations with current smoking status, observing that it was positively correlated with COVID-19 disease severity, and note new associations with diverticulosis and reticulocyte traits. Additionally, we observe a suggestive association between increased disease severity and reduced levels of IGF-1—a marker of nutritional status—and additional suggestive protective associations with magnesium, retinol, and vitamin E levels.

COVID-19 is primarily a respiratory illness. We observed that higher forced vital capacity (FVC) was negatively (protectively) associated with COVID-19 disease severity and observed a strongly positive correlation between the genetic architecture of ‘shortness of breath while walking on level ground’ and development of severe COVID-19 illness. Chest pain and discomfort have previously been associated with COVID-19 hospitalization and the U.S. Centers for Disease Control and Prevention (CDC) announced that individuals with chronic lung diseases including emphysema, chronic bronchitis, COPD, and interstitial lung disease are at high risk for becoming critically ill from SARS-CoV-21. Our study demonstrates a positive correlation between the genetic architecture of these risk factors and COVID-19 disease severity through LDSR analyses. In this study, a differential diagnosis of COPD was strongly positively correlated with COVID-19 hospitalization, regardless of the exclusion of genomic regions related to BMI and smoking behaviors. Since chronic inflammation is an important feature in developing both emphysema and bronchitis, these finding suggest a potential shared genetic contribution between COPD and COVID-19 hospitalization separate from the contributions of known BMI and smoking-related variants. Variants located in immune-related genes and contributing to increased pulmonary inflammation could be evaluated in future work.

Traits related to smoking behaviors were generally associated with increased COVID-19 disease severity in our analyses, including current smoking, exposure to tobacco smoke either at home or outside home, in utero tobacco smoke exposure, and cumulative pack-years. Conversely, never-smoker status showed negative genomic correlation with COVID-19 disease severity. Although UKBB does not delineate former smokers in ascribing smoking status, our analyses indicate that the genetic determinants of current smoking are associated with increased COVID-19 disease severity and do not support the clinical observations that current smoking may protect against severe COVID-19 illness.

Given the lack of a COVID-19 vaccine during the first year of the pandemic and continued supply scarcity in numerous regions, many studies have sought to identify alternative strategies to minimize risk of developing severe COVID-19 following SARS-CoV-2 infection and also to treat severe COVID-19. In addition to evaluations of existing pharmacologic agents (e.g., ivermectin, hydroxychloroquine, azithromycin, and dexamethasone), vitamin and nutrient supplementation has been widely studied. Global mortality rate differences associated with latitude and clinical observations of low serum 25-hydroxyvitamin D levels among hospitalized COVID-19 patients has perhaps garnered greatest attention22, but we did not observe a significant association between genetic determinants of vitamin D levels and COVID-19 severity. However, we observed nominally significant protective effects for less-studied nutrient-related traits, including magnesium, calcium, retinol, and vitamin E. A combined vitamin D/magnesium/vitamin B12 combination was associated with a reduction in the proportion of elderly COVID-19 patients requiring oxygen support and intensive care support in a small prospective cohort23, and lower plasma retinol levels have also been observed in hospitalized COVID-19 patients24. We did not observe a significant association between serum Vitamin D levels and risk for COVID-19 or severe outcomes. Vitamin E levels have not been widely examined in the context of COVID-19, but deficiency is frequently associated with intestinal malabsorption rather than dietary insufficiency and thus may reinforce the observed genetic correlation between COVID-19 disease severity and diverticular disease in our analyses. In our study, we do observe an association between higher levels. Further, we observe protective associations for both HDL and serum concentration of apolipoprotein A, a major component of the HDL complex involved in clearing fat. HDL is involved in vitamin E absorption and contains approximately 40% of circulating α-tocopherol, the main dietary source of vitamin E25.

Integration and harmonization of extant large-scale GWAS datasets has become a popular approach to reveal novel epidemiologic associations. Still, access to individual-level GWAS datasets remains limited, because of data use restrictions. The LDSR method does not require individual-level genotype data or LD pruning and can quantify the shared genetic architecture of traits having undergone GWAS analysis. However, LDSR analysis assumes absence of population stratification in the underlying summary statistics used and necessitates incorporation of GWAS data from populations expected to have similar genomic architecture. This assumption restricted our analysis to use of GWAS data from British-ancestry individuals, limiting our ability to make conclusions about the shared genetic architectures among other racial/ethnic groups. Given that COVID-19 disease severity has been associated with racial/ethnic background, as well as socioeconomic status and somatic well-being, it is imperative that efforts be made to enrich future genetic epidemiology studies for participants of non-European descent to expand generalizability of results. Interpretation of our results is also limited by the strong correlation between many of the traits studied with BMI and smoking behaviors. Although we made efforts to limit the impact of BMI and smoking-associated genetic variation by excluding known loci from LDSR analysis, such sensitivity analyses cannot account for polygenic contributions not yet having reached genome-wide statistical significance in prior research.

To estimate cross-trait genetic correlation, we restricted the range of UKBB traits with an arbitrary threshold of h2 ≥ 1% to improve reliability. For instance, there were additional subtypes of diabetes derived from the various medical records in UKBB and their estimates of SNP-heritability showed h2Type 1 Diabetes = 0.3% and h2Type 2 Diabetes = 0.4%. Therefore, we did not include results from them. The type of diabetes reported in Table 1, described as “Diabetes diagnosed by doctor” (UKBB Field Identifier:2443), is not specified for the type of diabetes, but given the age of participants and the general prevalence of T2D versus T1D, the association between diabetes and COVID-19 severity is ostensibly driven by the shared genetic architecture between COVID-19 severity and the genetic architecture of T2D. Furthermore, LDSR analysis relies on the common genetic variants with MAF > 1% and therefore it can fail to capture the SNP-heritability on the observed scale due to underlying low-frequency or rare variants4. If a polygenic trait in UKBB shows a significant genetic correlation with COVID-19 severity, this does not imply a causal association. Both the tested trait and COVID-19 severity risk may be jointly influenced by an unmodeled trait that is independently associated with each19. Although our study relies on associations with common genetic variants (generally, MAF > 1%), the inclusion of additional rare genetic variants might be valuable as it could increase the overall trait heritability being modeled. Inferences from LDSR rely on normality assumptions that may be violated when rare variants are studied, and we therefore restricted analysis to more common variants. Additionally, LDSR does not explicitly model confounding effects which can arise when studying multiple correlated traits. Therefore, the method identifies novel associations that can be further studied using Mendelian Randomization or direct analyses of the nominated trait phenotypes for further confirmation of causal relationships. LDSR is a useful approach for identifying potential novel associations that will warrant further epidemiological analysis to tease apart causal associations from associations that are influenced by confounding.

Our findings support previously identified risk factors for severe COVID-19 illness, including elevated BMI, diabetes, and numerous pulmonary conditions (e.g., COPD, reduced FEV, shortness of breath during mild activity). We also observe protective associations between the genetic underpinnings of COVID-19 severity and that of non-smoking, serum albumin, apolipoprotein A, HDL cholesterol level, and several nutrients. Further studies using Mendelian randomization approaches may help to dissect causal associations between COVID-19 disease severity and these traits, potentially nominating targets for therapeutic intervention.