Prioritizing Causal Risk Factors for Severe COVID-19: An Exhaustive Mendelian Randomization Study

Identifying causal risk factors for severe coronavirus disease 2019 (COVID-19) is critical for its prevention and treatment. Many associated pre-existing conditions and biomarkers have been reported, but these observational associations suffer from confounding and reverse causation. Here, we perform a large-scale two-sample Mendelian randomization (MR) analysis to evaluate the causal roles of many traits in severe COVID-19. Our results highlight multiple body mass index (BMI)-related traits as risk-increasing: BMI (OR:1.89, 95% CI:1.51–2.37), hip circumference (OR:1.46, 1.15–1.85), and waist circumference (OR:1.82, 1.36–2.43). Our multivariable MR analysis further shows that the BMI-related effect is driven by fat mass (OR:1.63, 1.03–2.58), but not fat-free mass (OR:1.00, 0.61–1.66). Several white blood cell counts are negatively associated with severe COVID-19, including those of neutrophils (OR:0.76, 0.61– 0.94), granulocytes (OR:0.75, 0.601–0.93), and myeloid white blood cells (OR:0.77, 0.62–0.96). Furthermore, some circulating proteins are associated with an increased risk of (e.g., zinc-alpha-2-glycoprotein) or protection from severe COVID-19 (e.g., interleukin-3/6 receptor subunit alpha). Our study shows that fat mass and white blood cells underlie the etiology of severe COVID-19. It also identies risk and protective factors that could serve as drug targets and guide the effective protection of high-risk individuals.


Introduction
The coronavirus disease 2019 (COVID-19) is a global pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) 1 . As of mid-January 2021, 92 million con rmed cases and two million deaths from COVID-19 have been reported worldwide 2 . Despite substantial public health and medical efforts, COVID-19 continues to cause irreversible damage and death [3][4][5] . It is essential to identify risk factors and potential drugs for COVID-19 in order to improve primary prevention and to develop treatment strategies.
Many observational studies have reported that age, gender, ethnicity, and pre-existing conditions, such as cardiovascular disease, diabetes, chronic respiratory disease, hypertension, and cancers, are associated with increased COVID-19 susceptibility and severity [5][6][7][8] . Moreover, these retrospective observational studies have noted that hospitalized COVID-19 patients, especially those with severe respiratory or systemic conditions, are at increased risks of atrial brillation, non-sustained ventricular tachycardia, acute kidney injury, neurologic disorders, and thrombotic complications [9][10][11][12] . Vitamin D de ciency, higher body mass index (BMI), and obesity have been associated with an increased risk of COVID-19 13,14 . Some lifestyle factors were also identi ed as risk-increasing, such as smoking, alcohol consumption, and lack of physical activity 15 . However, it is di cult to infer causal effects from observational studies because they are susceptible to confounding and reverse causation, while data from randomized controlled trials are scarce and inconclusive.
Mendelian randomization (MR) study provides a promising opportunity to validate and prioritize putative risk factors and drug targets. MR studies use randomly allocated genetic variants related to the exposure as instrumental variables for investigating the effect of the exposure on an outcome 16 . It is expected to be independent of confounding factors and has been demonstrated as an e cient and cost-effective strategy to identify causal effects 17 . Recent MR studies have provided evidence of causality for a range of risk factors on COVID-19 (Supplementary Table 1). For instance, BMI and smoking are associated with an increased COVID-19 risk, while no evidence of causal effects was found for circulating 25-hydroxyvitamin D (25OHD) levels [18][19][20] . However, inconsistent results were also reported for some factors, such as Alzheimer's disease, blood lipids, and physical activity 18,19,[21][22][23][24][25][26] . Some of these inconsistencies are likely due to the usage of early genome-wide association studies (GWAS) of COVID-19, which have small sample sizes. Moreover, most studies are limited to a small number of candidate factors, leaving many more to be tested and identi ed.
In this study, we conducted an unbiased and exhaustive MR analysis to examine the causal effects of an extensive list of exposures on severe COVID-19. All traits with existing GWAS, as compiled by the Integrative Epidemiology Unit (IEU) Open GWAS project, were included 27 . Based on these GWAS, independent genetic variants associated with a trait at the genome-wide signi cance were selected as instrumental variables for the trait. The associations between genetic instruments and the risk of severe COVID-19 were evaluated based on three GWAS of COVID-19. The COVID-19 Host Genetics Initiative (HGI) study A2 was used in our discovery analysis. HGI A2 compared COVID-19 patients with con rmed severe respiratory symptoms to population controls 28 . The HGI B2 study, comparing hospitalized COVID-19 patients to population controls, was used as one of our two replication datasets. The other replication dataset, labeled as the NEJM study, was drawn from the rst published GWAS study of COVID-19 comparing patients with respiratory failure to healthy controls from Italy and Spain 29 . Multiple sensitivity analyses were performed to detect and correct for the presence of pleiotropy. Here, we only report associations that do not have evidence of pleiotropy in genetic instruments and are replicated in at least one of the two replication analyses. As an in-depth investigation into the BMI-related traits, we further conducted a multivariable MR analysis to disentangle the effects of fat mass and fat-free mass. Our ndings provide profound insights into the etiology of severe COVID-19 and prioritize candidate causal risk factors for public health intervention and for drug discovery.

Study overview
The work ow of our study is summarized in Figure 1. Starting with the 34,519 GWAS compiled by the IEU Open GWAS project, we focused on the 14,422 that are based on European-descent samples, in order to match the major ancestry in the GWAS of COVID-19 and to avoid false positives as results of population discrepancy in genetic effects (Supplementary Tables 2 and 3). Genetic instruments were selected from each GWAS as independent genetic variants at the genome-wide signi cance, and their effect alleles were harmonized with the outcome GWAS. Three or more genetic instruments are required for statistical tests of pleiotropic effects, and thus exposures with fewer instruments were excluded. For the univariable MR analysis of each exposure-outcome pair, we rst applied the inverse variance-weighted (IVW) method with a multiplicative random-effects model 30 . We then evaluated the possible presence of pleiotropic effects with Cochran's Q test of heterogeneity and the MR-Egger intercept test for directional pleiotropy [31][32][33] . We excluded all exposures with indications of pleiotropy in their genetic instruments to full ll the key assumptions underlying MR analysis. We retained 6,442 GWAS for the discovery analysis with the HGI A2 study (Supplementary Table 4), 6,407 GWAS for the replication analysis with the HGI B2 study (Supplementary Table 5), and 6,248 GWAS for the replication analysis with the NEJM study (Supplementary Table 6). The false discovery rate (FDR) approach was utilized to correct for multiple testing of many exposures (Supplementary Tables 7-9). Based on these three sets of analysis, we de ned two sets of results: 1) the signi cant and replicated results, which have a q-value < 0.05 in the discovery analysis and a nominal p-value < 0.05 in either one of the replication studies (Supplementary Table 10); and 2) the suggestive and replicated results, which have a nominal p-value < 0.05 in the discovery analysis and a nominal p-value < 0.05 in either one of the replication studies (Supplementary

BMI-related traits
In the univariable MR study, eight BMI-related traits are positively associated with severe COVID-19 in our discovery analysis and also in both of our replication analyses (Table 1). Genetically predicted one standard deviation (SD) increase of BMI is associated with a higher risk of severe COVID-19 (OR: 1.89, 95% CI: 1.51-2.37, p = 1.78 × 10 −6 ). Consistent with BMI, genetically instrumented higher hip circumference (OR: 1.46, 95% CI: 1.15-1.85, p = 0.0017) and waist circumference (OR: 1.82, 95% CI: 1.36-2.43, p = 6.20 × 10 −5 ) are associated with a higher risk of severe COVID-19. The univariable MR study also provided strong evidence that weight and fat mass in the left arm, right arm, left leg, right leg, trunk, and whole body are positively associated with severe COVID-19 (Supplementary Tables 10 and 11).
To pinpoint the different aspects of BMI-related traits, we investigated the roles of fat mass and fat-free mass indices in severe COVID-19 (Supplementary Table 13). In the multivariable MR analysis controlling for fat-free mass, there is strong evidence for direct causal effects of fat mass measured at different body parts, including the whole body, left and right arms, left and right legs, and the trunk. The evidence is consistent across the three GWAS of COVID-19 ( Fig. 2, Supplementary Table 13). On the other hand, there is no evidence for direct causal effects of fat-free mass (Fig. 3, Supplementary Table 13). The multivariable MR analysis results indicate that the causal effects of BMI-related traits on severe COVID-19 are mainly driven by fat mass.

Circulating proteins
Our univariable MR analyses revealed evidence of causal effects for some circulating proteins. There are six proteins whose effects on severe COVID-19 are signi cant in the discovery MR analysis (q-value < 0.05) and also replicated in both replication analyses (p-value < 0.05) ( Table 1, Supplementary Table 12 (Supplementary Table 11). Overall, our MR analyses prioritized scores of circulating proteins that are likely causal in the development of severe COVID-19.

Discussion
This exhaustive MR study examined an extensive list of risk factors and prioritized those that are likely to play causal roles in the development of severe COVID-19. It leveraged GWAS of COVID-19 of the largest sample size, and the ndings were replicated with one, and for some associations, two additional COVID-19 GWAS. Using univariable MR, we rst con rmed that BMI-related traits are causal risk factors for severe COVID-19. Our multivariable MR results further suggested that the effects of BMI-related traits are driven by fat mass but not fat-free mass. Moreover, our ndings indicate that white blood cell traits, particularly neutrophils, are inversely associated with the severe COVID-19 risk. We also highlighted scores of circulating proteins that could potentially serve as drug targets.
Our main nding that higher BMI-related traits increase the risk of severe COVID-19 is consistent with several recent MR studies 18,19,23,24 . Furthermore, our multivariable MR analysis further showed that fat mass is a causal risk factor for severe COVID-19, while fat-free mass is not. These results indicate that the causal effect of BMI on severe COVID-19 is likely driven by fat mass. The causal effects of BMI and fat mass have plausible biological mechanisms. Fat mass has been known to have deleterious effects on lung function, in ammation, and immunity [34][35][36] . In adipose tissue, high production of circulating proin ammatory cytokines and adipokines may intensify virally induced in ammation and immune dysregulation, and contribute to acute respiratory distress syndrome, which is the leading cause of mortality from COVID-19 [37][38][39][40] . Notably, other causal risk factors of severe COVID-19 identi ed in this study are also related to adiposity, including glucosamine, resistin, IL6Ra, prostate-associated microseminoprotein, and zinc-alpha-2-glycoprotein [41][42][43][44][45] . These connections suggest a shared mechanism for their contribution to severe COVID-19. Therefore, further mechanistic understanding of fat mass and other related risk factors will shed light on the etiology of severe COVID-19 and provide multiple targets of intervention for prevention and treatment.
Our study indicates that white blood cell traits, especially neutrophils, reduce the risk of severe COVID-19. In addition to direct evidence from neutrophils, sum basophil neutrophil counts and sum neutrophil eosinophil counts are directly related to neutrophils, and concordant causal effects were obtained using multiple MR methods. We also identi ed myeloid white blood cell counts and granulocyte counts as being inversely associated with the risk of severe COVID-19, which is consistent with our previous MR ndings 46 . In contrast to the negative associations in this MR study, previous observational studies have provided strong evidence that elevated white blood cells and neutrophils but depleted lymphocytes are common in COVID-19 patients [47][48][49][50] . This discrepancy highlights the possibility that observed associations are due to confounding and reverse causation. The causal role of neutrophils in preventing the development of severe COVID-19 has biological support. Neutrophils, the integral components of the innate immune system, are the rst line of defense against invading pathogens 51 . Moreover, neutrophils participate in elaborate cell signaling networks involving cytokines, chemokines, survival, and growth factors that cause downstream pro-in ammatory effects 52 . On the other hand, neutrophils are involved in the hyperin ammatory responses (e.g., overproduction of neutrophil extracellular traps and cytokine storm) in severe COVID-19 patients. This re ects the reverse causal effect of COVID-19 on neutrophils 53,54 . Overall, our present results support the causal effects of white blood cells, especially neutrophils, on severe COVID-19, likely through an enhanced immune response that suppresses virus infection in the early stage.
To identify potential drug targets, we found that six immune-related proteins are inversely associated with the risk of severe COVID-19. Interleukin-6 receptor subunit alpha is associated with a decreased risk of severe COVID-19, which is consistent with a recent MR nding 55 . Signaling of the interleukin-6 receptor subunit alpha in uences many in ammatory molecules and tissue regeneration 56 . Interleukin-3 receptor subunit alpha plays important functions in hemopoietic, vascular, and immune systems 57 . Prostateassociated microseminoprotein may in uence in ammation and cancer development 58 . C-C motif chemokine 23 is a chemotactic agent, which probably plays an important role in in ammation and atherosclerosis 59 . Collectin-10 can act as a cellular chemoattractant in vitro, probably involved in the regulation of cell migration 60 . Reticulon-4 receptor in uences the central nervous system and protects motoneurons against apoptosis 61 . The identi cation of these immune-related circulating proteins highlights the critical role of immune responses in the development of severe COVID-19.
Our study identi ed another six circulating proteins to be positively associated with an increased risk of severe COVID-19. Most of them are glycoproteins. The effect of zinc-alpha-2-glycoprotein might be mediated by the depletion of fatty acids from adipose tissues 62 . C1GALT1-speci c chaperone 1 might abolish a glycosyltransferase function and disrupt the O-glycan Core 1 synthesis 63,64 . Corneodesmosin is a glycoprotein expressed in the epidermis and the inner root sheath of hair follicles 65 . Inter-alpha-trypsin inhibitor heavy chain H1 is involved in cell adhesion and leukocyte migration in in ammation sites 66 . The alpha-2-macroglobulin receptor-associated protein is responsible for the role of exotoxin A in pseudomonas disease and immunity 67 . Resistin is known as a hormone that potentially links obesity to diabetes through resisting insulin action 45 . More in-depth mechanistic work is needed to better understand the physiological and biological processes through which these druggable proteins contribute to COVID-19 severity. While our study identi ed scores of circulating proteins, we cannot rule out the possibility that there are more COVID-19-related proteins. The number of genetic instruments is often limited for circulating proteins, precluding many of them from being analyzed. Our ndings of circulating proteins not only suggest possible etiological processes but also provide potential druggable targets.
Our study has many strengths. One strength as an MR study is the ability to assess causal effects, avoiding bias from reverse causation and residual confounding. A major feature and strength of our study is an unbiased and exhaustive approach to screen an extensive list of risk factors. To address the issue of multiple testing, we used FDR corrections in the discovery analysis. To ensure robustness and reduce false positives, we only reported results that were replicated in at least one replication analysis. Another strength is that we applied multivariable MR analyses to evaluate the independent causal effects of fat mass and fat-free mass.
Our study also has several weaknesses. Although we applied multiple sensitivity analyses, including the heterogeneity test, MR-Egger, and WM method, we could not fully rule out the possibility that some genetic variants might be pleiotropic. Another limitation is that some GWAS of exposures and the HGI GWAS of COVID-19 have overlapped samples, especially those from the UK Biobank. To mitigate this issue, we also utilized another GWAS of COVID-19, the NEJM study, which does not have overlapping samples. A further weakness is that the statistical power for some analyses was limited, and some null results might be false negatives. Further positive ndings may be revealed if more GWAS with larger sample sizes become available. As an effort to reduce complications from population strati cation, our study focuses on European ancestry, and thus the ndings may not be generalizable to other ethnicities. Lastly, further research is required to decipher the biological pathways underpinning the observed associations with severe COVID-19.
In conclusion, the present study provides evidence that the causal association between BMI-related traits and severe COVID-19 is driven by fat mass, but not by fat-free mass. Our ndings suggest that neutrophils, granulocytes, and myeloid white blood cells are inversely associated with the severe COVID-19 risk. Our study also identi es putatively causal associations between 12 circulating proteins and severe COVID-19. These ndings provide valuable insights into the etiology of severe COVID-19. These prioritized risk and protective factors could serve as drug targets and guide the effective protection of high-risk populations.

Exposure data sources
To obtain a comprehensive list of traits with existing GWAS, the summary statistics of 34,519 published GWAS were extracted from the latest MRC Integrative Epidemiology Unit (University of Bristol) GWAS database (https://gwas.mrcieu.ac.uk/). Details of each GWAS study can be found at https://gwas.mrcieu.ac.uk/datasets/. The R package TwoSampleMR (version 0.5.5) was applied to retrieve the IEU GWAS datasets 68 . The univariable MR study was conducted using the same package.
These GWAS were further ltered based on the following criteria: 1) European ancestry; 2) not eQTL studies, those labeled as "eqtl" from eQTLGen 2019 69 . A total of 14,422 GWAS summary datasets were retained and used in this study. Detailed information on the used data sources is available in Supplementary Table 2, while details for each exposure and its GWAS are available in Supplementary  Table 3.

Outcome data sources
For evaluation of the association with COVID-19 severity, the instrument-outcome effects were retrieved from the recent version of GWAS meta-analysis by the COVID-19 Host Genetics Initiative (HGI, release 4 alpha, accessed on October 9, 2020) 28 . Detailed information has been provided on the COVID-19 HGI website (https://www.covid19hg.org/results/). In our primary discovery analysis, we used the summary statistics based on the comparison of 2,972 patients con rmed as "very severe respiratory" COVID-19 with the 284,472 general population samples. This is called "the HGI A2 study" in this study.
To reduce false positives and to ensure the robustness of our discoveries, replication analyses were performed with two additional GWAS of COVID-19. One of them was also from the COVID-19 HGI, comparing 6,492 hospitalized COVID-19 patients with 1,012,809 control participants. We called it "the HGI B2 study". Only single nucleotide polymorphisms (SNPs) with imputation quality scores > 0.6 were retained. The other GWAS was on 1,610 COVID-19 patients with respiratory failure and 2,180 controls from Italy and Spain, and it was called "the NEJM study" 29 .

Selection of instrumental variables
For the implementation of MR, SNPs were selected based on the genome-wide signi cance threshold (p < 5 × 10 −8 ). To ensure SNPs are independent, we pruned the variants by linkage disequilibrium (LD) (R 2 threshold of 0.001 or clumping window within 10,000 kb). When target SNPs were not present in the outcome dataset, proxy SNPs were used instead through LD tagging (minimum LD R 2 threshold of 0.8). The effect alleles of selected genetic variants were harmonized across the exposure and outcome associations.
Univariable Mendelian randomization Two-sample Mendelian randomization analysis was undertaken using GWAS summary statistics for each exposure-outcome pair. In order to estimate the causal effect of each trait on severe COVID-19, the IVW method with a multiplicative random-effects model was used as the primary analysis 30,33,70 . Horizontal pleiotropy occurs when SNPs exert a direct effect on severe COVD-19 through other independent biological pathways. To assess the presence of heterogeneity among genetic instruments, Cochran's Q statistic was calculated for heterogeneity for the IVW analyses 31 . An extended version of Cochran's Q statistic (Rücker's Q′) can be estimated for the MR-Egger 32 . We used the MR-Egger intercept test to evaluate the extent to which unbalanced horizontal pleiotropy may affect the effect estimate 33 . To account for pleiotropy, additional sensitivity analyses were performed with the MR-Egger 33,70 , weighted median (WM) 71 , and weighted mode methods 72 . The MR-Egger method allows unbalanced horizontal pleiotropic effects even when all SNPs are invalid instruments 33 . The WM method can provide robust causal estimates when at least 50% of SNPs are valid genetic instruments, while the weighted mode method requires that the largest number of instruments that identify the consistent causal effect to be valid instruments 71,72 . The false discovery rate (FDR) approach was utilized to correct for multiple testing, and it was applied to the p values from the IVW random-effect model 73 . FDR controls the expected proportion of false positives among the hypotheses declared signi cant if the q-value is < 0.05, while the association was deemed to be suggestive if the unadjusted p-value is < 0.05.
Two additional exclusion criteria were applied to lter out exposures: 1) the number of genetic instruments was less than three. Three or more are required for statistical tests of pleiotropic effects and for statistical sensitivity analyses to correct for pleiotropy. 2) Exposures with indications of pleiotropy in their genetic instruments. The presence of pleiotropy violates the assumption of MR analysis. For the remaining exposures, FDR correction for multiple testing was applied separately for each analysis with the HGI A2, HGI B2, or NEJM study. To identify potential causal risk factors for severe COVID-19, we used two approaches to consider the evidence strength. First, the signi cant and replicated results were de ned as those with a q-value < 0.05 in the discovery analysis and a nominal p-value < 0.05 in either one of the replication studies (Supplementary Table S10). Second, the suggestive and replicated results were de ned as those with a nominal p-value < 0.05 in the discovery analysis and a nominal p-value < 0.05 in either one of the replication studies (Supplementary Table S11). All MR analyses were conducted in R with the TwoSampleMR package 68 . An analysis owchart is shown in Fig. 1.

Multivariable Mendelian randomization
As many BMI-related traits are typically correlated with each other, we conducted a two-sample multivariable MR (MVMR) analysis to explore independent causal risk factors for severe COVID-19 74 . SNPs associated with fat mass and fat-free mass were obtained from previous GWAS by MRC IEU and the Neale Lab through the TwoSampleMR package. The effects of genetically predicted fat mass and fatfree mass for each pair of the whole body, left arm, right arm, left leg, right leg, and trunk were estimated using the MVMR package (version 0.2.0) in R.