An observational and Mendelian randomisation study on vitamin D and COVID-19 risk in UK Biobank

A growing body of evidence suggests that vitamin D deficiency has been associated with an increased susceptibility to viral and bacterial respiratory infections. In this study, we aimed to examine the association between vitamin D and COVID-19 risk and outcomes. We used logistic regression to identify associations between vitamin D variables and COVID-19 (risk of infection, hospitalisation and death) in 417,342 participants from UK Biobank. We subsequently performed a Mendelian Randomisation (MR) study to look for evidence of a causal effect. In total, 1746 COVID-19 cases (399 deaths) were registered between March and June 2020. We found no significant associations between COVID-19 infection risk and measured 25-OHD levels after adjusted for covariates, but this finding is limited by the fact that the vitamin D levels were measured on average 11 years before the pandemic. Ambient UVB was strongly and inversely associated with COVID-19 hospitalization and death overall and consistently after stratification by BMI and ethnicity. We also observed an interaction that suggested greater protective effect of genetically-predicted vitamin D levels when ambient UVB radiation is stronger. The main MR analysis did not show that genetically-predicted vitamin D levels are causally associated with COVID-19 risk (OR = 0.77, 95% CI 0.55–1.11, P = 0.160), but MR sensitivity analyses indicated a potential causal effect (weighted mode MR: OR = 0.72, 95% CI 0.55–0.95, P = 0.021; weighted median MR: OR = 0.61, 95% CI 0.42–0.92, P = 0.016). Analysis of MR-PRESSO did not find outliers for any instrumental variables and suggested a potential causal effect (OR = 0.80, 95% CI 0.66–0.98, p-val = 0.030). In conclusion, the effect of vitamin D levels on the risk or severity of COVID-19 remains controversial, further studies are needed to validate vitamin D supplementation as a means of protecting against worsened COVID-19.

A growing body of evidence shows that vitamin D deficiency might be associated with an increased susceptibility to viral and bacterial respiratory infections [1][2][3] . Similar findings have been recently reported for COVID-19: by analysing publicly available patient data, researchers have found a strong correlation between vitamin D deficiency and COVID-19 risk 4 . Furthermore, evidence suggests that COVID-19 disproportionately affects black and minority ethnic individuals, with one potential explanation being the higher prevalence of vitamin D deficiency, in addition to other risk factors 5 . It is thus hypothesised that having adequate vitamin D levels may help reduce the risk of contracting the SARS-CoV-2 virus or reduce the risk of severe or lethal COVID-19 disease.
To explore the causal role of vitamin D in COVID-19 risk, there have been at least three Mendelian Randomisation studies using the genetic variants associated with serum 25OHD as instrumental variables [6][7][8] . It is shown that genetic predisposition for lower levels of vitamin D is not causally associated with infection from SARS-CoV-2 or severe COVID-19 disease [6][7][8][9] . It is important to note that genetic heritability of vitamin D status is high in winter, but in the summer the vitamin D status might be predominantly determined by environmental factors regulating exposure (including season, geographical latitude) to ultraviolet B (UVB) radiation 10 . Therefore, an integrative measure of both genetically and ambient UVB radiation determined vitamin D levels during the pandemic would provide comprehensive insight in the causal inference in relation to vitamin D and COVID-19 risk.
The main aim of the current study is to perform Mendelian Randomisation (MR) analyses investigating the effect of genetically-predicted vitamin D levels on COVID-19 risk while taking into account ambient UVB radiation at the time of the infection, and compare these findings with results obtained from the observational analysis. We firstly conducted an observational study to examine the associations between measured vitamin D levels and COVID-19 risk. We then performed a MR analysis by using genetically-predicted vitamin D levels and also applied a novel approach that enabled us to estimate the UVB exposure preceding disease onset to COVID-19 to account for seasonal differences.

Methods
Data sources. Basic demographic information and genotype data on 495,780 participants from UK Biobank 11 , a large prospective study, were linked to COVID-19 test results (for the period 16/03/2020 to 29/06/2020 provided by Public Health England), including the specimen date, origin (whether the person was an inpatient or not) and result (positive or negative), and death cases caused by clinically and epidemiologically diagnosed COVID-19 from death registry. Confirmed COVID-19 cases were defined as UK Biobank participants who had at least one positive test result or died of COVID-19. Participants who have not been tested for SARS-CoV-2 were taken as controls. We additionally excluded the following participants from the cohort: (1) those who tested negative, since test results could have been false negative; (2) participants, who were from Scotland and Wales, since all COVID-19 test results were provided by NHS England only; (3) participants who died before 01/01/2020, since they had no chance to be infected by SARS-COV-2. Total plasma 25-hydroxy-vitamin D (25-OHD) was measured at the baseline assessment visits between 2006 and 2010 (median of 11 years before COVID-19 pandemic), using immunoassay (Diasorin). To remove the effect of sampling season on 25-OHD levels, we generated May-standardised 25-OHD levels for all participants (approximating 25-OHD concentration if blood was drawn in May), by applying coefficients generated in a model restricted to controls and adjusted for age and sex 12 . Vitamin D status was further categorised as deficient (25-OHD < 25 nmol/L), insufficient (25-50 nmol/L), or sufficient (> 50 nmol/L). A total of 138 genetic variants have recently been reported to be associated with vitamin D from the largest Genome Wide Association Study (GWAS; n = 443,734) 13 . We excluded ambiguous AT and CG variants (n = 4, rs184958517, rs200641845, rs529640451, rs536006581) to avoid bias due to strand differences between studies, and finally 134 SNPs were selected as genetic instruments for the MR analysis. The effects of vitamin D SNPs on COVID-19 outcomes were examined in the UK Biobank participants of White ancestry only to minimize the influence of population structure. A weighted genetic risk score (wGRS) was calculated as a proxy of genetically-predicted 25-OHD levels for a life-long exposure in the UK Biobank White population by using effect estimates reported by Manousaki et al. 13 . Dermal synthesis following exposure to UVB radiation is a major source of vitamin D for humans. We used ambient UVB radiation to approximate vitamin D status attributable to dermal synthesis (vitD-UVB) at the time of COVID-19 diagnosis. To do this, we calculated the cumulative and weighted vitD-UVB dose form the TEMIS database, version 2.0 (http:// www. temis. nl/ uvrad iation/ UVdose. html). Briefly, we extracted daily UVB dose at wavelengths that induces vitamin D synthesis at each participant's residential location over 135 days preceding the date of diagnosis for cases. Dates were randomly allocated to controls, from the distribution that was identical to that observed in cases. We weighted the daily UVB contributions before summing them up because more recent UVB exposure contributes more than exposures from a more distant past, since vitamin D is being synthesized and used up. More details on the calculation are presented elsewhere [14][15][16] and in "Supplementary Method".
Considering vitamin D receptor (VDR) may modify the biological effects of vitamin D, five variants (rs7975232, rs1544410, rs2228570, rs731236 and rs11568820) that are associated with VDR function were tested for any effect modification by adding multiplicative interaction terms in logistic regression model to examine whether the carrier of genetic polymorphisms of VDR would modify the effect of vitamin D on COVID-19 risk.

Statistical analysis.
In the descriptive analysis, mean and Standard Deviation (SD) is given for continuous variables, and number (N) and proportion for categorical variables, unless indicated otherwise. Logistic regression modelling was used to estimate the effect of vitamin D variables on COVID-19 risk (the risk of infection, hospitalisation and death) after adjustment for a range of covariates, including age, sex, deprivation index, body mass index (BMI), month of blood draw, ethnicity, physical activity, smoking and alcohol status, sunshine exposure variables, vitamin D supplement intake, and comorbidities of cardiovascular diseases (CVDs), diabetes, asthma, and malignancy. Specifically, we investigated the associations between COVID-19 and: (1)  www.nature.com/scientificreports/ levels (circulating 25-OHD concentration, May-standardised 25-OHD concentration, and categorical vitamin D status); (2) vitD-UVB, an integrated measure of ambient UVB radiation during the pandemic; (3) geneticallypredicted 25-OHD concentration using wGRS (vitD-wGRS 134 ), in fully adjusted models as described above. When analysing the association between vitD-wGRS 134 and COVID-19 risk, we additionally adjusted for the first 20 genetic principal components and genotyping panel to account for any potential confounding caused by population structure. Bonferroni correction was applied to account for multiple testing. As we tested the associations between five vitamin D variables (vitD levels, vitD-May-adjusted, vitD-categorical, vitD-UVB and vitD-wGRS 134) and three COVID-19 outcomes (risk of infection, hospitalisation, and death), we adjusted the significant threshold as p < 0.003 (0.05/15). We also tested their interactions with VDR SNPs. For MR analyses, vitamin D SNPs were aligned by the vitamin D increasing alleles, and the genetic associations between vitamin D SNPs and COVID-19 infection risk were estimated with adjustment for age, sex, the first 20 genetic PCs and genotype panel. Inverse-variance weighted (IVW) MR approach was used as the main analysis, and the simple mode, Egger, weighted median, weighted mode and MR-Pleiotropy RESidual Sum and Outlier (MR-PRESSO) as sensitivity analyses to explore the robustness of the findings in the presence of potential pleiotropy of the genetic variants 17 . The statistical power of MR analysis was calculated by using the non-centrality parameter-based approach 18 , and the overall proportion of variance (R 2 ) of vitamin D levels explained by the genetic instruments was estimated by using the measured vitamin D levels in the study population. Details of these MR approaches, including their different assumptions, are provided in "Supplementary Methods" and elsewhere 19,20 . All analyses were conducted using R version 3.6.1.
Ethics approval and consent to participate. UK Biobank has approval from the North West Multi-Centre Research Ethics Committee (11/NW/0382) and obtained written informed consent from all participants prior to the study. No consent to participate was required.
Ethical statement. The co-authors confirm that all methods in this study were carried out in accordance with relevant guidelines and regulations.

Results
There was a total of 14,439 COVID-19 tests conducted in UK Biobank participants. Of these, 1596 individuals had at least one positive COVID-19 test and 1020 of them were hospitalised. Additional 399 COVID-19 death cases were identified from the death registry. Table 1 presents the basic demographic characteristics of the cohort. In multivariate regression analysis, vitD-UVB at recruitment was strongly associated with 25OHD concentrations at recruitment (beta = 0.11, p-val < 2 × 10 -16 , R 2 = 0.19). The variance of 25OHD concentration at recruitment explained by vitD-UVB at recruitment alone was 12.4%, by vitD-GRS 134 alone was 4.2%, and by vitD-GRS 134 and vitD-UVB together with covariates in a multivariate model was 23.1%. Given the number of COVID-19 patients and the percentage of variance (4.2%) explained by vitamin D-related genetic variants, the main MR analysis was adequately powered (> 80%) to detect moderate to large causal effect with an odds ratio (OR) less than 0.68 (or greater than 1.32) per SD change in standardized natural-log transformed 25OHD levels.

Discussion
In this study, we assessed whether there is an association between vitamin D and COVID-19 risk and severity by examining a comprehensive set of key vitamin D variables jointly for the first time, and applying a number of analyses to probe consistency of our findings. We consistently found a strong inverse association between an integrated measure of ambient UVB preceding disease onset (vitD-UVB) and disease severity. We, unsurprisingly, found no strong association between vitamin D levels (plasma 25-OHD concentration measured at recruitment, 11 years ago) and COVID-19 risk or severity after adjustment for confounders, results that are in accordance to the recent study by Hastie et al. 21 . In this cohort, vitD-UVB explained the largest portion of the variance in 25-OHD at recruitment: vitD-UVB alone explained 12.4%, while vitD-GRS 134 alone explained 4.2%. Previous studies have shown that heritability of 25-OHD is high in winter and low in summer, which suggests a varied role of genetic factors, dependant on the UVB intensity 10 . It is therefore not surprising that we found evidence of an interaction between vitD-UVB and vitamin D genetic risk score, and these findings highlight the added value of examining genetically-predicted levels and ambient UVB jointly. MR sensitivity analyses using the weighted median and mode methods indicated a potential causal effect, although the main MR analysis showed that genetically-predicted vitamin D levels were not causally associated with COVID-19 risk.
UK Biobank is a large prospective study, with rich information on a range of demographic, lifestyle and health-related risk factors. Vitamin D plasma measurements were conducted in a single central processing laboratory using the Diasorin immunoassay, albeit a blood sample was taken over a decade ago and is unlikely to be representative of participants' vitamin D status at the time of the pandemic. We have partially addressed this by Table 1. Baseline characteristics of the COVID-19 cases and controls in UK Biobank. a Dates were randomly allocated to controls for the calculation of vitD-UVB based on the distribution that was identical to that observed in cases.

Controls (n = 415,596) Outpatient (n = 576) Inpatient (n = 1020) Death (n = 399)
Gender, N (%) www.nature.com/scientificreports/ using genetic instruments (that are determined by DNA sequence and hence not variable) to derive geneticallypredicted vitD levels. It is important to note that heritability of vitamin D status is high in winter (70-90%), but levels might be entirely determined by environmental factors in the summer 10 . Therefore, we also included an integrative measure of ambient UVB radiation during the pandemic. Vitamin D status is highly correlated with numerous factors, many of which are also linked with poorer health. By using genetically-predicted vitamin D level MR approach offers a unique opportunity to bypass confounding originating from these associations. However, the vitamin D status is varying seasonally, due to the overpowering effect of solar radiation and dermal production it induces. To account for these seasonal differences, we have used a novel approach that enabled us to estimate the UVB exposure preceding disease onset. One of the key strengths of this study is that we included this covariate in the analysis, with and without modelling the interaction, which enabled us to account for the time-varying nature of the relationship that is commonly a major issue for vitamin D MR studies. The discriminatory power of the UVB variable is somewhat limited in this study, because UVB radiation is low at this time of the year, particularly at the high northern latitude of UK-larger effects might be observed if variation in UVB is greater. We only used ambient UVB, and did not capture individual behavioural differences that would determine the actual level of vitamin D synthesis in the skin, such as duration and time of day spent outside, clothing, etc. It is important to note that time of year is the strongest predictor of vitD-UVB. To avoid bias control dates were assigned to follow the same distribution as case dates, which might have led to artificially diminished differences in vitD-UVB between cases and controls, however analysis relating to hospitalisation and death are not affected by this. We also conducted an analysis of the genetically-predicted vitamin D and a number of state-of-the-art MR analyses. However, the main limitation is the lack of power. Given the small number of COVID-19 patients and the relatively small percentage of variance (4.2%) explained by vitamin D-related genetic variants, this MR study was not adequately powered to detect small causal effect and negative results should be interpreted with caution. Additionally, MR studies only consider linear effects between 25-OHD levels and COVID-19 risk, which do not capture what happens at the extremes of vitamin D deficiency. Therefore, it cannot rule out the possibility that seriously ill patients (due to an underlying pathology) with extremely low vitamin D levels could be predisposed to COVID-19 infection and increased COVID-19 severity and mortality. Furthermore, 25-OHD levels are the used biomarker of vitamin D status in the study population, nevertheless, they correlate poorly with the active form of vitamin D (1,25-OH2D), which exerts the effects of vitamin D on a cellular level. Thus, this study cannot exclude effects of 1,25-OH2D on COVID-19 risk.
Another limitation of this cohort relates to the fact that not all participants have been tested for present (or past) COVID-19 infection; consequentially, taking participants who were not tested as controls could be a potential source of bias, given that misclassification of controls might be substantial due to the presence of asymptomatic infected individuals, further driving our findings to the null. This is evident from the 1:2 ratio between outpatient vs. inpatient cases. It should be acknowledged that the COVID-19 cases in UK biobank have a high rate of hospitalisation due to the very limited and targeted testing at this stage of the pandemic in the UK, so this study reflects mainly those with more severe COVID-19 and gives less information about true infection risk, or risk of milder disease. In addition, we excluded individuals with a negative COVID-19 testing result from the controls due to the risk of those being false negatives. Although there is a risk of introducing selection bias, we believe that the risk of introducing misclassification bias if we included them in the analysis could be Table 2. Association between Vitamin D and COVID-19 risk in multivariable regression models. a Adjusted for age, gender, body mass index (BMI), month of blood draw (adjusted for vitD and vitD-categorical only), ethnicity, physical activity, smoking and alcohol status, sunshine exposure variables (i.e., time spend outdoors in summer, time spent outdoors in winter and the use of sun/uv protection), vitamin D supplement intake, deprivation index, and comorbidities of CVDs, diabetes, asthma, and malignancy. b Multivariable model was additionally adjusted for the first 20 genetic principal components and genotype panel. c Multivariable regression was fitted by including both vitD-wGRS 134 and vitD-UVB in the same model to examine the effects of genetically predicted vitamin D levels and ambient UVB jointly.  www.nature.com/scientificreports/ higher 22,23 . Additionally, given the presence of asymptomatic infected individuals, taking participants who were not tested as controls could also be another potential source of bias. Our study assessed the effect of genetically predicted vitamin D levels on COVID-19 risk while taking into consideration of ambient UVB radiation during the pandemic. We show an indication of an inverse association between genetically predicted vitamin D levels and severe COVID-19. Findings from our study are consistent with a recent randomised controlled trial (RCT) that found protective effect of vitamin D supplementation among those hospitalised with COVID-19 24 . However, other clinical trials did not show an effect. For instance, a randomised trial of 240 patients showed that supplementation with a single very large dose of 200,000 IU of vitamin D 3 that increased serum vitamin D levels (21-44 ng/ml) was nonetheless ineffective in decreasing the length of hospital stay or any other clinical outcomes among hospitalized patients with severe COVID-19 25 . It has been estimated that one SD change in standardized natural-log transformed 25-OHD levels corresponds to a change in 25-OHD levels of 29.2 nmol/l in vitamin D insufficient individuals (serum 25-OHD levels < 50 nmol/l), which is comparable to the 21.2 nmol/l mean increase in 25-OHD levels conferred by taking daily 400 IU of cholecalciferol, the amount of vitamin D most often found in vitamin D supplements 26 . This estimation has clinical implication on the dose of vitamin D supplement for disease prevention. Given the lack of highly effective therapies against COVID-19, it is important to remain open-minded to emerging results from rigorously conducted studies of vitamin D.
In conclusion, we found no significant associations between COVID-19 risk and measured 25-OHD levels after adjusted for covariates, but this finding is limited by the fact that the vitamin D levels were measured on average 11 years before the pandemic. Ambient UVB was strongly and inversely associated with COVID-19 hospitalization and death. The main MR analysis did not show that genetically-predicted vitamin D levels were causally associated with COVID-19 risk, although MR sensitivity analyses indicated a potential causal effect. Overall, the effect of vitamin D levels on the risk or severity of COVID-19 remains controversial, further studies are needed to validate vitamin D supplementation as a means of protecting against worsened COVID-19. www.nature.com/scientificreports/

Data availability
Data used in this study were obtained from UK Biobank under an approved data request application (application ID: 10775).