From the end of 2019 until now, a new coronavirus named SARS-CoV-2 has been causing a worldwide pandemic leading to many deaths due to an acute respiratory disease named COVID-19. The lack of knowledge about the pathogenesis of this disease and the reasons underlying the different clinical outcomes have been pushing society as a whole to take preventive measures and to accelerate research. During the last year, several research groups and healthcare institutions have made available evidence that indicates that an uncontrolled inflammatory response is a major cause for the occurrence of an acute respiratory distress syndrome1.

A statistical analysis of COVID-19 pandemic data from healthcare centers across China, France, Germany, Italy, Iran, South Korea, Spain, Switzerland, the United Kingdom and the United States, performed by Northwestern University, shows a strong correlation between severe vitamin D deficiency and mortality rates2.

Vitamin D has many relevant and documented roles in general health maintenance, being its deficiency particularly associated with severe impacts on the functional integrity of the immune system, such as influencing cytokine production3. It has been hypothesised that vitamin D could terminate the “cytokine storm” and oxidative stress with possible antiviral activity through its classical precursors and novel CYP11A1-derived hydroxyderivatives4. Vitamin D deficiency has also been linked to hypertension, autoimmune, infectious and cardiovascular diseases, also known risk factors for severe COVID-195,6.

Previous research shows that genetics contribute to up to 28% of inter-individual variability in serum 25(OH)D concentrations, while season and vitamin D intake explain another 24% of the variability7. Large-scale genome-wide association studies (GWAS), considering 79,366 individuals with European ancestry8, or considering imputed genotypes from 401,460 white British UK Biobank participants9, have identified relatively common single nucleotide polymorphisms which play an important biological role in vitamin D metabolism, transport, degradation and downstream pathways, to evaluate their impact on circulating 25(OH) D concentrations. Vitamin D polymorphisms research shows that not all genomes will respond to vitamin D supplementation due to the loss of function of different genes10.

In a normal scenario, it is known that vitamin D deficiency is common in Europe and the Middle East. It occurs in < 20% of the population in Northern Europe, in 30–60% in Western, Southern and Eastern Europe and up to 80% in Middle East countries11. Data from Portugal shows that 66% of adults present vitamin D insufficiency/deficiency12. This high percentage in a population living in a sunny country, motivated the characterization that was performed regarding the prevalence of high impact alleles in genes associated with the vitamin D pathways in the Portuguese population when compared with the European population.

The main objective of this study was to understand if an association exists between polymorphisms in vitamin D-related genes and vitamin D levels, and between these variables and the COVID-19 severity. Secondarily, we aimed to compare the frequency of the genetic variants under analysis with the observed frequency for the European population. To achieve this objective, we have assessed the genetic variants in vitamin D-related genes in hospitalized patients and the vitamin D serum levels, and evaluated the association between these data and the severity of the disease.

Results and discussion

A total of 491 patients with a laboratory confirmed positive COVID-19 test, 371 (75.6%) from Santa Maria hospital and 120 (24.4%) from São João hospital, were considered. There were 217 female and 266 male patients with mean ± SD age of 69.7 ± 15.8 years (see supplemental data for demographic, clinical and phenotypic data).

In COVID-19 positive patients, 120 (24.4%), 311 (63.3%), and 59 (12.0%) patients had deficient, insufficient, and sufficient levels of vitamin D, respectively, using the Endocrine Society cutoff (see supplemental data). The prevalence of vitamin D deficiency was 61.7% and 68.3% in Santa Maria and São João hospitals, respectively.

Dead, severe and moderate disease were observed in 18.5%, 21.8% and 59.7% of patients, respectively. From a total of 311 patients with vitamin D deficiency, 68 died (13.8% of all patients), 69 (14% of all patients) had a severe response and 174 (35.4% of all patients) had a moderate response to COVID-19. Disease severity for each patient was defined using the World Health Organization (WHO) discrete clinical progression scale from 0 to 1013. In this study patients’ severity levels were between 4 and 10.

Identification of the vitamin D polymorphisms as risk biomarkers

From previous large-scale GWAS, several single nucleotide polymorphisms that play an important biological role in vitamin D metabolism, transport, degradation, and downstream pathways, have been identified as having an impact on circulating 25(OH) D concentrations8,9. To understand if an association exists between the polymorphisms in the vitamin D-related genes and the disease severity, four polygenic risk scores (PRSs) were defined. These scores considered contributions from different genes and were identified as: Synthesis (DHCR7; CYP2R1); Metabolism (GC; CYP24A1); Pathway (DHCR7; CYP2R1; GC; CYP24A1) and Vitamin D total (DHCR7; CYP2R1; GC; CYP24A1; AMDHD1; SEC23A).

Table 1 represents the correlations between the PRSs and COVID-19 disease severity, and the PRSs and vitamin D levels. The results show a significant positive correlation between the Metabolism score and COVID-19 disease severity, and a significant negative correlation between the Vitamin D total score and vitamin D levels.

Table 1 PRSs correlation with COVID-19 disease severity and vitamin D level (Spearman correlation).

The relation between the Vitamin D total score values and vitamin D levels can be clearly verified in Fig. 1. Higher PRSs values are associated with patients with deficient levels of vitamin D, with a higher aggregation of points at the median/high level of the PRSs in vitamin D deficient patients.

Figure 1
figure 1

PRSs are represented as a continuous value. Patients were divided in three groups (Deficient—red; Insufficient—yellow; Sufficient—blue) considering its vitamin D level (a continuous variable). In each group, patients are sorted in ascending order of PRSs.

A deeper analysis of the positive correlation between the Metabolism score and COVID-19 disease severity showed that the polymorphism GC RS2282679 in the vitamin D binding protein encoded by the GC gene could explain most of the interesting correlation observed. The association of this polymorphism with the severity presents a ρ = 0.13 (p-value = 0.005).

This finding was particularly relevant because only one previous GWAS meta-analysis from the COVID-19 host genetics initiative considering hospitalized vs. not hospitalized COVID-19 patients ( briefly highlighted this fact. In this meta-analysis, the GC RS2282679 correlated significantly with COVID-19 disease severity (p-value = 0.002).

Regarding the association of the GC RS2282679 polymorphism with the disease severity, it might be explained by an additional role played by the vitamin D binding protein other than the transport of vitamin D in the bloodstream. Available data indicates that this protein may also act as a neutrophil chemotactic factor and a macrophage activator, therefore actively participating in the inflammation process14,15,16. Also, vitamin D binding protein is an extracellular scavenger for actin released from damaged/dead cells. When in excess, actin can cause intravascular coagulation resulting in multi-organ dysfunction and cardiac arrest17.

The functional consequence of this polymorphism in this pathway needs to be further explored in future studies.

Relation between vitamin D level and COVID-19 disease severity

It was observed a trend towards an increased proportion of patients with deficient levels of vitamin D (< 20 ng/ml) and an increase of severity from moderate to severe, and dead. The proportion of patients with deficient levels of vitamin D was higher in the group that died (76%) when compared with the two other groups (moderate—59% and severe—64% disease). The combined proportion of patients with insufficient and sufficient vitamin D levels was 40% in the group with moderate severity, compared with 24% in the group of deceased patients (see supplemental material).

This evidence was also confirmed when a binomial variable of survival and fatal outcome was used (Fig. 2). Vitamin D levels were significantly lower in patients that died than in those that survived (Median = 11.70 [Q1 = 8.67, Q3 = 19.67] vs Median = 17.40 [Q1 = 11.00, Q3 = 24.60], respectively (p-value = 1.5e−4, Mann–Whitney U test). These results corroborate what was already observed by other studies across the world.

Figure 2
figure 2

Vitamin D level (ng/mL) (continuous) and COVID-19 disease severity (binomial: survived, dead).

Differences between impact genotypes’ frequency in Portuguese and European populations

The genotypes’ frequency of the genetic variants under analysis was compared with the observed frequency for the European population in the 1000 Genomes Project, Fig. 3. The variant DHCR7 RS12785878 shows a significant deviation in the impact genotypes’ frequency (G/GT), 18.9% for the Portuguese population compared to 9.7% for the European population. The prevalence of the impact polymorphism DHCR7 RS12785878 is similar in the Portuguese population in the HeartGenetics’s research database with more than 8000 Portuguese individuals (data not shown), where GG genotype has a prevalence of 18.8%.

Figure 3
figure 3

Comparison of the impact genotypes’ frequency in the Portuguese and European populations. Circles highlight the risk genotype. The red circles show the risk genotypes from variants with a statistically significant difference between these two populations (p-value = 2.5e−7 for DHCR7 RS12785878, and p-value = 2.8e−4 for AMDHD1 RS10745742).

In this study, and for the first time, it is shown that the Portuguese population presents a genetic makeup for a higher predisposition to vitamin D deficiency when compared to the European population, therefore increasing the potential severity of the COVID-19 response, consequently impacting on patient outcomes. Furthermore, these results also explain, in part, the higher vitamin D deficiency in the Portuguese population in different previous publications12. It is important to know that this metabolic vulnerability must be considered for a vitamin D supplementation clinical decision. This data proves that it is wrong to assume that sunny countries would not have an issue with vitamin D deficiency, showing that genetic characterization and vitamin D monitoring at a population level should be put in place in order to define guidelines for vitamin D intake.

One year after this pandemic scenario, it has been described that vitamin D deficiency may be a risk factor for mortality in COVID-19 patients10. On top of this, evidence revealed that supplementation with high‐dose vitamin D3 booster therapy and reduced the risk of mortality18. This highlights the importance of vitamin D on COVID-1919 and reinforces the hypothesis of vitamin D as a terminator of the “cytokine storm”4. Moreover, it was reported an increased risk of infection with COVID-19 in individuals presenting lower levels of 25(OH)D20,21, which could be in line with the suggested anti-SARS-CoV-2 activity of a range of vitamin D3-related compounds, including 7-dehydrocholesterol and lumisterol hydroxyderivatives22. Whereas vitamin D intake, sun exposure, demographics, and, specially, genes have been identified as being crucial determinants of vitamin D status, the impact of these factors is expected to be different across populations. To improve current prevention and treatment strategies, it is essential to propose novel diagnostic tools for personalization.

These results reinforce the role of vitamin D polymorphisms, in particular the GC RS2282679 polymorphism, and vitamin D levels as biomarkers for COVID-19 disease severity and emphasize the relevance for personalized strategies in the context of viral diseases. Whether these relationships are causal or consequential is unknown. Nevertheless, they represent targets for diagnostic surveillance, or for intervention studies.



Between August 2020 and January 2021, 517 patients were enrolled in this project when admitted to Hospital de Santa Maria in Lisbon and Hospital de São João in Oporto. Eligibility criteria included patients aged 18 years and above, patients admitted with COVID-19 disease requiring hospitalization, with a positive test for SARS-CoV-2 by nasopharyngeal swabs using quantitative RT-PCR performed in national reference laboratories and in accordance with recommendations from the National Directorate of Health.

The RT-PCR results for the SARS-CoV-2 were obtained through one of the following tests: LightCycler® Multiplex RNA Virus Master (Roche Life Science, Penzberg, Germany) at a LightCycler® 480 Instrument II (Roche Life Science, Penzberg, Germany), Cobas® SARS-CoV-2 Test (Roche® Diagnostics) at a Cobas® 6800 (Roche® Diagnostics), AllplexTM SARS-CoV-2 Master Assay (Seegene Inc.) at a CFX96® Real-Time (Bio-Rad®), GeneFinderTM COVID-19 PLUS RealAmp Kit (OSANG Healthcare Co, Ltd.) at ELITe InGenius® (ELITechGroup®) or QuantGene 9600 (Hangzhou Bioer Technology Co. Ltd.), Xpert®Xpress SARS-CoV-2 (Cepheid®) at a GeneXpert® (Cepheid®).

All eligible patients were followed till the closure of case i.e. either discharge or mortality.

Serum 25-hydroxyvitamin D

Vitamin D status was determined by analyzing 25(OH) D concentrations in the serum samples collected at the time of patient admission. Serum 25(OH) D was determined in the local pathology laboratories by a competitive electrochemiluminescence protein-binding immunoassay using a Cobas® e411 or e801 automated analyzer (Roche Diagnostics GmbH). This assay uses a vitamin D-binding protein as capture protein, which binds to both forms of vitamin D: 25(OH) D2 and 25(OH) D3. Thus, in this report, the terms ‘vitamin D’ and ‘25(OH) D’ refer to both forms: 25(OH) D2 and 25(OH) D3. The specific terms for vitamin D2 and D3 are used to refer to the corresponding individual form. The 25(OH) D cut-offs established by the Clinical Guidelines Subcommittee of the Endocrine Society were followed, considering the absence of a global consensus on the 25(OH) D concentration that defines vitamin D deficiency and “adequate” 25(OH) D levels for extra-skeletal functions23. Thus, vitamin D adequacy was classified according to the following 25(OH) D cut-off levels: deficiency, < 20 ng/mL; insufficiency, 20–29 ng/mL; and sufficiency, ≥ 30 ng/mL24. The season of blood sample collection was also considered, as it may affect the patient’s vitamin D concentrations25.


All patients signed an informed consent to perform a genetic test for the analysis of 18 genetic variants that play a role in vitamin D metabolism, transport, degradation and downstream pathways. See supplemental data for details about the genetic panel, Table ii. The genetic test was performed using the iPLEX® MassARRAY® system at the HeartGenetics’ certified laboratory in Portugal. DNA samples were prepared from blood samples by HeartGenetics, CCUL and São João hospital laboratories.

Data management

Clinical history, genotypic and phenotypic data, stored at the e-CRF, were collected and managed using REDCap electronic data capture tools26 hosted by (, the Portuguese distributed infrastructure for biological data, at INESC-ID research institute. All datasets are pseudo-anonymous and only one of them has a key that connects to the patient. In accordance with the GDPR, only the PI of the project (Prof. Fausto Pinto) and the clinicians responsible for data acquisition have access to this key. All project participants have controlled data access. The clinical history and the patient informed consent were supervised by the Cardiology Service at Santa Maria hospital and by the Pathology Service at São João hospital (see supplemental data for e-CRF content details).

Data analysis

Clinical history, genotypic and phenotypic data was evaluated using statistical, machine learning and polygenic risk scores methodologies. Vitamin D polymorphisms prevalence in different populations was obtained from 1000 Genomes database ( and from HeartGenetics’s research database for the Portuguese population, with more than 8,000 Portuguese individuals.

Regarding the methodological approach, the following steps were undertaken:

  1. 1.

    Data cleaning and validation: All variables were analysed for outliers and missing values. Some discrepancies, such as different units of measure and data entry errors, were identified and fixed. No imputation was made. Regarding data transformation, both disease severity and vitamin D levels were categorized in different levels, and the risk genotypes were aggregated in PRSs.

  2. 2.

    Descriptive analysis: A complete, graphical descriptive analysis of the data was created for all variables of interest as univariate analysis. Data are presented as numbers or percentages for categorical variables, while continuous variables are shown as mean and standard deviation, and median and interquartile range (25th percentile–75th percentile).

  3. 3.

    Analysis of data distribution: The data normality was accessed using Shapiro–Wilk test and D’Agostino Pearson’s test. Statistical normality testing is relevant in order to set up the category of statistical methods (parametric or non-parametric) used in further analysis. Results showed that most parameters do not follow a normal distribution, thus for further analysis it was considered only non-parametric statistical tests that do not assume any particular data distribution.

  4. 4.

    Identification of the vitamin D polymorphisms as risk biomarkers: Several statistical tests were used, namely Mann–Whitney and Kruskal–Wallis Tests. Spearman rank correlation coefficient was also calculated. Four PRSs have been computed, focused on the vitamin D metabolism, transport and degradation pathways, based on an additive weighted model, having values in the interval [0, 1]. In this interval, 0 corresponds to a lower risk of having low vitamin D levels due to genetics, and 1 corresponds to a higher risk of having low vitamin D levels due to genetics (see supplemental material for details about the PRSs). The four different scores considered the following genetic variants.

    1. (1)

      Synthesis score = DHCR7 RS12785878 + CYP2R1 RS10741657

    2. (2)

      Metabolism score = GC RS 2,282,679 + CYP24A1 RS17216707

    3. (3)

      Pathway score = DHCR7 RS12785878 + CYP2R1 RS10741657 + GC RS 2,282,679 + CYP24A1 RS17216707

    4. (4)

      Vitamin D total score = DHCR7 RS12785878 + CYP2R1 RS10741657 + GC RS 2,282,679 + CYP24A1 RS17216707 + AMDHD1 RS10745742 + SEC23A RS8018720

The PRSs did not model other genetic variants that have been tested, since their impact has not been obtained by the same GWAS studies, which could introduce a bias in its relative impact.

  1. 5.

    Analysis of the correlation between hypovitaminosis D and the disease severity: Different statistical tests were employed, namely Mann–Whitney and Kruskal–Wallis Tests, depending on the type of categorization under analysis. Spearman rank correlation coefficient was also calculated in order to analysed not only an eventual association but also to quantify it and observe its direction.

  2. 6.

    Genotypes frequency comparison: For this comparison the 1000 Genomes ( and the HeartGenetics’s research database with more than 8000 Portuguese individuals were used.

Concerning the different statistical tests performed, a p-value < 0.05 was considered statistically significant.

Study approval

This project was approved by the Ethics Committee of Hospital de Santa Maria in Lisbon and Hospital de São João in Oporto. The study was conducted in accordance with the ethical principles of the Declaration of Helsinki and followed the Good Clinical Practice guidelines. Written informed consent was obtained from all study participants prior to their inclusion in the study.