Introduction

The clinical manifestations of COVID-19, the disease caused by the SARS-CoV-2 virus, vary widely from mild respiratory symptoms to severe organ failure and death1,2,3. The mortality rate of COVID-19 also shows remarkable temporal and spatial heterogeneity across the world4,5. Several risk factors have been associated with increased mortality, such as older age, male sex, and presence of comorbidities6,7.

The role of genetics in modulating the severity and outcome of COVID-19 has been a subject of intense research and growing evidence supports the existence of individual genetic factors predisposing to a severe outcome8. For example, the COVID-19 Host Genetics Initiative (HGI) consortium performed large-scale meta-analyses of genome-wide data from over nine thousand critically ill cases (defined as patients who required respiratory support or died from COVID-19) and over 25 thousand hospitalized cases with moderate or severe disease, compared with up to five million controls9. These studies identified several genetic loci associated with either critical illness or hospitalization due to COVID-19. However, the consortium did not address the survival probability of SARS-CoV-2 infected patients, which is a relevant time-to-event phenotype that has received limited attention so far. Indeed, most studies have focused on COVID-19 severity (reviewed in10), some on mortality11,12,13 and very few on survival14, mainly investigating candidate gene polymorphisms rather than performing genome-wide analyses.

In this study, we conducted a genome-wide survival analysis to identify variants affecting the risk of death from acute SARS-CoV-2 infection. We used genotyping and clinical follow-up data (at 60 days post-infection) from about four thousand COVID-19 patients from five European cohorts. We adjusted the analyses for known non-genetic independent prognostic factors, like age, sex, and pandemic wave, which were available for all patients.

Materials and methods

Case series

The case series investigated in this study comprised 3904 COVID-19 patients molecularly tested for SARS-CoV-2 infection and enrolled for host genetics studies at several recruiting centres, in the context of the international COVID-19 HGI. Patients from GEN-COVID Multicenter Study and from the series included in the European Genome-Phenome Archive (EGA) study number EGAS00001005304 (except for BRACOVID and INMUNGEN-CoV series), with 60-days follow-up information and full data about sex, age, and infection date were analysed. BRACOVID patients were not included in the analysis since their data were not shared with us. The INMUNGEN-CoV series was not included in our study, since 90% of patients did not have survival data and, in addition, they were genotyped with a different SNP-array. Patients provided written informed consent to the use of their biological samples and data for research purposes. Personal data treatment was GDPR compliant. The research was approved by the Committees for Ethics of the recruiting centres.

Survival analysis

A multivariable Cox proportional hazard model with demographic-clinical features (i.e., age, sex, series, and time of infection) was used for survival analysis15 (age was considered as both a linear and non-linear term, the latter defined as age squared and hereafter named age2). The R survival package16 was used to draw Kaplan–Meier (KM) curves and run the log-rank test in R (v. 3.6.0) environment. The hazard proportionality assumptions were verified through the function “cox.zph()” of the survival package. The variables that were found to impact on survival from the log-rank test (with P-value < 0.05) were analysed both in multivariable Cox regression and in a weighted multivariable analysis to account for non-proportional hazards17, using the survival R package and the coxphw R package18 (applying the Average Hazards Ratio method, by setting the parameter template = “AHR”), respectively. Cox and log-rank test P values < 0.05 (two-sided) indicated sufficient statistical significance.

Genotyping data quality check, principal component analysis, and imputation

Genome-wide genotyping data were available at EGA (study number EGAS00001005304) and University of Siena. The LiftOver tool (https://liftover.broadinstitute.org/) was used to convert genomic coordinates and bring all the datasets to the same genomic build (GRCh38). PLINK v.2 software19 was used to carry out genotype quality control (QC) steps (Supplementary Figure S1). In detail, for each patient series, per-sample and per-variant QC steps were performed, excluding samples with call rate < 99% and excess of heterozygosity (F >  ± 0.2), removing insertions/deletions, duplicated and non-informative variants, and filtering out single nucleotide polymorphisms (SNPs) with genotyping call rate < 99% and Hardy–Weinberg equilibrium test P-value < 1.0 × 10−10. Then, all the datasets were merged, and an additional round of QC was carried out to remove duplicated and related individuals and patients for whom no survival data were available. Additionally, SNPs with minor allele frequency (MAF) < 1% were filtered out, together with SNPs mapping in regions of extended linkage disequilibrium (LD)20. PLINK v.2 software was also used to carry out principal components analysis (PCA): we plotted PC1 versus PC2 visualizing samples according to our five patient series (i.e., BelCovid, GENCOVID, Hostage, SPGRX, and SweCovid; Supplementary Figure S2A). Additionally, to better visualize the ancestry of our patients, we projected the first four principal components of our patients together with those of 2504 individuals from five different populations, selected from 1000 Genomes Project21: Africans, Americans, South-East Asians, East Asians, and Europeans (Supplementary Figure S2B and S2C). We defined as Europeans those patients that clustered together with 1000 Genomes Project European individuals. Genotype imputation to whole-genome sequence was carried out on the TopMed imputation server22 using Eagle v.2.4 for phasing23, minimac4 algorithm24, and TopMed r2 as reference panel25. Finally, SNPs with a low-quality imputation (R2 ≤ 0.3)26 and with a MAF < 0.02 were filtered out. This second MAF filter was applied after imputation to remove variants with very low frequency alleles, thus reducing the risk of the Cox model not-converging and of obtaining spurious association results when, in patients carrying the minor allele, few or no events (i.e., death) were observed.

Genome-wide survival analysis

The associations between SNPs (additive model) and patient overall survival were assessed using multivariable Cox proportional hazard model, with the first 10 PCs, sex, age, age2, patient series, and time of infection (before or after the first pandemic wave, that we considered finished by the end of June 2020), as covariates, using the GenAbel package in R environment27. Correction for multiple testing was performed with the Benjamini–Hochberg method of false discovery rate (FDR)28. Top significant polymorphisms were also tested under a dominant model using the same covariates as above. This was done for the following two reasons: first, we hypothesized that the minor allele was increasing the risk of death by acting as a dominant allele or at least codominant; second, in this way we reduced the risk of artifacts in the Kaplan–Meier curves by comparing the probability of survival of patients homozygous for the major allele with that of patients with at least one minor allele in their genotype (due to the low number of patients homozygous for the minor allele). COVID-19 severity top-significantly associated variants (https://app.covid19hg.org/variants; analysis B2: Hospitalized COVID19 + vs. population controls)9 were also investigated in our study, by comparing P-values of association between these variants and survival with those reported by COVID-19 HGI.

A logistic regression between a binary status phenotype (live vs. dead during the 60-days follow-up) and genome-wide imputed SNPs was performed using PLINK v.2, with the first 10 PCs, sex, age, age2, time of infection, and patient series as covariates.

Functional analyses

To investigate the functional role of the identified variants associated with survival to COVID-19, we used multiple databases to obtain more reliable and robust results, by pooling together partial results from each platform.

First, we looked for the variants associated with COVID-19 survival at P-value < 1.0 × 10−5, in the GTEx (Analysis V8 release, GTEx_Analysis_v8_eQTL_EUR.tar) and eQTLGen29 (https://www.eqtlgen.org/cis-eqtls.html) databases (accessed on 08/05/2023), to test whether they have already been reported as cis expression quantitative trait loci (eQTLs). For some variants, we also looked for eQTL SNPs in LD with them, using the LDexpress tool of LDlink30. We searched all tissue eQTLs reported in the European population and in LD with query variants at D’ > 0.8. An over-representation analysis of the gene list including all the target genes of the found eQTLs was done, using the WEB-based Gene SeT AnaLysis Toolkit31, to search for enriched Reactome and KEGG pathways.

Additionally, genes that included or are close to the variants associated with COVID-19 survival (P-value < 1.0 × 10−5) were retrieved. The ENSEMBL gene database was used with the R Bioconductor package “biomaRt”32 to find genes that lie at a maximum distance of ± 50 kb from detected SNP variants. The resulting list of genes was then used as input for functional analysis tools. We used the Database for Annotation, Visualization and Integrated Discovery (DAVID)33 with default functional categories and applying the high classification stringency parameters; Benjamini–Hochberg adjusted P-values (FDR) < 0.05 was considered as significance threshold of enrichment.

The GENE2FUNC functionality of the Functional Mapping and Annotation (FUMA) for GWAS platform34 was selected to retrieve gene ontologies (biological processes, molecular functions, cell compartments), differentially expressed genes associated with COVID-19, target tissues and metabolic pathways, with the following parameters: all background genes; Ensembl version v102; GTEx v8 (54 tissue types) and GTEx v8 (30 general tissue types) gene expression datasets; Benjamini-Hochberg (FDR) correction when testing for gene-set enrichment (threshold: FDR < 0.05); minimum overlapping genes with gene set: ≥ 2.

Ethical approval and informed consent

The research was performed in accordance with the Declaration of Helsinki and was approved by the committees for Ethics of the recruiting centres, namely, the University Hospital (Azienda ospedaliero-universitaria Senese) ethical review board, Siena, Italy (Prot n. 16917, dated March 16th, 2020); Euskadi Ethics Committee, Donostia-San Sebastian, Spain, on April 6, 2020 (approval number PI2020064); Vall d’Hebron Ethical Committee, Barcelona, Spain; ethics committee of the Junta de Andalucia, Spain (ethics id: 0886-N-20 and 1954-N-20); the ethics committee of Humanitas Clinical and Research Center, Rozzano (MI), Italy (reference number, 316/20); Valladolid Ethics Committee (PI-201716) and the Granada Ethics Committee, Spain, on March 24, 2020, and April 13, 2020, respectively; Erasme Ethics committee, Bruxelles, Belgium (protocol P2020_209); the National Ethical Review Agency, Sweden (EPM; 2020-01623). Patients provided written informed consent to the use of their biological samples and data for research purposes. Personal data treatment was GDPR compliant.

Results

Age, sex, and period of infection are associated with COVID-19 patient survival

In this study, we included a total of 3904 European COVID-19 patients recruited in Italy (GEN-COVID and, in part, Hostage series), Spain (Hostage and SPGRX), Sweden (SWECOVID) and Belgium (BelCovid) (Table 1). The median age at infection was 63 years and male patients were slightly prevalent (58%). More than two thirds of patients included in this study were infected during the first COVID-19 wave (before June 30th, 2020). Most of enrolled patients were hospitalised (86%), but 78% of the whole series did not need to be admitted to the intensive care unit (ICU). Among the 3175 patients for whom we have information about the respiratory support, just a small fraction of patients included in the study (15%) did not need any oxygen support, whereas most of them (40%) received oxygen by mask or cannula, 11% received non-invasive ventilation, and 14% needed intubation. Information about comorbidities was available for 2887 patients and about a quarter of patients had a comorbidity (hypertension, diabetes, cancer, asthma, and heart failure, in order from the most to the least frequent) and 12% had at least two comorbidities. Considering a follow-up time of 60 days after the infection, approximately 11% of COVID-19 patients of this study have died.

Table 1 Clinical characteristics of patients included in the survival genome-wide association study.

To investigate the factors affecting mortality after SARS-CoV-2 infection, we carried out a survival analysis in a period of 60 days post-infection. Since the hazard proportionality assumption was not verified in our series (global Schoenfeld residuals test P-value = 6.5 × 10−6) we carried out both a weighted multivariable Cox analysis, to deal with non-proportional hazards, and a multivariable proportional hazard Cox regression. In both models we used the following variables, for which we had full data availability: age, age2, sex, and date of infection (Table 2A). The results obtained were similar. Indeed, we observed that the major mortality risk factor for COVID-19 patients was the age at infection (as expected, in both models it was a risk factor for poor prognosis, with HR > 1, with increasing age). We also explored a non-linear effect of age (i.e., age2) on survival and, although statistically significant, it was quite irrelevant (HR = 1 in both models). Female sex was associated with a slightly higher probability of survival (HR = 0.73 in both models), compared to males. Additionally, we observed that patients infected in the first pandemic wave (i.e., in the first half of 2020) had lower probabilities of survival than patients infected later (HR = 1.6 and 1.5 in Cox and weighted Cox models, respectively).

Table 2 Clinical and personal prognostic factors for COVID-19 patients, as resulted from Cox’s and weighted Cox’s multivariable tests.

We also tested Cox and weighted Cox models with an additional covariate, i.e., the series in which patients were recruited. Indeed, we were aware that some differences in the recruitment might have confounding effects on patient survival. For instance, we knew that SweCovid patients were all admitted in intensive care unit, they probably were affected by a severe form of COVID-19 and, therefore, their probability of survival was lower than other patient series. As expected, SweCOVID patients had the highest risk of death among all patients included in the present study (Table 2B). These results prompted us to include the explored variables as covariates in the genome-wide survival analysis, in order to identify genetic variants that were independent prognostic factors.

We also drew KM curves testing the effects of age, sex, pandemic wave, and patient series, on the risk of death 60 days after infection. Regarding the age, it was discretized in five age groups, starting from patients < 55 years old, then by decades till the age of 84, and the last group comprising patients ≥ 85 years old. Highly significant associations were observed for age, patient series, and pandemic wave (Supplementary Figure S3), whereas the association with sex was weaker, but still significant (log-rank test, P-value = 0.03). As already observed with the Cox models, the probability of survival decreased with increasing age and was lower for males than females, for patients infected at the beginning of 2020, and as expected, for SweCOVID patients.

Germline variants are associated with COVID-19 overall survival 60 days after infection

Genotype data of patients from GEN-COVID Multicenter Study and from the series included in the European Genome-Phenome Archive (EGA) study number EGAS00001005304 (except for BRACOVID and INMUNGEN-CoV, that were unavailable and with missing data, respectively, as explained above) were used for the genome-wide survival analysis. After quality controls and imputation, the dataset comprised 7,151,809 variants and 3904 patients, 91% of which were of European ancestry (Supplementary Figure S2B and C).

We run a GWAS Cox model, with the first 10 PCs, age, age2, sex, period of infection and patient series as covariates and we found one variant (rs117011822) associated with survival at P-value < 5.0 × 10−8 (genome-wide significance level) and another one (rs7208524) nearly significant (P-value = 5.19 × 10−8). The results for all the variants tested in the multivariable Cox model are reported in a Manhattan plot (Fig. 1). The minor alleles of both these variants were risk factors for poor prognosis, showing hazard ratios > 1. Both are low frequency variants, in our dataset (MAF < 5%). The top one is a 2 kb upstream variant of FGF19 gene on chromosome 11, and the other one maps in an intron of the GPRC5C gene on chromosome 17.

Figure 1
figure 1

Manhattan plot of the results of the GWAS for survival of COVID-19 patients. SNPs are plotted on the x-axis according to their genomic position (GChr 38, hg38 release), and P-values (− log10(P)) for their association with survival probability are plotted on the y-axis. The horizontal solid line represents the threshold of significance (P-value < 5.0 × 10−8), whereas the dashed one represents a suggestive threshold (P-value < 1.0 × 105). Names, hazard ratios and P-values of the two most significant SNPs are shown on top.

Looking at all the variants with a P-value < 1.0 × 10−5 (n = 113; Supplementary Table S1), we found that 7 variants mapped in the GPRC5C gene locus, 16 polymorphisms were on chromosome 8, in the PSD3 gene locus, 9 SNPs on chromosome 9, in an intergenic region of 359 kb between PBX3 and MVB12B genes, and 9 variants in the CDH13 locus, on chromosome 16. Additionally, other 27 mapped on chromosome 6, 12 of which were in an intergenic region near the EPHA7 gene, in proximity of an enhancer region (ENSR00000798782), and 13 mapped in the locus of the PERP gene, mostly in its 5’ regulatory region. A zoomed plot for each of these loci is reported in Fig. 2.

Figure 2
figure 2

Zoomed plots of six loci associated with patient survival. SNPs are plotted on the x-axis according to their chromosome position (GChr 38, hg38 release), and P-values (− log10(P)) for their association with survival probability are plotted on the y-axis. The horizontal line represents the threshold of significance (P-value < 5.0 × 10−8). Below the x axis, the mapped genes are plotted (according to University of California Santa Cruz Genome Browser notation).

Beside the additive model, we tested the two top-significant variants in a dominant model (using the same covariates), where the survival of patients heterozygous and homozygous for the alternative allele was compared with that of patients homozygous for the common allele. Also this analysis indicated that having at least one alternative allele of these variants conferred a higher risk of poor COVID-19 prognosis than having a wild-type genotype (rs117011822: HR = 2.47, P-value = 3.43 × 10−8; rs7208524: HR = 2.70, P-value = 1.10 × 10−7). We plotted the KM curves to visualize the probability of survival according to the genotypes of both variants (Fig. 3).

Figure 3
figure 3

Kaplan–Meier survival curves for COVID-19 patients according to the genotype of the top significant variants (A) rs117011822 and (B) rs7208524. Black line represents patients homozygous for the major allele and grey line represents patients with at least one copy of the minor allele (according to the dominant model, patients with heterozygous genotype and homozygous for the minor allele were grouped together). Crosses denote censored samples. Numbers of patients at risk are shown below the plot. Log–rank test P‐value is shown.

We also looked for the top-significant variants previously identified by the meta-analysis, reported by the COVID-19 HGI, as associated with COVID-19 severity9, in our results. Of note, the phenotypes under investigation in that study were quite different, since the COVID-19 HGI defined the severity phenotype as a binary variable representing COVID-19 patients’ risk of hospitalisation. As shown in Supplementary Table S2, none of the COVID-19 HGI top variants were significant, neither at nominal P-value < 0.01, in our survival model.

Finally, we tested the risk of mortality in the 60 days after infection, using a logistic regression model, with the same covariates used in the Cox analysis. In this GWAS, no SNPs reached the genome-wide significance threshold. However, 30% of the 113 top significant SNPs associated with survival probability were confirmed also in this GWAS, as associated with mortality risk, although at a higher P-value (< 1.0 × 10−5). These results are reported in Supplementary Table S1, for comparison with Cox analysis results.

Variants associated with COVID-19 patient survival are involved in immune or lung functions

We queried two different eQTL databases, namely GTEX and eQTLGen, to test whether our most significant variants (P-value < 1.0 × 10−5) were previously reported as eQTLs. In GTEx, 43 out of 113 SNPs were eQTLSs of 33 target genes, in several different tissues (e.g., brain, lung, muscle, heart, spleen, whole blood) and in eQTLGen 54 out of 113 SNPs acted as cis eQTLs for 30 genes, in the whole blood (Supplementary Table S3 and S4). Eight target genes (i.e., TSPYL1, EPHB4, MOSPD3, UFSP1, GIGYF1, SLC12A9, MVB12B, and KLRC1) were found in both databases. The list of the 55 unique eQTL target genes was enriched for the Reactome pathway R-HSA-198933: Immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell (enrichment ratio = 16.7; FDR = 0.017), including the CD300 genes (A, C, and LB) and two other genes (KLRC1 and KLRF1).

In addition, we explored the possibility that the quite numerous SNPs identified on chromosome 6 and 8, next to EPHA7 and PSD3 genes, respectively, might be in LD with some near variants reported as eQTLs of these two genes. Indeed, none of these SNPs were reported as eQTLs in the two searched databases. Looking at data available in the European population (using LDlink), we found several variants in strong LD (D’ > 0.8) with our SNPs and acting as EPHA7 and PSD3 eQTLs, in many tissues (Supplementary Table S5).

Considering the list of 38 genes (Supplementary Table S6) mapping within 50 kbps of the 113 top significant variants, functional annotation analysis with DAVID identified 19 significant terms (FDR < 0.05, Supplementary Table S7). Among these, the top significant ones are two Gene Ontology (GO) biological processes, namely, “positive regulation of natural killer (NK) cell mediated cytotoxicity” and “stimulatory C-type lectin receptor signalling pathway”, both involving the same five genes (KLRK1, KLRC3, KLRC4, KLRC4-KLRK1, and KLRD1). Three of these genes (KLRC3, KLRC4, and KLRD1) were also annotated in the Biocarta “h_nkcellsPathway: Ras-Independent pathway in NK cell-mediated cytotoxicity”, the only pathway that reached the statistical significance threshold.

Partially overlapping results were obtained in the overrepresentation analysis carried out with the FUMA platform (Supplementary Table S8), that identified 36 significantly enriched functional gene sets. Among them, we observed GO biological processes, KEGG and Biocarta pathways related to NK cell regulation and other immune functions, in which approximately the same genes as above are involved. In addition, looking at tissue specificity analysis by FUMA, we observed that the 38-gene list was enriched of genes over-expressed in the lung and in the brain’s putamen basal ganglia (Supplementary Figure S4).

Discussion

In this study we investigated the effects of host germline variants on the overall survival of COVID-19 patients, 60 days after the infection. With a case-only approach and using a multivariable Cox model to look for variants associated with the probability of survival after infection, we aimed to dissect the genetics bases to develop a severe COVID-19 from a different point of view, as compared to several other genetic studies on COVID-19 severity (reviewed in10). The analysis considered, as covariate of the multivariable model, the most widely known prognostic factors for COVID-19 survival, i.e., patient age at infection, sex, and period of infection (at the very beginning of the pandemic or after).

We identified a genome-wide level significant association between survival and the SNP rs117011822, on chromosome 11. We observed that individuals with an increasing number of minor alleles of this variant in their genotype (both under additive and dominant model) had a worse prognosis than patients homozygous for the major allele, 60 days after SARS-CoV-2 infection. This variant maps in a regulatory region (ENSR00000958007), specifically a CTCF binding site, upstream the FGF19 gene. However, we did not find evidence for a regulation of FGF19 expression by this variant in GTEx or eQTLGen, but it might be interesting to investigate this aspect further. Indeed, it was reported that serum levels of the FGF19 protein were lower in asymptomatic than symptomatic COVID-19 patients35. In that study, the authors discussed a possible role of FGF19 (together with other proteins) in lung tissue repair, with differences between asymptomatic and symptomatic patients. Therefore, it might be interesting to further investigate the role of the minor allele of rs117011822 in the regulation on FGF19 levels with the aim to understand the functional mechanism underlying the statistical association observed in our study.

Additional loci on chromosomes 17, 8, 6, 9, and 16 were suggestively associated with COVID-19 patient overall survival. On chromosome 17 we identified a locus of seven variants that were reported to act as regulators of the expression of CD300 genes (CD300A, CD300C, and CD300LB). Of note, CD300 is a family of leukocyte surface proteins involved in immune response signalling pathways and it has been observed that shifts in the expression pattern of CD300 molecules in T-cells of COVID-19 patients correlated with COVID-19 severity36.

The 16 variants on chromosome 8 mapped in intronic regions of the PSD3 gene, also known as EFA6R, a member of the family of guanine nucleotide exchange factors, that activate ADP-ribosylation factor 6 (ARF6)37. This protein is involved in endocytosis, and it has been reported to play a role in SARS-CoV-2 cell entry38,39. So far, it has not yet been reported any role of PSD3 variants in the regulation of ARF6 activation. Our variants showed a strong LD with eQTL SNPs of PSD3.

The six variants we found associated with COVID-19 patient survival, mapping in an intergenic region of chromosome 9, were reported to be eQTLs of the near the MVB12B gene. No evidence for a role of this gene in COVID-19 is available, but it is interesting to underline that it codes for a subunit of the ESCRT-I complex that mediates HIV virus budding40.

The 12 intergenic variants on chromosome 6 (at position 93 Mb) were near the EPHA7 gene, which was suggested to be a downstream mediator of cytokine production, induced by the N-terminal domain of the SARS-CoV-2 spike protein41. These variants mapped in an enhancer region and might affect the expression of the EPHA7 gene, although we did not find them among the eQTLs of this gene. However, these variants are in strong LD with eQTL SNPs of EPHA7. The other 11 variants on chromosome 6, instead, mapped in the PERP gene locus and some of them were already annotated as eQTLs of this gene. Although no functions related to COVID-19 have been reported for the PERP gene, so far, it encodes a protein that is a p53 apoptosis effector, and, recently, Wang and colleagues reviewed the possible roles of p53 in mediating host-virus interactions in infections caused by Coronaviruses42.

Finally, the nine SNPs on chromosome 16, intronic to the CDH13 gene, were eQTLs of this gene, which encodes the T-cadherin protein, a regulator of vascular permeability, but also receptor of adiponectin and LDL. It is involved in lung function, and reportedly associated with several metabolic disorders as atherosclerosis, dyslipidemia, obesity, and also diabetes (as reviewed in43).

Interestingly, the list of genes, whose expression was reported to be regulated by our top-significant variants, was enriched for genes with immunoregulatory functions. These included the already mentioned CD300 genes, but also KLRC1 and KLRF1 genes. KLRC1 (alias NKG2A) expression, in SARS-CoV-2 infected patients was suggested to correlate with functional exhaustion of cytotoxic lymphocytes and with a severe COVID-19 outcome44. In addition, functional annotation of the genes near or where top-significant variants mapped resulted in an enrichment of immune-related terms and pathways, in particular those involved in regulation of NK cell-mediated cytotoxicity. This finding is interesting in the light of previous results by Maucorant et al.45 showing that distinct NK-immunotypes were related to COVID-19 severity. Among the genes in this pathway, there is KLRK1, also known as NKG2D, coding for a NK cell activating receptor. It has been previously reported that SARS-CoV-2 non-structural protein 1 can downregulate ligands of the NKG2D receptor, thus escaping NK cells cytotoxicity46.

Regarding the findings from FUMA tissue specificity analysis, the observed enrichment of our gene list in genes whose expression is altered in lung is not unexpected: as the major clinical manifestation of SARS-CoV-2 infection and severity is at respiratory level, we expected that variants associated with COVID-19 survival were in genes expressed in the lungs. On the other hand, it is more difficult to speculate on the finding of an enrichment of genes upregulated in brain putamen basal ganglia. In a recent paper47, Balsak et al. reported that basal ganglia can be damaged after COVID-19, due to microstructural alterations caused by hypoxia. However, further studies are needed to understand if the variants/genes identified in our study might be involved in the hypoxia induced brain alterations after SARS-CoV-2 infection and, also, if this kind of damage might affect COVID-19 patient survival.

Despite the above cited evidence found in the literature, without appropriate functional studies, we cannot assert that the variants we identified as associated with survival play a role in predisposing patients to a worse COVID-19 outcome. However, we believe that our findings are worthy of further investigation and are of interest, since we explored the genetics of COVID-19 severity with an unconventional approach that might have led to new results. Indeed, not surprisingly, none of the previously reported variants associated with severity in9 were significant in our study. Indeed, both the phenotype (severity vs. overall survival 60 days after the infection) and the analysis model (regression vs. Cox) were completely different.

We are aware of some limitations in our study. First, validation in an independent, wider, but possibly genetically homogeneous patient series should be needed. Indeed, we were able to detect only one variant associated with survival at a genome-wide significance level. In addition, our series, is mainly composed of patients of European ancestry and our results, although controlled for population stratification, would not be directly generalizable to patients from different ethnicities. Additionally, since we did not adjust for patients’ comorbidities (as no full data were available), we cannot exclude that some of the identified associations might result from such potential confounding factors. For instance, the variants in the CDH13 gene, involved in metabolic disorders, might be this case. Although we were aware of this limitation, we preferred not to further reduce the already relatively small sample size of our analysis, by excluding patients with unavailable comorbidity data, to avoid losing statistical power.

Overall, our results shed new light on the genetics of COVID-19 severity, having identified some loci associated with patient survival, at 60 days after infection. Although our findings suggest that genetics plays a limited role in affecting mortality probability after SARS-CoV-2 infection, the identified variants are worthy to be further investigated as possible prognostic factors for COVID-19.