replying to M. Hultström et al. Nature Immunology (2022)

Prompted by our report on the role of mannose-binding lectin (MBL) in resistance to COVID-19 (ref. 1), Hultström and colleagues2 conducted a genetic and biochemical analysis of this fluid-phase pattern recognition molecule in 426 patients of the SweCovid Swedish initiative and in data extracted from summary statistics of the COVID-19 Host Genetics Initiative (HGI)3. Our study had reported that MBL binds to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike protein from variants of concern and inhibits the virus, and that genetic variants mapping in the MBL gene (MBL2) region predispose to severe COVID-19 (ref. 1). In apparent contrast to our genetic study, Hultström and colleagues did not find a significant association between MBL2 single-nucleotide polymorphisms (SNPs) and hospitalization or intensive care admission due to COVID-19 (data extracted from summary statistics of the COVID-19 HGI3). They found that MBL2 haplotypes, composed of functional variants mapping within the gene (alleles named C, B, D, X/Y and L/H, according to the legacy nomenclature4) had a dual, U-type, impact on the risk for thrombotic complications in critically ill COVID-19 patients.

Hultström and colleagues2 did not find any association between severe COVID-19 and D, B and C alleles (rs5030737-A, rs1800450-T and rs1800451-T, all inducing a lowering effect on MBL levels) or other SNPs in the MBL2 region. Indeed, we had also found borderline associations in the single-variant analysis (none surviving the correction for multiple testing, as declared). However, a significant predisposing effect was observed in those individuals carrying two disruptive alleles among rs5030737, rs1800450 and rs1800451. In addition, a significant protective effect was observed for the haplotype lacking the predisposing rs5030737 allele.

The best association signals stemmed from haplotype analysis performed on the entire MBL2 region (500 kb upstream and downstream of the gene)1. Those haplotypes are in proximity to MBL2 regulatory regions, do not include the functional variants mapping within the gene and show a highly significant P value of association (accounting for multiple testing correction)1. In this frame, it would have been interesting to see a comparison with similar analyses on the Swedish cohort described by Hultström et al.2. The latter study did not include a case–control single-SNP association analysis on their patients and only results based on the HGI data were reported. This choice was probably based on the advantage of using the huge statistical power given by the HGI summary statistics (obtained from the analysis of at least 8,800 cases and 1 million controls). However, this approach suffers from limitations: (1) haplotype-based analyses cannot be performed using the summary statistics; and (2) it is not possible to finely dissect signals coming from different populations. This is not trivial especially for those SNPs showing minor allele frequencies (MAFs) that differ enormously across ethnic groups. Table 1 summarizes MAFs in different populations for the three functional SNPs of the MBL2 gene: allele frequencies vary considerably across populations, with SNPs rs1800451 and rs5030737 even being monomorphic in east Asians. Due to standard quality check steps adopted in preparing datasets for association analysis (aimed at discarding monomorphic SNPs or SNPs with extremely low MAFs)3, it is possible that the SNPs rs1800451 and rs5030737 were not even analyzed in some HGI populations.

Table 1 Population frequencies for functional coding variants in the MBL2 gene

Prompted by the study of Hultström and colleagues2, we repeated association/haplotype analyses in a larger case–control Italian cohort, including the original 332 cases and 1,668 controls (described in ref. 1) plus 195 cases and 1,522 controls. As in our original study1, newly enrolled cases were recruited at the Humanitas Clinical and Research Center IRCCS (in Rozzano, Milan, Italy) and San Gerardo Hospital (in Monza, Italy), and were defined as severe COVID-19 cases, all with respiratory failure requiring hospitalization and a confirmed SARS-CoV-2 viral RNA PCR test. Overall, our 527 severe cases belong to the first SARS-CoV-2 wave (March–May 2020), and are therefore extremely homogeneous with regard to their phenotype, treatments, absence of vaccination and viral strain infection. New controls were, again, from the general Italian population with unknown COVID-19 status. We essentially confirmed all our previous results: single-SNP association analysis only revealed borderline associations not surviving the multiple testing correction (data not shown), whereas multiallelic analysis, investigating the previously identified haplotypes mapping in the MBL2 region1, evidenced significant associations with severe COVID-19 (Table 2). Considering that the identified haplotypes could be, at least in part, a reflection of single-marker association signals in their proximity, we extracted from the HGI repository, v.6 (not yet available at the time of ref. 1) all SNPs showing a P value of association <5 × 10−3. Extended Data Fig. 1 reports on these signals, considering three of the analyses performed in the frame of the HGI project: A2 (defined as ‘Very severe respiratory confirmed COVID-19 versus population’), B2 (‘Hospitalized COVID-19 versus population’) and C2 (‘COVID-19 versus population’). We are well aware that all these hits are above the genome-wide threshold for significance. Nevertheless, we observed clusters of SNPs corresponding exactly to the haplotypes we identified (Table 2). More interestingly, we noticed that the stronger signals of associations (10−4 < P < 10−5; top signal rs1877134, P = 3.8 × 10−5) correspond to the A2 analysis and map in a genomic region of 18 kb, exactly encompassing the MBL2 gene. Although suggestive, and given the lack of association signals at the genome-wide level in the COVID-19 HGI results, we should also consider the possibility that the MBL2 locus does not have significantly consistent COVID-19 risk in every population analyzed. A possible explanation for discrepancies in effect sizes could be that host genetics risk may not be identical across the analyzed cohorts, and between the present study and that of Hultström et al.2 or more in general among populations. Virus strains, different pandemic waves, vaccinations and medical care strategies and drugs could also have played a role.

Table 2 Locus-wide haplotype analysis

Hultström and colleagues2 report the association between haplotypes determining activity of MBL and the risk for thrombotic complications in severe COVID-19 patients. In particular, genetically determined MBL activity levels were shown to confer risk for pulmonary embolism in a U-shaped fashion (that is, haplotypes LXA/LXA, HYA/0 and LYA/0—associated with intermediate MBL activity—are indeed protective). These results are based on a restricted number of cases (4 of 123 individuals bearing ‘intermediate’ MBL activity haplotype versus 33 of 231 and 8 of 72 carrying ‘high’ and ‘deficient’ haplotypes, respectively). In an effort to replicate these findings, we stratified our 207 severe COVID-19 patients, for whom data on presence/absence of thromboembolic events during hospitalization were available, on the basis of the same MBL activity haplotypes (‘high’: HYA/HYA, LYA/LYA, HYA/LYA, HYA/LXA and LYA/LXA; ‘intermediate’: LXA/LXA, HYA/0 and LYA/0; and ‘deficient’: LXA/0 and 0/0). This analysis did not show any association with thromboembolic events (χ2 test, P = 0.36). Indeed, we observed a higher frequency of thromboembolic events in individuals carrying the ‘intermediate’ haplotype (9 of 59 individuals bearing ‘intermediate’ MBL activity haplotype versus 11 of 118 and 2 of 30 carrying ‘high’ and ‘deficient’ haplotypes, respectively). In the regression model, correcting for covariates, we confirmed these negative results.

Finally, we stratified our entire cohort of 3,717 individuals for these high/intermediate/deficient haplotypes, and used this marker in an association analysis for severity: again, no association signal emerged (P = 0.33). However, we observed a moderate predisposing effect toward severe COVID-19 for those individuals carrying two disruptive 0 alleles (P = 0.067 in the regression model accounting for covariates).

MBL recognition of SARS-CoV-2 spike protein was found to trigger complement activation via the lectin pathway1. However, the antiviral activity of MBL observed in vitro with three different cellular models occurred in serum-free conditions, and thus were irrespective of complement activity1. Molecular modeling indicates the MBL-binding site spans across the S1 and S2 regions of SARS-CoV-2 spike protein, and suggests a fusion neutralization mechanism. In addition, inhibition of virus entry into host cells might depend on competition with C-type lectins, which act as entry receptors or co-receptors5. The mechanism responsible for the antiviral activity of MBL remains to be fully defined, but it is clearly complement independent. Functional assays aimed at investigating the relevance of the three structural MBL2 polymorphisms in the study by Hultström and colleagues2 are based on complement activation. The relevance of rs5030737-associated protein in MBL anti-SARS-CoV-2 activity cannot be investigated using this readout, because antiviral activity is complement independent.

In conclusion, the apparent discrepancy between the two reports may well be accounted for by the breadth of the genetic analysis, substantial differences in the frequency of MBL genotypes in different human populations (Table 1) and other factors that usually make genetic association studies during a global pandemic extremely challenging (new emerging virus strains, different pandemic waves, vaccinations and different medical care approaches). The possibility that MBL-triggered complement activation may amplify advanced tissue-damaged disease, including thromboembolic complications, is biologically plausible and deserves further studies.


Patient cohorts and ethical approvals

Approvals for the project were obtained from the relevant ethics committees (for Humanitas hospitals, reference no. 316/20, and for the University of Milano–Bicocca School of Medicine, San Gerardo Hospital, reference no. 84/2020). The requirement for informed consent was waived.

For genetic association analyses, we investigated a total of 3,717 individuals. These included: (1) 527 patients with severe COVID-19, which was defined as hospitalization with respiratory failure and a confirmed SARS-CoV-2 viral RNA PCR test from nasopharyngeal swabs; patients were recruited from intensive care units and general wards at three hospitals in Lombardy, that is, the Humanitas Clinical and Research Center, IRCCS, in Rozzano, Milan, Italy (180 patients), the San Gerardo Hospital, in Monza, Italy (320 patients) and the Humanitas Gavazzeni hospital, in Bergamo, Italy (27 patients); and (2) 3,190 controls from the general Italian population with unknown COVID-19 status.

Data on thromboembolic events were available for 207 cases (29% females and 71% males; mean age: 68.9 ± 12.3 years).

Genotyping and imputation

Details on DNA extraction, array genotyping and quality checks are reported elsewhere6,7.

Genetic coverage was increased by performing SNP imputation, as described1. In the post-imputation steps, we retained only those SNPs with R² ≥ 0.6 and MAF ≥ 1%. Next, we accurately checked cases and controls for solving within-Italian relationships and for testing the possible existence of population stratification within and across batches: to this aim, we performed principal component analysis (PCA), using a linkage disequilibrium-pruned subset of SNPs across chromosome 10 and the Plink v.1.9 package8. The final set of analyzed variants comprised 3,452 SNPs, distributed in the MBL2 region (the gene ± 500 kb).

Statistical analysis

Case–control allele–dose association tests were performed using the PLINK v.1.9 logistic-regression framework for dosage data. Age, sex, age × age, sex × age and the first ten PCs from PCA were introduced in the model as covariates. Analyses were conducted always referring to the minor allele. All P values are accompanied by odds ratio (OR) and 95% confidence interval (CI).

Haplotype analysis and reconstruction of functional haplotypes was performed using PLINK v.1.07 (ref. 9).

The statistical analysis for association of functional haplotypes (high, intermediate and deficient) with severe COVID-19 or thromboembolic complications was performed using a binomial glm model in R ( with the following covariates: age, sex, age × age, sex × age and ten PCs as already calculated for previous analyses.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.