The COVID-19 pandemic continues to pose a major public health threat, especially in countries with low vaccination rates. To better understand the biological underpinnings of SARS-CoV-2 infection and COVID-19 severity, we formed the COVID-19 Host Genetics Initiative1. Here we present a genome-wide association study meta-analysis of up to 125,584 cases and over 2.5 million control individuals across 60 studies from 25 countries, adding 11 genome-wide significant loci compared with those previously identified2. Genes at new loci, including SFTPD, MUC5B and ACE2, reveal compelling insights regarding disease susceptibility and severity.
Here we present meta-analyses bringing together 60 studies from 25 countries (Fig. 1 and Supplementary Table 1) for three COVID-19-related phenotypes: (1) individuals critically ill with COVID-19 on the basis of requiring respiratory support in hospital or who died as a consequence of the disease (9,376 cases, of which 3,197 are new in this data release, and 1,776,645 control individuals); (2) individuals with moderate or severe COVID-19 defined as those hospitalized due to symptoms associated with the infection (25,027 cases, 11,386 new and 2,836,272 control individuals); and (3) all cases with reported SARS-CoV-2 infection regardless of symptoms (125,584 cases, 76,022 new and 2,575,347 control individuals). Most studies have reported results before the roll out of the COVID-19 vaccination campaign. An overview of the study design is provided in Supplementary Fig. 1. We found a total of 23 genome-wide significant loci (P < 5 × 10−8) of which 20 loci remain significant after correction for multiple testing (P < 1.67 × 10−8) to account for the number of phenotypes examined (Fig. 2, Supplementary Fig. 2 and Supplementary Table 2). We compared the effects of these loci between the previous2 and current analysis and found that only one locus did not replicate (rs72711165). All of the other loci showed the expected increase in statistical significance (Supplementary Fig. 3).
Across the genome-wide significant loci, we observed clear patterns of association with the different phenotypes under study. We therefore developed a two-class Bayesian model for classifying loci based on the patterns of association across the two better-powered phenotypes (COVID-19 hospitalization and SARS-CoV-2 reported infection). Intuitively, loci that are associated with susceptibility will also be associated with severity as, to develop COVID-19, SARS-CoV-2 infection needs to first occur. By contrast, those genetic effects that solely modify the course of illness should be associated with severity of illness and not show any association with reported infection except through preferential ascertainment of hospitalized cases in a cohort (Supplementary Methods). We identified 16 loci that are substantially more likely (>99% posterior probability) to affect the risk of COVID-19 hospitalization and 7 loci that clearly influence susceptibility to SARS-CoV-2 infection (Supplementary Table 3 and Supplementary Fig. 4).
We observed that several loci had a significant heterogeneous effect across studies (6 out of 23 loci with a P value for heterogeneity of <2.2 × 10−3; Supplementary Table 2). Owing to an increased diversity in our study population (Supplementary Fig. 5), we were able to examine whether such heterogeneity was due to effect differences across continental ancestry groups. Only one locus (FOXP4) showed a significantly different effect across ancestries (P value heterogeneity of <7 × 10−5; Supplementary Table 4 and Supplementary Fig. 6), although even at this locus all of the ancestry groups showed a positive effect estimate. This confirms that factors related to between-study heterogeneity (such as variable definition of COVID-19 severity owing to different thresholds for testing, hospitalization and patient recruitment) rather than differences across ancestries are a more likely explanation for the observed heterogeneity in the effect sizes across studies.
For the 23 genome-wide significant loci, we examined candidate causal genes and performed a phenome-wide association study to better understand their potential biological mechanisms (Supplementary Tables 2, 5 and 6 and Supplementary Fig. 7). Several of these loci with previous and direct connections to lung disease and SARS-CoV-2 infection mechanisms are highlighted here.
Several loci involved in COVID-19 severity implicate lung surfactant biology. A missense variant rs721917:A>G (p.Met31Thr) in SFTPD (10q22.3) confers risk for hospitalization (odds ratio (OR) = 1.06, 95% confidence interval (CI) = 1.04–1.08, P = 1.7 × 10–8) and has been previously associated with increased risk of chronic obstructive pulmonary disease3 (OR = 1.08, P = 2.0 × 10–8) and decreased lung function4 (FEV1/FVC; β = –0.019; P = 2.0 × 10–15). SFTPD encodes surfactant protein D (SP-D), which participates in innate immune response, protecting the lungs against inhaled microorganisms. The recombinant fragment of SP-D binds to the S1 spike protein of SARS-CoV-2 and potentially inhibits binding to ACE2 receptor and SARS-CoV-2 infection5. Another missense variant rs117169628:G>A (p.Pro256Leu) in SLC22A31 (16q24.3) also confers risk of hospitalization (OR = 1.09, 95% CI = 1.06–1.13, P = 2.6 × 10–8). SLC22A31 belongs to the family of solute carrier proteins that facilitate transport across membranes6 and is co-regulated with other surfactant proteins7.
We found that the variant rs35705950:G>T located in the promoter of MUC5B (11p15.5) is protective against hospitalization (OR = 0.83, 95% CI = 0.86–0.93, P = 6.5 × 10–9). This well-studied promoter variant increases the expression of MUC5B in lung in GTEx (P = 6.7 × 10–16) and is the strongest known variant associated with an increased risk of developing idiopathic pulmonary fibrosis (IPF)8,9, but also improves survival in patients with IPF carrying this mutation10.
Finally, we found that rs190509934:T>C, which is located 69 bp upstream of ACE2 (Xp22.2), is associated with decreased susceptibility risk (OR = 0.69, 95% CI = 0.63–0.75, P = 3.6 × 10–18). ACE2 is the SARS-CoV-2 receptor and functionally interacts with SLC6A19 and SLC6A2011, one of which also showed a significant association with susceptibility (rs73062389:G>A at SLC6A20; OR = 1.18, 95% CI = 1.16–1.20, P = 2.5 × 10–74). Notably, rs190509934 is ten times more common in south Asian populations (minor allele frequency (MAF) = 0.027) than in European populations (MAF = 0.0024), demonstrating the importance of diversity for variant discovery. Recent results have shown that the rs190509934:T>C variant lowers ACE2 expression, which in turn confers protection against SARS-CoV-2 infection12.
We applied Mendelian randomization to infer potential causal relationships between COVID-19-related phenotypes and their genetically correlated traits (Supplementary Methods; Supplementary Tables 7–9 and Supplementary Fig. 8). A causal association was observed between genetic liability to type 2 diabetes and SARS-CoV-2 reported infection (OR = 1.02, 95% CI = 1.01–1.03, P = 1.6 × 10−3), and COVID-19 hospitalization (OR = 1.06, 95% CI = 1.03–1.1, P = 1.4 × 10−4). Multivariable Mendelian randomization was used to estimate the direct effect of liability to type 2 diabetes on COVID-19-related phenotypes that was not mediated through body mass index. This analysis indicated that the observed causal association of liability to type 2 diabetes on COVID-19 phenotypes is mediated by body mass index (Supplementary Table 10).
We have substantially expanded the genetic analysis of SARS-CoV-2 infection and COVID-19 severity by doubling the case size, identifying 11 loci. We developed an approach to systematically assign the 23 discovered loci to either disease susceptibility (7 loci) or disease severity (16 loci). Although distinguishing between the two phenotypes is challenging because progression to a severe form of the disease requires susceptibility to infection in the first place, it is now evident that the genetic mechanisms involved in these two aspects of the disease can be differentiated. Among the new loci associated with disease susceptibility, ACE2 represents an expected, albeit interesting, finding. MUC5B, SFTPD and SLC22A31 are the three most interesting new loci associated with COVID-19 severity. Their relationship with lung function and lung diseases is consistent with loci previously associated with disease severity. The surfactant proteins secreted by alveolar cells, representing an emerging biological mechanism, maintain healthy lung function and facilitate the clearance of pathogens13. The protective effect of the MUC5B variant is unexpected given the otherwise risk-increasing, concordant effect between IPF and COVID-19 observed for other variants9. Nonetheless, this result aligns with the MUC5B promoter variant association that shows a twofold higher survival rate among patients with IPF10. In mice, Muc5b seems to be essential for effective mucociliary clearance and for controlling infection14, which suggests that therapies to control mucin secretion may be beneficial in patients with COVID-19.
Expanding genomic research to include participants from around the world enabled us to test whether the effect of COVID-19-related genetic variants was markedly different across ancestry groups. We did not detect obvious heterogeneity between ancestry groups, and we attribute the observed heterogeneity in the effect of COVID-19-related genetic variants to the diverse inclusion criteria across studies in terms of COVID-19 severity. However, we also note that ascertainment differences across studies might mask true underlying differences in effect sizes between ancestry groups.
The biological insights gained by this expansion of the COVID-19 Host Genetic Initiative showed that increasing sample size and diversity remain a fruitful activity to better understand the human genetic architecture of COVID-19.
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.
Summary statistics generated by COVID-19 Host Genetics Initiative are available online (https://www.covid19hg.org/results/r6/). The analyses described here use the freeze 6 data. The COVID-19 Host Genetics Initiative continues to regularly release new data freezes. Summary statistics for samples from individuals of non-European ancestry are not currently available owing to the small individual sample sizes of these groups, but the results for 23 loci lead variants are reported in Supplementary Table 3. Individual-level data can be requested directly from the authors of the contributing studies, listed in Supplementary Table 1. We used publicly available data from GTEx (https://gtexportal.org/home/), the Neale laboratory (http://www.nealelab.is/uk-biobank/), the Finucane laboratory (https://www.finucanelab.org), the FinnGen Freeze 4 cohort (https://www.finngen.fi/en/access_results) and eQTL catalogue release 3 (http://www.ebi.ac.uk/eqtl/).
The code for summary statistics lift-over, the projection PCA pipeline including precomputed loadings and meta-analyses are available on GitHub (https://github.com/covid19-hg/), and the code for the Mendelian randomization and genetic correlation pipeline is available at GitHub (https://github.com/marcoralab/MRcovid). Codes for implementing the multivariable Mendelian randomization analysis and subtype analyses are available at GitHub (https://github.com/marcoralab/multivariate_MR and https://github.com/mjpirinen/covid19-hgi_subtypes).
The COVID-19 Host Genetics Initiative. The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic. Eu. J. Hum. Genet. 28, 715–718 (2020).
COVID-19 Host Genetics Initiative. Mapping the human genetic architecture of COVID-19. Nature 600, 472–477 (2021).
Hobbs, B. D. et al. Genetic loci associated with chronic obstructive pulmonary disease overlap with loci for lung function and pulmonary fibrosis. Nat. Genet. 49, 426–432 (2017).
Shrine, N. et al. New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries. Nat. Genet. 51, 481–493 (2019).
Hsieh, M.-H. et al. Human surfactant protein D binds spike protein and acts as an entry inhibitor of SARS-CoV-2 pseudotyped viral particles. Front. Immunol. 12, 641360 (2021).
Hediger, M. A. et al. The ABCs of solute carriers: physiological, pathological and therapeutic implications of human membrane transport proteins. Pflugers Arch. 447, 465–468 (2004).
Deelen, P. et al. Improving the diagnostic yield of exome-sequencing by predicting gene-phenotype associations using large-scale gene expression analysis. Nat. Commun. 10, 2837 (2019).
Seibold, M. A. et al. A common MUC5B promoter polymorphism and pulmonary fibrosis. N. Engl. J. Med. 364, 1503–1512 (2011).
Fadista, J. et al. Shared genetic etiology between idiopathic pulmonary fibrosis and COVID-19 severity. EBioMedicine 65, 103277 (2021).
Peljto, A. L. et al. Association between the MUC5B promoter polymorphism and survival in patients with idiopathic pulmonary fibrosis. JAMA 309, 2232–2239 (2013).
Vuille-Dit-Bille, R. N. et al. Human intestine luminal ACE2 and amino acid transporter expression increased by ACE-inhibitors. Amino Acids 47, 693–705 (2014).
Horowitz, J. E. et al. Common genetic variants identify targets for COVID-19 and individuals at high risk of severe disease. Preprint at medRxiv https://doi.org/10.1101/2020.12.14.20248176 (2021).
Wright, J. R. Immunoregulatory functions of surfactant proteins. Nat. Rev. Immunol. 5, 58–68 (2005).
Roy, M. G. et al. Muc5b is required for airway defence. Nature 505, 412–416 (2014).
A full list of competing interests is supplied as Supplementary Table 11.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
COVID-19 Host Genetics Initiative. A first update on mapping the human genetic architecture of COVID-19. Nature 608, E1–E10 (2022). https://doi.org/10.1038/s41586-022-04826-7