Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A first update on mapping the human genetic architecture of COVID-19

The Original Article was published on 08 July 2021

The COVID-19 pandemic continues to pose a major public health threat, especially in countries with low vaccination rates. To better understand the biological underpinnings of SARS-CoV-2 infection and COVID-19 severity, we formed the COVID-19 Host Genetics Initiative1. Here we present a genome-wide association study meta-analysis of up to 125,584 cases and over 2.5 million control individuals across 60 studies from 25 countries, adding 11 genome-wide significant loci compared with those previously identified2. Genes at new loci, including SFTPD, MUC5B and ACE2, reveal compelling insights regarding disease susceptibility and severity.

Here we present meta-analyses bringing together 60 studies from 25 countries (Fig. 1 and Supplementary Table 1) for three COVID-19-related phenotypes: (1) individuals critically ill with COVID-19 on the basis of requiring respiratory support in hospital or who died as a consequence of the disease (9,376 cases, of which 3,197 are new in this data release, and 1,776,645 control individuals); (2) individuals with moderate or severe COVID-19 defined as those hospitalized due to symptoms associated with the infection (25,027 cases, 11,386 new and 2,836,272 control individuals); and (3) all cases with reported SARS-CoV-2 infection regardless of symptoms (125,584 cases, 76,022 new and 2,575,347 control individuals). Most studies have reported results before the roll out of the COVID-19 vaccination campaign. An overview of the study design is provided in Supplementary Fig. 1. We found a total of 23 genome-wide significant loci (P < 5 × 10−8) of which 20 loci remain significant after correction for multiple testing (P < 1.67 × 10−8) to account for the number of phenotypes examined (Fig. 2, Supplementary Fig. 2 and Supplementary Table 2). We compared the effects of these loci between the previous2 and current analysis and found that only one locus did not replicate (rs72711165). All of the other loci showed the expected increase in statistical significance (Supplementary Fig. 3).

Fig. 1: Overview of contributing studies in Host Genetics Initiative data freeze 6.
figure 1

a, Geographical overview of the contributing studies to the COVID-19 Host Genetics Initiative and composition by major continental ancestry groups. Ancestry groups are defined as Middle Eastern (MID), south Asian (SAS), east Asian (EAS), African (AFR), admixed American (AMR) and European (EUR). b, Principal components analysis highlighting the population structure and the sample ancestry of the individuals participating in the COVID-19 Host Genetics Initiative. This figure is reproduced from the original publication by the COVID-19 Host Genetics Initiative2 with modifications reflecting the updated analysis from data freeze 6.

Fig. 2: Genome-wide association results for COVID-19.
figure 2

a, The results of the genome-wide association study of hospitalized COVID-19 (n = 25,027 cases and n = 2,836,272 control individuals) (top), and the results of reported SARS-CoV-2 infection (n = 125,584 cases and n = 2,575,347 control individuals) (bottom). Loci highlighted in yellow (top) represent regions associated with the severity of COVID-19 manifestation. Loci highlighted in green (bottom) are regions associated with SARS-CoV-2-reported infection. Lead variants for the loci identified in this data release are annotated with their respective rs ID. Horizontal lines denote genome-wide significant thresholds. b, The results of gene prioritization using different evidence measures of gene annotation. Genes in regions of linkage disequilibrium (LD), genes with coding variants and eGenes (fine-mapped cis-eQTL variant PIP > 0.1 in GTEx Lung) are annotated if in linkage disequilibrium with a COVID-19 lead variant (r2 > 0.6). V2G denotes the highest gene prioritized by OpenTargetGenetics’ V2G score. The asterisk (*) indicates SARS-CoV-2 reported infection and the plus symbol (+) indicates COVID-19 severity. The transparent loci were reported in the previous freeze (data release 5), and loci in bright blue were identified in the current freeze (data release 6). This figure is reproduced from the original publication by the COVID-19 Host Genetics Initiative2 with modifications reflecting the updated analysis from data freeze 6.

Across the genome-wide significant loci, we observed clear patterns of association with the different phenotypes under study. We therefore developed a two-class Bayesian model for classifying loci based on the patterns of association across the two better-powered phenotypes (COVID-19 hospitalization and SARS-CoV-2 reported infection). Intuitively, loci that are associated with susceptibility will also be associated with severity as, to develop COVID-19, SARS-CoV-2 infection needs to first occur. By contrast, those genetic effects that solely modify the course of illness should be associated with severity of illness and not show any association with reported infection except through preferential ascertainment of hospitalized cases in a cohort (Supplementary Methods). We identified 16 loci that are substantially more likely (>99% posterior probability) to affect the risk of COVID-19 hospitalization and 7 loci that clearly influence susceptibility to SARS-CoV-2 infection (Supplementary Table 3 and Supplementary Fig. 4).

We observed that several loci had a significant heterogeneous effect across studies (6 out of 23 loci with a P value for heterogeneity of <2.2 × 10−3; Supplementary Table 2). Owing to an increased diversity in our study population (Supplementary Fig. 5), we were able to examine whether such heterogeneity was due to effect differences across continental ancestry groups. Only one locus (FOXP4) showed a significantly different effect across ancestries (P value heterogeneity of <7 × 10−5; Supplementary Table 4 and Supplementary Fig. 6), although even at this locus all of the ancestry groups showed a positive effect estimate. This confirms that factors related to between-study heterogeneity (such as variable definition of COVID-19 severity owing to different thresholds for testing, hospitalization and patient recruitment) rather than differences across ancestries are a more likely explanation for the observed heterogeneity in the effect sizes across studies.

For the 23 genome-wide significant loci, we examined candidate causal genes and performed a phenome-wide association study to better understand their potential biological mechanisms (Supplementary Tables 2, 5 and 6 and Supplementary Fig. 7). Several of these loci with previous and direct connections to lung disease and SARS-CoV-2 infection mechanisms are highlighted here.

Several loci involved in COVID-19 severity implicate lung surfactant biology. A missense variant rs721917:A>G (p.Met31Thr) in SFTPD (10q22.3) confers risk for hospitalization (odds ratio (OR) = 1.06, 95% confidence interval (CI) = 1.04–1.08, P = 1.7 × 10–8) and has been previously associated with increased risk of chronic obstructive pulmonary disease3 (OR = 1.08, P = 2.0 × 10–8) and decreased lung function4 (FEV1/FVC; β = –0.019; P = 2.0 × 10–15). SFTPD encodes surfactant protein D (SP-D), which participates in innate immune response, protecting the lungs against inhaled microorganisms. The recombinant fragment of SP-D binds to the S1 spike protein of SARS-CoV-2 and potentially inhibits binding to ACE2 receptor and SARS-CoV-2 infection5. Another missense variant rs117169628:G>A (p.Pro256Leu) in SLC22A31 (16q24.3) also confers risk of hospitalization (OR = 1.09, 95% CI = 1.06–1.13, P = 2.6 × 10–8). SLC22A31 belongs to the family of solute carrier proteins that facilitate transport across membranes6 and is co-regulated with other surfactant proteins7.

We found that the variant rs35705950:G>T located in the promoter of MUC5B (11p15.5) is protective against hospitalization (OR = 0.83, 95% CI = 0.86–0.93, P = 6.5 × 10–9). This well-studied promoter variant increases the expression of MUC5B in lung in GTEx (P = 6.7 × 10–16) and is the strongest known variant associated with an increased risk of developing idiopathic pulmonary fibrosis (IPF)8,9, but also improves survival in patients with IPF carrying this mutation10.

Finally, we found that rs190509934:T>C, which is located 69 bp upstream of ACE2 (Xp22.2), is associated with decreased susceptibility risk (OR = 0.69, 95% CI = 0.63–0.75, P = 3.6 × 10–18). ACE2 is the SARS-CoV-2 receptor and functionally interacts with SLC6A19 and SLC6A2011, one of which also showed a significant association with susceptibility (rs73062389:G>A at SLC6A20; OR = 1.18, 95% CI = 1.16–1.20, P = 2.5 × 10–74). Notably, rs190509934 is ten times more common in south Asian populations (minor allele frequency (MAF) = 0.027) than in European populations (MAF = 0.0024), demonstrating the importance of diversity for variant discovery. Recent results have shown that the rs190509934:T>C variant lowers ACE2 expression, which in turn confers protection against SARS-CoV-2 infection12.

We applied Mendelian randomization to infer potential causal relationships between COVID-19-related phenotypes and their genetically correlated traits (Supplementary Methods; Supplementary Tables 79 and Supplementary Fig. 8). A causal association was observed between genetic liability to type 2 diabetes and SARS-CoV-2 reported infection (OR = 1.02, 95% CI = 1.01–1.03, P = 1.6 × 10−3), and COVID-19 hospitalization (OR = 1.06, 95% CI = 1.03–1.1, P = 1.4 × 10−4). Multivariable Mendelian randomization was used to estimate the direct effect of liability to type 2 diabetes on COVID-19-related phenotypes that was not mediated through body mass index. This analysis indicated that the observed causal association of liability to type 2 diabetes on COVID-19 phenotypes is mediated by body mass index (Supplementary Table 10).

We have substantially expanded the genetic analysis of SARS-CoV-2 infection and COVID-19 severity by doubling the case size, identifying 11 loci. We developed an approach to systematically assign the 23 discovered loci to either disease susceptibility (7 loci) or disease severity (16 loci). Although distinguishing between the two phenotypes is challenging because progression to a severe form of the disease requires susceptibility to infection in the first place, it is now evident that the genetic mechanisms involved in these two aspects of the disease can be differentiated. Among the new loci associated with disease susceptibility, ACE2 represents an expected, albeit interesting, finding. MUC5B, SFTPD and SLC22A31 are the three most interesting new loci associated with COVID-19 severity. Their relationship with lung function and lung diseases is consistent with loci previously associated with disease severity. The surfactant proteins secreted by alveolar cells, representing an emerging biological mechanism, maintain healthy lung function and facilitate the clearance of pathogens13. The protective effect of the MUC5B variant is unexpected given the otherwise risk-increasing, concordant effect between IPF and COVID-19 observed for other variants9. Nonetheless, this result aligns with the MUC5B promoter variant association that shows a twofold higher survival rate among patients with IPF10. In mice, Muc5b seems to be essential for effective mucociliary clearance and for controlling infection14, which suggests that therapies to control mucin secretion may be beneficial in patients with COVID-19.

Expanding genomic research to include participants from around the world enabled us to test whether the effect of COVID-19-related genetic variants was markedly different across ancestry groups. We did not detect obvious heterogeneity between ancestry groups, and we attribute the observed heterogeneity in the effect of COVID-19-related genetic variants to the diverse inclusion criteria across studies in terms of COVID-19 severity. However, we also note that ascertainment differences across studies might mask true underlying differences in effect sizes between ancestry groups.

The biological insights gained by this expansion of the COVID-19 Host Genetic Initiative showed that increasing sample size and diversity remain a fruitful activity to better understand the human genetic architecture of COVID-19.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability

Summary statistics generated by COVID-19 Host Genetics Initiative are available online ( The analyses described here use the freeze 6 data. The COVID-19 Host Genetics Initiative continues to regularly release new data freezes. Summary statistics for samples from individuals of non-European ancestry are not currently available owing to the small individual sample sizes of these groups, but the results for 23 loci lead variants are reported in Supplementary Table 3. Individual-level data can be requested directly from the authors of the contributing studies, listed in Supplementary Table 1. We used publicly available data from GTEx (, the Neale laboratory (, the Finucane laboratory (, the FinnGen Freeze 4 cohort ( and eQTL catalogue release 3 (

Code availability

The code for summary statistics lift-over, the projection PCA pipeline including precomputed loadings and meta-analyses are available on GitHub (, and the code for the Mendelian randomization and genetic correlation pipeline is available at GitHub ( Codes for implementing the multivariable Mendelian randomization analysis and subtype analyses are available at GitHub ( and


  1. The COVID-19 Host Genetics Initiative. The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic. Eu. J. Hum. Genet. 28, 715–718 (2020).

  2. COVID-19 Host Genetics Initiative. Mapping the human genetic architecture of COVID-19. Nature 600, 472–477 (2021).

  3. Hobbs, B. D. et al. Genetic loci associated with chronic obstructive pulmonary disease overlap with loci for lung function and pulmonary fibrosis. Nat. Genet. 49, 426–432 (2017).

    CAS  Article  Google Scholar 

  4. Shrine, N. et al. New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries. Nat. Genet. 51, 481–493 (2019).

    CAS  Article  Google Scholar 

  5. Hsieh, M.-H. et al. Human surfactant protein D binds spike protein and acts as an entry inhibitor of SARS-CoV-2 pseudotyped viral particles. Front. Immunol. 12, 641360 (2021).

    CAS  Article  Google Scholar 

  6. Hediger, M. A. et al. The ABCs of solute carriers: physiological, pathological and therapeutic implications of human membrane transport proteins. Pflugers Arch. 447, 465–468 (2004).

    CAS  Article  Google Scholar 

  7. Deelen, P. et al. Improving the diagnostic yield of exome-sequencing by predicting gene-phenotype associations using large-scale gene expression analysis. Nat. Commun. 10, 2837 (2019).

    Article  ADS  Google Scholar 

  8. Seibold, M. A. et al. A common MUC5B promoter polymorphism and pulmonary fibrosis. N. Engl. J. Med. 364, 1503–1512 (2011).

    CAS  Article  Google Scholar 

  9. Fadista, J. et al. Shared genetic etiology between idiopathic pulmonary fibrosis and COVID-19 severity. EBioMedicine 65, 103277 (2021).

    CAS  Article  Google Scholar 

  10. Peljto, A. L. et al. Association between the MUC5B promoter polymorphism and survival in patients with idiopathic pulmonary fibrosis. JAMA 309, 2232–2239 (2013).

    CAS  Article  Google Scholar 

  11. Vuille-Dit-Bille, R. N. et al. Human intestine luminal ACE2 and amino acid transporter expression increased by ACE-inhibitors. Amino Acids 47, 693–705 (2014).

    Article  Google Scholar 

  12. Horowitz, J. E. et al. Common genetic variants identify targets for COVID-19 and individuals at high risk of severe disease. Preprint at medRxiv (2021).

  13. Wright, J. R. Immunoregulatory functions of surfactant proteins. Nat. Rev. Immunol. 5, 58–68 (2005).

    CAS  Article  Google Scholar 

  14. Roy, M. G. et al. Muc5b is required for airway defence. Nature 505, 412–416 (2014).

    CAS  Article  ADS  Google Scholar 

Download references

Author information

Authors and Affiliations



Detailed author contributions are integrated in the authorship list.

Corresponding author

Correspondence to Andrea Ganna.

Ethics declarations

Competing interests

A full list of competing interests is supplied as Supplementary Table 11.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods with additional references, and Supplementary Figs. 1–8.

Reporting Summary

Supplementary Tables 1–11

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

COVID-19 Host Genetics Initiative. A first update on mapping the human genetic architecture of COVID-19. Nature 608, E1–E10 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing