A first update on mapping the human genetic architecture of COVID-19

The COVID-19 pandemic continues to pose a major public health threat, especially in countries with low vaccination rates. To better understand the biological underpinnings of SARS-CoV-2 infection and COVID-19 severity

A first update on mapping the human genetic architecture of COVID-19

COVID-19 Host Genetics Initiative* ✉
The COVID-19 pandemic continues to pose a major public health threat, especially in countries with low vaccination rates.To better understand the biological underpinnings of SARS-CoV-2 infection and COVID-19 severity, we formed the COVID-19 Host Genetics Initiative 1 .Here we present a genome-wide association study meta-analysis of up to 125,584 cases and over 2.5 million control individuals across 60 studies from 25 countries, adding 11 genome-wide significant loci compared with those previously identified 2 .Genes at new loci, including SFTPD, MUC5B and ACE2, reveal compelling insights regarding disease susceptibility and severity.
Here we present meta-analyses bringing together 60 studies from 25 countries (Fig. 1 and Supplementary Table 1) for three COVID-19-related phenotypes: (1) individuals critically ill with COVID-19 on the basis of requiring respiratory support in hospital or who died as a consequence of the disease (9,376 cases, of which 3,197 are new in this data release, and 1,776,645 control individuals); (2) individuals with moderate or severe COVID-19 defined as those hospitalized due to symptoms associated with the infection (25,027 cases, 11,386 new and 2,836,272 control individuals); and (3) all cases with reported SARS-CoV-2 infection regardless of symptoms (125,584 cases, 76,022 new and 2,575,347 control individuals).Most studies have reported results before the roll out of the COVID-19 vaccination campaign.An overview of the study design is provided in Supplementary Fig. 1.We found a total of 23 genome-wide significant loci (P < 5 × 10 −8 ) of which 20 loci remain significant after correction for multiple testing (P < 1.67 × 10 −8 ) to account for the number of phenotypes examined (Fig. 2, Supplementary Fig. 2 and Supplementary Table 2).We compared the effects of these loci between the previous 2 and current analysis and found that only one locus did not replicate (rs72711165).All of the other loci showed the expected increase in statistical significance (Supplementary Fig. 3).
Across the genome-wide significant loci, we observed clear patterns of association with the different phenotypes under study.We therefore developed a two-class Bayesian model for classifying loci based on the patterns of association across the two better-powered phenotypes (COVID-19 hospitalization and SARS-CoV-2 reported infection).Intuitively, loci that are associated with susceptibility will also be associated with severity as, to develop COVID-19, SARS-CoV-2 infection needs to first occur.By contrast, those genetic effects that solely modify the course of illness should be associated with severity of illness and not show any association with reported infection except through preferential ascertainment of hospitalized cases in a cohort (Supplementary Methods).We identified 16 loci that are substantially more likely (>99% posterior probability) to affect the risk of COVID-19 hospitalization and 7 loci that clearly influence susceptibility to SARS-CoV-2 infection (Supplementary Table 3 and Supplementary Fig. 4).
We observed that several loci had a significant heterogeneous effect across studies (6 out of 23 loci with a P value for heterogeneity of <2.2 × 10 −3 ; Supplementary Table 2).Owing to an increased diversity in our study population (Supplementary Fig. 5), we were able to examine whether such heterogeneity was due to effect differences across continental ancestry groups.Only one locus (FOXP4) showed a significantly different effect across ancestries (P value heterogeneity of <7 × 10 −5 ; Supplementary Table 4 and Supplementary Fig. 6), although even at this locus all of the ancestry groups showed a positive effect estimate.This confirms that factors related to between-study heterogeneity (such as variable definition of COVID-19 severity owing to different thresholds for testing, hospitalization and patient recruitment) rather than differences across ancestries are a more likely explanation for the observed heterogeneity in the effect sizes across studies.
For the 23 genome-wide significant loci, we examined candidate causal genes and performed a phenome-wide association study to better understand their potential biological mechanisms (Supplementary Tables 2, 5 and 6 and Supplementary Fig. 7).Several of these loci with previous and direct connections to lung disease and SARS-CoV-2 infection mechanisms are highlighted here.
We have substantially expanded the genetic analysis of SARS-CoV-2 infection and COVID-19 severity by doubling the case size, identifying 11 loci.We developed an approach to systematically assign the 23 discovered loci to either disease susceptibility (7 loci) or disease severity (16 loci).Although distinguishing between the two phenotypes is challenging because progression to a severe form of the disease requires susceptibility to infection in the first place, it is now evident that the genetic mechanisms involved in these two aspects of the disease can be differentiated.Among the new loci associated with disease susceptibility, ACE2 represents an expected, albeit interesting, finding.MUC5B, SFTPD and SLC22A31 are the three most interesting new loci associated with COVID-19 severity.Their relationship with lung function and lung diseases is consistent with loci previously associated with disease severity.The surfactant proteins secreted by alveolar cells, representing an emerging biological mechanism, maintain healthy lung function and facilitate the clearance of pathogens 13 .The protective effect of the MUC5B variant is unexpected given the otherwise risk-increasing, concordant effect between IPF and COVID-19 observed for other variants 9 .Nonetheless, this result aligns with the MUC5B promoter variant association that shows a twofold higher survival rate among patients with IPF 10 .In mice, Muc5b seems to be essential for effective mucociliary clearance and for controlling infection 14 , which suggests that therapies to control mucin secretion may be beneficial in patients with COVID-19.
Expanding genomic research to include participants from around the world enabled us to test whether the effect of COVID-19-related genetic variants was markedly different across ancestry groups.We did not detect obvious heterogeneity between ancestry groups, and we attribute the observed heterogeneity in the effect of COVID-19-related genetic variants to the diverse inclusion criteria across studies in terms of COVID-19 severity.However, we also note that ascertainment differences across studies might mask true underlying differences in effect sizes between ancestry groups.
The biological insights gained by this expansion of the COVID-19 Host Genetic Initiative showed that increasing sample size and diversity remain a fruitful activity to better understand the human genetic architecture of COVID-19.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41586-022-04826-7.

Data availability
Summary statistics generated by COVID-19 Host Genetics Initiative are available online (https://www.covid19hg.org/results/r6/).The analyses described here use the freeze 6 data.The COVID-19 Host Genetics Initiative continues to regularly release new data freezes.Summary statistics for samples from individuals of non-European ancestry are not currently available owing to the small individual sample sizes of these groups, but the results for 23 loci lead variants are reported in Supplementary Table 3. Individual-level data can be requested directly from the authors of the contributing studies, listed in Supplementary Table 1.We used publicly available data from GTEx (https://gtexportal.org/ home/), the Neale laboratory (http://www.nealelab.is/uk-biobank/),the Finucane laboratory (https://www.finucanelab.org),the FinnGen Freeze 4 cohort (https://www.finngen.fi/en/access_results)and eQTL catalogue release 3 (http://www.ebi.ac.uk/eqtl/).

Fig. 1 |
Fig. 1 | Overview of contributing studies in Host Genetics Initiative data freeze 6. a, Geographical overview of the contributing studies to the COVID-19 Host Genetics Initiative and composition by major continental ancestry groups.Ancestry groups are defined as Middle Eastern (MID), south Asian (SAS), east Asian (EAS), African (AFR), admixed American (AMR) and European (EUR).b, Principal components analysis highlighting the population structure and the sample ancestry of the individuals participating in the COVID-19 Host Genetics Initiative.This figure is reproduced from the original publication by the COVID-19 Host Genetics Initiative 2 with modifications reflecting the updated analysis from data freeze 6.

Fig. 2 |
Fig. 2 | Genome-wide association results for COVID-19.a, The results of the genome-wide association study of hospitalized COVID-19 (n = 25,027 cases and n = 2,836,272 control individuals) (top), and the results of reported SARS-CoV-2 infection (n = 125,584 cases and n = 2,575,347 control individuals) (bottom).Loci highlighted in yellow (top) represent regions associated with the severity of COVID-19 manifestation.Loci highlighted in green (bottom) are regions associated with SARS-CoV-2-reported infection.Lead variants for the loci identified in this data release are annotated with their respective rs ID.Horizontal lines denote genome-wide significant thresholds.b, The results of gene prioritization using different evidence measures of gene annotation.Genes in regions of linkage disequilibrium (LD), genes with coding variants and eGenes (fine-mapped cis-eQTL variant PIP > 0.1 in GTEx Lung) are annotated if in linkage disequilibrium with a COVID-19 lead variant (r 2 > 0.6).V2G denotes the highest gene prioritized by OpenTargetGenetics' V2G score.The asterisk (*) indicates SARS-CoV-2 reported infection and the plus symbol (+) indicates COVID-19 severity.The transparent loci were reported in the previous freeze (data release 5), and loci in bright blue were identified in the current freeze (data release 6).This figure is reproduced from the original publication by the COVID-19 Host Genetics Initiative 2 with modifications reflecting the updated analysis from data freeze 6.