Abstract
Critical illness in COVID-19 is an extreme and clinically homogeneous disease phenotype that we have previously shown1 to be highly efficient for discovery of genetic associations2. Despite the advanced stage of illness at presentation, we have shown that host genetics in patients who are critically ill with COVID-19 can identify immunomodulatory therapies with strong beneficial effects in this group3. Here we analyse 24,202 cases of COVID-19 with critical illness comprising a combination of microarray genotype and whole-genome sequencing data from cases of critical illness in the international GenOMICC (11,440 cases) study, combined with other studies recruiting hospitalized patients with a strong focus on severe and critical disease: ISARIC4C (676 cases) and the SCOURGE consortium (5,934 cases). To put these results in the context of existing work, we conduct a meta-analysis of the new GenOMICC genome-wide association study (GWAS) results with previously published data. We find 49 genome-wide significant associations, of which 16 have not been reported previously. To investigate the therapeutic implications of these findings, we infer the structural consequences of protein-coding variants, and combine our GWAS results with gene expression data using a monocyte transcriptome-wide association study (TWAS) model, as well as gene and protein expression using Mendelian randomization. We identify potentially druggable targets in multiple systems, including inflammatory signalling (JAK1), monocyte–macrophage activation and endothelial permeability (PDE4A), immunometabolism (SLC2A5 and AK5), and host factors required for viral entry and replication (TMPRSS2 and RAB2A).
Similar content being viewed by others
Main
The design of the GenOMICC study and the rationale for focusing on critical illness has been previously described1,2. In brief, patients with confirmed COVID-19 requiring continuous cardiorespiratory monitoring or organ support (a generalizable definition for critical illness) were recruited in 2020–2022. We first performed ancestry-specific GWAS analyses according to the methods that we described previously1,2. Using the results of these GWAS analyses, previously reported results obtained using GenOMICC participants with whole-genome sequencing data2 and data from GenOMICC Brazil, we performed trans-ancestry and -platform meta-analyses within the GenOMICC study for a critically ill COVID-19 phenotype and a hospitalized COVID-19 phenotype (Extended Data Fig. 1). The results of these GenOMICC-only meta-analyses are presented for both critically ill and hospitalized phenotypes (Table 1 and Extended Data Fig. 2). To put these results into the context of existing knowledge, we performed comprehensive meta-analyses, drawing on further GWAS results, including data shared by the SCOURGE consortium and published data from the COVID-19 Human Genetics Initiative (HGIv6, 2021)4. The characteristics of the contributing studies are summarized in Supplementary Tables 13 and 14 for the critically ill and hospitalized phenotypes, with further details on each study provided in the Supplementary Information. We used a mathematical subtraction approach, as done in our previous work2, to remove signals of previous GenOMICC releases from HGIv6, yielding an independent dataset.
As no replication cohorts exist for these meta-analyses, we used the heterogeneity across studies to assess the reliability of individual findings (Supplementary Table 15). Owing to the unusually extreme phenotype in the GenOMICC study, some heterogeneity is expected for the strongest associations when compared with studies with more permissive inclusion criteria. Importantly, significant heterogeneity was not detected for any of the findings that we report here (Supplementary Table 15). Comparing effect estimates between studies using a regression approach that takes into account estimation errors (Methods), we detected systematic differences in effect sizes between studies (Extended Data Fig. 3). For example, effects for the HGI critical illness phenotype (which was designed to parallel the GenOMICC inclusion criteria) are smaller than those obtained using prospective recruitment in GenOMICC by a factor of 0.68. As the effect sizes in GenOMICC are consistently larger than other studies, and GenOMICC contributes a disproportionately large signal to meta-analyses of both critical and hospitalized phenotypes (Extended Data Fig. 4), between-study heterogeneity is likely to reflect the careful case ascertainment and extreme phenotype in GenOMICC compared with other studies.
We found 49 common genetic associations with critical COVID-19 meeting our criteria for genome-wide significance in the absence of heterogeneity (Extended Data Fig. 2 and Table 1). Findings from previous reports were consistently replicated (Extended Data Table 2). Conditional analysis revealed two additional lead variants (Table 1) and statistical fine-mapping provided credible sets of putative causal variants for a majority of lead variants (Supplementary Figs. 27–44 and Supplementary Table 5). Gene-level analyses found 196 significantly associated genes at a Bonferroni-corrected threshold (Supplementary Table 10). There were no genome-wide significant differences in the effects between sexes in a sex-stratified meta-analysis using a subset of cohorts (Supplementary Fig. 1).
Therapeutic implications
Our analysis is limited to common variants that are detectable on genotyping arrays and imputation panels. Although most lead variants are not directly causal, in some cases, they highlight molecular mechanisms that alter clinical outcomes in COVID-19, and may have direct therapeutic relevance. To investigate the disease mechanisms, we first quantified the effect of inferred gene expression on critical illness in three relevant tissue/cell types. Many of the genes that we have found to be implicated in critical COVID-19 (refs. 1,2) are highly expressed in the monocyte–macrophage system, which has poor coverage in existing expression quantitative trait loci (eQTL) datasets. For this reason, we constructed a new TWAS model in primary monocytes obtained from 176 individuals (Methods). We found significant associations after Bonferroni correction between critical COVID-19 and predicted gene expression in lung (33), blood (21), monocyte (37) and all-tissue (107) meta-analysis (Supplementary Table 2 and Supplementary Table 11). We extended these findings using generalized summary-level data Mendelian randomization (GSMR) for RNA expression (Fig. 2, Extended Data Table 1, Supplementary Figs. 11–18 and Supplementary Table 4).
In parallel, we assessed the effect of genetically determined variation in circulating protein levels on the critical illness phenotype using GSMR5. We identified 15 unique proteins linked to critical illness, as summarized in Extended Data Table 1 (Supplementary Table 3). Of the significant results, we found causal evidence implicating five new proteins in comparison to our previous GSMR analysis2: QSOX2, CREB3L4, myeloperoxidase (MPO), ADAMTS13 and mannose-binding lectin-2 (MBL2) (Supplementary Fig. 10). These include well-studied biomarkers and potential drug targets in sepsis—the innate immune pattern recognition receptor MBL2 and the neutrophil effector enzyme MPO. ADAMTS13 modulates von Willebrand-factor-mediated platelet thrombus formation and may have a role in the hypercoagulable state in critical COVID-19 (Extended Data Fig. 5).
Three genes containing non-synonymous protein-coding changes associated with severe disease were also found to have significant effects from differential gene expression: SLC22A31 (ref. 2) (Fig. 1), SFTPD4 (Fig. 1) and TKY2 (ref. 1) (Extended Data Fig. 6). Further biological and clinical research will be required to dissect the genetic evidence at these loci. In the example of TYK2, there is now a therapeutic test of the genetic predictions. Our previous report of association between higher expression and critical illness1 led directly to the inclusion of a new drug, baricitinib, in a large clinical trial; the result demonstrated a clear therapeutic benefit3. This therapeutic signal is consistent across multiple trials, providing the first proof-of-concept for drug target identification using genetics in critical illness and infectious disease.
To assess the immediate therapeutic use of our results for repurposing of existing compounds, we considered the drug therapies under consideration by the UK COVID-19 Therapeutic Advisory Panel (UK-CTAP), a national independent review group supported by an expert due-diligence panel6. Consistent evidence from gene-level GWAS (Supplementary Table 6 and Supplementary Table 10) and post-GWAS analyses was identified for several licensed compounds (Supplementary Table 12). For example, we found an association in another gene encoding a protein that is inhibited by baricitinib and other JAK inhibitors—the intracellular signalling kinase, JAK1, which is stimulated by numerous cytokines including type I interferons and IL-6. Mendelian randomization analysis of RNA expression revealed a significant positive association between the expression of the gene encoding a canonical inflammatory cytokine, tumour necrosis factor (TNF), and severe disease (Fig. 2). This suggests that inhibition of TNF signalling may be an effective therapy in severe COVID-19.
Our additional expression data in monocytes reveal a marked tissue-specific effect on expression of PDE4A. This phosphodiesterase regulates the production of multiple inflammatory cytokines by myeloid cells. In contrast to the negative correlations seen in the lungs and blood, we show that a genetic tendency for higher expression of PDE4A in monocytes is associated with critical COVID-19 (Supplementary Table 11). Inhibition of PDE4A by several existing drugs is under investigation in multiple inflammatory diseases7, reduces pulmonary endothelial permeability8 and appears to be safe in small clinical trials in patients with COVID-19.
The postulated biological role of genes associated with critical COVID-19 in GWAS, TWAS and GSMR results is shown in Extended Data Fig. 5, which highlights the preponderance of genes with expression or functions in the mononuclear phagocyte system. This includes SLC2A5, encoding the GLUT5 fructose transporter, which is strongly inducible in primary macrophages in response to inflammatory stimulation9, and XCR1, a dendritic cell receptor with a critical role in cytotoxic T cell-mediated antiviral immunity10. NPNT, a significant meta-TWAS association in the genome-wide significant region on chromosome 4 (chr4:105673359; Supplementary Table 11), encodes a pulmonary basement membrane protein that may have a protective role in acute lung injury11.
Host–pathogen interaction
Our results also demonstrate the capacity of host genetics to reveal core mechanisms of disease. Multiple genes implicated in viral entry are associated with severe disease. In addition to ACE2, we detect a genome-wide significant association in TMPRSS2, a key host protease that facilitates viral entry that we have previously studied as a candidate gene12. This effect may be viral-lineage specific13. A strong GWAS association is seen in RAB2A (Table 1), with TWAS evidence suggesting that more expression of this gene is associated with worse disease (Supplementary Table 11). RAB2A is highly ranked in our previous meta-analysis by information content14 study of host genes implicated in SARS-CoV-2 interaction using in vitro and clinical data15, and is consistent with CRISPR screen data showing that RAB2A is required for viral replication16.
Although our focus on critical illness enhances discovery power (Extended Data Fig. 4), it has the disadvantage of combining genetic signals for multiple stages in disease progression, including viral exposure, infection and replication, and development of inflammatory lung disease. From these data alone we cannot identify when in disease progression the causal effect is mediated, although clinical evidence helps to make some predictions17 (Extended Data Fig. 5). As most cases included were recruited before vaccinations and treatments became available (Extended Data Fig. 7), at present, our study does not have sufficient statistical power to dissect the genetic effects of treatments or vaccination. These effects may include the masking of true associations, or the detection of genetic effects mediated by vaccine or drug response, rather than COVID-19 susceptibility. However, the absence of divergent genetic effects between studies (Supplementary Figs. 2–5) or consistent changes in effect allele frequency among cases over time (Supplementary Figs. 45–48) suggests that treatment and vaccination have not substantially affected the association between the specific variants that we report and the risk of critical illness.
As we performed a meta-analysis of multiple studies that may have slightly different definitions of the phenotype, effect sizes differ between studies (Supplementary Figs. 2–5). This, together with ancestry-specific effects1, may explain the heterogeneity in strong GWAS signals, such as the LZTFL1 signal in Table 1. Different studies also have sets of variants that are not completely overlapping, so P values between variants in high linkage disequilibrium are more different than expected. Although most of the studies contain individuals from multiple ancestries, a large majority of the individuals are of European ancestry. In future research, there is a scientific and moral imperative to include the full diversity of human populations.
Together, these results deepen our understanding of the pathogenesis of critical COVID-19 and highlight new biological mechanisms of disease, several of which have immediate potential for therapeutic targeting.
Methods
Hospitalization meta-analysis
The hospitalized phenotype includes patients who were hospitalized with a laboratory-confirmed SARS-Cov2 infection. In this analysis we included GenOMICC, GenOMICC Brazil, GenOMICC Saudi Arabia, ISARIC4C, HGIv6 B2 phenotype with subtraction of GenOMICC data, SCOURGE hospitalized versus population and mild cases, and 23andMe broad respiratory phenotype. A summary description of each analysis is given above, a table with the included studies can be found in Supplementary Table 14 and an extended description can be found in Supplementary Table 1.
Critical illness meta-analysis
The critically ill COVID-19 group included patients who were hospitalized owing to symptoms associated with laboratory-confirmed SARS-CoV-2 infection and who required respiratory support or whose cause of death was associated with COVID-19. In the critical illness analysis, we included GenOMICC, patients with critical illness from ISARIC4C, HGIv6 phenotype A2 with subtraction of GenOMICC data, SCOURGE severity grades 3 and 4 versus population controls, and 23andMe respiratory support phenotype. A summary description of each analysis can be found above, a table with the included studies can be found in Supplementary Table 13 and an extended description can be found in Supplementary Table 1.
Meta-analyses
All meta-analyses across studies were performed using a fixed-effect inverse-variance weighting method and control for population stratification in the METAL software23. Allele frequency was calculated as the average frequency across studies with the METAL option AVERAGEFREQ. P values for heterogeneity in effect sizes between studies were calculated using a Cochran’s Q-test implemented in METAL. For variants in the same position with different REF and ALT alleles across studies, the GenoMICC variant in the European population was selected and the rest were removed. Finally, variants with switched ALT and REF alleles between HGIv6 and GenOMICC were also removed on the basis of differences in allele frequency of the alternative allele. Variants were annotated to the closest genes using dbsnp v.b151 GRCh38p7 and bionrRt R package (v.2.46.3)24. As each single-nucleotide polymorphism (SNP) of the meta-analysis can be present in different subsets of cohorts, there may be large differences in P values in SNPs with a high level of linkage disequilibrium, which may have an effect on downstream analyses. For this reason, variants that were not present in one of the three biggest studies—GenOMICC European ancestry, HGIv6 or SCOURGE—were filtered out from post-GWAS analysis.
Conditional analysis
We performed a step-wise conditional analysis to find independent signals. As European-specific data are not available in some cohorts but European ancestry is largely predominant (87.2% of cases with critical illness), we performed the conditional analysis using a European reference panel and the meta-analysis results of the whole cohort. To perform the conditional analysis, we used the GCTA (v.1.9.3) --cojo-slct function25. The parameters for the function were P = 5 × 10−8, a distance of 10,000 kb and a co-linear threshold of 0.9 (ref. 26), and the reference population for the conditional analysis was individuals of European ancestry with whole-genome sequence available in the GenOMICC study and whole genomes from the 100,000 Genomics England project2.
Credible set fine-mapping
We performed fine-mapping using the SuSiE model27 to construct credible sets for the independent signals identified using conditional analysis. As for conditional analysis, we used a European reference panel and the meta-analysis results of the whole critical illness cohort. We performed analyses in 1 Mb windows centred on the lead variants identified through conditional analysis. In cases in which windows for multiple variants overlapped, they were joined into a single window. For each window, we fitted the SuSiE summary statistics model setting the expected number of independent signals to the number of identified though conditional analysis. Models for three windows did not converge in 500 iterations and have been excluded. As a reference, we used the publically available linkage disequilibrium information for non-Finish Europeans from the GNOMAD 2.1.1 release. Full data for all variants included in credible sets are included in Supplementary Table 5.
Gene-level analysis
We performed an analysis summarizing the genetic associations at the gene level using the mBAT-combo method28. We used the COVID ‘all critical cohorts’ meta-analysis (GenOMICC, HGIv6 phenotype A2, SCOURGE and 23andMe) summary statistics. As this is a trans-ethnic meta-analysis, we used a mixed ancestry linkage disequilibrium reference panel, consisting of 3,202 1000 Genomes phase 3 samples. We considered a list of protein-coding genes with unique ensemble gene ID based on the release from GENCODE (v.40) for hg38, which can be found on the mBAT-combo website (https://yanglab.westlake.edu.cn/software/gcta/#mBAT-combo). A gene region was taken to span 50 kb upstream to 50 kb downstream of the gene’s untranslated regions.
Sex-stratified meta-analysis
To test for differences in genetic effects, we performed sex-stratified GWAS of the COVID-19 critical illness phenotype in the European ancestry GenOMICC WGS and genotyped cohorts and SCOURGE. We then performed a meta-analysis for each sex following the same methods as for the main analysis. We tested for differences in effects between the meta-analyses of the two sexes following previously described methods29.
Mendelian randomization
GSMR5 was performed. We used the COVID ‘all critical cohorts’ meta-analysis (GenOMICC, HGIv6 phenotype A2, SCOURGE and 23andMe) as the outcome, protein expression quantitative-trait loci (pQTLs) from ref. 30 and RNA expression quantitative-trait loci (eQTLs) from eQTLgen31 (2019-12-23 data release) as exposures, and 10,000 individuals of European ancestry randomly sampled from the UK Biobank as the linkage disequilibrium reference cohort (50,000 for linkage disequilibrium to missense variant plots). GSMR was performed for all exposures for which we were able to identify two or more suitable SNPs. SNPs were chosen to meet the following criteria: (1) SNP to exposure association P < 5 × 10−8; (2) linkage disequilibrium clumping lead SNPs only (±1 Mb, r2 < 0.05); (3) SNP not removed by HEIDI-outlier filtering (for the removal of SNPs with evidence of horizontal pleiotropy) at the default threshold value of 0.01. eQTLGen effect sizes and standard errors were estimated as described in supplementary note 2 of ref. 32. We considered as significant those exposure–outcome pairs with FDR < 0.05.
TWAS analysis
To perform TWAS analysis in GTExv8 tissues33, we used the MetaXcan framework and the GTExv8 eQTL and sQTL MASHR-M models available for download online (http://predictdb.org/) and the ‘all critical cohorts’ meta-analysis. We first calculated individual TWAS for whole blood and lungs using the S-PrediXcan function34,35. We next performed a metaTWAS including data from all tissues to increase the statistical power using s-MultiXcan36. We applied Bonferroni correction to the results to choose significant genes and introns for each analysis.
Monocyte gene expression
To detect eQTLs, untreated primary monocytes were prepared from 174 healthy individuals of Northern European (British) ancestry recruited through the Oxford Biobank. Poly(A) RNA was paired-end 100 bp sequenced in the Oxford Genome Centre using the Illumina HiSeq-4000 machines (median = 47,735,438 reads per sample). Reads were aligned to CRGh38/hg38 using HISAT2 with the default parameters. High mapping quality reads were selected on the basis of MAPQ score using bamtools. Duplicate reads were marked and removed using picard (v.1.105). Samtools was used to pass through the mapped reads and calculate statistics. Read count information was generated using HTSeq and normalized using DESeq2. Sample contamination and swaps were detected by comparing the imputed SNP-array genotypes with genotypes called from RNA-seq using verifyBamID. Genotyping was performed with Illumina HumanOmniExpress with coverage of 733,202 separate markers. Genotypes were pre-phased with SHAPEIT2, and missing genotypes were imputed with PBWT. Poly(A) RNA was paired-end sequenced at the Oxford Genome Centre using the Illumina HiSeq-4000 machines. vcftools (v0.1.12b) was applied on genetic variation data in the form of variant call format (VCF) files to filter out indels and SNPs with a minor allele frequency of less than 0.04.
TWAS analysis for monocyte data was performed using genotyping and monocyte RNA-sequencing data from 174 individuals. Using a region of 500 kb around each gene, we calculated gene expression models using the Fusion R package37. For each gene, three models were calculated adding as covariates the two first principal components calculated from the genotype: blup, elastic networks and lasso. The model with a better r2 between predicted and measured expression in a fivefold cross-validation was chosen. Then SNP genetic heritability was calculated for the 500 kb region for each gene and those genes with a nominal significant SNP heritability estimate (P ≤ 0.01) were chosen for the TWAS analysis. Summary statistics for the ‘all critical cohorts’ meta-analysis and the best model for each gene were then used to perform the TWAS.
Colocalization
Significant genes in the TWAS and metaTWAS were selected for a colocalization analysis using the coloc R package. The lead SNPs and a region of 200 Mb around the gene were used to colocalize with significant genes in the TWAS with eQTL summary statistics data on the region from GTExv8 lung, GTExv8 whole blood, eQTLgen or monocyte eqtl. As in our previous analysis2, we first performed a sensitivity analysis of the posterior probability of colocalization (PPH4) on the prior probability of colocalization (P12), going from P12 = 10−8 to P12 = 10−4, with the default threshold being P12 = 10−5. eQTL signal and GWAS signals were deemed to colocalize if these two criteria were met: (1) at P12 = 5 × 10−5 the probability of colocalization PPH4 > 0.5; and (2) at P12 = 10−5 the probability of independent signal (PPH3) was not the main hypothesis (PPH3 < 0.5). These criteria were chosen to allow eQTLs with weaker P values, owing to lack of power in GTEx v.8, to be colocalized with the signal when the main hypothesis using small priors was that there was not any signal in the eQTL data.
Effect comparison
We compared the estimates of effect sizes between the individual GWASs used in the meta-analysis, for all variants that were genome-wide significant in at least one of the individual GWASs. To this end, we regressed the effects obtained using critical illness and hospitalization in the SCOURGE and 23andMe cohorts, as well as the HGI meta-analyses on the effect estimates obtained using the GenOMICC cohort. To account for estimation errors present in both the dependent and independent variables of the regression we used orthogonal distance regression38.
Weight of studies
To calculate the weight of GenOMICC, we downloaded the leave-one-out data of HGIv7. As the meta-analysis is performed using a variance-weighted method, we can recover the variance for each SNP as \(v=\frac{1}{{{\rm{s.e.}}}^{2}}\), for the meta-analysis of all of the cohorts and for each one of the leave-one-out analysis. The total weight is \({w}_{{\rm{tot}}}=\frac{1}{v}\) and the weight leaving out a specific study is \({w}_{{\rm{loo}}}=\frac{1}{{v}_{{\rm{loo}}}}\). The weight of a cohort is then \({w}_{{\rm{tot}}}-{w}_{{\rm{loo}}}\). We calculated the weight for each the significant SNPs in our analysis for each study and normalized it using the total weight. Finally, we calculated the mean and s.d. from the significant SNPs for each cohort.
Forest plots
To compare effects between cohorts, we first performed a trans-ancestry meta-analysis for GenOMICC and 23andMe using METAL23. Then, we used the metagen and forest functions of the meta R package to produce forest plots for critical illness and hospitalization separately.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Downloadable summary data are available through the GenOMICC data site (https://genomicc.org/data). Summary statistics are available, but without the 23andMe summary statistics, except for the 10,000 most significant hits, for which full summary statistics are available. The full GWAS summary statistics for the 23andMe discovery dataset will be made available through 23andMe to qualified researchers under an agreement with 23andMe that protects the privacy of the 23andMe participants. For further information and to apply for access to the data, see the 23andMe website (https://research.23andMe.com/dataset-access/). All individual-level genotype and whole-genome sequencing data (for both academic and commercial uses) can be accessed through the UKRI/HDR UK Outbreak Data Analysis Platform (https://odap.ac.uk). A restricted dataset for a subset of GenOMICC participants is also available through the Genomics England data service. Monocyte RNA-seq data are available under the title ‘Monocyte gene expression data’ within the Oxford University Research Archives (https://doi.org/10.5287/ora-ko7q2nq66). Sequencing data will be made freely available to organizations and researchers to conduct research in accordance with the UK Policy Framework for Health and Social Care Research through a data access agreement. Sequencing data have been deposited at the European Genome–Phenome Archive (EGA), which is hosted by the EBI and the CRG, under accession number EGAS00001007111.
Code availability
Code to calculate the imputation of P values on the basis of SNPs in linkage disequilibrium is available at GitHub (https://github.com/baillielab/GenOMICC_GWAS).
Change history
11 July 2023
A Correction to this paper has been published: https://doi.org/10.1038/s41586-023-06383-z
References
Pairo-Castineira, E. et al. Genetic mechanisms of critical illness in COVID-19. Nature 591, 92–98 (2021).
Kousathanas, A. et al. Whole-genome sequencing reveals host factors underlying critical COVID-19. Nature 607, 97–103 (2022).
Abani, O. et al. Baricitinib in patients admitted to hospital with COVID-19 (RECOVERY): a randomised, controlled, open-label, platform trial and updated meta-analysis. Lancet 400, 359–368 (2022).
Pathak, G. A. et al. A first update on mapping the human genetic architecture of COVID-19. Nature 608, E1–E10 (2022).
Zhu, Z. et al. Causal associations between risk factors and common diseases inferred from GWAS summary data. Nat. Commun. 9, 224 (2018).
Chinnery, P. F. et al. Choosing drugs for UK COVID-19 treatment trials. Nat. Rev. Drug Discov. 21, 81–82 (2022).
Peng, T., Qi, B., He, J., Ke, H. & Shi, J. Advances in the development of phosphodiesterase-4 inhibitors. J. Med. Chem. 63, 10594–10617 (2020).
Sanz, M.-J. et al. Roflumilast inhibits leukocyte-endothelial cell interactions, expression of adhesion molecules and microvascular permeability. Br. J. Pharmacol. 152, 481–92 (2007).
Forrest, A. R. R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).
Brewitz, A. et al. CD8. Immunity 46, 205–219 (2017).
Wilson, C. L., Hung, C. F. & Schnapp, L. M. Endotoxin-induced acute lung injury in mice with postnatal deletion of nephronectin. PLoS ONE 17, e0268398 (2022).
David, A. et al. A common TMPRSS2 variant has a protective effect against severe COVID-19. Curr. Res. Transl. Med. 70, 103333 (2022).
Meng, B. et al. Altered TMPRSS2 usage by SARS-CoV-2 Omicron impacts tropism and fusogenicity. Nature https://doi.org/10.1038/s41586-022-04474-x (2022).
Li, B. et al. Genome-wide CRISPR screen identifies host dependency factors for influenza A virus infection. Nat. Commun. 11, 164 (2020).
Parkinson, N. et al. Dynamic data-driven meta-analysis for prioritisation of host genes implicated in COVID-19. Sci. Rep. 10, 22303 (2020).
Hoffmann, H.-H. et al. Functional interrogation of a SARS-CoV-2 host protein interactome identifies unique and shared coronavirus host factors. Preprint at bioRxiv https://doi.org/10.1101/2020.09.11.291716 (2020).
Russell, C. D., Lone, N. I. & Baillie, J. K. Comorbidities, multimorbidity and COVID-19. Nat. Med. 29, 334–343 (2023).
Niemi, M. E. K. et al. Mapping the human genetic architecture of COVID-19. Nature 600, 472–477 (2021).
Ellinghaus, D. et al. Genomewide association study of severe COVID-19 with respiratory failure. New Engl. J. Med. 383, 1522–1534 (2020).
Cruz, R. et al. Novel genes and sex differences in COVID-19 severity. Hum. Mol. Genet. 31, 3789–3806 (2022).
Degenhardt, F. et al. Detailed stratified GWAS analysis for severe COVID-19 in four European populations. Hum. Mol. Genet. 31, 3945–3966 (2022).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Durinck, S., Spellman, P. T., Birney, E. & Huber, W. Mapping identifiers for the integration of genomic datasets with the r/bioconductor package biomaRt. Nat. Protoc. 4, 1184–1191 (2009).
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).
Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. B 82, 1273–1300 (2020).
Li, A. et al. mBAT-combo: a more powerful test to detect gene-trait associations from GWAS data. Preprint at bioRxiv https://doi.org/10.1101/2022.06.27.497850 (2022).
Bernabeu, E. et al. Sex differences in genetic architecture in the UK Biobank. Nature 53, 1283–1289 (2021).
Sun, B. B. et al. Genomic atlas of the human plasma proteome. Nature 558, 73–79 (2018).
Võsa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021).
Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).
The GTEX Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
Barbeira, A. N. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 9, 1825 (2018).
Barbeira, A. N. et al. Integrating predicted transcriptome from multiple tissues improves association detection. PLoS Genet. 15, e1007889 (2019).
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
Boggs, P. T. & Rogers, J. E. Orthogonal distance regression. Contemp. Math. 112, 183–194 (1990).
Liang, J. et al. Lead identification of novel and selective TYK2 inhibitors. Eur. J. Med. Chem. 67, 175–87 (2013).
Li, Z. et al. Two rare disease-associated Tyk2 variants are catalytically impaired but signaling competent. J. Immunol. 190, 2335–2344 (2013).
Acknowledgements
We thank the patients and their loved ones who volunteered to contribute to this study at one of the most difficult times in their lives, and the research staff in every intensive care unit who recruited patients at personal risk during the most extreme conditions ever witnessed in most hospitals. GenOMICC was funded by Sepsis Research (the Fiona Elizabeth Agnew Trust), the Intensive Care Society, a Wellcome Trust Senior Research Fellowship (to J.K.B., 223164/Z/21/Z), the Department of Health and Social Care (DHSC), Illumina, LifeArc, the Medical Research Council, UKRI, a BBSRC Institute Program Support Grant to the Roslin Institute (BBS/E/D/20002172, BBS/E/D/10002070 and BBS/E/D/30002275) and UKRI grants MC_PC_20004, MC_PC_19025, MC_PC_1905 and MRNO2995X/1. A.D.B. acknowledges funding from the Wellcome PhD training fellowship for clinicians (204979/Z/16/Z), the Edinburgh Clinical Academic Track (ECAT) programme. This research is supported in part by the Data and Connectivity National Core Study, led by Health Data Research UK in partnership with the Office for National Statistics and funded by UK Research and Innovation (grant MC_PC_20029). Laboratory work was funded by a Wellcome Intermediate Clinical Fellowship to B.F. (201488/Z/16/Z). We acknowledge the staff at NHS Digital, Public Health England and the Intensive Care National Audit and Research Centre who provided clinical data on the participants; and the National Institute for Healthcare Research Clinical Research Network (NIHR CRN) and the Chief Scientist’s Office (Scotland), who facilitate recruitment into research studies in NHS hospitals, and to the global ISARIC and InFACT consortia. GenOMICC genotype controls were obtained using UK Biobank Resource under project 788 funded by Roslin Institute Strategic Programme Grants from the BBSRC (BBS/E/D/10002070 and BBS/E/D/30002275) and Health Data Research UK (HDR-9004 and HDR-9003). UK Biobank data were used in the GSMR analyses presented here under project 66982. The UK Biobank was established by the Wellcome Trust medical charity, Medical Research Council, Department of Health, Scottish Government and the Northwest Regional Development Agency. It has also had funding from the Welsh Assembly Government, British Heart Foundation and Diabetes UK. The work of L.K. was supported by an RCUK Innovation Fellowship from the National Productivity Investment Fund (MR/R026408/1). J.Y. is supported by the Westlake Education Foundation. SCOURGE is funded by the Instituto de Salud Carlos III (COV20_00622 to A.C., PI20/00876 to C.F.), European Union (ERDF) ‘A way of making Europe’, Fundación Amancio Ortega, Banco de Santander (to A.C.), Cabildo Insular de Tenerife (CGIEU0000219140 ‘Apuestas científicas del ITER para colaborar en la lucha contra la COVID-19’ to C.F.) and Fundación Canaria Instituto de Investigación Sanitaria de Canarias (PIFIISC20/57 to C.F.). We also acknowledge the contribution of the Centro National de Genotipado (CEGEN) and Centro de Supercomputación de Galicia (CESGA) for funding this project by providing supercomputing infrastructures. A.D.L. is a recipient of fellowships from the National Council for Scientific and Technological Development (CNPq)-Brazil (309173/2019-1 and 201527/2020-0). We thank the members of the Banco Nacional de ADN and the GRA@CE cohort group; and the research participants and employees of 23andMe for making this work possible. A full list of contributors who have provided data that were collated in the HGI project, including previous iterations, is available online (https://www.covid19hg.org/acknowledgements).
Author information
Authors and Affiliations
Consortia
Contributions
E.P.-C., K. Rawlik, K.M., S.K., C.P.P., J.F.W., V.V., M.A., A.D.L., E.J.P., R.C., A.C., A.F., L.M., K. Rowan, A.C.P., A.L., S.C.H. and J.K.B. contributed to design. E.P.-C., K. Rawlik, A.D.B., T.Q., Y.W., I.N., G.A.M., M.Z., L.K., A.K., A.R., T.M., J.Y., A.L., B.F., S.C.H. and J.K.B. contributed to data analysis. E.P.-C., K. Rawlik, I.N., A.K., A.R., J.M., C.D.R., A.L., B.F. and S.C.H. contributed to bioinformatics. E.P.-C., K. Rawlik, I.N., G.A.M., M.Z., A.K., J.M., C.D.R., R.T., D. McAuley, A.N., M.G.S., B.F., S.C.H. and J.K.B. contributed to writing and reviewing the manuscript. I.N., F.G., W.O., K.M., S.K., D. Maslove, A.N., M.G.S., J.K., M.S.-H., C.S., C.H., P.H., L.L., D. McAuley, H.M., P.J.M.O., C.B., T.W., A.T., C.F., J.A.R., A.R.-M., P.L., C.P.P., A.F., L.M., K. Rowan, A.L., B.F. and S.C.H. contributed to oversight. F.G. and W.O. contributed to project management. F.G., W.O. and J.K.B. contributed to ethics and governance. K.M., A.F. and L.M. contributed to sample handling and sequencing. C.P.P., K. Rowan, S.C.H. and J.K.B. contributed to conception. C.P.P., J.F.W., V.V., M.A., A.D.L., E.J.P., R.C., A.C., K. Rowan and A.C.P. contributed to reviewing the manuscript. K. Rowan and A.L. contributed to clinical data management. J.K.B. contributed to scientific leadership.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature thanks Jacques Fellay and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Pipeline of meta-analysis and post-GWAS analyses.
Red border indicates that the data is only available for the hospitalized phenotype, while a black border indicates that the analysis was performed for the critical illness phenotype.
Extended Data Fig. 2 Miami plots.
Meta-analysis results are shown for a) critical and b) hospitalized phenotypes. In each plot results obtained using all cohorts are shown at the top and using GenOMICC data only at the bottom. Independent lead variants in the analyses of all cohorts are annotated with associated genes. Genome-wide significant associations that have not been previously reported are indicated in bold.
Extended Data Fig. 3 Comparison of effect size estimates.
GenOMICC is compared with the critical and hospitalized phenotype definitions in the SCOURGE, 23andMe, and HGI analyses. The black line indicates the best linear fit, given by the equation in each plot, obtained using Orthogonal Distance Regression to account for estimation errors in both sets of effects in the comparison.
Extended Data Fig. 4 Study weightings for (a) critical and (b) hospitalized COVID-19.
Mean +/− standard deviation of weights assigned to each data source in meta-analyses for all significant SNPs.
Extended Data Fig. 5 Cartoon showing postulated roles for genes and mediators implicated in the pathogenesis of critical COVID-19 by GenOMICC GWAS, TWAS and Mendelian randomization.
Postulated roles for genetic variants are shown in a highly simplified format to illustrate potential roles in pathogenesis, with the shaded background indicating the hypothetical impact of the host immune response over time17. Host immune processes are divided into those that are thought to play a role in controlling viral replication early in disease (orange section, showing “adaptive” response), and those implicated in driving hypoxaemic respiratory failure later in disease (green section, showing “maladaptive” response). Bold type gene names indicate a higher level of confidence in both the gene identification and the biological role.
Extended Data Fig. 6 Functional genomics analyses for TYK2.
(a) Effect size plot for effect of multiple variants on TYK2 expression (eQTLgen, x-axis) against increasing susceptibility to critical COVID-19 (\({\beta }_{{xy}}\) = 0.53; \({P}_{{xy}}=1.2\times {10}^{-23}\)). Colour shows linkage disequilibrium (LD) with the missense variant rs34536443. (b) Crystal structure of TYK2 kinase domain (Protein Data Bank ID 4GVJ39) in two views that differ by a 45° rotation around a horizontal axis. The side chain of P1104 is shown as connected spheres with a nitrogen atom coloured in blue. Carbon, oxygen, nitrogen and phosphorus atoms of ATP are shown as magenta, red, blue and orange connected spheres, respectively. The N-terminal region of the kinase domain is not shown in the second view for clarity. The right-most panel shows a close view of P1104 and neighbouring residues with their side chains shown as sticks. Numbering of residues corresponds to UniProtKB entry P29597. P1104 is in the catalytic kinase domain and proximal to the ATP-binding site; TYK2 P1104A is catalytically impaired40.
Extended Data Fig. 7 Steroid treatment and vaccination status.
Data are shown for a subset of GenOMICC cases who were also recruited to the ISARIC4C study in the UK.
Supplementary information
Supplementary Information
Supplementary Sections 1–13, including Supplementary Figs. 1–54 and Supplementary Tables 7–17.
Supplementary Table 1
Description of the cohorts used in critical and hospitalized meta-analyses. Cohorts are divided by ancestry and genotyping method (whole-genome sequencing or microarray genotyping). In cohorts in which data are available, the median age with s.d. in parentheses, percentage of female cases, number of female and male cases and controls is shown. NA, data are not available for the cohort. Country of origin indicates the country in which individuals in the cohort were recruited; GenotypingPlatform indicates the array or WGS platform used for genotyping; reference indicates the reference of the publication (if the data have already been published). ‘In GenOMICC v2’ is an indication of whether the dataset was included in a previous GenOMICC paper1.
Supplementary Table 2
Full results for colocalization and TWAS analyses in lungs, blood, monocytes and across multiple tissue types (metaTWAS). Colocalization results are reported between significant TWAS genes and eQTLs in GTExv8 in lungs, blood and monocytes, and in eqtlGEN. rsid indicates chromosome, position, reference and alternative alleles; gene.tested is the significant TWAS gene, and ensembl.id is the Ensembl ID corresponding to the significant gene. PP.H3.1e-5 is the posterior probability of independent signals with a prior for colocalization of 5 × 10−5, PP.H4.1e-5 is the posterior probability of colocalization with a prior of 5 × 10−5, PP.H3.5e-5 is the posterior probability of independent signals with a prior for colocalization of 5 × 10−5 and PP.H4.5e-5 is the probability of colocalization with a prior of 5 × 10−5. Colocalization was considered to be significant when PP.H3 was lower than 0.5 and PP.H4 was the highest posterior probability. Significant genes after Bonferroni correction in a TWAS meta-analysis of lungs, blood, monocytes and all tissues in GTExv8 and the ‘all critical cohorts’ GWAS. Ensembl ID, gene name, P value of the meta-analysis, number of SNPs used in to model gene expression in all tissues, mean z score for all tissues and its s.d. are shown.
Supplementary Table 3
The full results from GSMR analysis for protein level. Exposure indicates the protein name used as exposure for the analysis, bxy is the effect size, se is the standard error, p is the P value of the analysis, nsnp is the number of SNPs used as instruments and multi_snp_based_heidi_outlier is the P value of the Heidi test.
Supplementary Table 4
Full results from GSMR analysis for RNA-seq data from eQTLGEN. Exposure indicates the gene name used as exposure for the analysis, bxy is the effect size, se is the standard error, p is the P value of the analysis, nsnp is the number of SNPs used as instruments and multi_snp_based_heidi_outlier is the P value of the Heidi test.
Supplementary Table 5
Table of variants in credible sets with 95% probability of containing the causal SNP. Credible set ID is the credible set index to which the variant belongs; posterior is the posterior probability of causality for the variant. Variant indicates chromosome, position, reference and alternative alleles; beta is the effect, beta.se is the error and P is the P value.
Supplementary Table 6
The full results of the gene-level analysis performed using the mBAT-combo method. Gene indicates ensembl ID, gene_name indicates name of the gene, gene_p is the P value of the gene-level test, nsps is the number of SNPs used for the test, lead_snp is the chromosome, position, reference and alternative alleles for the SNPs with lowest P value in the region, and lead_snp_p indicates the P value of this lead SNP.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Pairo-Castineira, E., Rawlik, K., Bretherick, A.D. et al. GWAS and meta-analysis identifies 49 genetic variants underlying critical COVID-19. Nature 617, 764–768 (2023). https://doi.org/10.1038/s41586-023-06034-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41586-023-06034-3
This article is cited by
-
Enhancing sepsis biomarker development: key considerations from public and private perspectives
Critical Care (2024)
-
PASTRY: achieving balanced power for detecting risk and protective minor alleles in meta-analysis of association studies with overlapping subjects
BMC Bioinformatics (2024)
-
Priority index for critical Covid-19 identifies clinically actionable targets and drugs
Communications Biology (2024)
-
Complex patterns of multimorbidity associated with severe COVID-19 and long COVID
Communications Medicine (2024)
-
Relationship between HLA genetic variations, COVID-19 vaccine antibody response, and risk of breakthrough outcomes
Nature Communications (2024)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.