Whole-genome sequencing reveals host factors underlying critical COVID-19

Critical COVID-19 is caused by immune-mediated inflammatory lung injury. Host genetic variation influences the development of illness requiring critical care1 or hospitalization2–4 after infection with SARS-CoV-2. The GenOMICC (Genetics of Mortality in Critical Care) study enables the comparison of genomes from individuals who are critically ill with those of population controls to find underlying disease mechanisms. Here we use whole-genome sequencing in 7,491 critically ill individuals compared with 48,400 controls to discover and replicate 23 independent variants that significantly predispose to critical COVID-19. We identify 16 new independent associations, including variants within genes that are involved in interferon signalling (IL10RB and PLSCR1), leucocyte differentiation (BCL11A) and blood-type antigen secretor status (FUT2). Using transcriptome-wide association and colocalization to infer the effect of gene expression on disease severity, we find evidence that implicates multiple genes—including reduced expression of a membrane flippase (ATP11A), and increased expression of a mucin (MUC1)—in critical disease. Mendelian randomization provides evidence in support of causal roles for myeloid cell adhesion molecules (SELE, ICAM5 and CD209) and the coagulation factor F8, all of which are potentially druggable targets. Our results are broadly consistent with a multi-component model of COVID-19 pathophysiology, in which at least two distinct mechanisms can predispose to life-threatening disease: failure to control viral replication; or an enhanced tendency towards pulmonary inflammation and intravascular coagulation. We show that comparison between cases of critical illness and population controls is highly efficient for the detection of therapeutically relevant mechanisms of disease.

ensure that the global biomedical research community quickly builds on these promising leads.
C. Data & methodology: validity of approach, quality of data, quality of presentation The bioinformatic tools and strategies used for quality control, alignment and variant calling from the whole genome sequencing data are state-of-the-art in the field and very well described.
A remark concerning the fine mapping methods (lines 568-569): instead of selecting the worst consequence across all available transcripts, where many might not have much biological relevance, I would suggest sticking to the canonical transcript or at least transcripts known to be expressed according to GTEx.
The post-GWAS analyses (TWAS and colocalisation analysis) also follow recognized standards. The choice of focusing on lung and blood expression data makes biological sense and the addition of a pan-GTEx TWAS (metaTWAS sheet in 'TWAS results' suppl. table) is more confusing than useful.

D. Appropriate use of statistics and treatment of uncertainties
The single variant association analyses and aggregate variant testing are carefully performed. Remarkable efforts have been made to address and avoid potential batch effects. Still, the fact that cases and controls were sequenced on different platforms remains a potential issue, which is well explained in the 'Limitations' section. E. Conclusions: robustness, validity, reliability Most of the independent associations identified in the study (22/25) have been replicated in an independent, yet phenotypically distinct cohort (see below, suggested improvements). They are therefore robust and likely to be further validated by other ongoing genomic studies.
As a note of caution, the authors mention in the Discussion the lack of replication of the association with OAS gene variants that was reported in their previous GWAS (lines [264][265][266][267][268][269][270]. They also didn't find convincing associations in additional regions that were identified by others, including TLR7 and several genes involved in the regulation of type I and III interferon, as reported in lines 175 to 179. A more general discussion on winner's curse and other potential reasons for lack of replication would be useful for the community. F. Suggested improvements: experiments, data for possible revision A better description of the study population is needed. Suppl. figures 9 and 10 show an imbalance between cases and controls in terms of age and sex. However, no numbers are given. A table summarizing the differences in basic demographic parameters between cases and controls would be useful. Also, it would be very interesting to know about risk factors / comorbidities among cases, which would allow for better dissection of the genetic signals. For example, is there is any way for the authors to obtain BMI data? Combined with age, such information could help better understand the association results (e.g., through sensitivity analyses in young, non-obese patients, or in older patients with high BMI).
As replication cohort, the authors picked the Host Genetic Initiative (HGI) GWAS meta-analysis round 6 "hospitalised COVID vs. population" (B2 analysis). Another HGI association analysis -"very severe respiratory confirmed covid vs. population" (A2 analysis) -would have been much closer phenotypically to the severe COVID-19 phenotype studied here. The rationale not to use it was that "there are currently insufficient cases from other sources available to attempt replication" (line 141). It is true that GenOMICC contributed 1'825 cases to the analysis, according to the numbers shown here: https://www.covid19hg.org/results/r6/. Still, another 6'954 cases are available (together with about 1mio controls), which is more than enough for replication purpose. I suggest to include that analysis as an additional validation step. G. References: appropriate credit to previous work?
References are complete and up-to-date.
H. Clarity and context: lucidity of abstract/summary, appropriateness of abstract, introduction and conclusions The abstract, intro and conclusion are written in a concise and lucid way. In both the abstract and the conclusion, the authors suggest that the way to translating their results to clinical use is through "testing in clinical trials" (abstract) or through "large-scale randomised trials" (conclusion). While it is certainly true that clinical trials will be needed, it should also be mentioned that more fundamental research into pathogenic mechanisms and immune response is required -in light of the new findings -before anything can be tested in clinical trials.
Referee #3 (Remarks to the Author): Summary Kousathanas and Pairo-Castineira et al reported a whole-genome sequencing study of critical COVID-19. This is an incremental work from their previous publication on a genome-wide association study of critical illness in COVID-19 (Pairo-Castineira et al. Nature 2020). An 3-fold increase in case sample size (2,244 --> 7,491) and deeper genotyping enables the authors to uncover 15 novel risk loci associated with critical COVID-19 (totallying of 22 risk loci). While it is crucial to highlight the importance of deep-phenotyping in the COVID-19, and in infectious diseases study design, I found the presented work has limited improvement to our current knowledge on the host genetics of COVID-19. In particular, the amount of overlapping is large -30% of their currently enrolled severe cases were included in the COVID-19 HGI publication; and the increase in the total case and control numbers were relatively small --20% increase in the case numbers (6,179 → 7,491), and 30fold decrease in control numbers (1,483,780 → 48,400) when comparing it to the recent publica on from the COVID-19 Host Genetic Initiative (Nature 2021). The authors fail to present some critical information when a finding is presented or it is buried in the supplementary information. Overall, I found the current work lack of scientific vigor and could be improved in clarity. My specific comments and concerns are listed below: Major comments: 1. Study design: the advantage of using a whole-genome sequenced cohort versus genotype + imputation is unclear/ not discussed here. Since most of the study participants are European individuals, whether one would gain more information by whole-genome sequencing versus genotyping + imputation is unclear. Since the authors had emphasised this study design in their title and in the abstract, thus it warrants some comments for the value of their current WGS study design.
2. Definition of independent regions. The authors performed a conditional analysis using summary statistics using GCTA cojo. However, given the authors have the individual-level genotype data, it would be more convincing to the readers if the reported risk loci were done using a conditional analysis with individual-level genotype data. In particular, a few of the reported "independent" regions are very close to each other (e.g. rs7528026 and rs41264915 are only ~22KB apart; rs73510898 and rs34536443 are < 50KB apart and are not complete dependent (r2>0)). In Figure 3b, the authors highlighted such an example for two "independent" risk loci in chromosome 21, but it would be more convincing to show the regional plot after the condition on the top association signal (rs17860115).
3. Only 1/25 listed regions share the same top SNP as their previous work in Nature 2020 (rs73064425). Is it a difference due to technology (genotyping+imputation versus WGS) or statistical method (logistic linear model versus linear mixed model). It would be useful to see the association statistics of the previously reported eight SNPs (Table 1) in the current cohort, especially their posterior probabilities for being causal if it is the same locus.
4. Likely causal variants or credible sets from fine-mapping results are not presented in the manuscript. 5. Table 1 and 2 contain duplicate yet inconsistent results? For example, rs1143011457 has different p-values in two tables. The authors should also consider two merge two tables to reduce duplicated information.
6. The authors may consider incorporating the coloc results into Figure 2 by coloring their points by the colocalization posterior probability. Or adding it to Table 1 / 2. 7. The analysis and discussion around the HLA association is incomplete. The listed SNP in the HLA region is a upstream SNP of HLA-DQA1, but the authors identified a HLA-DRB1*04:01 association instead after HLA imputation. Have the authors performed conditional analysis to identify the causal allele or amino acid positions? In addition, mentioned briefly in the method section, the authors are currently performing inference of HLA alleles directly from whole genome sequences instead of imputation. The authors indicated 96% concordance between HIBAG and HLA*LA results for all HLA alleles, but what about HLA-DRB1*04:01 specifically?
8. It is unclear to me whether the authors presented the critically-ill ceases versus mild COVID-19 controls findings. Or how the mild COVID-19 controls were used. 9.In SF4C (PC3 versus PC4), there is a clear difference between cases versus controls. Please can the authors show the PC loading for the reported SNPs to make sure the reported risk variants are not due to population stratification.
Minor comments: 1. Authors should present the number of individuals included in the study broken down by ancestry (Line 122).
2. Authors put their fine-mapping analysis (Line 153-164) under the Replication section, which shouldn't be the case. This manuscript is an extension of work completed by this group previously evaluating COVID-19 severe outcomes in critical respiratory patients with population and mild/asymptomatic controls using sequencing. The team identifies 22 loci in Europeans and an additional 3 loci with a meta-analysis across ancestries. They also use transcriptomics and HLA classical alleles to support their findings. Overall, it is a great contribution to the field. However, there are several reservations about the approach that do deserve some attention.

Information missing in
We thank the reviewer for their generous comments and have endeavoured to respond to each point below.
1) (a) There are very limited details on the 100,000 Genomes project population. Its described briefly as patients broadly representing -rare diseases, cancers and infections. More details are necessary on age, sex, etc. Especially because these are topics included in the supplementary materials but not really addressed. Is there lack of a sex or age difference because of the distribution of these traits in cases/controls?
We agree about the importance of this information and have added additional description of the 100,000 Genomes Project control group in the supplementary information. Firstly, we added a figure showing a detailed phenotypic breakdown of the 100,000 Genomes Project control cohort (Supplementary figure 9. Secondly, we added and updated figures to show the numerical breakdown and distribution for age, sex and body mass index (BMI) across the different cohorts, including the 100,000 Genomes Project controls (Supplementary Figures 10 and 11, Supplementary Table 4).
With regard to differences in genetic predisposition by sex and age, we predict that these effects will be difficult to detect with high confidence due to the smaller effect sizes expected for comparisons within the extreme-susceptibility case group. The Manhattan plots in Supplementary Figure 18 suggest that we may see genome-wide significant differences in these analyses with a larger population in future. We also added a new age, sex, BMI -matched study including a subset of participants (see Supplementary Figure 20 and our reply to reviewer 2, comment F).
1 1) (b) Additionally, we know that guidance for immunocompromised individuals was to take extra cautions to avoid COVID-19 infections. This means limited exposures. So there is a likely selection bias in these controls. Do they actually represent the population? You do have mild/asymptomatic controls and those should be utilized. They can help to support your findings by showing that the direction of the effects (even if p values are not significant) is the same, as is the allele frequencies. These are know exposed individuals and it will also make your inferences much stronger. Inclusion of both controls (separate findings) and their allele frequencies and results in a table should be in the main text. The supplementary table on cohort characteristics is not sufficient, and does not represent any cohort characteristics, but sequencing characteristics.
The additional details we have added in the response above mitigate this risk. For the control population used in this study, unrelated participants were selected from the 100,000 Genomes Project cohort, which includes participants with rare disorders and their family members, and participants with a range of different cancer types. The control group contains 18,915 unaffected family members of rare disease participants, 14,701 affected rare disease participants (not related to the unaffected family members selected) and 12,149 cancer participants. Affected participants with immunological disorders account for <1% of the control group. We have now added a detailed disease breakdown (Supplementary Figure 9) and expanded the cohort description in the results to include this information (under Cohort characteristics).
However the effect of shielding is a potentially important bias that we had not considered. A fraction of our control does include otherwise healthy individuals that were recruited for this study and have had only mild symptoms. In order to address this, we have performed a new GWAS analysis using only the mild COVID-19 cohort as controls and the full severe COVID-19 cohort as cases. As the sample size of the mild cohort controls is small, there is a substantial reduction in power but results for lead loci are consistent with our main results. We have included these results in a new table (Supplementary Table  11), which includes allele frequencies of cases, controls and the mild-only cohort, and refer to them in the main manuscript.
2) The tables included in the manuscript and the supplementary tables do not (unless I missed it) indicate the numbers per category. This is important as we interpret the findings. The forest plots have the direction of the effect and OR and CI, but no allele frequencies or N to capture what might be happening. A fixed effect model is also chosen for the meta-analysis which is interesting given the diversity of your population. Trans-ancestry meta analysis programs may be better suited to handle this diversity (Mantra). Although its hard to determine if this is really a diverse study. The readers need to know who is being studied, what the age, sex, ethnicity breakdowns are and co-morbidities.
We agree with the need to include a better description of the study population and have now provided additional characterisation in the Supplementary Material. We also have added allele frequency data for cases and controls in supplementary table 8.
In response to the suggestion regarding the methodology for the meta-analysis, we repeated the metaanalysis using a random effects model and MR-MEGA, [1] which is designed to account for ancestry differences adding principal components to a fixed-effect model. In order to run MR-MEGA, the number of populations needs to be higher than the number of added pcs+2. Only variants that passed QC in all 4 populations could be tested with MR-MEGA (4M variants remain) adding 1 pc.
We did not see important changes in effects for the lead variants tested (Response to Review Figure 1). Errors and p-values for the three methods are compared for the lead variants in the meta-analysis in Response to Review Table 1: Meta-analysis results for three different approaches (fixed, random and MR-MEGA).
3) The paper is also set up to be the finding of 22 variants in Europeans and an additional 3 in non-Europeans + europeans via the meta-analysis. But, were there any findings outside of Europeans? How do you interpret lack of replication of variants in a sub-population. From the plots we cannot fully interpret (2nd plot not labeled) Were the cases in each sub-population all at the same severity level? The table is set up with RAF of Europeans, but you discuss trans-ancestry. If the approach is trans-ancestry then full information on all populations should be provided.
We had previously left the labels off the second plot to indicate the new findings from trans-ancestry analysis. In order to make comparison easier, we have added the labels for all peaks to this plot ( Figure  1).
Cases were defined the same way across ancestry groups. We include Manhattan plots for each ancestry independently in Supplementary Figure 12, showing genome-wide significant replication of the association at 3p21.31 in the South Asian ancestry group but no new associations in non-European ancestry groups. This is consistent with the limited numbers of cases in these groups, and hence the lower statistical power to detect ancestry-specific associations. We observe no heterogeneity of effect for 23 hits (Supplementary Figure 14); there is significant heterogeneity in two loci, both highlighted in Table 1 and shown in Supplementary Table 8. 4) One novel finding is FUT2. I caution interpretation of FUT2 because its know that ABO, secretor status, and Lewis genotypes are all highly dependent on ethnicity or geographical ancestry. This should be explored more to be sure it is not a European gradient effect (either by cases or controls or both). It could be evaluated by looking at tight clusters of European ancestry individuals from the PCA and seeing this effect remains or is supported.
We appreciate this suggestion and have completed the analysis as proposed. We did not find any evidence of a gradient effect.
Response to Review Figure 1: Manhattan plot showing results obtained with MR-MEGA Specifically, we defined increasingly tight clusters of European ancestry and performed a logistic regression of case/control status on the lead variant at FUT2 (rs368565) and the same covariates as in the main analysis. Clusters were defined as follows; we projected all individuals of european ancestry on the three leading population-specific genomic principal components and computed the mean individual. We then considered subsets comprising 80%, 60%, 40% and 20% of individuals with the lowest distance to the mean individual. These are illustrated in Response to Review Figure 2 The results of analyses on these subsets are summarised in Response to Review Table 1, showing the estimated OR and its 95% confidence interval for the lead SNP for each of the subsets as well as for the whole sample as reported in the main manuscript. There is no evidence of a systematic change in effect for increasingly tight subsets.
5) The use of controls sequenced on a different platform can often lead to batch effects or other technical artifacts. Discussion of this concern should be included in the manuscript, along with measures on mitigation. This is a recognized issue in the field, especially for sequencing and could result in spurious outcomes.
We fully agree with the reviewer and have taken extensive steps to mitigate the possibility of batch effects. Firstly, we masked low quality genotypes (with low GQ, low depth or aberrant ab-ratio) and then filtered stringently for low missingness to "homogenise" the different platforms. Secondly, we performed a control-control relative allele frequency filter based on samples that were aligned in both processing platforms (Illumina NSV4 vs. Genomics England Pipeline 2.0), which removes most Response to Review Figure 2: PCA plot with different colors indicating individuals in different quintiles of the distance distribution of the batch effect noise. Thirdly, we performed an LD-support based filter where we assessed the significance of each variant by the support of its neighbours. This meant that significant variants with no LD-partners or LD-partners with inconsistent GWAS summaries were eliminated. These filters might have been overly stringent, but we deemed them necessary to drastically reduce false-positives.
We have added these points to the discussion as suggested.
6) The lack of rare variant gene based associations is intriguing, especially since you restricted to putative snps. Are the lead variants from the GWAS all common variation, with no rare variants underlying these associations? Or is it just that the test for gene based rare variants is not significant? I think those are two different points, but important, because this sequencing study has the opportunity to fine map these regions. The supplementary section has information on fine mapping that should be included in the main text. In my opinion, this is what moves this paper beyond the GWAS reported by this group and others. Can we identify putative causal alleles, and/or a putative causal gene with the sequencing. At the very least the Chromosome 3 region and narrowing down to a gene should be highlighted. Credible sets are included in the discussion, but there is no table or way to visualize these credible sets, and if in the conditional analysis a single SNP explains these associations.
This suggestion inspired us to extend our fine-mapping analysis to check whether rare variants could underlie the association signals discovered. As we explain in detail in new supplementary section "Genetic fine-mapping check for rare variants", we performed a new fine-mapping analysis with susieR for our largest EUR cohort using all variants with MAF>0.02% (reduced from 0.5%) around the 23 EUR-discovered signals. This analysis did not reveal any new potential causal variants for our main signals (shown in Supplementary Table 10).
We agree with the reviewer that the fine mapping is a major strength of the work and have moved a larger part of this analysis into a prominent position in the main manuscript. We have also added an additional Supplementary Table 9 to show other potential candidates that we identified for each credible set and have also performed an enrichment analysis using genes identified from fine-mapping and colocalisation (Supplementary Figure 15).

Referee #(Remarks to the Author):
A  (2021)) a genotyping-based GWAS performed using data from 2,244 critical COVID-19 cases, which identified 4 association signals. Now, the increase in sample size coupled with the use of whole genome sequencing and statistical fine-mapping has led to the identification of 25 independent association signals (22 of which replicated in another dataset), included 15 new associations.
B. Originality and significance These results are timely and important, as the newly identified genes and variants are potentially informative about SARS-CoV-2 pathogenic mechanisms causing the most severe clinical presentations. The road to concrete clinical translation of the findings is obviously not straightforward, as duly acknowledged by the authors, but a large diffusion of the study results -and of the underlying genetic data -is the surest way to ensure that the global biomedical research community quickly builds on these promising leads.
We thank the reviewer for their support for the work and agree that there is considerable urgency in making these findings widely available.
C. Data & methodology: validity of approach, quality of data, quality of presentation The bioinformatic tools and strategies used for quality control, alignment and variant calling from the whole genome sequencing data are state-of-the-art in the field and very well described.
A remark concerning the fine mapping methods (lines 568-569): instead of selecting the worst consequence across all available transcripts, where many might not have much biological relevance, I would suggest sticking to the canonical transcript or at least transcripts known to be expressed according to GTEx.
As suggested, to increase the biological relevance of our variant annotation, we have modified our approach to use only transcripts inluded in the GENCODE basic set, which is a smaller set of transcripts with full-length protein coding transcripts [2]. We utilised the worst predicted consequence type across those transcripts as we feel it is potentially more clinically relevant. We updated The post-GWAS analyses (TWAS and colocalisation analysis) also follow recognized standards. The choice of focusing on lung and blood expression data makes biological sense and the addition of a pan-GTEx TWAS (metaTWAS sheet in 'TWAS results' suppl. table) is more confusing than useful.
We agree that this table is difficult to interpret and have removed it from the supplementary table.
D. Appropriate use of statistics and treatment of uncertainties The single variant association analyses and aggregate variant testing are carefully performed. Remarkable efforts have been made to address and avoid potential batch effects. Still, the fact that cases and controls were sequenced on different platforms remains a potential issue, which is well explained in the 'Limitations' section.
We thank the reviewer for recognising our efforts to correct for batch effects and have added to this section in response to comments from this and other reviewers.
E. Conclusions: robustness, validity, reliability Most of the independent associations identified in the study (22/25) have been replicated in an independent, yet phenotypically distinct cohort (see below, suggested improvements). They are therefore robust and likely to be further validated by other ongoing genomic studies. As a note of caution, the authors mention in the Discussion the lack of replication of the association with OAS gene variants that was reported in their previous GWAS (lines 264-270). They also didn't find convincing associations in additional regions that were identified by others, including TLR7 and several genes involved in the regulation of type I and III interferon, as reported in lines 175 to 179. A more general discussion on winner's curse and other potential reasons for lack of replication would be useful for the community.
We agree that this is an important point and have expanded the discussion to explain the winner's curse in order to make it clearer to non-specialists.
F. Suggested improvements: experiments, data for possible revision A better description of the study population is needed. Suppl. figures 9 and 10 show an imbalance between cases and controls in terms of age and sex. However, no numbers are given. A table summarizing the differences in basic demographic parameters between cases and controls would be useful. Also, it would be very interesting to know about risk factors / comorbidities among cases, which would allow for better dissection of the genetic signals. For example, is there is any way for the authors to obtain BMI data? Combined with age, such information could help better understand the association results (e.g., through sensitivity analyses in young, non-obese patients, or in older patients with high BMI).
We have added the table of characteristics as suggested by this reviewer and reviewer 1. In response to the specific suggestion to include BMI, we have obtained linked data for a majority of cases from intensive care national audit data (ICNARC Case Mix Programme).
Response to Review Figure 3: Results for lead variants of this study comparing effect size (BETA) and P -values for EUR GWAS analyses using cases with matched and unmatched controls. Left panel shows the results of the matched study which used default covariates of age, sex, age × sex and 20 PCs. Middle panels show results of a study using unmatched controls of the same sample size as the matched study and using covariates as the principal study gwas using only default covariates. Right panels show results of an unmatched study using default + BMI as covariate. Results for EUR-discovered loci are shown in black and with grey the trans-ancestry meta-analysis results are shown. Error bars for BETA represent standard errors of estimates.
We have repeated the GWAS analysis using a subset of the study population, matching case and controls groups for BMI and other characteristics. As expected, there is a reduction in statistical power in this analysis but the direction and magnitude of effect at genome-wide significant loci are consistent. We report these results in a new section in the Supplementary Information, "Matched case-control analysis". For convenience, we reproduce the key figure (Supplementary Figure 21) in Response to Review Figure 3.
As replication cohort, the authors picked the Host Genetic Initiative (HGI) GWAS meta-analysis round 6 "hospitalised COVID vs. population" (B2 analysis). Another HGI association analysis -"very severe respiratory confirmed covid vs. population" (A2 analysis) -would have been much closer phenotypically to the severe COVID-19 phenotype studied here. The rationale not to use it was that "there are currently insufficient cases from other sources available to attempt replication" (line 141). It is true that GenOMICC contributed 1,825 cases to the analysis, according to the numbers shown here: https://www.covid19hg.org/results/r6/. Still, another 6,954 cases are available (together with about 1mio controls), which is more than enough for replication purpose. I suggest to include that analysis as an additional validation step.
We have added this additional replication, using the HGI "A2" data set, in Supplementary Table 8.
We selected the "B2" analysis a priori, using the same approach as we took in our previous paper in 2020. [3] Although we agree that the "A2" analysis has improved during the intervening 9 months, there are important differences between the carefully-selected, prospectively-recruited GenOMICC case group, and the broader cohort meeting the definition of "critical COVID-19" in our meta-analysis with HGI. Both our analyses, and the leave-one-out statistics from HGI, demonstrate that discovery power in the GenOMICC cohort is substantially greater than in the remaining 75% of cases included in the HGI critical COVID-19 analysis. This is likely to be due to a combination of factors, including random variation and the winner's curse, but also in study design, such as: 1. International differences in critical care practice -patients admitted to intensive care units (ICU) in the UK are generally younger, sicker and less comorbid than in other healthcare systems, creating a more extreme phenotype more suitable for population comparisons. [4] 2. Prospective case ascertainment may have improved the quality of phenotyping in GenOMICC compared with some retrospective studies.
We discuss this issue further below in response to comments from reviewer 3.
G. References: appropriate credit to previous work? References are complete and up-to-date.
H. Clarity and context: lucidity of abstract/summary, appropriateness of abstract, introduction and conclusions The abstract, intro and conclusion are written in a concise and lucid way. In both the abstract and the conclusion, the authors suggest that the way to translating their results to clinical use is through "testing in clinical trials" (abstract) or through "large-scale randomised trials" (conclusion). While it is certainly true that clinical trials will be needed, it should also be mentioned that more fundamental research into pathogenic mechanisms and immune response is required -in light of the new findings -before anything can be tested in clinical trials.
We have edited both the abstract and concluding paragraph to make this clearer.

Referee #(Remarks to the Author):
Summary Kousathanas and Pairo-Castineira et al reported a whole-genome sequencing study of critical COVID-19. This is an incremental work from their previous publication on a genomewide association study of critical illness in COVID-19 (Pairo-Castineira et al. Nature 2020). An 3-fold increase in case sample size (2,244 -> 7,491) and deeper genotyping enables the authors to uncover 15 novel risk loci associated with critical COVID-19 (totallying of 22 risk loci). While it is crucial to highlight the importance of deep-phenotyping in the COVID-19, and in infectious diseases study design, I found the presented work has limited improvement to our current knowledge on the host genetics of COVID-19. In particular, the amount of overlapping is large -30% of their currently enrolled severe cases were included in the COVID-19 HGI publication; and the increase in the total case and control numbers were relatively small -20% increase in the case numbers (6,179 → 7,491), and 30-fold decrease in control numbers (1,483,780 → 48,400) when comparing it to the recent publication from the COVID-19 Host Genetic Initiative (Nature 2021). The authors fail to present some critical information when a finding is presented or it is buried in the supplementary information. Overall, I found the current work lack of scientific vigor and could be improved in clarity. My specific comments and concerns are listed below: We thank the reviewer for their in-depth comments. Our objective in this work is to find new genetic 9 associations with severe COVID-19, which we have done with exceptional efficiency. Whilst comparing the total number of cases and controls between studies is a reasonable starting point, we would argue that relying only on the raw numbers risks overlooking the importance of careful study design and conduct. The clinical insight underlying our study design, together with careful prospective recruitment, are the key advantages of GenOMICC, and the benefits are demonstrated by our discovery of 15 new genetic associations.
More specifically, our cases were defined as COVID-19 positive patients who were assessed as criticallyill by the treating clinician. This case definition is qualitatively different from even the most severe definition of the HGI meta-analysis phenotype (i.e., analysis "A" for COVID-19 positive hospitalised patients who died or required respiratory support). As discussed above, the critically ill population included in GenOMICC in the UK have been selected by clinical admission practice to be younger, and less frail, than critically ill populations in other healthcare systems [4].
Furthermore, an important consideration that is not incorporated into HGI case definitions is that some patients who die of COVID-19 do not have hypoxaemic respiratory failure: we have demonstrated this in our previous autopsy studies [5] and in the ISARIC4C study [6]. Importantly, many non-critically ill patients have distinct pathophysiology underlying their disease, indicated by the opposite effect of steroid treatment on mortality, which we discovered in the RECOVERY trial. [7] Death in this group is strongly influenced by frailty and comorbidity; hence using unselected hospital mortality as an outcome, as has been used in A2, is highly likely to add noise to a study of genetic susceptibility. In contrast, critical COVID-19 is a remarkably homogeneous -and extreme -phenotype.
We would also point out that increasing the size of the control cohort in an imbalanced study increases power only asymptotically. Using a calculation of the effective sample size as: N ef f = 2 1/N cases + 1/N controls [8] we obtain N ef f =12,307 for the cited HGI study and N ef f =12,974 for the present study. Therefore, the size of the control cohort that was used in this study did not appreciably reduce power compared with the HGI analysis, even though the number of individuals in the control cohort was much larger in the latter.
Our prediction that the GenOMICC design would be substantially more powerful is confirmed by the data. As observed by reviewer 2 above, although there are 6,179 cases in our recent meta-analysis report with HGI (and similar N ef f ), that report only discovered one new genetic association with critical COVID-19 (rs77534576, near TAC4 ) -taking the total number of associations with critical illness to 6.
In contrast, in our present report of 22 genome-wide significant and replicated associations with critical COVID-19, only 6 have been previously reported to be associated with that phenotype, and 15 are completely new findings, which we believe is a substantial increment on previous knowledge.
Major comments: 1. Study design: the advantage of using a whole-genome sequenced cohort versus genotype + imputation is unclear/ not discussed here. Since most of the study participants are European individuals, whether one would gain more information by whole-genome sequencing versus genotyping + imputation is unclear. Since the authors had emphasised this study design in their title and in the abstract, thus it warrants some comments for the value of their current WGS study design.
We thank the reviewer for their comments. In response, we are pleased to emphasise the benefits of whole genome sequencing (WGS) in a revised manuscript.
As COVID-19 is a new disease with unknown genetic architecture, our design employed the best available sequencing strategy to investigate genetic variation across the full allele frequency spectrum underlying COVID-19 severity. More specifically, we used high-coverage WGS instead of employing a genotyping and imputation approach to accurately interrogate common, low frequency and rare variants, providing particular benefit for rarer variants which are less likely to be part of reference panels employed for imputation, and less likely to be well imputed [9].
Although our rare variant gene-level testing identified no gene-wide significant signals, our GWAS study with more common variants successfully identified many new loci. We agree with the reviewer that most of the common signals could have been found with genotype + imputation design, but WGS data provided our study with several benefits, including improved fine-mapping, and excellent coverage of indels and multi-allelic variants, many of which are represented in the credible sets for new association signals. These variants are often overlooked because they are harder to impute. As we explain in detail in new supplementary section "Genetic fine-mapping check for rare variants", we performed a new fine-mapping analysis using all variants with MAF>0.02% (reduced from 0.5%) around the 23 EUR-discovered signals. This analysis did not reveal any new potential causal variants for our main signals (shown in Supplementary Table 10). It is thus also important to note that the absence of rare variants driving the new and known genetic signals here is in itself a valuable contribution to our understanding of the genetic architecture of COVID-19. We have amended the discussion to make this clearer.
Moreover, we have added a Mendelian randomisation analysis, which makes use of the high-resolution genotype data to maximise the number of overlapping variants with existing protein expression data, and enrichment analysis that leverages the additional missense mutations in genes identified by finemapping together with results from TWAS and coloc to discover relevant gene pathways for COVID-19 (Extended Data Table 1 Supplementary Table 15).
Finally, we have added a second protein-coding structure change to the main manuscript, since additional replication data obtained during the review process indicates that the association in IFNA10 replicates robustly in a second study (Regeneron and Ancestry.com). This, together with the amino acid substitution in PLSCR1, provides an indication of the value of fine mapping with WGS. '' 2. Definition of independent regions. The authors performed a conditional analysis using summary statistics using GCTA cojo. However, given the authors have the individual-level genotype data, it would be more convincing to the readers if the reported risk loci were done using a conditional analysis with individual-level genotype data. In particular, a few of the reported "independent" regions are very close to each other (e.g. rs7528026 and rs41264915 are only~22KB apart; rs73510898 and rs34536443 are < 50KB apart and are not complete dependent (r2>0)). In Figure 3b, the authors highlighted such an example for two "independent" risk loci in chromosome 21, but it would be more convincing to show the regional plot after the condition on the top association signal (rs17860115).
We are grateful to the reviewer for this suggestion, which in our view improves the robustness of our conditional analysis. We have run this confirmatory analysis using individual-level data with SAIGE. All our main loci remain significant after conditioning on the other lead signals on the same chromosome (new Supplementary Table 6). We have added a description of the additional methods and refer to the new results in the main manuscript.
3. Only 1/25 listed regions share the same top SNP as their previous work in Nature 2020 (rs73064425). Is it a difference due to technology (genotyping+imputation versus WGS) or statistical method (logistic linear model versus linear mixed model). It would be useful to see the association statistics of the previously reported eight SNPs (Table 1) in the current cohort, especially their posterior probabilities for being causal if it is the same locus.
We agree this is a useful link to our previous report and have added this as a column in supplementary table 7. Apart from differences with genotyping + imputation versus WGS, for the previous work we applied a strict threshold for QC using only SNPs with high quality imputation (score >0.9) in both UK BioBank and our imputed data, therefore not including variants that have now been tested.
4. Likely causal variants or credible sets from fine-mapping results are not presented in the manuscript.
We thank the reviewer for this suggestion, especially given that the fine-mapping is a strength of our approach, and have now added Supplementary Table 9 showing summary info about the size of the credible sets and added probable causal variants that have the worst predicted consequence type among the variants in the credible set. We provide full fine-mapping credible sets in Supplementary File GWAS.xlsx.
5. Table 1 and 2 contain duplicate yet inconsistent results? For example, rs1143011457 has different p-values in two tables. The authors should also consider two merge two tables to reduce duplicated information.
We agree this is a better way to present our results. The reason for the discrepancy is that a different set of results (from trans-ancestry meta-analysis) was presented in the second table. We think this is now much clearer in the revised version of Table 1 where we have merged important information for  previous table 1 and table 2. We have added supplementary table 8 to present the full results of the replication meta-analysis with HGI.
6. The authors may consider incorporating the coloc results into Figure 2 by coloring their points by the colocalization posterior probability. Or adding it to Table 1 / 2. We greatly appreciate this suggestion by the reviewer which helped us improve Figure 2 to add different colors for loci with strong and moderate evidence of colocalisation (PP[H4], >0.8 and >0.5, respectively), which we show below. We provide the actual numerical values for posterior probabilities for colocalisation of all variants in the supplementary excel file TWAS.xlsx.
7. The analysis and discussion around the HLA association is incomplete. The listed SNP in the HLA region is a upstream SNP of HLA-DQA1, but the authors identified a HLA-DRB1*04:01 association instead after HLA imputation. Have the authors performed conditional analysis to identify the causal allele or amino acid positions? In addition, mentioned briefly in the method section, the authors are currently performing inference of HLA alleles directly from whole genome sequences instead of imputation. The authors indicated 96% concordance between HIBAG and HLA*LA results for all HLA alleles, but what about HLA-DRB1*04:01 specifically?
We have conducted a conditional analysis on HLA-DRB1*04:01 and have observed no other significantly associated HLA alleles or SNPs. We have now edited the main text to clarify this point and moved the figure presenting the conditional analysis to main manuscript as Extended Data Figure 4.
We did not conduct analysis at the amino-acid level. Following the reviewer's comment, we specifically looked at the concordance between HIBAG and HLA-LA results for the HLA-DRB1*04:01 calls. For the HLA-DRB1*04:01 carriers in HIBAG, we report a concordance (i.e. both calls agreeing at the G-group resolution between the HIBAG and HLA-LA callsets) of 87.6% across 7,927 EUR samples for which we also had HLA-LA data. The concordance is comparable across samples processed using different pipelines (86.5% for samples processed with NSV4, 87.9% for samples processed with Pipelines 2.0) and case control status (88.2% in cases, 87.5% in controls). We also re-ran our association analysis on concordant samples only using the same model as for the full analysis, and confirmed the observed association with HLA-DRB1*04:01 (OR = 0.78, 95%CIs : 0.75 − 0.81, P = 1.3 × 10 −11 ) which is stronger than for the lead variant (OR = 0.88, 95%CIs : 0.86 − 0.90, P = 4.4 × 10 −9 ), consistent with our results on the full HIBAG callset. We have also added this to the supplementary material.
8. It is unclear to me whether the authors presented the critically-ill ceases versus mild COVID-19 controls findings. Or how the mild COVID-19 controls were used.
We apologise for the lack of clarity in explaining how the mild individuals were used. Mild individuals were used in the GWAS analysis as controls together with individuals from the 100,000 Genomes Project. This is shown in the displayed diagram of the study design (Supplementary Figure 1). We have elaborated on this point in the caption of this figure to improve clarity. In light of comments received from reviewers, we have also performed an additional GWAS analysis with severe cases compared with only the individuals who have experienced mild or asymptomatic COVID-19 to further support the main results (see response comment 1B to reviewer 1). 9. In SF4C (PC3 versus PC4), there is a clear difference between cases versus controls. Please can the authors show the PC loading for the reported SNPs to make sure the reported risk variants are not due to population stratification.
We thank the reviewer for this comment. We did not observe obvious case/control imbalances from observation of the projection PCs (Supplementary Figure 4C and reproduced in Response to Review Figure 3 for convenience).
Note that in Supplementary Figure 4 (panels B-C), gray symbols represent background samples from the 1KGP3. We have amended the figure to make this more clear. We note that we primarily use this figure to check whether our cases and controls have been assigned correctly to ancestry groupings and cluster together. As we have performed population-specific GWAS, we also present population-specific PCs, which were used as covariates for the association analyses and are shown in Supplementary Figures  5,6,7,8). We did not observe case/control stratification in these population specific PCs either.
In addition to visual assessment and following the reviewers suggestion, we now also calculated SNP loadings for major axes of population-specific variation of PCs 1-5 for AFR, EAS, EUR and SAS ancestry groups. We calculated loadings for all 25 lead variants of this study and plotted them against the background of 58,925 high quality independent variants that were used for computing the PCA and controlling for stratification.
As shown in the loadings figure below, the majority of signals do not display unusual stratification, although there are four outliers. For PC1 these are rs2271616 for EUR, SAS, AFR, EAS populations, rs73064425 for EUR and SAS populations and rs114301457 for SAS. For PC3, it is rs9271609 for the SAS population. All of these loci were discovered in the EUR-only analysis. The two loci (rs2271616 and rs73064425) that have outlier EUR loadings have been previously reported as being associated with COVID-19 phenotypes [11,12] and we replicated these findings in this study (i.e, we do not report them as new findings here, see also main table 1).
Response to Review Figure 4: Thick dashed lines indicate 3 standard deviations away from the mean for each loading.
As a further check we also investigated whether for rs2271616 and rs73064425 population-specific stratification could be driving the signals. We performed logistic regression with plink2 using the same covariates as in the main analysis, but selecting individuals in successively tighter population centric clusters of individuals. These subsamples were computed by considering the quintiles of the distribution of distance between individuals and the mean individual of the population in PCA space, illustrated in Response to Review Figure 5.
Response to Review Figure 5: PCA for FUT2 The results for these EUR subsets are summarised in Response to Review Table 2 showing the estimated OR and 95% confidence interval for the two loci. There is no evidence for a major change in effect size for increasingly tight clusters apart from a small attenuation for rs73064425 that could also be due to 15 sample variance or true covariance between severe COVID-19 risk and the PC1 axis. Response to Review Table 2 16 Minor comments: 1. Authors should present the number of individuals included in the study broken down by ancestry (Line 122).
We thank the reviewer for pointing this omission from the main paper. We have now added the ancestry details in main results.
2. Authors put their fine-mapping analysis (Line 153-164) under the Replication section, which shouldn't be the case.
We have placed the fine-mapping analysis to the appropriate section in the results, i.e, after the GWAS analysis description and before replication.

Reviewer Reports on the First Revision:
Referees' comments: Referee #1 (Remarks to the Author): The authors have considered the previous comments carefully, and have addressed the comments. My biggest concern is that many of these comments are not included in the main manuscript, but instead presented as tables or figures in the extensive supplementary section. This concerns me because these are real issues that you have tried to address and by not acknowledging them in the paper even in a few sentences, you will have others question this data as well, or misinterpret the findings. In addition, you are still holding back some of the key points that global investigators will find interesting--the fine mapping, credible sets, and conditional analyses.
1) The description of 100,000 genomes is now in the manuscript, but please include comments on this population and how this will or will not affect your results. Using rare cancer patients is not representative of the general population. Even if you summarize some of the disease breakdown. Cancer patients can also be considered immunocompromised if on treatment...but at the least they were cautious on exposures during the pandemic. The additional supplementary table 11 data can be included in a few lines of text.
2) The interesting findings seem to be in the supplementary section! The credible set information and the fine mapping really should be incorporated into the main text. For example, the chromosome 3 finding--suggesting two hits under your peak is important! You have the best data to get at what might be happening under these peaks (supplementary page 30) --and you have some diversity in your population to fine map it. Please make this more prominent in the main manuscript. Similarly, the additional conditional analyses should be addressed in the manuscript even if the figures remain in the supplementary.
Tt seems the authors are perhaps focused on rare variants not underlying their associations (a great point) but then not addressing how the rare variants (and LD blocks/recombination) actually help to narrow down the existing published work.
3) Age and ancestry may be confounded. In the age section please address the plots that show in some populations the age of the controls is very different than the age of the cases. The focus in the subsequent plots is on Europeans and age, but the other ancestry populations are not balanced and that may lead to some spurious results. See Supp Fig 10   4) In the individual conditional analysis, what is used in the conditional analysis? The effects seem to get stronger (more significant p-value) --can you include the region and or rsid that you are conditioning on? 5) In the comparison to 2020 paper--two regions are less significant. Have those been replicated in other studies? Or are you indicating that they were initially false positives/spurious associations? Supplementary Table 7 Referee #2 (Remarks to the Author): The authors have carefully considered all the comments and concerns raised during the first round of reviews. They performed extensive additional analyses and provided detailed explanations that result in a muchimproved manuscript. Thank you for the authors' fast and in-depth responses. I still have a few remaining comments: 1. <b>Number of independent risk loci</b>. It is typical in GWAS to define a risk locus as a genomic region within ± 500kb -1Mb from the lead variant. I still think the authors should use this convention in presenting their number of independent risk loci. Even if after conditional analysis there are multiple risk variants, it could be just evidence for multiple causal variants in the SAME risk locus.
2. Figure 1 and Table 1 should present <i>HLA-DRB1</i> as the lead variant, and not just HLA (which is not a single gene).
3. <b>Colonization in the HLA region</b>. Interestingly there is no coloc signal for the HLA region (Figure 2). But a search on the OpenTargets website (https://genetics.opentargets.org/variant/6_32623820_T_C) shows the lead variant (rs9271609) is in an eQTL with HLA-DRB1 in multiple tissues. Please can the authors speculate for the reason why it is the case? Could it be that the default parameters of coloc are not ideal for this gene/variants-dense region? What is a simple Pearson correlation between GWAS versus eQTL p-value in this region?
My biggest concern is that many of these comments are not included in the main manuscript, but instead presented as tables or figures in the extensive supplementary section. This concerns me because these are real issues that you have tried to address and by not acknowledging them in the paper even in a few sentences, you will have others question this data as well, or misinterpret the findings. In addition, you are still holding back some of the key points that global investigators will find interesting-the fine mapping, credible sets, and conditional analyses.
We thank their reviewer for their comment. We have moved some suggested items from the supplementary material to main text and expanded relevant sections as detailed below.
1) The description of 100,000 genomes is now in the manuscript, but please include comments on this population and how this will or will not affect your results. Using rare cancer patients is not representative of the general population. Even if you summarize some of the disease breakdown. Cancer patients can also be considered immunocompromised if on treatment...but at the least they were cautious on exposures during the pandemic. The additional supplementary table 11 data can be included in a few lines of text.
We have added a new section (highlighted in red) to the results in the main text to present more clearly the caveats of using the 100k as controls and present more clearly the two sensitivity analyses additional matched study and the comparison with mild.
1 Author Rebuttals to First Revision: 2) The interesting findings seem to be in the supplementary section! The credible set information and the fine mapping really should be incorporated into the main text. For example, the chromosome 3 finding-suggesting two hits under your peak is important! You have the best data to get at what might be happening under these peaks (supplementary page 30) -and you have some diversity in your population to fine map it. Please make this more prominent in the main manuscript.
We thank the reviewer for their comment and we agree that our fine-mapping approach should be more prominent in the main text, as it represents a strength of this work owing to the availability of WGS data. To this end, we have moved the supplementary table giving fine-mapping details to the main text as Extended Data Table 2. In addition, we have now expanded the results section with the addition of a new section (highlighted in red) on fine-mapping in which we explicitly draw attention to the fine-mapping of the association signal in the 3p21 region.
Similarly, the additional conditional analyses should be addressed in the manuscript even if the figures remain in the supplementary.
We have added an explicit mention in the results, as per the reviewer's suggestion, highlighted in red in the resubmitted manuscript.
Tt seems the authors are perhaps focused on rare variants not underlying their associations (a great point) but then not addressing how the rare variants (and LD blocks/recombination) actually help to narrow down the existing published work.
We thank the reviewer for their comment. We have now expanded the fine-mapping section of the results, following the reviewer's suggestion, to explicitly mention the narrowing down of potential causal loci for many signals and refer to the new Extended Data Table 2 where 8 independent missense or worse variants are shown.
3) Age and ancestry may be confounded. In the age section please address the plots that show in some populations the age of the controls is very different than the age of the cases. The focus in the subsequent plots is on Europeans and age, but the other ancestry populations are not balanced and that may lead to some spurious results. See Supp Fig 10 We thank the reviewer for their comment and we have performed additional analyses to address it. We focused our matched study on the EUR population as the majority of our findings (22 out of 25 associations) were found in this population. However, to additionally check any biases for the three signals (chr1:155175305:G:A,chr2:60480453:A:G,chr6:41515007:A:C) that were discovered through multi-ancestry meta-analysis, we performed matching for the other three ancestries (SAS, AFR, EAS) as well (new breakdown Supplementary Figure 19), and meta-analysed the summaries. The results were almost identical to the unmatched primary study (Supplementary Figure 22) and are also pasted below for convenience.
Response to Review Figure 1: Comparison of effect sizes between unmatched and matched control results for the three loci that were found significant in the multi-ancestry meta-analysis. For the matched study, controls were matched to cases by propensity score matching to cases for which we had BMI information (using age, sex and bmi as matching covariates and performed separately for each ancestry).

4) In the individual conditional analysis
, what is used in the conditional analysis? The effects seem to get stronger (more significant p-value) -can you include the region and or rsid that you are conditioning on?
We have added a column to Supplementary table 6 with the rsids of the variants that we conditioned on.

5)
In the comparison to 2020 paper-two regions are less significant. Have those been replicated in other studies? Or are you indicating that they were initially false positives/spurious associations? Supplementary Table 7 Apart from the OAS1 locus, which could theoretically indicate a change in the virus, the only significant SNPs from the Pairo-Castineira et al.(2020) paper that have declined in significance are in the HLA region. These SNPs were not replicated in Pairo-Castineira et al. using HGI and 23andme data and we did not report them as confirmed findings. In subsequent HGI freezes these SNPs were shown to have high heterogeneity between studies indicating that they may be false positives or specific to some patient populations.

Referee #2 (Remarks to the Author)
The authors have carefully considered all the comments and concerns raised during the first round of reviews. They performed extensive additional analyses and provided detailed explanations that result in a much-improved manuscript.
Here are my (minor) remaining points: -This study identifies significant associations with severe Covid for a total of 5 relatively common variants with clear roles in type I interferon signalling. Even if the gene-based association tests performed here don't replicate the previous findings of Zhang et al. (Science 2020) regarding very rare deleterious variants (with MAF <0.1%) in genes of the same pathways, I think it would be adding a line in the discussion to comment on the convergence of the findings.
We have added to the discussion of this interesting convergence. (Highlighted in red in submitted manuscript.) Note also that, since our last submission, we have successfully replicated the association in IFNA10 in an external cohort, and this is added to the main results table.
-On a related note, I couldn't find the supplementary File AVTsuppinfo.xlsx, which is supposed to show the SKAT-O p-values for the 13 interferon-related genes tested by Zhang et al. Also, a precise description of TLR7 variation is missing. Considering that TLR associations have now been replicated by several groups, it would be worth getting a more precise account of the variants that were observed in this cohort. Is there absolutely no enrichment of deleterious TLR7 variants?
We apologise to have omitted the supplementary info files in our previous revised submission. We have now added those in our latest revised manuscript.
Regarding TLR7, in our largest EUR cohort, we assessed 3 predicted Loss of Function (pLoF) and 36 missense variants. Although all 3 pLoF were found in the cases, this data did not generate a significant P -value (P =0.30). For missense mutations, we also observed a small enrichment in cases (39 missense in cases versus 28 missense in controls) but was not significant (P =0.075). We added these P-values for TLR7 tests to the burden results section explicitly.
-Still about TLR7: the reference [11] for the following statement (lines 133-134) is wrong: "We also did not replicate the reported association for the toll-like receptor 7 (TLR7) gene." The 3 best references for TLR7 associations are the following: -van der Made et al., JAMA. 2020 Aug 18;324 (7) We thank the reviewer for noting these omissions and we have amended the references for reported association for TLR7 as suggested.