Identifying the host genetic factors underlying severe COVID-19 is an emerging challenge1,2,3,4,5. Here we conducted a genome-wide association study (GWAS) involving 2,393 cases of COVID-19 in a cohort of Japanese individuals collected during the initial waves of the pandemic, with 3,289 unaffected controls. We identified a variant on chromosome 5 at 5q35 (rs60200309-A), close to the dedicator of cytokinesis 2 gene (DOCK2), which was associated with severe COVID-19 in patients less than 65 years of age. This risk allele was prevalent in East Asian individuals but rare in Europeans, highlighting the value of genome-wide association studies in non-European populations. RNA-sequencing analysis of 473 bulk peripheral blood samples identified decreased expression of DOCK2 associated with the risk allele in these younger patients. DOCK2 expression was suppressed in patients with severe cases of COVID-19. Single-cell RNA-sequencing analysis (n = 61 individuals) identified cell-type-specific downregulation of DOCK2 and a COVID-19-specific decreasing effect of the risk allele on DOCK2 expression in non-classical monocytes. Immunohistochemistry of lung specimens from patients with severe COVID-19 pneumonia showed suppressed DOCK2 expression. Moreover, inhibition of DOCK2 function with CPYPP increased the severity of pneumonia in a Syrian hamster model of SARS-CoV-2 infection, characterized by weight loss, lung oedema, enhanced viral loads, impaired macrophage recruitment and dysregulated type I interferon responses. We conclude that DOCK2 has an important role in the host immune response to SARS-CoV-2 infection and the development of severe COVID-19, and could be further explored as a potential biomarker and/or therapeutic target.
COVID-19, caused by SARS-CoV-2, remains a serious global public health issue6. Although promising vaccines have recently become available, the emergence of SARS-CoV-2 variants may delay the end of this pandemic7. COVID-19 manifests as a range of clinical presentation from asymptomatic infection to fatal respiratory or multi-organ failure, with multiple risk factors8,9.
The human genetic background influences the susceptibility to and/or the severity of infectious diseases. The Severe Covid-19 Genome-Wide Association Study (GWAS) Group reported a variant of LZTFL1 at locus 3p21 with severely increased COVID-19 risk in a European population1. Of note, these variants demonstrated globally heterogeneous allele frequency spectra and were rarely present in East Asian people2.
Further GWAS efforts, including COVID-19 Human Genome Initiatives (HGI), have nominated host susceptibility genes3,4,5. However, the vast majority of existing studies have been carried out on European populations. Considering the global diversity of COVID-19 severity, COVID-19 host genetic analysis in non-European people should provide novel insights.
The Japan COVID-19 Task Force (JCTF) was established in early 2020 as a nationwide multicentre consortium to overcome the COVID-19 pandemic (Extended Data Fig. 1 and Supplementary Table 1). Here we report the result of a large-scale GWAS of COVID-19 in Japanese individuals with systemic comparisons to results from Europeans, which identified a population-specific risk allele at the DOCK2 region that confers a risk of severe COVID-19, particularly in individuals below 65 years of age (hereafter referred to as ‘young’). We further conducted bulk and single-cell transcriptomics, and immunohistochemical assays of the patients as well as in vivo perturbation of DOCK2 function in an animal model. We found that DOCK2 suppression is associated with the development of severe COVID-19 in a Syrian hamster model of SARS-CoV-2 infection, and that DOCK2-mediated signalling has a key role in the host immune response to SARS-CoV-2 infection.
Overview of the study participants
We enrolled 2,393 unrelated patients with COVID-19 who required hospitalization between April 2020 and January 2021 (during the first, second and third waves of the pandemic in Japan) to the GWAS, from more than 100 hospitals participating in the JCTF. The COVID-19 diagnoses were confirmed by physicians at each affiliated hospital on the basis of clinical manifestations and a positive PCR test result. As controls, we enrolled 3,289 unrelated subjects ahead of the COVID-19 pandemic, representative of the general Japanese population. All of the participants were confirmed to be of East Asian origin by principal component analysis (Extended Data Fig. 2a,b).
Of the 2,393 patients with COVID-19, 990 had severe infection as defined by the need for oxygen support, artificial respiration and/or intensive care, whereas 1,391 patients had non-severe disease. Severity information was not available for the remaining 12 individuals. As reported previously8,10, those with severe COVID-19 were older (65.3 ± 13.9 years (mean ± s.d.)) and included a higher proportion of males (73.9%) compared with non-severe cases (49.3 ± 19.2 years of age and 57.2% male).
To replicate these results, we enrolled 1,243 further patients with severe COVID-19 collected between February 2021 and September 2021 (the fourth and fifth waves of the pandemic in Japan) and 3,769 controls. Detailed characteristics of the participants are provided in Supplementary Table 2.
COVID-19 GWAS in the Japanese population
The GWAS including all COVID-19 cases yielded no signals satisfying a genome-wide significance threshold (P < 5.0 × 10−8; Extended Data Fig. 2c). Cross-population comparisons confirmed the risks at multiple COVID-19-associated variants identified in the previous studies1,3,5. Seven out of the eleven reported positive associations were replicated in our Japanese cohort with P < 0.05, including those at LZTFL1, FOXP4, TMEM65, ABO, TAC4, DPP9 and IFNAR2 (Fig. 1a and Supplementary Table 3), where the highest odds ratios were observed in comparisons for severe and young (less than 65 years of age) COVID-19 cases in 6 out of the 7 loci. The most significant replication was observed at FOXP4, as expected from its higher allele frequency in East Asian people than in Europeans3 (odds ratio = 1.29, 95% confidence interval 1.13–1.46, P = 9.1 × 10−5 for severe COVID-19). By contrast, the risk allele at LZTFL1 (rs35081325), which showed the strongest association in Europeans, was rare in Japanese patients. Despite its low frequency (0.0013 in controls), we nominally replicated the association with the highest risk in the young patients with severe COVID-19 (odds ratio = 11.8, 95% confidence interval = 1.64–85.5, P = 0.014).
We evaluated the effects of human leukocyte antigen (HLA) variants on COVID-19 risk11,12 by in silico HLA imputation analysis13,14. We did not observe association signals satisfying the HLA-wide significance threshold (P < 0.05 over 2,482 variants, 2.0 × 10−5; Extended Data Fig. 3 and Supplementary Table 4). Among the four major ABO blood types15, the O blood type was associated with a protective effect (P < 0.05), most evidently in young patients with severe COVID-191 (odds ratio = 0.73, 95% confidence interval 0.56–0.93, P = 0.014; Extended Data Fig. 4a and Supplementary Table 5). We found an increased risk associated with the AB blood type, especially in severe cases of COVID-19 (odds ratio = 1.41, 95% confidence interval 1.10–1.81, P = 0.0065 for all ages). The Japanese population has the highest frequency of the AB blood type16 (9.5% in our study), which may have provided the power to detect its risk.
Cross-population Mendelian randomization
Next, to identify medical conditions that may affect COVID-19 susceptibility, we applied cross-population two-sample Mendelian randomization analysis17 (Supplementary Table 6). We inferred a causal role for obesity in severe COVID-19 in the Japanese cohort (P < 0.0074; Extended Data Fig. 4b and Supplementary Table 7). We also inferred causal roles for asthma, uric acids and gout, whereas systemic lupus erythematosus showed a protective effect (P < 0.05). Hyperuricemia is a risk factor for severe COVID-19 in the Japanese population10, consistent with our findings from Mendelian randomization. In Europeans, we observed significant causal inferences for obesity18 (P < 6.2 × 10−6), with doubled effect sizes in hospitalized patients and those with severe COVID-19 when compared with self-reported COVID-19. Our analysis provided additional evidence of obesity as a risk factor8,9.
A population-specific risk allele on DOCK2
Given the observation that many COVID-19 risk variants confer larger effects in severe disease and young patients1,3,5,19, we stratified the subjects according to age and disease severity, analysing those with severe COVID-19 (n = 990), young patients9 (n = 1,484) and young patients with severe COVID-19 (n = 440).
By comparing young patients with severe COVID-19 and controls, we identified a genetic locus on 5q35 that satisfied genome-wide significance (P = 1.2 × 10−8 at rs60200309; Fig. 1b). The A allele of the lead SNP (rs60200309), located at an intergenic region downstream of DOCK2, was associated with an increased risk of severe COVID-19 (odds ratio = 2.01, 95% confidence interval 1.58–2.55, P = 1.2 × 10−8; Fig. 1c and Table 1). The rs60200309-A allele was also associated with an increased risk of COVID-19 in other comparisons, including all COVID-19 cases and controls (odds ratio = 1.24; Supplementary Table 8), and within-case severity analysis (that is, severe versus non-severe cases; odds ratio = 1.27 for all ages and odds ratio = 1.90 for ages < 65 years).
We then conducted a replication study using an additional 1,243 patients with severe COVID-19, recruited during the fourth and fifth waves of the pandemic, as well as 3,769 controls. We replicated an age-specific nominal risk in the young patients with COVID-19 (n = 833; odds ratio = 1.28, 95% confidence interval 1.02–1.61, P = 0.033; Table 1) compared with all ages (odds ratio = 1.00, 95% confidence interval 0.85–1.19, P = 0.96), whereas the effect size was smaller than that observed in the GWAS during the first three pandemic waves. A decreased severity risk was observed for other risk loci in this later study (for example, odds ratios of 11.8 during the first three waves and 4.4 during the fourth and fifth waves at LZTFL1; regression coefficient = 0.57; Extended Data Fig. 5). This suggests that longitudinal shifts of confounding factors with the pandemic waves—such as the introduction of therapeutic strategies, a high prevalence of vaccination, changes in hospitalization policy and the evolution of virus strains—may have mitigated the host genetic burdens defined during the initial pandemic waves; further evaluations of this effect may be warranted.
We also examined the COVID-19 risk profile of the DOCK2 variant on different ancestral backgrounds20,21 (3,138 hospitalized patients with COVID-19 versus 891,375 controls from the pan-ancestry meta-analysis). We observed the same directional effect, with a marginal association signal (odds ratio = 1.73, 95% confidence interval 0.95–3.15, P = 0.072, control minor allele frequency (MAF) = 0.0008; Supplementary Table 9).
The DOCK2 variant was prevalent in East Asian people (0.097)—with the highest frequency (0.125) in Japanese individuals—and, to a lesser extent, in Native Americans (0.049), but was very rare in other groups (<0.005; Fig. 1d). Natural selection screening in Japanese participants22 suggested marginal positive selection of the variant (P for singleton density score = 0.051). Population-specific features of the DOCK2 variant provide a rationale for COVID-19 host genetic research in non-European populations.
DOCK2 downregulation in severe COVID-19
To functionally annotate the DOCK2 risk variant, we examined the expression quantitative trait loci (eQTL) effect by conducting peripheral blood RNA-sequencing (RNA-seq) analysis of data from patients with COVID-19 collected by the JCTF (n = 473). The risk allele at DOCK2 (rs60200309-A) was not associated with a significant eQTL effect for all patients (β = −1.07, P = 0.083; Fig. 2a), but was associated with decreased expression of DOCK2 in the patients below 65 years of age (n = 270; β = −2.15, P = 0.0030). This allele did not exhibit a significant eQTL effect on other surrounding genes (±500 kb window, P > 0.070). We observed colocalization between the GWAS and the DOCK2 eQTL signals23 (colocalization posterior probability > 0.01; Extended Data Fig. 6 and Supplementary Table 10).
We analysed differential expression of DOCK2 in patients with severe and non-severe COVID-19 (n = 468) using real-time quantitative PCR (qPCR). DOCK2 expression was reduced in the patients with severe COVID-19 (P = 0.011; Fig. 2b). Suppression of DOCK2 was more marked in young patients (P = 0.0068). When the patients were further stratified into asymptomatic, mild, severe and most severe cases, we observed a negative correlation between DOCK2 expression level and disease severity (Fig. 2c). Together, these results indicate that DOCK2 expression is downregulated in peripheral blood cells of patients with severe COVID-19, especially in young patients, and that the risk variant may contribute to severe COVID-19 by suppressing expression of DOCK2.
DOCK2 is a RAC activator that is involved in chemokine signalling, production of type I interferon (IFN) and lymphocyte migration24,25. Elucidation of immune cell-type-specific expression profiles was necessary to disentangle the roles of DOCK2 in the biology of COVID-19. We therefore conducted single-cell RNA-seq (scRNA-seq) of peripheral blood mononuclear cells (PBMC) obtained from 30 patients with severe COVID-19 and 31 healthy controls. We obtained 394,526 high-quality single cells and annotated 12 clusters (Fig. 2d and Extended Data Fig. 7). DOCK2 expression was highest in CD16+ monocytes (Fig. 2e). The proportion of cells expressing DOCK2 was higher in innate immune cell clusters (monocytes and dendritic cells) (43.8%) than in other clusters (25.6%; Fig. 2f). Differential expression analysis also demonstrated suppression of DOCK2 expression in cases of severe COVID-19 in the immune cell clusters (fold change (FC) = 0.82, P = 8.3 × 10−4 for monocytes; FC = 0.87, P = 0.050 for dendritic cells; Fig. 2g).
To determine immune cell-type specificity, we performed clustering and annotation by extracting 63,544 cells belonging to the innate immune cell clusters (Fig. 2h and Extended Data Fig. 7). Among the classified cell types—classical (CD14++CD16–), intermediate (CD14++CD16+) and non-classical (CD14+CD16++) monocytes, conventional dendritic cells and plasmacytoid dendritic cells (pDCs)—DOCK2 expression was highest in the non-classical monocytes, which have been implicated in the pathophysiology of COVID-19 (refs. 26,27) (Fig. 2h–j). Differential expression analysis showed that DOCK2 was most potently downregulated in non-classical monocytes (FC = 0.61, P = 3.2 × 10−7; Fig. 2k). The DOCK2 co-expression gene module28 in the non-classical monocytes of the COVID-19 patients exhibited enrichment in pathways such as immune response signalling pathways and phagocytosis (Extended Data Fig. 7). To further support the functional consequences of the DOCK2 risk variant, we assessed its single-cell eQTL effects. We found a COVID-19 context-specific decreasing dosage effect of the risk variant on DOCK2 expression in non-classical monocytes (β = −0.21, P = 0.035 for COVID-19 and β = 0.02, P = 0.51 for controls; Fig. 2l).
Next, we evaluated the biological effects of DOCK2 downregulation. In assays with primary cells, DOCK2 inhibition by CPYPP, an inhibitor of the DOCK2–RAC1 interaction29, resulted in reduced production of IFNα by pDCs under CpG stimulation (FC = 5.5 × 10−5, P = 0.0038, n = 3 per group; Extended Data Fig. 8a). pDCs are another key innate immune cell type involved in COVID-19 pathogenicity30, and DOCK2 expression was downregulated in pDCs from patients with COVID-19 (FC = 0.79, P = 0.019; Fig. 2k). CPYPP blocked chemotaxis of CD3+ T cells under CXCL12 stimulation (FC = 0.57, P = 1.0 × 10−7, n = 19 per group; Extended Data Fig. 8b). The DOCK2 risk variant had no significant effect on IFNα production in pDCs or chemotaxis of CD3+ T cells in primary cell assays (Supplementary Fig. 1). In THP1 Blue ISG cells, DOCK2 knockdown caused a marked decrease in transcriptional activation of IFN-stimulated genes, an indicator of type I IFN activity (Extended Data Fig. 8c–f and Supplementary Fig. 2). These results highlight the immunological roles of DOCK2 in complications of COVID-19 such as type I IFN immunity and chemotaxis dysregulation, as exemplified by patients with congenital impairment in type I IFN immunity31.
To confirm the involvement of DOCK2 in COVID-19 pneumonia, we performed immunohistochemical analysis on postmortem samples from people who died from COVID-19 (Extended Data Fig. 9). We examined three cases of COVID-19 pneumonia and observed decreased expression of DOCK2 in lymphocytes and macrophages located in the lung and in hilar lymph nodes (Fig. 2m). There was no such decrease in two control samples without COVID-19 or pneumonia (Fig. 2n). DOCK2 has been reported to be suppressed in bronchoalveolar lavage fluid cells of patients with COVID-19 (ref. 32), consistent with our findings. We observed a loss of DOCK2 expression in lymphocytes in a case of non-COVID-19 severe pneumonia, whereas there was a slight decrease of DOCK2 expression in a sample from a case of non-COVID-19 mild pneumonia. Thus, DOCK2 expression is suppressed during severe pneumonia caused by COVID-19. These observations reveal a link between cell-type- and tissue-specific downregulation of DOCK2, indicating a potential value for DOCK2 as a biomarker of severe COVID-19.
DOCK2 inhibition in a Syrian hamster model
To decipher in vivo pathogenesis of DOCK2 in COVID-19, we investigated the effects of DOCK2 suppression following SARS-CoV-2 infection in a Syrian hamster model33,34 (Extended Data Fig. 10a). Administration of the DOCK2 inhibitor CPYPP or vehicle (as a negative control) to mock-infected animals did not induce weight loss (Extended Data Fig. 10b). However, hamsters infected with SARS-CoV-2 and treated with vehicle (n = 12) decreased to 83.3% of the starting body weight by 7 days post-infection (dpi), but recovered to 97.6% of the starting weight at 11 dpi. By contrast, hamsters infected with SARS-CoV-2 and treated with CPYPP (n = 13) decreased to 79.0% of the starting body weight by 7 dpi, and recovered to 85.4% of the initial weight at 11 dpi (Fig. 3a and Extended Data Fig. 10c). Advanced pulmonary oedema was observed in the lung of the hamsters infected with SARS-CoV-2 and treated with CPYPP at 11 dpi (Fig. 3b). The largest lung weight (Fig. 3c) and the highest histopathological scoring changes of lung34 (Fig. 3d and Extended Data Fig. 10d–f) were observed at 6 dpi. Lung immunohistochemistry showed that the migration of CD68 macrophages around alveolar cells was impaired in the hamsters infected with SARS-CoV-2 and treated with CPYPP (Fig. 3d and Extended Data Fig. 10e). Conversely, there was mild or no lung damage in infected hamsters treated with vehicle or uninfected hamsters treated with CPYPP (Fig. 3b–d and Extended Data Fig. 10d–f).
Focusing on the deteriorating stages of SARS-CoV-2-induced pneumonia (3 and 6 dpi), we assayed SARS-CoV-2 viral loads in various organs. We observed increased viral loads in nasal swab at 3 and 6 dpi, in lung at 3 dpi and in intestine at 6 dpi (P < 0.05; Fig. 3e) of the CPYPP-treated hamsters. Lung cytokine expression profile assays revealed that expression of type I IFN (encoded by Ifna and Ifnb) decreased at 6 dpi and expression of type II IFN (encoded by Ifng) increased at 3 dpi (Fig. 3f) following CPYPP administration. We also observed that CPYPP administration induced increased expression of inflammatory cytokine (Il6) and chemokine (Ccl5) genes at 3 dpi. The roles of the IFN response in the pathogenicity of COVID-19 have been controversial31,35,36. Our observational and interventional findings on DOCK2 downregulation show that in COVID-19 pneumonia pathophysiology, impaired macrophage recruitment at the site of infection and dysregulated IFN responses result in impaired virus elimination and prolonged lung inflammation.
Here we reported on a GWAS of COVID-19 in a Japanese cohort, one of the first large-scale COVID-19 genetic studies in a non-European population. We confirmed the presence of multiple genetic variants associated with COVID-19 risk shared across different populations, identified a population-specific risk variant at DOCK2, particularly in young patients with severe COVID-19 collected during the early waves of the pandemic. Cross-population Mendelian randomization analysis disclosed causal effects of a number of complex human traits, such as obesity, on COVID-19. Our results highlight the role of population-specific risk alleles on different host genetic backgrounds, underscoring the need for studies of COVID-19 host genetics in non-European populations. Of note, autosomal recessive DOCK2 deficiency is a Mendelian disorder associated with combined immunodeficiency and severe invasive pneumonia37 (Online Mendelian Inheritance in Man (OMIM) entry 616433). Our results provide a genetic and clinical link between a Mendelian disorder and pneumonia associated with COVID-19. In the replication study using samples collected during later waves of the COVID-19 pandemic, we observed significant increases in the risk of severe COVID-19 associated with the risk variants identified in the studies based on the initial waves—including variants in DOCK2 and LZTFL1—but with smaller effect sizes. How the host genetics interact longitudinally with confounding factors and affect the spectrum of COVID-19 phenotypes through the pandemic waves remains unknown. Large-scale COVID-19 host genetics studies with diverse genetic backgrounds based on samples from different time points during the pandemic are required, and will contribute towards planning a global health strategy for the pandemic.
Our follow-up analyses of GWAS showed that DOCK2-mediated signalling has a key role in the response to SARS-CoV-2 infection, suggesting that the hypomorphic DOCK2 allele is involved in exacerbation of COVID-19 pathology, and that DOCK2 could serve as a potential clinical biomarker to predict severe COVID-19. Bulk and single-cell transcriptome analysis of peripheral blood cells identified cell-type-specific downregulation of DOCK2 modulated by a COVID-19-specific eQTL effect of the DOCK2 risk variant in patients with severe COVID-19, which was most evident in innate immune cells including non-classical monocytes and pDCs. Nevertheless, our evidence does not necessarily imply a direct causal link between the COVID-19-specific eQTL and COVID-19 severity. The risk variant could potentially induce DOCK2 downregulation in early phase of infection. Immunohistochemical analysis showed reduced DOCK2 expression in the lung of patients with COVID-19 pneumonia. In vivo inhibition of DOCK2 activity following SARS-CoV-2 infection using CPYPP in the Syrian hamster model resulted in severe COVID-19 pneumonia, highlighted by impaired migration of macrophages and dysregulation of the IFN response. We note the possibility that CPYPP is not specific to DOCK2 and also inhibits other DOCK family proteins. Assays with increased DOCK2 expression would provide further evidence of its role in COVID-19 pathophysiology. Given its critical roles in immune regulation25, upregulation of DOCK2 could be a potential therapeutic strategy against COVID-19. Our results motivate further studies linking DOCK2 to molecular and clinical phenotypes of COVID-19 in the effort to overcome the pandemic.
All the cases affected with COVID-19 were recruited through the JCTF. We enrolled hospitalized patients diagnosed as COVID-19 by physicians using the clinical manifestation and PCR test results who were recruited at any of the more than 100 affiliated hospitals between April 2020 and January 2021 (for the GWAS) or between February 2021 and September 2021 (for the replication; Supplementary Tables 1 and 2). Patients requiring oxygen support, artificial respiration and/or intensive care unit hospitalization were defined as having ‘severe COVID-19’, whereas others were defined as having ‘non-severe COVID-19’. Details of the clinical manifestation including cardiovascular and respiratory comorbidities are provided in Supplementary Table 2. The threshold of 65 years of age was selected according to the clinical management guide in Japan9. Control subjects were collected from the general Japanese population at Osaka University and affiliated institutes (for the GWAS and replication) or by the Biobank Japan Project38 (for the replication). Individuals determined to be of non-Japanese origin either by self-reporting or by principal component analysis were excluded as described elsewhere39 (Extended Data Fig. 2a). All the participants provided written informed consent as approved by the ethical committees of the affiliated institutes (Keio IRB approval 20200061, Osaka University IRB approval 734-14, University of Tsukuba IRB approval H29-294).
GWAS genotyping and QC
We performed GWAS genotyping of the 2,520 COVID-19 cases and 3,341 controls using Infinium Asian Screening Array (Illumina). We applied stringent quality control (QC) filters to the samples (sample call rate < 0.97, excess heterozygosity of genotypes >mean + 3 × s.d., related samples with PI_HAT > 0.175, or outlier samples from East Asian clusters in principal component analysis with 1000 Genomes Project samples), and variants (variant call rate < 0.99, significant call rate differences between cases and controls with P < 5.0 × 10−8, deviation from Hardy–Weinberg equilibrium with P < 1.0 × 10−6, or minor allele count <5). Details of the QC for the mitochondrial variants are described elsewhere40. After QC, we obtained genotype data of 489,539, 15,161 and 217 autosomal, X-chromosomal and mitochondrial variants, respectively, for 2,393 COVID-19 cases and 3,289 controls.
Genome-wide genotype imputation
We used SHAPEIT4 software (version 4.1.2) for haplotype phasing of autosomal genotype data, and SHAPEIT2 software (v2.r904) for X-chromosomal genotype data. After phasing, we used Minimac4 software (version 1.0.1) for genome-wide genotype imputation. We used the population-specific imputation reference panel of Japanese individuals (n = 1,037) combined with 1000 Genomes Project Phase3v5 samples22 (n = 2,504). Imputations of the mitochondrial variants were conducted as described elsewhere40, using the population-specific reference panel (n = 1,037). We applied post-imputation QC filters of MAF ≥ 0.1% and imputation score (Rsq) > 0.5, and obtained 13,116,003, 368,566 and 554 variants for autosomal, X-chromosomal, and mitochondrial variants, respectively. We note that the genotypes of the lead variant in the GWAS (rs60200309) were obtained by imputation (Rsq = 0.88). We assessed accuracy by comparing the imputed dosages with WGS data for the part of the controls (n = 236), and confirmed high concordance rate of 97.5%.
Case–control association test
We conducted GWAS of COVID-19 by using logistic regression of the imputed dosages of each of the variants on case–control status, using PLINK2 software (v2.00a3LM AVX2 Intel (6 July 2020)). We included sex, age, and the top five principal components as covariates in the regression model. We set the genome-wide association significance threshold of P < 5.0 × 10−8.
HLA genotype imputation and association test
HLA genotype imputation was performed using DEEP*HLA software (version 1.0), a multitask convolutional deep learning method14. We used the population-specific imputation reference panel of Japanese donors (n = 1,118), which included both classical and non-classical HLA gene variants for imputation13. Before imputation, we removed the overlapping samples between the GWAS controls and the reference panel (n = 649), from the GWAS data side. We imputed HLA alleles (two and four digit) and the corresponding HLA amino acid polymorphisms, and applied post-imputation QC filters of MAF ≥ 0.5% and imputation score (r2 in cross-validation) > 0.7.
As for the imputed HLA variants, we conducted (1) association test of binary HLA markers (two- and four-digit HLA alleles) and (2) an omnibus test of each of the HLA amino acid positions, as described elsewhere13. Binary maker test was conducted using the same logistic regression model and covariates as in the GWAS. Omnibus test was conducted by a log likelihood ratio test between the null model and the fitted model, followed by a χ2 distribution with m − 1 degrees of freedom, where m is the number of residues. R statistical software (version 3.6.0) was used for the HLA association test. We set the HLA-wide significance threshold based on Bonferroni’s correction for the number of the HLA tests (α = 0.05).
Estimation of the ABO blood types and analysis
We estimated the ABO blood types of the GWAS subjects based on the five coding variants at the ABO gene (rs8176747, rs8176746, rs8176743, rs7853989 and rs8176719)41. We phased the haplotypes of these five variants based on the best-guess genotypes obtained by genome-wide imputation, and estimated the ABO blood type as described elsewhere15. We were able to unambiguously determine the ABO blood type of 99.1% of the subjects.
Blood-group-specific odds ratios were estimated based on comparisons of A versus AB/B/O, B versus A/AB/O, AB versus A/B/O and O versus A/AB/B. We conducted a logistic regression analysis including age, sex and the top five principal components as covariates. R statistical software (version 3.6.3) was used for the ABO blood type analysis.
Cross-population Mendelian randomization analysis
We conducted two-sample Mendelian randomization analysis as described elsewhere17,42. As exposure, we selected a series of clinical states where altered comorbidity with COVID-19 have been discussed. As an outcome phenotype, we used the GWAS summary statistics of Japanese (current study) and European (release 5 from COVID-19 HGI3) participants. Lists of the Japanese and European GWAS studies used as the exposure phenotypes are in Supplementary Table 6. We extracted the independent lead variants with genome-wide significance (or the proxy variants in linkage disequilibrium r2 ≥ 0.8 in the EAS or EUR subjects of the 1000 Genomes Project Phase3v5 databases) from the GWAS results of the exposure phenotypes. We applied the inverse variance weighted method using the TwoSampleMR package (version 0.5.5) in R statistical software (version 4.0.2).
We genotyped additional 1,243 severe COVID-19 cases and 3,769 controls using Infinium Asian Screening Array (Illumina). We applied the QC filters and genotype imputation, and conducted case–control analysis of the variant as in the same manner as the GWAS.
RNA-seq of peripheral blood of patients with COVID-19
We incorporated 475 patients with COVID-19 recruited at the core medical institutes of JCTF and included them in the GWAS for the bulk RNA-seq analysis (Supplementary Table 2). Isolation of RNA from the peripheral blood of the COVID-19 patients was conducted using RNeasy Mini Kit (Qiagen). Libraries for RNA-seq were prepared using NEBNext Poly(A) mRNA Magnetic Isolation Module and NEBNext Ultra Directional RNA Library Prep Kit for Illumina (New England BioLabs). RNA-seq was performed using the NovaSeq6000 platform (Illumina) with paired-end reads (read length of 100 bp), using S4 Reagent kit (200 cycles). We obtained on average 71,724,142 ± 17,527,007 reads per a sample (mean ± s.d.). Sequencing reads were quality-filtered, and adapter removal was performed using the Trimmomatic (v0.39)43. Alignment to the human reference genome GRCh38/hg38 was performed using STAR (v2.7.9a)44, based on the GENCODE v30 annotation. Gene level quantification and normalization was using RSEM (v1.3.3)45. TPM was used as an index of gene quantification. We excluded the two outlier samples in the principal component analysis plot of the TPM from the analysis (n = 473 for the analysis). We quantified 58,825 genes, and adopted the 5,991 genes with median TPM > 10 for the subsequent analysis.
In the eQTL analysis of the DOCK2 variant, dosage effects of the risk variant (rs60200309-A) on the gene expression levels (TPM) were evaluated using linear regression models with age, sex, severity, the top ten principal components of the TPM matrix, and the top 5 pricipal components of the GWAS data as covariates. The dosage effects of the risk variant on the expression of nearby genes located within a 500-kb window were also evaluated. R statistical software (version 3.6.3) was used for the analysis. Colocalization analysis between the GWAS and the DOCK2 eQTL signals was conducted using eCAVIAR23.
qPCR-based differential expression analysis
Real-time qPCR was conducted for the RNA isolated from the peripheral blood of the COVID-19 patients (n = 468). Total RNA was reverse-transcribed using the High-Capacity RNA-to-cDNA cDNA Kit (Life Technologies). Real-time qPCR was performed using TaqMan assays on a 7500 Fast Real-Time PCR system (Applied Biosystems; probe assay ID: Hs00386045_m1 (DOCK2) and Hs99999905_m1 (GAPDH)). Differential expression analysis was conducted between severe and non-severe COVID-19, and across four COVID-19 disease severity grades, ordered from asymptomatic > mild > severe > most severe. Among the severe COVID-19, patients in intensive care or requiring intubation and ventilation were classed as ‘most severe’ disease, and the rest were classed as ‘severe’ disease. Among the non-severe COVID-19, patients without any symptoms related to COVID-19 were classed as ‘asymptomatic’ disease, and others were classed as ‘mild’ disease. The analysis was performed on relative DOCK2 mRNA expression relative to GAPDH using linear regression models with age and sex as covariates in R statistical software (version 3.6.3).
Subjects and specimen collection of PBMC for scRNA-seq
Peripheral blood samples were obtained from patients with severe COVID-19 (n = 30) and healthy controls (n = 31) recruited at Osaka University Graduate School of Medicine. Of the 30 patients with COVID-19, 5 were classed as moderate and 25 were classed as severe according to disease severity based on the highest score on the World Health Organization (WHO) Ordinal Scale for Clinical Improvement. For patients with COVID-19 and healthy controls, blood was collected into heparin tubes and PBMCs were isolated using Leucosep (Greiner Bio-One) density gradient centrifugation according to the manufacturer’s instructions. Blood was processed within 3 h of collection for all samples, and stored at −80 °C until use.
Droplet-based single-cell sequencing
Single-cell suspensions were processed through the 10x Genomics Chromium Controller (10x Genomics) following the protocol outlined in the Chromium Single Cell V(D)J Reagent Kits (v1.1 Chemistry) User Guide. Chromium Next GEM Single Cell 5′ Library & Gel Bead Kit v1.1 (PN-1000167), Chromium Next GEM Chip G Single Cell Kit (PN-1000127) and Single Index Kit T Set A (PN-1000213) were applied during the process. Approximately 16,500 live cells per sample were separately loaded into each port of the Chromium controller without sample mixing to generate 10,000 single-cell gel-bead emulsions for library preparation and sequencing, according to the manufacturer’s recommendations. Oil droplets of encapsulated single cells and barcoded beads were subsequently reverse-transcribed in a Veriti Thermal Cycler (Thermo Fisher Scientific), resulting in cDNA tagged with a cell barcode and unique molecular index (UMI). Next, cDNA was amplified to generate single-cell libraries according to the manufacturer’s protocol. Quantification was made with an Agilent Bioanalyzer High Sensitivity DNA assay (Agilent, High-Sensitivity DNA Kit, 5067-4626). Subsequently amplified cDNA was enzymatically fragmented, end-repaired, and polyA tagged. Cleanup and size selection was performed on amplified cDNA using SPRIselect magnetic beads (Beckman-Coulter, SPRIselect, B23317). Next, Illumina sequencing adapters were ligated to the size-selected fragments and cleaned up using SPRIselect magnetic beads. Finally, sample indices were selected and amplified, followed by a double-sided size selection using SPRIselect magnetic beads. Final library quality was assessed using an Agilent Bioanalyzer High Sensitivity DNA assay. Samples were then sequenced on NovaSeq6000 (Illumina) as paired-end mode to achieve a minimum of 20,000 paired-end reads per cell for gene expression.
Alignment, quantification and QC of scRNA-seq data
Droplet libraries were processed using Cell Ranger 5.0.0 (10x Genomics). Sequencing reads were aligned with STAR (v2.7.2a)44 using the GRCh38 human reference genome. Count matrices were built from the resulting BAM files using dropEst46. Cells that had fewer than 1,000 UMIs or greater than 20,000 UMIs, as well as cells that contained greater than 10% of reads from mitochondrial or haemoglobin genes, were considered low quality and removed from further analysis. Additionally, putative doublets were removed using Scrublet (v0.2.1) for each sample47.
scRNA-seq computational pipelines and basic analysis
The R package Seurat (v3.2.2) was used for data scaling, transformation, clustering, dimensionality reduction, differential expression analysis and most visualization48. Data were scaled and transformed using the SCTransform() function, and linear regression was performed to remove unwanted variation due to cell quality (percentage of mitochondrial reads). For integration, we identified 3,000 shared highly variable genes (HVGs) using SelectIntegrationFeatures() function. Then, we identified ‘anchors’ between individual datasets based on these genes using the FindIntegrationAnchors() function and inputted these anchors into the IntegrateData() function to create a batch-corrected expression matrix of all cells. Principal component analysis and UMAP dimension reduction with 30 principal components were performed49. A nearest-neighbour graph using the 30 dimensions of the principal component analysis reduction was calculated using FindNeighbors() function, followed by clustering using FindClusters() function.
Cellular identity was determined by finding differentially expressed genes for each cluster using FindMarkers() function with parameter ‘test.use=wilcox’, and comparing those markers to known cell-type-specific genes (Extended Data Fig. 7a). We obtained 12 cell clusters, which were further confirmed using Azimuth (Fig. 2d and Extended Data Fig. 7a, c)50. Six major cell types were defined from 12 clusters as follows; CD4+ T cells and Treg cells were annotated as CD4T; CD8+ T cells and proliferative T cells were annotated as CD8T; natural killer cells were annotated as NK; B cells and plasmablasts were annotated as B; CD14+monocytes and CD16+monocytes were annotated as Mono; conventional dendritic cells and pDCs were annotated as dendritic cells. To clarify immune cell-type-specific expression of DOCK2, we produced the density plot using plot_density() function from Nebulosa R package (v1.0.0)51, and the dot plot using DotPlot() function.
Droplets labelled as innate immune cell clusters (CD14+ monocytes, CD16+ monocytes and conventional and pDCs) were extracted and reintegrated for further subclustering using the same procedure as described above except using 2,000 shared HVGs. After integration, clustering and cluster annotation (Extended Data Fig. 7b) were performed as described above.
Differential expression analysis using scRNA-seq data
Differential gene expression analysis was performed between patients with severe COVID-19 and healthy controls in each cell type. Donor pseudo-bulk samples were first created by aggregating gene counts for each cell type within each sample. Genes which expression rate was more than 10% in either COVID-19 patients or healthy controls in each cell type were included in the analysis. Differential gene expression testing was performed using an NB GLM implemented in the Bioconductor package edgeR (v3.32.0)52.
DOCK2 co-expression analysis and GO enrichment analysis
We applied the weighted gene co-expression network analysis (WGCNA) algorithm28 to evaluate co-expressed genes with DOCK2 in COVID-19. Pseudo-bulk normalized data of non-classical monocytes in the patients with COVID-19 using scran (v1.18.5)53 was used for WGCNA analysis, and genes were selected if they were expressed in more than 1% of cells in non-classical monocytes of the patients with COVID-19. We calculated the adjacency with a ‘unsigned network’ option and soft threshold power with the adjacency matrix set to 5, created Topological Overlap Matrix by TOMsimilarity, calculated the gene tree by hclust against 1 - TOM with method = “average”, and conducted a dynamic tree cut with the following parameters; deepSplit = 4, minClusterSize = 30. We performed GO enrichment analysis of DOCK2 co-expression gene module using the function enrichGO (pvalueCutoff = 0.01, pAdjustMethod = “BH”, OrgDb = “org.Hs.eg.db”, ont = “BP”) of Clusterprofiler (v3.14.3)54.
Single-cell eQTL analysis of the DOCK2 risk variant
We applied pseudo-bulk approach for single-cell eQTL analysis. First, we performed single-cell-level normalization using scran (v1.18.5)53. Gene expression per cell type per sample was then calculated as the mean of log2-transformed counts-per-cell-normalized expression across cells. For principal component analysis, genes were adopted if they were expressed in more than 1% of cells in non-classical monocytes.
In the eQTL analysis of the DOCK2 variant, dosage effects of the risk variant (rs60200309-A) on the gene expression were evaluated using linear regression models with age, sex, disease severity (included only in COVID-19 analysis) and the top two PCs of the gene expression as covariates. R statistical software (version 4.0.2) was used for the analysis.
IFNα production assay using primary blood cells
PBMC were isolated from the blood of three healthy donors by Lymphoprep density gradient. pDCs were purified by negative selection using the Plasmacytoid Dendritic Cell Isolation Kit II (Miltenyi Biotec). To evaluation interferon-α production ability, sorted pDCs were stimulated with 30 μg ml−1 CpG-A ODN (D35; Gene Design, Japan) or control. IFNα was evaluated 12 h after stimulation using VeriKine-HS Human Interferon Alpha All Subtype TCM ELISA Kit (PBL). Differences of IFNα production between the groups were evaluated using paired t-test.
Chemotaxis assay using primary blood cells
PBMC were isolated from the blood of 19 healthy donors by Lymphoprep density gradient. CD3+ T cells were sorted by magnetic activated cell sorting (MACS). CD3+ T cells (1.0 × 105) in 100 μl RPMI + 0.5% BSA medium ± CPYPP (100 μM; Tocris, UK) were placed in the upper chambers of Transwell (5 µm pore size; Coaster). The lower chambers were filled with 400 µl RPMI medium supplemented with CXCL12 (100 ng ml−1; R&D Systems) and incubated at 37 °C for 2 h. The cells that migrated to the lower chambers were collected and analysed using FACS. The following monoclonal antibodies were used for FACS analysis: anti-human CD3 (UCHT1; BD Biosciences) and CD4 (SK3; BD Biosciences) antibodies. Dead cells were excluded using zombie dyes (BioLegend). Events were acquired with a LSR Fortessa (BD Biosciences) and analysed with FlowJo software (BD Biosciences). Differences of chemotaxis between CXCL12 groups and CXCL12 + CPYPP group were evaluated using paired t-test.
DOCK2 knockdown and IFNα production assay in THP1 Blue ISG cells
THP1-Blue ISG (InvivoGen) cells were cultured in 10% FBS, 2 mM l-glutamine, 25 mM HEPES. To generate lentivirus vectors, LentiCRISPR v2 expressing guide RNA/Cas9 (ref. 55), Gag-Pol packaging plasmid psPAX2 (Addgene #12260) and pMD2.G (Addgene #12259) were co-transfected to 293T cells using X-treme GENE 9 DNA Transfection Reagent (Roche). The guide RNA for DOCK2 knock out and potential off-target effects evaluation56,57 were in Supplementary Table 11. Transfected 293T cells were cultured in Dulbecco’s modified Eagle medium with 10% FBS and 50 units per ml penicillin/streptomycin. The cultured medium was replaced 12 h after transfection. The virus-containing supernatants were collected after a further 36 h and filtered through a 0.45-μm pore size cellulose acetate filter (Sigma-Aldrich). Then, 2 × 106 THP1-Blue ISG cells were cultured in 2 ml polybrene (8 µg ml−1, Millipore)/virus-containing medium. After a 24 h incubation, infected THP1-Blue ISG cells with virus-containing medium were collected, centrifuged (400g, 4 min) and cultured in fresh medium. For selection LentiCRISPR vector expressing cells, infected cells were cultured for 4 days in medium supplemented with 1 μg/ml puromycin 2 days after infection. DOCK2 knockdown efficiency was evaluated through quantitative real-time PCR analysis and western blotting (Abcam ab124848). THP1 monocytes are differentiated by 72 h incubation with 20 ng ml−1 phorbol 12-myristate 13-acetate (PMA, Sigma, P8139). IFNα was evaluated 6 h after stimulation (3 μg ml−1 CpG-A ODN (D35, Gene Design) or control ODN (D35, GC)) using VeriKine-HS Human Interferon Alpha All Subtype TCM ELISA Kit (PBL).
Immunohistochemical analysis of lung samples of patients with COVID-19 pneumonia
Patient samples of lung and hilar lymph node were obtained from autopsies following death from COVID-19 pneumonia (samples 1–3) and non-COVID-19 pneumonia (samples 4 and 5). To stain the control sample, lung and lymph node tissue sections were obtained from the surgically resected lung specimens due to lung cancer. Immunohistochemistry for DOCK2 was performed according to standard procedures. In brief, formalin fixed paraffin embedded tissue sections of 5 μm were deparaffinized. Antigen retrieval was carried out using pressure cooking (in citrate buffer for 3 min). Endogenous peroxidase activity was blocked by incubating sections in 3% hydrogen peroxide for 5 min. After blocking, tissue sections were incubated with the anti-DOCK2 rabbit polyclonal antibody58 diluted at 1:1,000. The EnVision kit from Dako (Glostrup) was used to detect the staining.
In vivo suppression of DOCK2 in Syrian hamster model with SARS-CoV-2 infection
SARS-CoV-2 (JPN/Kanagawa/KUH003)33, was used in experimental animal model of COVID-19. An aliquot of virus was stored at −80 °C until use.
CPYPP, an inhibitor of the DOCK2–RAC1 interaction29, was obtained from Tocris Bioscience (Bristol, UK). CPYPP was dissolved in DMSO.
All applicable national and institutional guidelines for the care and use of animals were followed. The animal experimentation protocol was approved by the President of Kitasato University through the judgment of the Institutional Animal Care and Use Committee of Kitasato University (approval no. 21-007). Sample sizes were determined based on our experience with SARS-CoV-2 infection models, and the minimum number of animals was used.
DOCK2 inhibition in a Syrian hamster model of SARS-CoV-2 infection
We planned and executed the experimental schedule shown in Extended Data Fig. 10a. Six-week-old male Syrian hamsters (CLEA Japan) were maintained in the biological safety level 3 experimental animal facility of the Department of Veterinary Medicine, Kitasato University. Sixty-three animals were divided four groups: SARS-CoV-2 + CPYPP (n = 29); SARS-CoV-2 + vehicle (n = 28); mock + CPYPP (n = 3); and mock + vehicle (n = 3). Hamsters were intranasally inoculated with 105.8 median tissue culture infectious dose (TCID50) of SARS-CoV-2 or medium only (mock infection) in a volume of 100 μl. After 5 min (0 dpi) and 24 h (1 dpi), hamsters were injected intraperitoneally with CPYPP (8.4 mg each; 0.2 ml) or DMSO (vehicle; 0.2 ml). All hamsters were weighed daily. SARS-CoV-2 infected hamsters were euthanized at 3, 6 or 11 dpi (8 animals per group 3 and 6 dpi, and 6 animals per group at 11 dpi), and then nasal swabs and tissues were collected. Lungs were dissected out from thoracic organs after euthanasia, and lung weights were measured at dpi 0, 3, 6 and 11. Differences of body weight and lung weight between SARS-CoV-2+CPYPP group and SARS-CoV-2+vehicle group were evaluated using two-sided Welch’s t-test. Hamsters were euthanized when reaching the humane endpoint or 11 days after inoculation with SARS-CoV-2. The humane endpoint (weight loss of > 25%) was based on a previous study34.
Syrian hamsters infected with CPYPP or vehicle were euthanized at 3, 6 or 11 dpi for pathological examinations (n = 3). Histopathological examination of the lungs of the hamsters inoculated with SARS-CoV-2 with CPYPP or vehicle was conducted by haematoxylin and eosin staining. Pathological severity scores in the infected hamsters were evaluated as described elsewhere34. In brief, lung tissue sections were scored based on the percentage of inflammation area of the maximum cut surface collected from each animal in each group by using the following scoring system: 0, no pathological change; 1, affected area (≤10%); 2, affected area (<50%, > 10%); 3, affected area (<90%, ≥50%); 4, (≥90%) an additional point was added when pulmonary oedema and/or alveolar haemorrhage was observed. The total score is shown for individual animals. Immunohistochemistry for alveolar macrophage was performed according to standard procedures. In brief, FFPE lung tissue section of infected Syrian hamster were incubated with the anti-CD68 mouse polyclonal antibody diluted in 1:400 (Abcam ab125212). The EnVision kit (Dako) was used to detect the staining.
Total RNA of nasal swab was extracted using QIAamp Viral RNA Mini kit (Qiagen) according to the manufacturer’s instructions. Each organ was homogenized by adding RLT buffer of QIAamp Viral RNA Mini kit using a multi-bead shocker (Yasui Kikai). After centrifugation of 10% (w/v) tissue homogenate at 10,000 rpm for 10 min, RNA was extracted from the recovered supernatants using the kit described above. The nucleocapsid (N) gene of SARS-CoV-2 was detected using THUNDERBIRD Probe One-step qRT-PCR (Toyobo) and Primer/Probe N2 2019-nCoV (TaKaRa). To quantify SARS-CoV-2 N gene copies, a standard curve was generated using Positive Control RNA Mix 2019-nCoV (TaKaRa). Lung cytokine expression profile (IFNs, Il6 and chemokines) were evaluated with the modifications of Ferren et al.59. In brief, 100 ng of RNA was converted to cDNA with the ReverTra Ace qPCR RT Master Mix (Toyobo). qPCR was performed with the THUNDERBIRD Probe qPCR Mix (Toyobo). The primers and probes used are listed in Supplementary Table 12. Reactions for all samples were performed in duplicates using QuantStudio 1 Real-Time PCR System (Thermo Fisher Scientific), and the target mRNA expression levels were normalized with Gapdh as a reference gene. Relative expression levels (fold changes) of mRNA from infected hamsters compared with uninfected hamsters were calculated using the 2−ΔΔCt method with QuantStudio Design and Analysis Software (Thermo Fisher Scientific). Differences of viral load and lung cytokine expression profile between the two groups were evaluated using two-sided Wilcoxon rank sum test.
Statistics and reproducibility
Figure 2m,n shows representative images of immunohistochemical analysis of DOCK2 in COVID-19 pneumonia and in a control without COVID-19 or pneumonia. Extended Data Fig. 9 shows all of the autopsied cadaver or surgical specimens examined in this study. For immunohistochemical analysis, all experiments were performed on at least three sections of lung and hilar lymph node in each sample, and the similar results were confirmed.
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
GWAS summary statistics and processed count matrices with differential expression-identified metadata of bulk RNA-seq are deposited at the National Bioscience Database Center (NBDC) Human Database with the accession code hum0343 without restriction. Raw sequencing data of scRNA-seq are available under controlled access at the Japanese Genotype-phenotype Archive (JGA) with accession codes JGAS000543 and JGAD000662 for general research use, which can be accessed through application at the NBDC with the accession code hum0197. GWAS genotype data of the COVID-19 cases are available under controlled access at European Genome-Phenome Archive (EGA) with the accession code EGAS00001006284 for general research use. GWAS genotype data of the controls collected at Osaka University and the affiliated medical institutes are available under controlled access at EGA with the accession code EGAS00001006423 for use as controls. GWAS genotype data of the controls collected at University of Tsukuba cannot be deposited, since no consent was obtained for deposition in a public repository, but these data are available upon request (email@example.com) for use as controls in research of inflammatory lung disease. The GWAS summary statistics of COVID-19 HGI (release 5) were obtained from https://www.covid19hg.org/results/r5/. The reference for cell-type annotation of PBMC in scRNA-seq (pbmc_multimodal.h5seurat) was obtained from https://satijalab.org/seurat/articles/multimodal_reference_mapping.html.
The Severe Covid-19 GWAS Group. Genomewide association study of severe Covid-19 with respiratory failure. N. Engl. J. Med. 383, 1522–1534 (2020).
Zeberg, H. & Pääbo, S. The major genetic risk factor for severe COVID-19 is inherited from Neanderthals. Nature 587, 610–612 (2020).
Niemi, M. E. K. et al. Mapping the human genetic architecture of COVID-19. Nature 600, 472–477 (2021).
Kosmicki, J. A. et al. Pan-ancestry exome-wide association analyses of COVID-19 outcomes in 586,157 individuals. Am. J. Hum. Genet. 108, 1350–1355 (2021).
Pairo-Castineira, E. et al. Genetic mechanisms of critical illness in COVID-19. Nature 591, 92–98 (2021).
Zhu, N. et al. A novel coronavirus from patients with pneumonia in China, 2019. N. Engl. J. Med. 382, 727–733 (2020).
Walensky, R. P., Walke, H. T. & Fauci, A. S. SARS-CoV-2 variants of concern in the United States—challenges and opportunities. JAMA 325, 1037–1038 (2021).
Williamson, E. J. et al. Factors associated with COVID-19-related death using OpenSAFELY. Nature 584, 430–436 (2020).
Clinical Management of Patients with COVID-19: A Guide for Front-Line (Ministry of Health, Labour and Welfare of Japan, 2020).
Ishii, M. et al. Clinical characteristics of 345 patients with coronavirus disease 2019 in Japan: a multicenter retrospective study. J. Infect. 81, e3–e5 (2020).
Nguyen, A. et al. Human leukocyte antigen susceptibility map for severe acute respiratory syndrome coronavirus 2. J. Virol. 94, e00510-20 (2020).
Ben Shachar, S. et al. MHC haplotyping of SARS-CoV-2 patients: HLA subtypes are not associated with the presence and severity of COVID-19 in the Israeli population. J. Clin. Immunol. 41, 1154–1161 (2021).
Hirata, J. et al. Genetic and phenotypic landscape of the major histocompatibilty complex region in the Japanese population. Nat. Genet. 51, 470–480 (2019).
Naito, T. et al. A deep learning method for HLA imputation and trans-ethnic MHC fine-mapping of type 1 diabetes. Nat. Commun. 12, 1639 (2021).
Lane, W. J. et al. Automated typing of red blood cell and platelet antigens: a whole-genome sequencing study. Lancet Haematol. 5, e241–e251 (2018).
Liu, Y., Häussinger, L., Steinacker, J. M. & Dinse-Lambracht, A. Association between the dynamics of the COVID-19 epidemic and ABO blood type distribution. Epidemiol. Infect. 149, e19 (2021).
Holmes, M. V., Ala-Korpela, M. & Smith, G. D. Mendelian randomization in cardiometabolic disease: challenges in evaluating causality. Nat. Rev. Cardiol. 14, 577–599 (2017).
Freuer, D., Linseisen, J. & Meisinger, C. Impact of body composition on COVID-19 susceptibility and severity: a two-sample multivariable Mendelian randomization study. Metabolism. 118, 154732 (2021).
Nakanishi, T. et al. Age-dependent impact of the major common genetic risk factor for COVID-19 on severity and mortality. J. Clin. Invest. 131, e152386 (2021).
Roberts, G. H. L. et al. AncestryDNA COVID-19 host genetic study identifies three novel loci. Preprint at medRxiv https://doi.org/10.1101/2020.10.06.20205864 (2020).
Roberts, G. H. et al. Novel COVID-19 phenotype definitions reveal phenotypically distinct patterns of genetic association and protective effects. Nat. Genet. 54, 374–381 (2022).
Okada, Y. et al. Deep whole-genome sequencing reveals recent selection signatures linked to evolution and disease risk of Japanese. Nat. Commun. 9, 1631 (2018).
Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).
Fukui, Y. et al. Haematopoietic cell-specific CDM family protein DOCK2 is essential for lymphocyte migration. Nature 412, 826–831 (2001).
Nishikimi, A. et al. Sequential regulation of DOCK2 dynamics by two phospholipids during neutrophil chemotaxis. Science 324, 384–387 (2009).
Stephenson, E. et al. Single-cell multi-omics analysis of the immune response in COVID-19. Nat. Med. 27, 904–916 (2021).
Ahern, D. J. et al. A blood atlas of COVID-19 defines hallmarks of disease severity and specificity. Cell 185, 916–938 (2022).
Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinf. 9, 559 (2008).
Nishikimi, A. et al. Blockade of inflammatory responses by a small-molecule inhibitor of the Rac activator DOCK2. Chem. Biol. 19, 488–497 (2012).
Saichi, M. et al. Single-cell RNA sequencing of blood antigen-presenting cells in severe COVID-19 reveals multi-process defects in antiviral immunity. Nat. Cell Biol. 23, 538–551 (2021).
Zhang, Q. et al. Inborn errors of type I IFN immunity in patients with life-threatening COVID-19. Science 370, eabd4570 (2020).
Zhou, Z. et al. Heightened innate immune responses in the respiratory tract of COVID-19 patients. Cell Host Microbe 27, 883–890.e2 (2020).
Ebisudani, T. et al. Direct derivation of human alveolospheres for SARS-CoV-2 infection modeling and drug screening. Cell Rep. 35, 109218 (2021).
Imai, M. et al. Syrian hamsters as a small animal model for SARS-CoV-2 infection and countermeasure development. Proc. Natl Acad. Sci. USA 117, 16587–16595 (2020).
Yoshida, M. et al. Local and systemic responses to SARS-CoV-2 infection in children and adults. Nature 602, 321–327 (2021).
Loske, J. et al. Pre-activated antiviral innate immunity in the upper airways controls early SARS-CoV-2 infection in children. Nat. Biotechnol. 40, 319–324 (2021).
Dobbs, K. et al. Inherited DOCK2 deficiency in patients with early-onset invasive infections. N. Engl. J. Med. 372, 2409–2422 (2015).
Hirata, M. et al. Overview of BioBank Japan follow-up data in 32 diseases. J. Epidemiol. 27, S22–S28 (2017).
Sakaue, S. et al. Dimensionality reduction reveals fine-scale structure in the Japanese population with consequences for polygenic risk prediction. Nat. Commun. 11, 1569 (2020).
Yamamoto, K. et al. Genetic and phenotypic landscape of the mitochondrial genome in the Japanese population. Commun. Biol. 3, 104 (2020).
Yip, S. P. Sequence variation at the human ABO locus. Ann. Hum. Genet. 66, 1–27 (2002).
Ogawa, K. et al. A transethnic Mendelian randomization study identifies causality of obesity on risk of psoriasis. J. Invest. Dermatol. 139, 1397–1400 (2019).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinf. 12, 323 (2011).
Petukhov, V. et al. dropEst: pipeline for accurate estimation of molecular counts in droplet-based single-cell RNA-seq experiments. Genome Biol. 19, 78 (2018).
Wolock, S. L., Lopez, R. & Klein, A. M. Scrublet: computational identification of cell doublets in single-cell transcriptomic data. Cell Syst. 8, 281–291.e9 (2019).
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902.e21 (2019).
McInnes, L., Healy, J. & Melville, J. UMAP: uniform manifold approximation and projection for dimension reduction. Preprint at arXiv https://arxiv.org/abs/1802.03426 (2018).
Hao, Y. et al. Integrated analysis of multimodal single-cell data. Cell 184, 3573–3587.e29 (2021).
Alquicira-Hernandez, J. & Powell, J. E. Nebulosa recovers single-cell gene expression signals by kernel density estimation. Bioinformatics 37, 2485–2487 (2021).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2009).
Lun, A. T. L., Bach, K. & Marioni, J. C. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17, 75 (2016).
Yu, G., Wang, L. G., Han, Y. & He, Q. Y. ClusterProfiler: an R package for comparing biological themes among gene clusters. Omics 16, 284–287 (2012).
Sanjana, N. E., Shalem, O. & Zhang, F. Improved vectors and genome-wide libraries for CRISPR screening. Nat. Methods 11, 783–784 (2014).
Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR–Cas9. Nat. Biotechnol. 34, 184–191 (2016).
Concordet, J. P. & Haeussler, M. CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens. Nucleic Acids Res. 46, W242–W245 (2018).
Nishihara, H. et al. Non-adherent cell-specific expression of DOCK2, a member of the human CDM-family proteins. Biochim. Biophys. Acta 1452, 179–187 (1999).
Ferren, M. et al. Hamster organotypic modeling of SARS-CoV-2 lung and brainstem infection. Nat. Commun. 12, 5809 (2021).
We thank all the participants involved in this study; all the members of JCTF for their support; J. Kitano and Ascend Corporation for voluntarily supporting JCTF; and COVID-19 Host Genetics Initiative for publicly sharing the GWAS summary statistics. This study was supported by AMED (JP20nk0101612, JP20fk0108415, JP21jk0210034, JP21km0405211, JP21km0405217, JP21fk0108469, JP21wm0325031, JP21gm4010006, JP22km0405211, JP22ek0410075, JP22km0405217, JP22ek0109594), JST CREST (JPMJCR20H2), JST PRESTO (JPMJPR21R7), JST Moonshot R&D (JPMJMS2021, JPMJMS2024), MHLW (20CA2054), JSPS KAKENHI (22H00476), Takeda Science Foundation, the Mitsubishi Foundation, the Team Osaka University Research Project in The Nippon Foundation–Osaka University Project for Infectious Disease Prevention, and Bioinformatics Initiative of Osaka University Graduate School of Medicine. The super-computing resource was provided by Human Genome Center at the University of Tokyo.