Trans-ethnic genome-wide association study of severe COVID-19

Wu, Peng; Ding, Lin; Li, Xiaodong; Liu, Siyang; Cheng, Fanjun; He, Qing; Xiao, Mingzhong; Wu, Ping; Hou, Hongyan; Jiang, Minghui; Long, Pinpin; Wang, Hao; Liu, Linlin; Qu, Minghan; Shi, Xian; Jiang, Qin; Mo, Tingting; Ding, Wencheng; Fu, Yu; Han, Shi; Huo, Xixiang; Zeng, Yingchun; Zhou, Yana; Zhang, Qing; Ke, Jia; Xu, Xi; Ni, Wei; Shao, Zuoyu; Wang, Jingzhi; Liu, Panhong; Li, Zilong; Jin, Yan; Zheng, Fang; Wang, Fang; Liu, Lei; Li, Wending; Liu, Kang; Peng, Rong; Xu, Xuedan; Lin, Yuhui; Gao, Hui; Shi, Limei; Geng, Ziyue; Mu, Xuanwen; Yan, Yu; Wang, Kai; Wu, Degang; Hao, Xingjie; Cheng, Shanshan; Qiu, Gaokun; Guo, Huan; Li, Kezhen; Chen, Gang; Sun, Ziyong; Lin, Xihong; Jin, Xin; Wang, Feng; Sun, Chaoyang; Wang, Chaolong

doi:10.1038/s42003-021-02549-5

Download PDF

Article
Open access
Published: 31 August 2021

Trans-ethnic genome-wide association study of severe COVID-19

Peng Wu^1,2^na1,
Lin Ding^3,4^na1,
Xiaodong Li^5,6^na1,
Siyang Liu⁷^na1,
Fanjun Cheng⁸^na1,
Qing He⁹^na1,
Mingzhong Xiao^5,6,
Ping Wu^1,2,
Hongyan Hou^2,10,
Minghui Jiang^3,4,
Pinpin Long^4,11,
Hao Wang^4,11,
Linlin Liu¹²,
Minghan Qu^3,4,
Xian Shi^3,4,
Qin Jiang^4,11,
Tingting Mo^4,11,
Wencheng Ding^1,2,
Yu Fu^1,2,
Shi Han¹²,
Xixiang Huo¹²,
Yingchun Zeng¹²,
Yana Zhou^5,6,
Qing Zhang^5,6,
Jia Ke^5,6,
Xi Xu^5,6,
Wei Ni^5,6,
Zuoyu Shao^5,6,
Jingzhi Wang^5,6,
Panhong Liu¹³,
Zilong Li ORCID: orcid.org/0000-0001-5859-2078¹³,
Yan Jin¹⁴,
Fang Zheng¹⁵,
Fang Wang⁹,
Lei Liu⁹,
Wending Li^4,11,
Kang Liu^4,11,
Rong Peng^4,11,
Xuedan Xu^4,11,
Yuhui Lin^4,11,
Hui Gao^4,11,
Limei Shi^4,11,
Ziyue Geng^4,11,
Xuanwen Mu^4,11,
Yu Yan^3,4,
Kai Wang^3,4,
Degang Wu^3,4,
Xingjie Hao^3,4,
Shanshan Cheng^3,4,
Gaokun Qiu^4,11,
Huan Guo^4,11,
Kezhen Li ORCID: orcid.org/0000-0001-7943-5389^1,2,
Gang Chen^1,2,
Ziyong Sun^2,10,
Xihong Lin^16,17,18,
Xin Jin ORCID: orcid.org/0000-0001-7554-4975¹⁹^na2,
Feng Wang^2,10^na2,
Chaoyang Sun ORCID: orcid.org/0000-0003-2469-1638^1,2^na2 &
…
Chaolong Wang ORCID: orcid.org/0000-0003-3945-1012^2,3,4^na2

Communications Biology volume 4, Article number: 1034 (2021) Cite this article

7293 Accesses
25 Citations
70 Altmetric
Metrics details

Subjects

Abstract

COVID-19 has caused numerous infections with diverse clinical symptoms. To identify human genetic variants contributing to the clinical development of COVID-19, we genotyped 1457 (598/859 with severe/mild symptoms) and sequenced 1141 (severe/mild: 474/667) patients of Chinese ancestry. We further incorporated 1401 genotyped and 948 sequenced ancestry-matched population controls, and tested genome-wide association on 1072 severe cases versus 3875 mild or population controls, followed by trans-ethnic meta-analysis with summary statistics of 3199 hospitalized cases and 897,488 population controls from the COVID-19 Host Genetics Initiative. We identified three significant signals outside the well-established 3p21.31 locus: an intronic variant in FOXP4-AS1 (rs1853837, odds ratio OR = 1.28, P = 2.51 × 10⁻¹⁰, allele frequencies in Chinese/European AF = 0.345/0.105), a frameshift insertion in ABO (rs8176719, OR = 1.19, P = 8.98 × 10⁻⁹, AF = 0.422/0.395) and a Chinese-specific intronic variant in MEF2B (rs74490654, OR = 8.73, P = 1.22 × 10⁻⁸, AF = 0.004/0). These findings highlight an important role of the adaptive immunity and the ABO blood-group system in protection from developing severe COVID-19.

Initial whole-genome sequencing and analysis of the host genetic contribution to COVID-19 severity and susceptibility

Article Open access 10 November 2020

Integrative genomic analyses identify susceptibility genes underlying COVID-19 hospitalization

Article Open access 27 July 2021

Host genetic factors of COVID-19 susceptibility and disease severity in a Thai population

Article 11 January 2022

Introduction

The coronavirus disease 2019 (COVID-19) is an ongoing pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Despite a huge number of cases have been diagnosed, both modeling studies and seroprevalence studies estimate the actual number of infections to be much larger, suggesting the majority of infected individuals might have mild or no symptoms^1,2,3,4. COVID-19 patients display a wide spectrum of clinical symptoms. Up to 5% of the confirmed cases would develop severe pneumonia with acute respiratory distress syndrome (ARDS)^5,6 and millions of deaths have been attributed to COVID-19⁷. While older age, male sex, and comorbidities, such as hypertension, diabetes, obesity, and cardiovascular diseases, were found to associate with severe COVID-19^6,8,9, many patients with no major risk factors were reported to develop severe symptoms¹⁰. Host genetic variation might contribute to the diverse clinical presentations of infectious diseases, potentially through the regulation of the immune system. Classic examples include the association between CCR5 gene and the human immunodeficiency virus (HIV) infection¹¹, ABO and malaria¹² and SARS¹³, and HLA-C and chronic hepatitis B virus infection¹⁴.

The first genome-wide association study (GWAS) of COVID-19 has reported two severity-associated loci in Italians and Spanish: the 3p21.31 locus containing several immune genes and the ABO (9q34.2) locus determining ABO blood groups¹⁵. The 3p21.31 locus has been replicated by several follow-up studies, including the COVID-19 Host Genetics Initiative (HGI)¹⁶ and a recent GWAS comparing COVID-19 patients from intensive care units (ICU) across the UK and ancestry-matched population controls¹⁷, which also reported three additional loci at 12q24.13, 19p13.3, and 21q22.1. Furthermore, whole-genome sequencing studies (WGS) have identified several rare putative loss-of-function (LOF) variants, which could impair type I and II interferon (IFN) immunity, in association with severe COVID-19^18,19.

Identification of host genetic variants associated with severe COVID-19 can help understand how our immune system interacts with SARS-CoV-2 and thus guide the development of effective prevention and therapeutic strategies, including prioritizing high-risk populations for vaccination in shortage of vaccine supply. Current genetic studies of COVID-19, like many other human genetic studies, are mainly based on European populations, which might lead to potential bias in translating findings to non-Europeans²⁰. A striking example is the 3p21.31 locus for COVID-19. The risk haplotype at this locus was found to inherit from Neanderthals, reaching a high frequency of 30% in South Asians and 8% in Europeans, but almost absent in Africans and East Asians^15,21. Thus, risk stratification based on this locus is not applicable to Africans and East Asians. A recent study of the host genetic contribution to COVID-19 severity in the Chinese population did not identify genome-wide significant association signal due to a small sample size of 332 patients²².

In this study, we bridge the gap by collecting and analyzing GWAS and WGS data of 1072 severe COVID-19 cases and 3875 controls (including 1526 patients with mild symptoms and 2349 population controls), all of the Chinese ancestry, and meta-analyzing with summary statistics from the HGI analysis (B2_release3) of 3199 hospitalized cases and 897,488 population controls of primarily European ancestry. We group population controls with mild patients because the vast majority of population controls would likely have COVID-19 with mild or no symptoms if they were exposed to the virus^1,2,3,4. Our analyses lead to three significant loci predisposing risk to severe COVID-19, including an intronic variant in FOXP4-AS1, a frameshift insertion in ABO, and a Chinese-specific rare intronic variant in MEF2B.

Results

We successfully genotyped 1626 COVID-19 patients from Tongji Hospital and Hubei Hospital of Traditional Chinese Medicine (TCM) in Wuhan using the Illumina Global Screening Array (GSA). After quality controls, we merged the COVID-19 dataset with 1459 population controls from the Coke Oven Worker (COW) cohort in Wuhan, who were genotyped using the same array, resulting in 369,072 autosomal and 8942 X chromosomal SNPs with minor allele frequency (MAF) > 0.005 in both datasets. The merged data were imputed to an East Asian WGS reference panel combining East Asians from the 1000 Genomes Project (1KGP)²³ and our in-house WGS data of Chinese, achieving good imputation quality for common variants with MAF > 0.01 (“Methods” and Supplementary Fig. 1). We kept 6,019,210 autosomal and 132,535 X chromosomal variants with imputation R² > 0.8 and MAF > 0.01 for downstream analyses.

After excluding second-degree and above relatedness and contaminated samples, we performed a GWAS of 598 severe cases versus 2260 controls (including 859 mild patients and 1401 population controls), correcting for the first two principal components (PCs) of population structure. The demographic characteristics of the samples were presented in Table 1. We detected a significant association signal located on 1p36.31, an intronic region of CHD5 (rs34308690, OR = 1.50, P = 4.52 × 10⁻⁸, AF = 0.386, 0.389, and 0.144 in Chinese GWAS, Chinese WGS, and 1KGP Europeans, respectively) (Supplementary Fig. 2). We then combined our results with GWAS summary statistics of 3199 hospitalized COVID-19 patients versus 897,488 population controls from the HGI study (B2_release3). Meta-analysis of our GWAS and the HGI results did not replicate the signal in CHD5, but led to three other significant loci (P < 5 × 10⁻⁸), including the previously reported 3p21.31 locus and the ABO locus¹⁵, and a novel locus at 6p21.1 within the FOXP4-AS1 gene (Supplementary Fig. 2). We noted that the signal of the 3p21.31 locus was solely from the HGI analysis because the risk variant at this locus was absent in our GWAS dataset^15,21.

Table 1 Demographic characteristics of Chinese GWAS and WGS datasets.

Full size table

We then pooled together WGS data of 474 severe and 667 mild COVID-19 patients from Wuhan Tongji Hospital, Wuhan Union Hospital, and the Third People’s Hospital of Shenzhen, and 948 ancestry-matched population controls from BGI-Shenzhen, all sequenced at BGI (Shenzhen). These WGS samples have no overlap with our GWAS samples (Table 1 and “Methods”). Samples from different sources were sequenced in batches, in which 467 COVID-19 patients from Union Hospital were sequenced using cell-free DNA (cfDNA) to ~17.8× , while the other samples were sequenced to >33× following standard WGS protocols. To minimize batch effects, we performed linkage disequilibrium (LD) based joint calling and stringent variant quality controls, followed by association tests correcting for the first two PCs and an indicator variable for the cfDNA batch. Summary statistics from these WGS samples were meta-analyzed with those from the previous two GWAS datasets (Fig. 1a–e). We observed no genomic inflation in all association analyses (genomic inflation factor λ_GC < 1.011, Supplementary Fig. 2).

**Fig. 1: Trans-ethnic meta-analysis results for severe COVID-19.**

Lead variants in both the 6p21.1 locus and the ABO locus have a consistent direction of effect in the WGS samples, and no heterogeneity in effect size was detected across cohorts (Table 2). The top association signal at the 6p21.1 locus is an intronic SNP (rs1853837, alleles: C/A) of the FOXP4-AS1 gene, which has P = 2.51 × 10⁻¹⁰ and OR = 1.28 (95% confidence interval [CI]: 1.19–1.39) after meta-analysis. The risk allele A is much more common in Chinese than in Europeans (allele frequency AF = 0.353 in our WGS Chinese and 0.105 in 1KGP Europeans). The top association signal at the ABO locus is a frameshift insertion in the ABO gene (rs8176719, alleles: T/TC, P = 8.98 × 10⁻⁹, OR = 1.19 [1.12–1.26]), which is the only variant passing the genome-wide significance level of P < 5 × 10⁻⁸ and is located about 6 kb away from the previously reported lead SNP rs657152 (LD in Chinese, r² = 0.95)¹⁵. The rs8176719 insertion is common in both Europeans and Chinese (AF > 0.39). Notably, this variant is one of the three major variants determining the haplotypes of the ABO blood group^15,24,25. Carriers of the T/T homozygote belong to blood group O, while carriers of the risk allele TC belong to the non-O group unless they also carry an extremely rare allele T at SNP rs41302905 (absence in our Chinese samples). Thus, consistent with previous studies¹⁵, our result suggests the blood group O is protective for COVID-19 severity. When further adjusting for age and sex (Supplementary Table 1 and Supplementary Fig. 3), the FOXP4-AS1 locus remained significant (P = 4.20 × 10⁻¹⁰), but the signal at the ABO locus diminished (P = 5.88 × 10⁻⁷), likely due to loss of power given that our population controls were younger and mostly males (Table 1).

Table 2 Significant loci associated with COVID-19 severity.

Full size table

To explore the potential mechanism of these two common variant association signals, we examined the association between SNP genotypes and nine inflammatory biomarkers in a subset of COVID-19 patients with detailed clinical data during their hospitalization. These serum biomarkers included interleukin 1 beta (IL-1β, sample size n = 804), interleukin 2 receptor (IL-2R, n = 802), interleukin 6 (IL-6, n = 846), interleukin 8 (IL-8, n = 790), interleukin 10 (IL-10, n = 799), tumor necrosis factor-alpha (TNF-α, n = 785), complements C3 (n = 273) and C4 (n = 272), and C-reactive protein (CRP, n = 768). Interestingly, we observed a significant association between genotypes of rs8176719 at the ABO locus and the serum level of IL-1β (Spearman’s correlation r_s = −0.274, P = 2.61 × 10⁻¹⁵, Fig. 2a), despite no association between IL-1β and COVID-19 severity in our samples (Table 3 and Fig. 2b). The association between rs8176719 and IL-1β remains significant after dichotomizing IL-1β as normal (≤12 pg/ml) and abnormal (> 12 pg/ml) and controlling for COVID-19 severity, age, and sex (OR = 0.22 [0.10–0.50], P = 2.28 × 10⁻⁴, Wald test based on logistic regression, Fig. 2b–d). We found no association between rs1853837 at FOXP4-AS and inflammatory biomarkers.

**Fig. 2: Comparison of serum level of IL-1β in different groups.**

Table 3 Serum levels of inflammatory biomarkers in Chinese COVID-19 patients.

Full size table

In addition, we identified a rare intronic SNP within the MEF2B gene reaching genome-wide significance (rs74490654 at 19q13.11, alleles: C/G, P = 1.22 × 10⁻⁸). This signal is contributed by our Chinese WGS data because the alternative allele is extremely rare with AF = 0.4% in 1KGP East Asians and 0 in other continental groups²³. The AF in our WGS controls (including mild patients) is 0.4%, the same as the 1KGP East Asians, but increases to 2.2% in our cases with severe COVID-19. Among 35 carriers of the alternative allele (all in heterozygote), there are 21 from 474 severe cases, 4 from 667 controls with mild COVID-19, and 10 from 948 population controls, translated into a large effect size of OR = 8.73 (95% CI: 4.14–18.41). Variant rs74490654 locates only 7 bp upstream of an ENCODE candidate cis-regulatory element (cCRE) E1945041, which is annotated with a distal enhancer-like signature in the B-cell lymphocyte lineage OCILY7²⁶. Hi-C data further suggest the cCRE E1945041 and the promoter of MEF2B locate in the same topologically associating domain (TAD)²⁷, indicating rs74490654 is likely to disrupt the transcriptional activities of MEF2B.

Finally, there are four suggestively significant loci (P < 10⁻⁶) reported by at least two datasets (Supplementary Table 2): 21q22.11 (rs1051393 in IFNAR2, OR = 1.17 [1.10–1.25], P = 4.33 × 10⁻⁷), 3p14.2 (rs672699 in PTPRG, OR = 1.18 [1.04–1.34], P = 5.58 × 10⁻⁷), 16q21 (rs7499679 in ADGRG1, OR = 0.85 [0.79–0.90], P = 8.09 × 10⁻⁷), and 1q44 (rs12130553 in HNRNPU, OR = 1.19 [1.11–1.27], P = 9.17 × 10⁻⁷). All four loci have a consistent direction of effects across datasets, but locus 3p14.2 (rs672699) has significant variation in the effect sizes (I² = 67.21%, P_het = 0.05; Supplementary Table 2).

Discussion

In this study, we tested host genetic association with COVID-19 severity based on the largest COVID-19 GWAS and WGS datasets of Chinese ancestry to date, including 1072 severe COVID-19 patients, 1526 mild patients, and 2349 population controls. We detected a Chinese-specific rare variant (rs74490654 in MEF2B 19q13.11) at genome-wide significance in the WGS samples of 474 cases and 1615 controls. This signal, however, was not replicated in the GWAS samples due to the low imputation quality of rare variants. Given the limited power to detect rare variant association with our small WGS sample size, further sequencing-based replication in large samples will be required to confirm the association signal at MEF2B. Two additional loci (FOXP4-AS1 at 6p21.1 and ABO at 9q34.2) were identified by trans-ethnic meta-analysis with summary statistics from the HGI (B2_release3) analysis of 3199 hospitalized patients and 897,488 population controls, most of which were European samples, highlighting that COVID-19, like many other complex diseases, requires a large sample size to reliably detect moderate effects of common variants.

In the HGI analysis, cases were hospitalized COVID-19 patients and controls were population controls with no information on COVID-19. Given that most hospitalized patients in western countries have severe symptoms and that most SARS-CoV-2 infections result in mild or no symptoms, the HGI data likely enrich for genetic associations with COVID-19 severity, despite different case definitions from our samples. The only significant locus in the HGI B2_release3 analysis was the 3p21.31 locus, which was first identified in Italians and Spanish¹⁵. Because the risk haplotype of this locus was almost absent in East Asians, our data provide no additional evidence.

The lead SNP of locus 6p21.1, rs1853837, is in the intron of the lncRNA forkhead box P4 antisense RNA 1 (FOXP4-AS1). The risk allele (A) at rs1853837 is an eQTL in positive association with the expression of FOXP4-AS1 in lung²⁸, and has been reported to associate with an increased risk for non-small cell lung cancer by GWAS²⁹. The GeneHancer database indicates rs1853837 is resided in an enhancer targeting FOXP4-AS1, forkhead box P4 (FOXP4), and natural cytotoxicity triggering receptor 2 (NCR2)³⁰.

During the revision of this paper, an updated HGI analysis with a much larger sample size (B2_release5, involving 13,641 cases and 2,070,709 controls) has identified FOXP4-AS1 as a significant locus associated with hospitalized COVID-19 (rs1886814, OR = 1.26, P = 1.11 × 10⁻⁹, LD r² = 0.64 with rs1853837)³¹, supporting the validity of our result. FOXP4 is a transcription factor expressed in thymocytes and peripheral CD4+ and CD8+ T cells, and knockout of FOXP4 can impair memory recall of T-cell cytokines in response to viral infections³². For COVID-19, SARS-CoV-2-specific T cells have been detected in many uninfected healthy individuals, likely due to an exposure history to common cold coronaviruses^33,34. Such cross-reactive T-cell immunity from other coronavirus has been speculated to affect COVID-19 severity. Furthermore, FOXP4 plays a key role in regulating lung secretory epithelial cell fate and regeneration and thus can affect the production of mucus to protect the lung against pathogens and pollution³⁵.

The association between blood group O with COVID-19 has been reported by both genetic and non-genetic studies^15,25,36,37. However, as discussed by Ellinghaus et al.¹⁵, their association signal at the ABO gene (9q34.2) might be subject to population stratification because of the inclusion of blood donors as controls, which might enrich for blood group O. Skepticism of this association was elevated when the association was not replicated in the HGI release 3 analysis with large sample size. Our result, in contrast, confirmed the association at a frameshift insertion rs8176719 of the ABO gene, which showed no heterogeneity across populations (OR = 1.28 [1.12–1.46] in Chinese GWAS, 1.17 [1.09–1.26] in HGI B2_release3, and 1.17 [0.98–1.38] in Chinese WGS; I² = 0.00%, P_het = 0.51). rs8176719 is the major variant determining blood group O and has been reported to associate with susceptibility to malaria¹². The ABO blood groups have also been implicated in the association with susceptibility to SARS¹³ and several immune diseases, such as allergy, amyotrophic lateral sclerosis, and asthma^38,39,40. We observed that carriers of T/T homozygote at rs8176719 (i.e., individuals of blood group O) tend to have an elevated serum level of IL-1β among COVID-19 patients (Fig. 2b). Nevertheless, unlike other inflammatory cytokines, such as IL-2R, IL-6, IL-8, IL-10, TNF-α, and CRP, which were elevated in severe patients due to strong immune response, we observed no association between IL-1β and COVID-19 severity (Table 3). Further investigations are needed to understand how the ABO gene affects susceptibility to severe COVID-19.

The top SNP rs74490654 was a rare variant located in the intron of myocyte enhancer factor-2B (MEF2B), one of the four MEF2 transcription factors involved in the regulation of muscle, neural crest, endothelial cell, and lymphocyte development⁴¹. Both epigenetic and Hi-C annotations suggest rs74490654 is likely to transcriptionally regulate the expression of MEF2B in lymphocytes. MEF2B could bind to its target DNA sites with a degenerate motif and act as a specific transcriptional regulator⁴². Importantly, MEF2B is overexpressed in lymphocytes (Fig. 3d)²⁸ and plays critical roles in anti-virus immune which would be associated with COVID-19 development and severity. First, MEF2B is critical for the formation of germinal centers and promoting early B-cell development^43,44, while B cells are essential in the defense of virus infection by producing protective antibodies. Second, MEF2 is necessary during peripheral T-cell activation by activating IL-2 and other cytokines^45,46. Moreover, Clark et al. revealed that MEF2 regulates susceptibility to infection and is associated with tolerance of pathology by regulating metabolism⁴⁷. Taking together, MEF2 plays essential roles in B-cells development, T-cells activation, and in immune-metabolic switch, which could at least partially explain the association between rs74490654 and COVID-19 severity.

**Fig. 3: RNA expression in multiple tissues for genes within significant loci.**

Several candidate genes within the four suggestive loci have been reported to associate with immune-related traits. IFNAR2 and IL10RB at 21q22.11 were associated with the susceptibility to hepatitis B virus infection⁴⁸, Crohn’s disease⁴⁹, type 2 diabetes⁵⁰, and immunodeficiency⁵¹. In particular, IFNAR2 encoded interferon alpha- and beta-receptor subunit 2, which is essential for antiviral immunity⁵². Both common and rare variants in IFNAR2 have been implied in the susceptibility to severe COVID-19^17,19. PTPRG at 3p14.2 was associated with pneumococcal bacteremia⁵³. While HNRNPU at 1q44 is famous for association with neurodevelopmental delays and epilepsy^54,55, it also plays a role in restricting HIV activity by blocking the cytoplasmic accumulation of viral mRNA transcripts⁵⁶.

Vigorous multi-faceted public health interventions have led to rapid control of the COVID-19 epidemic in China^3,57. Therefore, it is difficult for us to recruit more patients to increase the sample size and statistical power. We augmented our samples with population controls from two existing cohorts, who were primarily males (for the COW cohort) and under age 50 (for both). To avoid loss of power, we did not adjust for age and sex in our main analyses because age and sex were systematically different between cases and population controls due to data collection rather than biological effects. Given that age and sex are not associated with autosomal genotypes, we do not expect spurious genetic association signals due to confounding effects of unadjusted age and sex. We identified common variants in ABO and FOXP4-AS1 and a rare variant in MEF2B to be significantly associated with COVID-19 severity, likely through regulation of the adaptive immunity. Unlike the 3p21.31 locus, of which the risk haplotype is specific to Europeans and South Asians^15,21, risk variants in both ABO and FOXP4-AS1 loci are common in worldwide populations²³. The rare risk variant in MEF2B, on the other hand, is specific to East Asians and confers about the eightfold increase in the risk of severe COVID-19 among carriers. These findings, together with many more to discover through ongoing international collaborations¹⁶, have important implications on the biology underlying the clinical development of COVID-19, and thus might help develop targeted prevention and therapeutic strategies to combat the pandemic.

Methods

Ethics statement

This study was reviewed and approved by the Institutional Review Boards of Tongji Hospital (TJ-IRB20200405) and Union Hospital (UH-IRB20200075-1), Tongji Medical College, Huazhong University of Science and Technology, and the Third People’s Hospital of Shenzhen (SZ3H-2020-006-02). Informed consent was obtained from all enrolled patients. Blood samples were collected using the rest of the standard diagnostic tests, with no burden to the patients.

Phenotype definition

We classified COVID-19 patients into two groups: severe and mild. The severe group included those diagnosed as critical or severe following the guidelines for diagnosis and treatment of COVID-19 (Trial Version 7) released by the National Health Commission of the People’s Republic of China. Briefly, a patient was diagnosed as a critical illness if at least one of the following conditions was met: (1) acute respiratory distress syndrome (ARDS) requiring mechanical ventilation, (2) shock, (3) combining with other organ failure requiring ICU admission. Severe illness was defined by meeting at least one of the following conditions: (1) respiratory rate ≥30 times/min, (2) oxygen saturation ≤93% at resting state, (3) arterial partial pressure of oxygen (PaO₂)/fraction of inspired oxygen (FiO₂) ≤300 mmHg, (4) pulmonary imaging examination showed that the lesions significantly progressed by more than 50% within 24–48 h. The other patients, including those with no clinical symptoms, were defined as mild cases. For patients whose electronic medical records (EMR) were available, we examined the EMR to classify the disease severity. For those with no EMR, we extracted information on disease severity from the municipal Notifiable Disease Report System, which might be determined based on similar criteria in early versions of the guidelines for diagnosis and treatment of COVID-19.

GWAS data of COVID-19

Blood samples of 1626 COVID-19 patients were collected from Tongji Hospital and Hubei Hospital of Traditional Chinese Medicine (TCM) in Wuhan. For samples from Tongji Hospital, genomic DNA was extracted from thawed whole blood samples with the BioTeke Genomic DNA kit (BioTeke, Beijing, China). For samples from Hubei Hospital of TCM, genomic DNA was extracted from 200 µl of EDTA-treated whole blood using the KingFisher Flex Purification System with the KingFisher Pure DNA Blood Kit (Thermo Fisher Scientific). All extractions were performed under Level III protection in the biosafety III laboratories.

For each sample, 200 ng of DNA was loaded on the Infinium Global Screening Array (GSA, Illumina, San Diego, CA) for genotyping, following the manufacturer’s instructions. Genotype calling was performed using GenomeStudio (v 2.0). After filtering SNPs with a call rate < 0.95, we genotyped 650,630 autosomal SNPs and 27,495 SNPs on the X chromosome. We then performed QC using PLINK (v 2.0)⁵⁸, removing 29 samples with missing rate > 0.1, 3 samples with inbreeding coefficient < −0.1, 76 duplicated samples, and 22 samples with the discrepancy between the inferred sex and the recorded sex (Supplementary Fig. 4). Finally, we kept SNPs with Hardy-Weinberg equilibrium (HWE) P > 10⁻⁶, including 649,431 autosomal SNPs and 27,479 X chromosomal SNPs. HWE tests for the X chromosomal SNPs were based on females.

GWAS data of population controls

We used existing genotyping data of a Coke Oven Worker (COW) cohort in Wuhan as population controls, whose ancestry background is similar to our COVID-19 patient samples collected in Wuhan⁵⁹. In total, 1477 individuals were genotyped using the GSA array in 2018. After excluding 18 samples with call rate < 0.9, 9 potentially contaminated samples (inbreeding coefficient < −0.1), 49 second-degree and above related samples (kinship coefficient φ > 0.088), 1401 individuals (1207 males and 194 females) were included as population controls. We removed SNPs with call rate <0.95, HWE P < 10⁻⁶, and monomorphic variants, and lift over the remaining SNPs from human reference genome GRCh37 to GRCh38, leaving 476,578 autosomal SNPs and 11,082 SNPs on chromosome X.

Imputation

We merged our COVID-19 GSA dataset and the COW dataset by extracting 436,444 autosomal and 10,528 chromosome X SNPs genotyped in both datasets. We further excluded SNPs with MAF < 0.005 in either dataset, leaving 369,072 autosomal and 8942 chromosome X SNPs for imputation.

We constructed an imputation reference panel by combing the 1000 Genomes Project (1KGP) dataset²³ and our in-house WGS data of COVID-19 patients from Tongji Hospital (see the “WGS data and analysis” section below). We first extracted the intersecting variants of 1KGP and our WGS dataset and used EAGLE2 (v 2.3.5) to phase our WGS samples with haplotypes from 1KGP as the reference⁶⁰. We then combined the phased haplotypes of our WGS samples and the East Asian samples from 1KGP to form a reference panel to phase and impute our array genotyping datasets. Imputation was performed using Minimac4⁶¹. For the X chromosome, pseudo-autosomal regions (PARs) were excluded and non-PARs were imputed separately for males and females. After removing variants with MAF < 0.01 and imputation R² < 0.8, 6,019,210 autosomal variants and 132,535 on the X chromosome remained for downstream analyses.

Cryptic relatedness and population structure

We filtered variants with MAF < 0.05, and pruned the merged COVID-19 and COW genotyping data to have linkage disequilibrium (LD) r² < 0.5, resulting in 157,968 autosomal SNPs⁵⁸. Using this set of SNPs, we inferred genetic relatedness using KING (v 2.2.5) and determined relatedness types based on the estimated kinship coefficients φ and the probability of zero-IBD-sharing π₀⁶². We identified 31 first-degree and 7 second-degree related pairs in the COVID-19 samples (Supplementary Fig. 5). After excluding close relatedness up to the second degree (φ > 0.088), we performed principal components analysis (PCA) on the combined COVID-19 and COW datasets. No systematic ancestral or batch effect differences were observed in the top PCs between cases and controls from different sources (Supplementary Fig. 6).

Association tests and meta-analysis

We performed association analysis using the EPACTS software on the imputed dosage data, which accounts for the imputation uncertainty. After removing close relatedness and 5 patients with missing severity information, we tested for the single-variant association on 598 severe/critical COVID-19 cases versus 2260 controls. Effect sizes and P values were derived from Wald tests under a logistic model, adjusting for the first two PCs of population structure. For the analysis of chromosome X, we treated the non-PAR variants as homozygotes for males and included sex as an additional covariate. We also performed genome-wide association analysis further adjusting for age and sex.

We downloaded summary statistics of the COVID-19 HGI B2_release3 analysis of 3199 hospitalized COVID-19 patients versus 897,488 population controls, who were primarily Europeans. There were very few Asian samples in the B2_release3 (only 62 South Asian cases) and we thus did not request for Asian-specific statistics. We performed random-effects meta-analysis using the RE2 model implemented in METASOFT⁶³. We chose the B2 dataset rather than the A2 dataset from HGI because the cases in A2 were defined as very severe confirmed COVID-19 cases that required respiratory support more than simple supplementary oxygen, a much stronger case definition than the definition of severe cases in China. Considering that only patients with severe clinical symptoms were recommended for hospitalization in most western countries during the early phase of the pandemic, the case definition of hospitalized COVID-19 patients in B2 dataset aligned better with the severe patient definition in China. We reported meta-analysis results, as well as the I² index of heterogeneity across datasets and the corresponding P-value P_het. We visualized regional association results using the LocusZoom software with LD information based on Chinese WGS samples⁶⁴.

WGS data and analysis

Blood samples of 474 severe and 667 mild COVID-19 patients were sequenced at BGI, Shenzhen. These samples were from three sources, including 305 (mild/severe: 211/94) samples from Tongji Hospital in Wuhan, 467 (170/297) from Union Hospital in Wuhan, and 369 (286/83) from the Third People’s Hospital of Shenzhen after excluding related and duplicated samples²². For those from Tongji Hospital and the Third People’s Hospital of Shenzhen, genomic DNA was extracted from frozen blood samples using Magnetic Beads Blood Genomic DNA Extraction Kit (MGI, Shenzhen, China). Around 0.5 μg DNA was used for creating the WGS library for each patient. For those from Union Hospital, circulating cell-free DNA (cfDNA) was extracted from 200 μL plasma using MagPure Circulating DNA Mini KF Kit (MD5432-02) following the manufacturer’s instructions. The cfDNA was eluted by 200 μL TE buffer for QC and 40 μL for the rest. The extracted cfDNA was processed to library construction using MGIEasy Cell-free DNA Library Prep kit (MGI, cat. No.: AA00226). After library preparation, all samples were sequenced by the DNBSEQ platform (MGI, Shenzhen, China) to generate 100bp paired-end reads. The mean sequencing depth was 45.0× for those from the Third People’s Hospital of Shenzhen, 33.3× for those from Tongji Hospital, 17.8× for those from Union Hospital.

To minimize batch effects, we performed joint calling and quality controls for all three datasets together. We used Sentieon (sentieon-genomics-201911) for alignment and variant detection following the best practices (https://gatk.broadinstitute.org/hc/en-us/sections/360007226651-Best-Practices-Workflows)⁶⁵. Briefly, sequence reads were mapped to GRCh38 using BWA⁶⁶. For each sample, after duplication removal, INDEL realignment and base quality score recalibration, SNPs and INDELs were detected using the Sentieon Haplotyper algorithm with option “--emit_mode gvcf” to generate an individual GVCF file. Then the GVCF files for all samples were subjected to Sentieon GVCFtyper algorithm for joint variant calling. Variant Quality Score Recalibration (VQSR) was performed using GATK (v 4.1.2). Reference-free LD-based genotype refinement was performed using BEAGLE (v 4.0)⁶⁷, which took the genotype uncertainty into account with the -gl flag. Variants with DR2 < 0.8 or HWE P < 10^-6 were excluded from downstream analysis.

To boost statistical power, we searched for additional population controls from an existing WGS dataset in BGI, Shenzhen, consisting of 1872 unrelated individuals sequenced to a mean depth of 40.0×. Sequencing and genotype calling for this dataset follow the standard WGS protocol as described above except that no LD-based refinement was performed given the high sequencing depth. We combined two call sets by extracting 8,673,249 shared biallelic variants after excluding variants with minor allele counts MAC < 5 or missing rate > 0.05 in either set, or HWE P < 10⁻⁶ in the combined set. We then performed PCA using 539,603 autosomal biallelic SNPs with MAF > 0.05 and LD r² < 0.5⁵⁸. For each of our 474 severe COVID-19 cases, we identified two ancestry-matched population controls based on Euclidean distances in the first two PCs using the optmatch R package^68,69. Thus, the final WGS dataset consists of 474 cases with severe symptoms, 667 controls with mild symptoms, and 948 ancestry-matched populations controls. Again, no systematic differences between samples from different batches were found in the top PCs (Supplementary Fig. 7).

Despite the very stringent quality controls described above, residual batch effects might persist because samples from Union Hospital were sequenced at 17.8× using cfDNA. We, therefore, included an indicator variable for the cfDNA batch and the first two PCs as covariates in our logistic model for association tests between 474 severe cases and 1615 controls (mild patients and population controls). Genomic inflation factor was λ_GC = 1.001, indicating overall well-controlled batch effects and population stratification. Furthermore, we performed additional association tests on 667 mild patients versus 948 population controls, adjusting for the top two PCs and the cfDNA batch indicator, and identified 1869 variants with P < 10⁻⁶. We conservatively excluded these variants from our final results because they might be subject to residual batch effects.

Clinical measurements

We analyzed the serum level of inflammatory biomarkers, including interleukin 1 beta (IL-1β), interleukin 2 receptor (IL-2R), interleukin 6 (IL-6), interleukin 8 (IL-8), interleukin 10 (IL-10), tumor necrosis factor-alpha (TNF-α), complements C3 and C4, and C-reactive protein (CRP) for hospitalized patients from Tongji Hospital, including both mild and severe patients. For patients with measurements at multiple time points, we used the earliest measurement after hospitalization. The majority of the measurements were taken in the first week of hospitalization. We compared these clinical measurements in different groups of samples defined by COVID-19 severity, age, sex, and genotypes of GWAS top SNPs using the Wilcoxon rank-sum test and Spearman’s correlation.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

Summary statistics of the association tests in our Chinese samples have been deposited in the China National Genebank Sequence Archive (https://db.cngb.org/cnsa/) with accession number CNP0001981. Individual-level genotype data are not publicly available due to the protection of privacy and regulations. Source data for Fig. 2 is provided in Supplementary Data 1.

Code availability

Details regarding the packages and versions used are included in “Methods”.

References

Gudbjartsson, D. F. et al. Humoral immune response to SARS-CoV-2 in iceland. N. Engl. J. Med. 383, 1724–1734 (2020).
Article CAS PubMed Google Scholar
Pollan, M. et al. Prevalence of SARS-CoV-2 in Spain (ENE-COVID): a nationwide, population-based seroepidemiological study. Lancet 396, 535–544 (2020).
Article CAS PubMed PubMed Central Google Scholar
Hao, X. J. et al. Reconstruction of the full transmission dynamics of COVID-19 in Wuhan. Nature 584, 420–424 (2020).
Article CAS PubMed Google Scholar
Havers, F. P. et al. Seroprevalence of antibodies to SARS-CoV-2 in 10 sites in the United States, March 23-May 12, 2020. JAMA Intern. Med. https://doi.org/10.1001/jamainternmed.2020.4130 (2020).
Grasselli, G. et al. Baseline characteristics and outcomes of 1591 patients infected with SARS-CoV-2 admitted to ICUs of the Lombardy Region, Italy. J. Am. Med. Assoc. 323, 1574–1581 (2020).
Article CAS Google Scholar
Richardson, S., Hirsch, J. S. & Narasimhan, M. Presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with COVID-19 in the New York City Area. J. Am. Med. Assoc. 323, 2098–2098 (2020).
Article CAS Google Scholar
Dong, E., Du, H. & Gardner, L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect. Dis. 20, 533–534 (2020).
Article CAS PubMed PubMed Central Google Scholar
Li, X. et al. Risk factors for severity and mortality in adult COVID-19 inpatients in Wuhan. J. Allergy Clin. Immunol. 146, 110–118 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zhou, F. et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet 395, 1054–1062 (2020).
Article CAS PubMed PubMed Central Google Scholar
Cunningham, J. W. et al. Clinical outcomes in young US adults hospitalized with COVID-19. JAMA Intern. Med. e205313, https://doi.org/10.1001/jamainternmed.2020.5313 (2020).
Samson, M. et al. Resistance to HIV-1 infection in caucasian individuals bearing mutant alleles of the CCR-5 chemokine receptor gene. Nature 382, 722–725 (1996).
Article CAS PubMed Google Scholar
Timmann, C. et al. Genome-wide association study indicates two novel resistance loci for severe malaria. Nature 489, 443–446 (2012).
Article CAS PubMed Google Scholar
Cheng, Y. et al. ABO blood group and susceptibility to severe acute respiratory syndrome. J. Am. Med. Assoc. 293, 1450–1451 (2005).
CAS Google Scholar
Hu, Z. et al. New loci associated with chronic hepatitis B virus infection in Han Chinese. Nat. Genet. 45, 1499–1503 (2013).
Article CAS PubMed Google Scholar
The Severe Covid-19 GWAS Group. Genomewide association study of severe Covid-19 with respiratory failure. N. Engl. J. Med. 383, 1522–1534 (2020).
Article Google Scholar
The COVID-19 Host Genetics Initiative. The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic. Eur. J. Hum. Genet. 28, 715–718 (2020).
Article PubMed Central CAS Google Scholar
Pairo-Castineira, E. et al. Genetic mechanisms of critical illness in Covid-19. Nature https://doi.org/10.1038/s41586-020-03065-y (2020).
Van der Made, C. I. et al. Presence of genetic variants among young men with severe COVID-19. J. Am. Med. Assoc. 324, 663–673 (2020).
Article CAS Google Scholar
Zhang, Q. et al. Inborn errors of type I IFN immunity in patients with life-threatening COVID-19. Science 370, eabd4570 (2020).
Article CAS PubMed PubMed Central Google Scholar
Martin, A. R. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51, 584–591 (2019).
Article CAS PubMed PubMed Central Google Scholar
Zeberg, H. & Pääbo, S. The major genetic risk factor for severe COVID-19 is inherited from Neanderthals. Nature https://doi.org/10.1038/s41586-020-2818-3 (2020).
Wang, F. et al. Initial whole-genome sequencing and analysis of the host genetic contribution to COVID-19 severity and susceptibility. Cell Discov. 6, 83 (2020).
Article CAS PubMed PubMed Central Google Scholar
The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article CAS Google Scholar
Bugert, P., Rink, G., Kemp, K. & Kluter, H. Blood group ABO genotyping in paternity testing. Transfus. Med. Hemoth 39, 182–186 (2012).
Article Google Scholar
Shelton, J. F. et al. Trans-ancestry analysis reveals genetic and nongenetic associations with COVID-19 susceptibility and severity. Nat. Genet. 53, 801–808 (2021).
Article CAS PubMed Google Scholar
Encode Project Consortium. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).
Article CAS Google Scholar
Kerpedjiev, P. et al. HiGlass: web-based visual exploration and analysis of genome interaction maps. Genome Biol. 19, 125 (2018).
Article PubMed PubMed Central CAS Google Scholar
The GTEx Consortium. The genotype-tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
Article CAS Google Scholar
Dai, J. et al. Identification of risk loci and a polygenic risk score for lung cancer: a large-scale prospective cohort study in Chinese populations. Lancet Respir. Med. 7, 881–891 (2019).
Article PubMed PubMed Central Google Scholar
Fishilevich, S. et al. GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database (Oxf.) 2017, bax028 (2017).
Google Scholar
COVID-19 Host Genetics Initiative. Mapping the human genetic architecture of COVID-19. Nature https://doi.org/10.1038/s41586-021-03767-x (2021).
Wiehagen, K. R. et al. Foxp4 is dispensable for T cell development, but required for robust recall responses. PLoS One 7, e42273 (2012).
Article CAS PubMed PubMed Central Google Scholar
Braun, J. et al. SARS-CoV-2-reactive T cells in healthy donors and patients with COVID-19. Nature 587, 270–274 (2020).
Article CAS PubMed Google Scholar
Le Bert, N. et al. SARS-CoV-2-specific T cell immunity in cases of COVID-19 and SARS, and uninfected controls. Nature 584, 457–462 (2020).
Article CAS PubMed Google Scholar
Li, S. R. et al. Foxp1/4 control epithelial cell fate during lung development and regeneration through regulation of anterior gradient 2. Development 139, 2500–2509 (2012).
Article CAS PubMed PubMed Central Google Scholar
Li, J. Y. et al. Association between ABO blood groups and risk of SARS-CoV-2 pneumonia. Brit J. Haematol. 190, 24–27 (2020).
Article CAS Google Scholar
Zhao, J. et al. Relationship between the ABO Blood Group and the COVID-19 Susceptibility. Clin. Infect. Dis. ciaa1150, https://doi.org/10.1093/cid/ciaa1150 (2020).
Ferreira, M. A. R. et al. Eleven loci with new reproducible genetic associations with allergic disease risk. J. Allergy Clin. Immunol. 143, 691–699 (2019).
Article CAS PubMed Google Scholar
Pickrell, J. K. et al. Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet. 48, 709–717 (2016).
Article CAS PubMed PubMed Central Google Scholar
Schymick, J. C. et al. Genome-wide genotyping in amyotrophic lateral sclerosis and neurologically normal controls: first stage analysis and public release of data. Lancet Neurol. 6, 322–328 (2007).
Article CAS PubMed Google Scholar
Potthoff, M. J. & Olson, E. N. MEF2: a central regulator of diverse developmental programs. Development 134, 4131–4140 (2007).
Article CAS PubMed Google Scholar
Machado, A. C. D. et al. Landscape of DNA binding signatures of myocyte enhancer factor-2B reveals a unique interplay of base and shape readout. Nucleic Acids Res. 48, 8529–8544 (2020).
Article CAS Google Scholar
Herglotz, J. et al. Essential control of early B-cell development by Mef2 transcription factors. Blood 127, 572–581 (2016).
Article CAS PubMed Google Scholar
Brescia, P. et al. MEF2B instructs germinal center development and acts as an oncogene in B cell lymphomagenesis. Cancer Cell 34, 453–465 (2018).
Article CAS PubMed PubMed Central Google Scholar
Pan, F., Ye, Z., Cheng, L. & Liu, J. O. Myocyte enhancer factor 2 mediates calcium-dependent transcription of the interleukin-2 gene in T lymphocytes: a calcium signaling module that is distinct from but collaborates with the nuclear factor of activated T cells (NFAT). J. Biol. Chem. 279, 14477–14480 (2004).
Article CAS PubMed Google Scholar
Esau, C. et al. Deletion of calcineurin and myocyte enhancer factor 2 (MEF2) binding domain of Cabin1 results in enhanced cytokine gene expression in T cells. J. Exp. Med. 194, 1449–1459 (2001).
Article CAS PubMed PubMed Central Google Scholar
Clark, R. I. et al. MEF2 is an in vivo immune-metabolic switch. Cell 155, 435–447 (2013).
Article CAS PubMed PubMed Central Google Scholar
Frodsham, A. J. et al. Class II cytokine receptor gene cluster is a major locus for hepatitis B persistence. Proc. Natl Acad. Sci. USA 103, 9148–9153 (2006).
Article CAS PubMed PubMed Central Google Scholar
Jostins, L. et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 491, 119–124 (2012).
Article CAS PubMed PubMed Central Google Scholar
Dominguez-Cruz, M. G. et al. Pilot genome-wide association study identifying novel risk loci for type 2 diabetes in a Maya population. Gene 677, 324–331 (2018).
Article CAS PubMed Google Scholar
Bucciol, G. et al. Lessons learned from the study of human inborn errors of innate immunity. J. Allergy Clin. Immun. 143, 507–527 (2019).
Article CAS PubMed Google Scholar
Duncan, C. et al. Human IFNAR2 deficiency: lessons for antiviral immunity. Clin. Exp. Immunol. 182, 1–2 (2015).
Google Scholar
Kenyan Bacteraemia Study, G. et al. Polymorphism in a lincRNA associates with a doubled risk of pneumococcal bacteremia in kenyan children. Am. J. Hum. Genet. 98, 1092–1100 (2016).
Article CAS Google Scholar
Bramswig, N. C. et al. Heterozygous HNRNPU variants cause early onset epilepsy and severe intellectual disability. Hum. Genet. 136, 821–834 (2017).
Article CAS PubMed Google Scholar
Depienne, C. et al. Genetic and phenotypic dissection of 1q43q44 microdeletion syndrome and neurodevelopmental phenotypes associated with mutations in ZBTB18 and HNRNPU. Eur. J. Hum. Genet. 26, 324–325 (2018).
Article Google Scholar
Valente, S. T. & Goff, S. P. Inhibition of HIV-1 gene expression by a fragment of hnRNP U. Mol. Cell 23, 597–605 (2006).
Article CAS PubMed Google Scholar
Pan, A. et al. Association of public health interventions with the epidemiology of the COVID-19 outbreak in Wuhan, China. J. Am. Med. Assoc. 323, 1915–1923 (2020).
Article CAS Google Scholar
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
Article PubMed PubMed Central CAS Google Scholar
Wei, W. et al. Lead exposure and its interactions with oxidative stress polymorphisms on lung function impairment: Results from a longitudinal population-based study. Environ. Res. 187, 109645 (2020).
Article CAS PubMed Google Scholar
Loh, P. R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).
Article CAS PubMed PubMed Central Google Scholar
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
Article CAS PubMed PubMed Central Google Scholar
Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).
Article CAS PubMed PubMed Central Google Scholar
Han, B. & Eskin, E. Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. Am. J. Hum. Genet. 88, 586–598 (2011).
Article CAS PubMed PubMed Central Google Scholar
Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).
Article CAS PubMed PubMed Central Google Scholar
Freed, D., Aldana, R., Weber, J. A. & Edwards, J. S. The Sentieon Genomics Tools—a fast and accurate solution to variant calling from next-generation sequence data. Preprint at bioRxiv https://doi.org/10.1101/115717 (2017).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Browning, B. L. & Yu, Z. X. Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies. Am. J. Hum. Genet. 85, 847–861 (2009).
Article CAS PubMed PubMed Central Google Scholar
Hansen, B. B. & Klopfer, S. O. Optimal full matching and related designs via network flows. J. Comput. Graph Stat. 15, 609–627 (2006).
Article Google Scholar
Wang, C. et al. Ancestry estimation and control of population stratification for sequence-based association studies. Nat. Genet. 46, 409–415 (2014).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank the COVID-19 Host Genetic Initiative for sharing the HGI B2_release3 summary statistics, and the China National GeneBank for computational support. This work was funded by Wuhan Municipal Health Commission (EG20B03), Natural Science Foundation of China (81973148, 82003561, 32000398, and 31900487), Natural Science Foundation of Guangdong Province (2017A030306026), and Guangdong-Hong Kong Joint Laboratory on Immunological and Genetic Kidney Diseases (2019B121205005).

Author information

These authors contributed equally: Peng Wu, Lin Ding, Xiaodong Li, Siyang Liu, Fanjun Cheng, Qing He.
These authors jointly supervised this work: Xin Jin, Feng Wang, Chaoyang Sun, Chaolong Wang.

Authors and Affiliations

Department of Obstetrics and Gynecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
Peng Wu, Ping Wu, Wencheng Ding, Yu Fu, Kezhen Li, Gang Chen & Chaoyang Sun
National Medical Center for Major Public Health Events, Huazhong University of Science and Technology, Wuhan, China
Peng Wu, Ping Wu, Hongyan Hou, Wencheng Ding, Yu Fu, Kezhen Li, Gang Chen, Ziyong Sun, Feng Wang, Chaoyang Sun & Chaolong Wang
Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
Lin Ding, Minghui Jiang, Minghan Qu, Xian Shi, Yu Yan, Kai Wang, Degang Wu, Xingjie Hao, Shanshan Cheng & Chaolong Wang
Ministry of Education Key Laboratory of Environment and Health, State Key Laboratory of Environmental Health (Incubating), School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
Lin Ding, Minghui Jiang, Pinpin Long, Hao Wang, Minghan Qu, Xian Shi, Qin Jiang, Tingting Mo, Wending Li, Kang Liu, Rong Peng, Xuedan Xu, Yuhui Lin, Hui Gao, Limei Shi, Ziyue Geng, Xuanwen Mu, Yu Yan, Kai Wang, Degang Wu, Xingjie Hao, Shanshan Cheng, Gaokun Qiu, Huan Guo & Chaolong Wang
Hepatic Disease Institute, Hubei Key Laboratory of Theoretical and Applied Research of Liver and Kidney in Traditional Chinese Medicine, Hubei Provincial Hospital of Traditional Chinese Medicine, Wuhan, China
Xiaodong Li, Mingzhong Xiao, Yana Zhou, Qing Zhang, Jia Ke, Xi Xu, Wei Ni, Zuoyu Shao & Jingzhi Wang
Hubei Provincial Academy of Traditional Chinese Medicine, Wuhan, China
Xiaodong Li, Mingzhong Xiao, Yana Zhou, Qing Zhang, Jia Ke, Xi Xu, Wei Ni, Zuoyu Shao & Jingzhi Wang
School of Public Health (Shenzhen), Sun Yat-sen University, Shenzhen, Guangdong, China
Siyang Liu
Department of Hematology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
Fanjun Cheng
The Third People’s Hospital of Shenzhen, National Clinical Research Center for Infectious Disease, The Second Affiliated Hospital of Southern University of Science and Technology, Shenzhen, China
Qing He, Fang Wang & Lei Liu
Department of Laboratory Medicine, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
Hongyan Hou, Ziyong Sun & Feng Wang
Department of Occupational and Environmental Health, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
Pinpin Long, Hao Wang, Qin Jiang, Tingting Mo, Wending Li, Kang Liu, Rong Peng, Xuedan Xu, Yuhui Lin, Hui Gao, Limei Shi, Ziyue Geng, Xuanwen Mu, Gaokun Qiu & Huan Guo
Hubei Provincial Center for Disease Control and Prevention, Wuhan, China
Linlin Liu, Shi Han, Xixiang Huo & Yingchun Zeng
College of Life Sciences, University of Chinese Academy of Sciences, Beijing, China
Panhong Liu & Zilong Li
Department of Emergency, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
Yan Jin
Department of Pediatrics, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
Fang Zheng
Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
Xihong Lin
Department of Statistics, Harvard University, Cambridge, MA, USA
Xihong Lin
Broad Institute of MIT and Harvard, Cambridge, MA, USA
Xihong Lin
School of Medicine, South China University of Technology, Guangzhou, China
Xin Jin

Authors

Peng Wu
View author publications
You can also search for this author in PubMed Google Scholar
Lin Ding
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Li
View author publications
You can also search for this author in PubMed Google Scholar
Siyang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Fanjun Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Qing He
View author publications
You can also search for this author in PubMed Google Scholar
Mingzhong Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Ping Wu
View author publications
You can also search for this author in PubMed Google Scholar
Hongyan Hou
View author publications
You can also search for this author in PubMed Google Scholar
Minghui Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Pinpin Long
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Linlin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Minghan Qu
View author publications
You can also search for this author in PubMed Google Scholar
Xian Shi
View author publications
You can also search for this author in PubMed Google Scholar
Qin Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Tingting Mo
View author publications
You can also search for this author in PubMed Google Scholar
Wencheng Ding
View author publications
You can also search for this author in PubMed Google Scholar
Yu Fu
View author publications
You can also search for this author in PubMed Google Scholar
Shi Han
View author publications
You can also search for this author in PubMed Google Scholar
Xixiang Huo
View author publications
You can also search for this author in PubMed Google Scholar
Yingchun Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Yana Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Qing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jia Ke
View author publications
You can also search for this author in PubMed Google Scholar
Xi Xu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Ni
View author publications
You can also search for this author in PubMed Google Scholar
Zuoyu Shao
View author publications
You can also search for this author in PubMed Google Scholar
Jingzhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Panhong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zilong Li
View author publications
You can also search for this author in PubMed Google Scholar
Yan Jin
View author publications
You can also search for this author in PubMed Google Scholar
Fang Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Fang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wending Li
View author publications
You can also search for this author in PubMed Google Scholar
Kang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Rong Peng
View author publications
You can also search for this author in PubMed Google Scholar
Xuedan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yuhui Lin
View author publications
You can also search for this author in PubMed Google Scholar
Hui Gao
View author publications
You can also search for this author in PubMed Google Scholar
Limei Shi
View author publications
You can also search for this author in PubMed Google Scholar
Ziyue Geng
View author publications
You can also search for this author in PubMed Google Scholar
Xuanwen Mu
View author publications
You can also search for this author in PubMed Google Scholar
Yu Yan
View author publications
You can also search for this author in PubMed Google Scholar
Kai Wang
View author publications
You can also search for this author in PubMed Google Scholar
Degang Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xingjie Hao
View author publications
You can also search for this author in PubMed Google Scholar
Shanshan Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Gaokun Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Huan Guo
View author publications
You can also search for this author in PubMed Google Scholar
Kezhen Li
View author publications
You can also search for this author in PubMed Google Scholar
Gang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ziyong Sun
View author publications
You can also search for this author in PubMed Google Scholar
Xihong Lin
View author publications
You can also search for this author in PubMed Google Scholar
Xin Jin
View author publications
You can also search for this author in PubMed Google Scholar
Feng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chaoyang Sun
View author publications
You can also search for this author in PubMed Google Scholar
Chaolong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.W., Peng W. and C.S. conceived, designed, and supervised the project. Peng W., X. Li, M.X., Ping W., H.H., H.W., Linlin L., W.D., Y.F., S.H., X. Huo, Y. Zeng, Y. Zhou., Q.Z., J.K., X. Xi., W.N., Z. Shao, J.W., K. Li., G.C., Z. Sun, Feng W. and C.S. contributed samples and clinical data. Peng W., S.L., F.C., Q.H., W.D., Y.F., P. Liu., Z.L., Y.J., F.Z., Fang W., Lei L., K. Li., G.C., X.J. and C.S. contributed WGS data. X.J. coordinated sequencing experiments and analysis. H. Guo contributed GWAS data of controls. L.D., Ping W., M.J., P. Long, M.Q., X.S., Q.J., T.M., S.H., W.L., K. Liu, R.P., Xuedan X., Y.L., H. Gao, L.S., Z.G., X.M. and G.Q. conducted DNA extraction and genotyping experiments. L.D., S.L., M.J., M.Q., X.S., Y.Y., K.W., D.W., X. Hao, S.C., X. Lin, and C.W. performed bioinformatic and statistical analyses. C.W., Peng W., C.S., and X. Hao wrote the first draft. All authors reviewed, revised, and approved the manuscript.

Corresponding authors

Correspondence to Xin Jin, Feng Wang, Chaoyang Sun or Chaolong Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Communications Biology thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Brooke LaFlamme.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary File

Supplementary Data 1

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wu, P., Ding, L., Li, X. et al. Trans-ethnic genome-wide association study of severe COVID-19. Commun Biol 4, 1034 (2021). https://doi.org/10.1038/s42003-021-02549-5

Download citation

Received: 13 May 2021
Accepted: 12 August 2021
Published: 31 August 2021
DOI: https://doi.org/10.1038/s42003-021-02549-5

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Initial whole-genome sequencing and analysis of the host genetic contribution to COVID-19 severity and susceptibility

Integrative genomic analyses identify susceptibility genes underlying COVID-19 hospitalization

Host genetic factors of COVID-19 susceptibility and disease severity in a Thai population

Introduction

Results

Discussion

Methods

Ethics statement

Phenotype definition

GWAS data of COVID-19

GWAS data of population controls

Imputation

Cryptic relatedness and population structure

Association tests and meta-analysis

WGS data and analysis

Clinical measurements

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information

Description of Additional Supplementary File

Supplementary Data 1

Reporting Summary

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links