Initial Study of Human Genetic Contribution to COVID-19 Severity and Susceptibility

The COVID-19 pandemic has accounted for more than five million infections and hundreds of thousand deaths worldwide in the past six months. The patients demonstrate a great diversity in clinical and laboratory manifestations and disease severity. Nonetheless, little is known about the host genetic contribution to the observed inter-individual phenotypic variability. Here, we report the first host genetic study in China by deeply sequencing and analyzing the 332 COVID-19 patients categorized by varying levels of severity from the Shenzhen Third People's Hospital. Based on a total of 22.2 million genetic variants, we conducted both single-variant and gene-based association tests among the five severity groups including asymptomatic, mild, moderate, severe and critical ill patients after the correction of potential confounding factors. The most significant gene loci associated with severity is located in TMEM189-UBE2V1 involved in the IL-1 signaling pathway. The p.Val197Met missense variant that affects the stability of the TMPRSS2 protein displays a decreasing allele frequency among the severe patients compared to the mild and the general population. We also identified that the HLA-A*11:01, B*51:01 and C*14:02 alleles significantly predispose the worst outcome of the patients. This initial study of Chinese patients provides a comprehensive view of the genetic difference among the COVID-19 patient groups and highlighted genes and variants that may help guide targeted efforts in containing the outbreak. Limitations and advantages of the study was also reviewed to guide future international efforts on elucidating the genetic architecture of host-pathogen interaction for COVID-19 and other infectious and complex diseases.


53
It has been more than 100 years since the 1918 influenza outbreak killed at 54 least fifty million people worldwide 1 . Now we are facing another pandemic.

55
Since the late December of 2019, the 2019 novel coronavirus diseases 56 (COVID-19) has spread rapidly throughout the world, resulting in more than five 57 million confirmed cases and hundreds of thousands deaths in less than six 58 months 2,3 . The disease was caused by the infection of a novel enveloped RNA 59 betacoronavirus that has been named severe acute respiratory syndrome 60 coronavirus 2 (SARS-CoV-2), which is the seventh coronavirus species that 61 causes respiratory disease in humans 4,5 . The virus causes serious respiratory 62 illnesses such as pneumonia, lung failure and even death 6 . Until now, there is 63 no specific therapeutics and vaccine available for its control. Continuing 64 epidemiological and molecular biological study to better understand, treat and respiratory failure [8][9][10][11] . Patients with severe disease had more prominent 74 laboratory abnormalities including lymphocytopenia and leukopenia than those 75 with non-severe disease 12,13 . In addition, not all people exposed to SARS-CoV-   Genome-wide association test on array data from the UK Biobank participants 98 with a positive and negative PCR-tests also reveals a few suggestive genes 27 .

99
The COVID-19 host genetics initiative was established to encourage generation,

131
Clinical and laboratory features of the 332 hospitalized COVID-19 patients 132 The 332 recruited patients with laboratory-confirmation of SARS-COV-2 133 infection were being quarantined and treated in the Shenzhen Third Hospital.

134
We extracted and analyzed the clinical symptoms, laboratory assessment,

165
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 11, 2020.     Table S1.

211
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 11, 2020.   CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 11, 2020. . https://doi.org/10.1101/2020.06.09.20126607 doi: medRxiv preprint time duration between the complained disease onset and the first laboratory 258 confirmed PCR-test negative outcome (N=233) ( Figure 1D). Power analysis 259 indicates that given 80% statistical power, we will be able to identify 260 associations between genotypes and phenotypes for variants with minor allele 261 frequency greater than 0.2 and with a relative genetic risk contribution greater 262 than 2 given the current sample size for dichotomous trait and similarly for the 263 quantitative trait ( Figure S12). Principal component analysis of the patients 264 suggests little genetic differentiation (Figure S13-14).

265
We tested all the QC-passed 19.6 million bi-allelic variants for 266 association with each of three traits in a logistic or linear regression model that all the sixty-four laboratory assessments among the patients (Figure 4C).

299
Therefore, the observed signal is not supposed to be confounded by individual 300 variability on blood cell types. There is no strong genetic association with the 301 disease durations ( Figure 3C).

302
. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 11, 2020. . https://doi.org/10.1101/2020.06.09.20126607 doi: medRxiv preprint We further perform optimal SKAT gene-based association test on the 303 functional variants including a total of 99,166 missense and loss of function 304 variants that were predicted to have high or moderate impacts by variant effect 305 predictor among the patients. The NOA1 gene tend to higher mutation burden 306 in the severe group (P= 8.1e-07) ( Figure 3D). This gene encodes the GTPase 307 that functions in the mitochondrion and has been associated with platelet count 308 and leukocyte count 45 . We didn't identify other genes that are genome-wide   between people who are exposed or not exposed to the pathogen. This is     CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 11, 2020. . https://doi.org/10.1101/2020.06.09.20126607 doi: medRxiv preprint ( Figure 6A) even though the inflation was seemingly adequately controlled 395 ( Figure S19). In the gene-based association test, we observed significantly 396 different mutation burdens in the immunoglobulin loci ( Figure 6B). However, 397 this is not replicated when we compared the COVID-19 patients with the 665 398 CNRP individuals (Figure 6C-D). Therefore, we inferred that the association 399 signals between the 1KGP and the COVID-19 patients were probably due to

406
In the single variant association test between the COVID-19 patients and 407 the CNPR who were sequenced using the same experimental protocol and 408 were laboratory PCR tested negative, we identified genome-wide significant

433
We revealed that the disease progression after the SARS-CoV-2 434 infection is a complex event rather than be explained by a monogenic model.

435
The severe and critical patients do not carry causal monogenic variants related  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 11, 2020. . globe, it will be important to identify and study the extreme asymptomatic 487 patients to understand the host factors contributing to a good control of the viral 488 infection.

489
As we and the others are continuing to recruit patients and data in China

496
This work is also an initial start to guide study design regarding the selection of 497 samples, the genetic assay approach, the bioinformatics and the statistical 498 genetic analysis for COVID-19 as well as other infection and complex disease.

499
The publicly available summary statistics will encourage international 500 collaborative efforts to understand the host-pathogen interaction and to contain 501 the COVID-19 outbreak. CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 11, 2020. . https://doi.org/10.1101/2020.06.09.20126607 doi: medRxiv preprint COVID19 in China and the Chinese CDC criteria 6 , the patients were diagnosed 533 as asymptomatic, mild, moderate, severe and critically severe according to the 534 most severe stage they experienced during the disease course. The 535 asymptomatic, mild and the moderate groups of patients do not experience 536 pneumonia. When meeting any one of the following criteria, 1) RR>30 2)

619
We have applied both the rvtest 70 and the SAIGE 71 approaches to carry out is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 11, 2020. .

625
between the 1KGP and the COVID-19 patients as age is not available for the 626 1KGP data set. Independent loci were defined as significant variants clustered 627 in a 1Mbp window. The lead SNP was defined as the SNP in the 1Mbp window 628 that has most significant, i.e., smallest p value. The genomic inflation factor, GC 629 lambda, attenuation ratio, LD score regression intercept and the SNP heritability 630 were estimated using the LD score regression approach 72 . The qqman R 631 package was applied to generate the manhattan and qqplot. We defined 632 genome-wide significance for single variant association test as 5e-8, suggestive 633 significance as 1e-5 and for gene-based association test as 1e-6.  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 11, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 11, 2020.  . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 11, 2020. . https://doi.org/10.1101/2020.06.09.20126607 doi: medRxiv preprint variant genome-wide association test for the sixty-four laboratory assessments at the 924 lead SNP rs6020298. The P-value of the three traits (Severity, Severity score and 925 Disease Duration) in Figure 3 were also displayed. 926 927 928 929 930 931 932 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 11, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 11, 2020. . https://doi.org/10.1101/2020.06.09.20126607 doi: medRxiv preprint