Genetics of symptom remission in outpatients with COVID-19

We conducted a genome-wide association study of time to remission of COVID-19 symptoms in 1723 outpatients with at least one risk factor for disease severity from the COLCORONA clinical trial. We found a significant association at 5p13.3 (rs1173773; P = 4.94 × 10–8) near the natriuretic peptide receptor 3 gene (NPR3). By day 15 of the study, 44%, 54% and 59% of participants with 0, 1, or 2 copies of the effect allele respectively, had symptom remission. In 851 participants not treated with colchicine (placebo), there was a significant association at 9q33.1 (rs62575331; P = 2.95 × 10–8) in interaction with colchicine (P = 1.19 × 10–5) without impact on risk of hospitalisations, highlighting a possibly shared mechanistic pathway. By day 15 of the study, 46%, 62% and 64% of those with 0, 1, or 2 copies of the effect allele respectively, had symptom remission. The findings need to be replicated and could contribute to the biological understanding of COVID-19 symptom remission.


Definition of a set of credible risk variants
We extracted results from the significant regions associated with time to COVID-19 symptoms remission in our GWAS at region 9q33.1, 500kb centered on the top SNP: rs62575331 (chr9:115147521-116147521) and at region 5p13.3 500kb centered on the top SNP: rs1173773 (chr5:32250877-33250877). For the statistical prioritization in the GWAS for time to remission in placebo participants, we considered both 9q33.1 (Locus 1) and 5p13.3 (Locus 2) using the PAINTOR software. A single locus was used for statistical prioritization in the GWAS for time to remission in both placebo and colchicine arms at 5p13.3 (Locus 3). We defined credible candidate variants (CCVs) as those located within 500kb of the most significant SNP in each locus and with P values within two orders of magnitude of this variant. We selected 6, 16 and 31 CCVs for locus 1, locus 2 and locus 3 respectively (Supplemental Table 2).

Conditional analysis
We used the software GCTA-CoJo 2 to conduct conditional analysis at Locus 1 and Locus 3 (we considered Locus 2 and 3 as equivalent). We performed stepwise model selection on the candidates SNPs and a single independent signal was found in each locus. For Locus 1, rs62575331 was tagged as associated with the outcome with a P value from the joint analysis of all the selected SNPs of 5.01 × 10 -8 . For Locus 3, rs1173773 was associated with the outcome with a P value from the joint analysis of all the selected SNPs of 6.37 × 10 -8 .

Statistical prioritization
We converted GrCh38 coordinates to the previous build referential hg19 and mapped rsID from dbSNP 3 using the R packages liftover 4 and myvariant 5 . Using PAINTOR (version 3.0) 6 , we first performed a multi-locus prioritization with Locus 1 and Locus 2 to identify the most credible causal variants. We used the compiled library of functional annotations provided by PAINTOR's authors to run the PAINTOR model including a large collection of annotations coming from Roadmap/ENCODE data as well as other regulatory and genic annotations. We also did a prioritization using Locus 3 alone. Only the experiment with Locus 3 succeeded to identify a variant with a significant posterior probability of being causal: rs1173773 (pp = 0.86).
Quantification of the enrichment of causal variants within functional classes by PAINTOR estimated the baseline annotation at 3.39, establishing the baseline prior probability for any SNP in the fine-mapping dataset to be causal as 0.033. The most enriched annotation was a chromatin state predicted to have weak transcription in aorta according to data from Bernstein et al. 7 Table 3.

Functional prioritization
We used the combination of in silico functional annotations resources from RegulomeDB 9 and DSNetwork 10 for functional prioritization. According to the scoring scheme and the probability of being a regulatory variant 11 estimated by RegulomeDB, the best candidates are the variants rs72764716 (rank = 6, prob = 0.69) and rs1173733 (rank = 5, prob = 1.00) for the Locus 1 and 3 respectively (Supplemental Table 3). Regulatory variants were evaluated for their potential deleteriousness using in silico prediction tools provided by DSNetwork. Based on DSNetwork ranking, the top marker is rs62575331 and rs7730564 for Locus 1 (9q33.1) and Locus 3 (5p13.3) respectively (Supplemental Table 3).

Selection of the best candidate causal variants
Based on the statistical and functional prioritization results, we selected the best candidate variants for each of Locus 1 (9q33.1) and Locus 3 (5p13.   17 This association is led by the marker rs7469817 whose C-allele is associated with an increase in standing height (P = 2.6 × 10 -14 ), risk of varicose veins (P = 2.6 × 10 -10 ) and forced expiratory volume in 1-second (FEV1) (P = 2.2 × 10 -9 ) among others. 18 Similarly to our candidate variant, this intergenic variant rs7469817 has been found associated with the expression of the lncRNA AL355601. The intronic variant rs1173773 is located in the NPR3 gene. Using gene expression correlations data from Ensembl, 19 we found that this variant was associated with the expression of several genes with a P < 0.01 among which were MTMR12, SUB1, GOLPH3, TARS and NPR3 genes. Most of those genes (4/5) were found associated with anthropomorphic traits such as height and lung functions. 20 In the DIABHYCAR trial (3,126 French Noninsulin-dependent Diabetes, Hypertension, Microalbuminuria or Proteinuria, Cardiovascular Events, and Ramipril), rs1173773 was found to be an independent predictive factor for systolic blood pressure with evidence of modulation by BMI. 21 Based on functional annotations, we selected two putative target genes: PAPPA (Locus 1), NPR3 (Locus 3) for network analysis with GeneMANIA which highlighted connectivity and support by several common associations in GWAS. Both signals appear to involve similar or closely related phenotypes including height 6,18 , systolic blood pressure, 18 and FEV1. 18 Using gene networks between the target genes and genes targeted by the colchicine: (TUBB, NLRP3 and NFKB1), we observed that PAPPA was directly connected to the NFKB1 gene by a co-expression relationship but also through the C1R and IGFBP4 genes. The PAPPA gene was also connected to the NLRP3 gene through the C1R gene.

Supplementary Figure 2. Manhattan plot for the GWAS of hospitalisation for COVID-19.
Using a logistic regression with 1855 subjects (58 events) from the colchicine and placebo arms of the COLCORONA study, controlling for study arm, sex, age, and 10 principal components for genetic ancestry, with 6,393,360 genetic variants of minor allele frequency ≥ 5%.

Supplementary Figure 3. Quantile-quantile (QQ) plots
a. QQ plot for the GWAS of time to remission of COVID-19 symptoms in 1723 subjects from the colchicine and placebo arms of the COLCORONA study (λ = 1.03).