Introduction

Severe acute respiratory syndrome coronavirus 2 (SARs-CoV-2) was first reported in 2019 in Wuhan, China. Since then, the coronavirus disease 2019 (COVID-19) resulting from SARs-CoV-2 infection has become a global pandemic. Up to the first week of June 2021, more than 173 million individuals worldwide were confirmed cases with nearly 4 million deaths [1]. Thailand had the first outside-China confirmed case in January 2020 [2, 3] and is currently (June-July 2021) dealing with another outbreak of COVID-19 in the country [4]. COVID-19 disease had a wide range of manifestations from asymptomatic and mild to very severe respiratory failure leading to death. Several clinical factors were identified as risk factors for severe COVID-19 symptoms, including ageing, male gender, comorbidity such as hypertension, diabetes, obesity, and other cardiovascular diseases [5,6,7,8,9,10,11,12].

Undoubtedly, host genetic factors also play a role in SARs-CoV-2 pathophysiology, influencing an individual’s susceptibility to infection, disease severity, and disease progression, as shown by several studies with genome-wide association analysis (GWAS) [13,14,15,16,17,18,19,20,21,22,23,24]. The early works suggested the role of human leukocyte antigen (HLA) and renin-angiotensin pathway genes (ACE1 and ACE2) in the pathophysiology of COVID-19 disease [13,14,15]. A large GWAS study in Italian and Spanish populations with COVID-19 patients with respiratory failure [17] reported associations on chromosome 3p21.31 (rs11385942) corresponded with a cluster of genes including SLC6A20, LZtFL1, CCR9, FYCO1, and XCR1 and chromosome 9q34.2 (rs657152) coincided with the ABO-blood group system. Another GWAS study in the UK [20], focusing on critically ill patients with COVID-19, showed associations on chromosome 12q24.13 (rs10735079), chromosome 19p13.2 (rs74956615), chromosome 19p13.3 (rs2109069), and chromosome 21q22.1 (rs2236757), which were corresponded to antiviral restriction enzymes genes cluster (OAS1, OAS2, and OAS3), tyrosine kinase 2 (TYK2), dipeptidyl peptidase 9 (DPP9) and the interferon receptor gene (IFNAR2), respectively [20]. In Europeans, chromosome 21 was also explored as an in-dept genetic analysis showing that five single nucleotide polymorphisms (SNPs) within TMPRSS2 and MX1 were correlated with severe COVID-19 [19]. Recently, a global GWAS study including data of 46 studies from 19 countries known as COVID-19 Host Genetic Initiative (COVID-19 HGI) [24] showed several genome-wide significant loci which were associated with SARs-CoV-2 infection, including chromosome 3 (RPL24), 5 (DNAH5), 9 (ABO), and 19 (PLEKHA4) and COVID-19 disease severity including chromosome 1 (THBS3), 2 (SCN1A), 3 (LZTFL1), 6 (FOXP4), 8 (TMEM65), 12 (OAS1), 17 (KANSL1), 19 (DPP9 and RAVER1), and 21 (IFNAR2).

Data regarding host genetic factors in COVID-19 infection and disease progression in Asia are limited. However, one report from a Chinese group suggested the association of chromosome 21q22.11 (IFNAR2 and IL10RB) to COVID-19 susceptibility [18]. As the genetic landscape differs among various ethnicities, we aimed to explore the host genetic factors associated with COVID-19 disease susceptivity and disease severity, specifically in Thai. The obtained information may benefit from identifying risk groups that need special care or guiding the vaccination programs based on the future genetic risks of COVID-19 infection.

Materials and methods

This study used biobank samples and clinical data of the Faculty of Medicine, Chulalongkorn University and King Chulalongkorn Memorial Hospital, Bangkok, Thailand, from the project entitled: Collection and management of COVID-19-related clinical data and biological specimens for researches (COA No. 464/2020). The study was also approved by the Institutional Review Board of the Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand (COA No. 691/2021).

Study participants

The total of 248 participants (cases/controls = 212/36; males/females = 91/157) was recruited into the study. Cases (n = 212) were COVID-19 patients diagnosed with polymerase-chain-reaction (PCR) test from nasopharyngeal swabs and were admitted to King Chulalongkorn Memorial Hospital from February 2020 to March 2021. Controls were exposed individuals with negative PCR tests for SARs-CoV-2 viral infection (n = 36).

The severity of COVID-19 disease was assessed by the attending medical staffs of the Thai Red Cross Emerging Infectious Diseases Clinical Centre, King Chulalongkorn Memorial Hospital, Bangkok, Thailand, using the following criteria and was classified into four conditions: (1) mild: asymptomatic patients (2) moderate: symptomatic patients without pneumonia, comorbidity and risk factors for severe disease; (3) severe: symptomatic patients with mild pneumonia, comorbidity or risk factors for severe disease (any of these factors including age >60 years, obesity, chronic obstructive pulmonary disease, chronic kidney disease, cardiovascular disease, cerebrovascular disease, uncontrolled diabetes mellitus, liver cirrhosis, immunocompromised, or lymphocyte count <1000 cell/m3); and (4) critical: symptomatic patients with pneumonia together with resting O2 saturation <96% or exercise-induced hypoxemia.

Genomic DNA preparation and bioanalysis

Genomic DNA was extracted from 200 μL of peripheral ethylenediaminetetraacetic acid-anticoagulated blood using QIAamp® DNA Blood Mini Kit (Qiagen, Germany) and was adjusted to the concentration of 15 ng/µL with the total volume of 80 uL. DNA purity was evaluated by OD260/OD280 and OD260/OD230 ratios (NanoDropTM One/One Microvolume UV-Vis Spectrophotometers, Thermo Fisher Scientific, Wilmington, DE, USA). The acceptance criteria of DNA purity were OD260/OD280 ratio of 1.8–2.0 and OD260/OD230 ratio >1.5. DNA degradation was assessed on a 1% agarose gel using an appropriate size standard control. DNA concentration was also quantitated using Qubit™ dsDNA HS (High Sensitivity) Assay Kit (Thermo Fisher Scientific) before bioanalysis with AxiomTM Human Genotyping SARs-COV-2 Array (Thermo Fisher Scientific), which includes >800,000 SNPs of COVID-19 susceptibility, severity and immune response variants.

Genotype calling, quality control, and imputation

Genotype calling from intensity data file was performed with Axiom Analysis Suite (AxAS) version 5.1.1 software [25] using default parameters yielding 847,384 SNPs on 248 participants. Quality control (QC) was carried out following Ricopili pipeline [26] using criteria as follow: - SNPs were removed if call rate <0.98, call rate difference between cases and controls >0.02 or Hardy–Weinberg equilibrium p value <10−6 in controls or <10−10 in cases. Samples were also removed if call rate <0.98, |inbreeding coefficient | >0.2, or sex discordant were detected.

There were 558,132 SNPs left post-QC (cases/controls = 207/33; males/females = 88/152). Principal component analysis (PCA, Supplementary Fig. 1) for remaining QCed cases and controls was conducted using the Ricopili pipeline [26] with the default parameters to assess relatedness between samples and population stratification. In brief, SNPs were pruned to minimize linkage disequilibrium (LD) between SNPs with the criteria of R2 < 0.2, and the number of SNPs in the window for pruning was 200 until there were less than 100,000 SNPs. The resulting pruned SNPs were used to assess recent common ancestry and population stratification with the threshold of identity by descent (IBD) equals 0.2. After this step, there were 240 samples left for further analysis. With our inspection of the PC1 vs PC2 (Supplementary Fig. 1A), we remove additional two samples, leaving the final 222 samples for analyses (Supplementary Fig. 1A–C, cases/controls = 191/31). Genotype imputation was done for chromosomes 1–22 using Michigan Imputation Server [27]. The reference panel used was Genome Asia Pilot (GAsP) with reference genome version GRCh37/hg19.

Association analysis

Scalable and Accurate Implementation of GEneralized mixed model (SAIGE, https://github.com/weizhouUMICH/SAIGE) [28], a software-implemented for efficiently controlling for unbalanced case-control ratios, sample relatedness, and population stratification for GWAS, was applied for association analysis. SAIGE uses saddle point approximation to control case-control imbalance in a logistic mixed-effects model and reports score test results. Three models were carried out with the same covariates, including age, age2, sex, age*sex, 20 PCs, and phases of the COVID-19 outbreak in Thailand. Samples without age information were removed.

Figure 1 provides information on three GWAS models analyzed in this study. In Model 1: Susceptibility (total n = 222; cases/controls = 191/31), cases were patients of any severity. Controls were individuals who were exposed and tested negative for COVID-19. Model 2: Severity I compared 66 patients with moderate, severe, or critical conditions versus 125 controls who were patients with mild symptoms. In Model 3: Severity II, severity levels were analyzed as quantitative traits, coded as 0 for exposed individuals with negative PCR results (n = 31), and 1 to 4 for patients with mild (n = 125), moderate (n = 30), severe (n = 25), and critical (n = 11), respectively.

Fig. 1
figure 1

Study flow chart. GWAS genome-wide association study, PCR polymerase-chain-reaction

Linkage disequilibrium pattern

LD blocks using LDBlockShow [29] were obtained, and genes residing in the blocks which contained SNPs with statistical significance were acquired using the University of California, Santa Cruz (UCSC) Genome Browser [30].

Results

Baseline characteristics

Table 1 showed baseline characteristics of COVID-19 patients and individuals with COVID-19 exposure but negative PCR test results recruited in this study.

Table 1 Characteristics of participants

Genome-wide association analysis

No SNPs passed the genome-wide significance threshold in Model 1: Susceptibility. However, at a threshold of a p value <1 × 10−5, loci on chromosome 5q32 (position 148710242–148768047, p value 6.8745 × 10−6 – 6.8755 × 10−6; odds ratio 0.02, Fig. 2, Table 2, and Supplementary Table 1) and chromosome 9q21.13 (position 77748151–77762449, p value 2.3197 × 10−6 – 9.5083 × 10−6; odds ratio 0.11–0.13, Fig. 2, Table 2, and Supplementary Table 2) were suggested to be associated with COVID-19 disease susceptibility. A quantile–quantile (Q–Q) plot showed significant associations in the tail of the distribution with deflation (λ = 0.805; Supplementary Fig. 2A). The LD block containing SNPs with statistical significance (significant block) contained four genes on chromosome 5 (AFAP1L1, GRPEL2, PCYOX1L, and IL17B), and one gene on chromosome 9 (OSTF1), (Table 2 and Supplementary Fig. 2B, C). We further extended a segment of 200 kilobase pairs (kbp) in both directions from the boundary of the significant block. ABLIM3, BX640700, CSNK1A1, L26953, and MIR143 were also identified on chromosome 5 and C9orf41, BC043649, and NMRK1 were identified on chromosome 9 (Table 2).

Fig. 2
figure 2

Manhattan plot of Model 1: susceptibility. Observed −log10 p values (y-axis) are shown for all SNPs on each autosomal chromosome (x-axis). The blue line indicates a suggestive line of p value <1 × 10−5

Table 2 Summary of the loci and genes of the three models

Similar to Model 1, no SNPs passed the genome-wide significance threshold in Model 2: Severity I. Nevertheless, at a threshold of a p value <1 × 10−5, one locus on chromosome 12q22 (position 93456633–93446082) was plausibly associated with COVID-19 disease severity (p value 1.3490 × 10−6 – 4.3527 × 10−6; odds ratio 0.28–0.31; Fig. 3; Table 2; and Supplementary Table 3). A Q–Q plot was slightly deflated (λ = 0.997; Supplementary Fig. 3A). The associated region contains the LOC643339 gene (Table 2 and Supplementary Fig. 3B). No additional gene was found in the significant block. EEA1 and LINC02412 were found in a ±200 kbp segment from the significant block (Table 2).

Fig. 3
figure 3

Manhattan plot of Model 2: severity I. Observed −log10 p values (y-axis) are shown for all SNPs on each autosomal chromosome (x-axis). The blue line indicates a suggestive line of p value <1 × 10−5

For Model 3: Severity II, no SNPs passed the genome-wide significance threshold; however, at a threshold of a p value <5 × 10−5, one locus on chromosome 3p24.3 (position 21141028–21235640) was identified to be associated with COVID-19 disease severity level (p value 5.0649 × 10−7 – 2.5343 × 10−6; Fig. 4; Table 2; and Supplementary Table 4). The Q–Q plot was slightly deflated in the middle but inflated at the tail of the distribution (λ = 0.931; Supplementary Fig. 4A). There was no gene in the significant region or the significant block (Table 2 and Supplementary Fig. 4B). However, in an extended segment (±200 kbp) from the boundary of the significant block, VENTXP7 and ZNF385D were identified (Table 2).

Fig. 4
figure 4

Manhattan plot of Model 3: severity II. Observed −log10 p values (y-axis) are shown for all SNPs on each autosomal chromosome (x-axis). The blue line indicates a suggestive line of p value <1 × 10−5

Discussion

Different ethnicity has different genetic composition. Therefore, studying various human populations could give better biological insights. We performed a GWAS in a Thai population to identify genetic loci associated with COVID-19 susceptibility and severity. As a result, two suggestive loci on chromosomes 5q32 and 9q21.13 and two suggestive loci on chromosomes 12q22 and 3p24.3 were associated with COVID-19 disease susceptibility and disease severity, respectively.

Three GWAS studies have shown associations of loci on chromosome 12 to the severity of COVID-19 disease [20, 24, 31]. Nelson et al. [31] reported three loci associated with plasma angiotensin-2 concentration in men. One was on chromosome 12 around the HNF1α gene, which encodes a transcription factor in regulating ACE2 expression. Results from the Genetics Of Mortality in Critical Care (GenOMICC) study in 2244 critically ill COVID-19 patients in intensive care units across the UK identified the associations on chromosomes 12q24.13, 19p13.2, 19p13.3, and 21q22.1, which were linked to two biological mechanisms including innate immunity (IFNAR2 and OAS) and host-driven inflammatory lung injury (DPP9, TYK2, and CCR2) [20]. Likewise, the most recent GWAS study from the COVID-19 HGI group also revealed the association of chromosome 12 on COVID-19 severity (OAS1) [24].

Here, we have suggested two plausible genome-wide significant associations on chromosomes 5q32 and 9q21.13 in our disease susceptibility model. Since the controls were all collected from the same hospital and same ethnics as cases, stratification bias is unlikely. Allele frequencies in Thai were 0.03 and 0.15 for chromosome 5 and 0.11 and 0.31–0.32 for chromosome 9, for cases and controls, respectively (Supplementary Tables 1, 2). The association signal on chromosome 5q32 coincided with IL17B. The protein encoded by IL17B is a T cell-derived cytokine known as interleukin-17B (IL-17B). Immunohistochemical analysis of several tissues indicated that IL-17B is primarily localized to chondrocytes and neurons [32, 33]. Furthermore, IL-17B was reported to play a role as a proinflammatory inducer in inflammatory disease, stimulating the release of tumor necrosis factor-α (TNF-α) and interleukin-1β (IL-1β) from a monocytic cell line resulting in neutrophil infiltration [32, 33]. Evidence suggests that IL-17B may participate in the host defense mechanism of infection, but data were still conflicting [33,34,35]. IL-17B concentrations were increased in patients with community-acquired pneumonia, and the mechanism was linked to the evidence that IL-17B induces gene and protein expression of interleukin-8 in bronchial epithelial cells [34]. However, a study of colitis in mice infected with C. rodentium, a murine-specific model for human Gram-negative E. coli infection, reported IL-17B as a protective factor for infection [35]. Regarding COVID-19, cytokines are fundamental to the pathophysiology of SARs-CoV-2 viral infection. Some cytokines, including TNF-α and IL-1β, appear detrimental, particularly in the cytokine storm [36] and changes in the hemopoietic system, including neutrophilia and lymphopenia during infection, are significant prognostic factors [36]. Hence, further studies for the role of IL17B as a host factor related to the susceptibility of SARs-CoV-2 viral infection is warranted.

Our severity models have further suggested two plausible loci on chromosomes 12q22 and 3p24.3 linked to disease severity. The UK GWAS identified a gene cluster that encodes antiviral restriction enzyme activator on chromosome 12q24.13 (OAS1, OAS2, and OAS3) associated with severe COVID-19 disease [20]. Additionally, loci on chromosome 3 were observed to be associated with disease susceptibility (rs11919389: RPL24) and disease severity (rs10490770: LZTFL1) in the COVID-19 HGI report [24]. From our results, the extended LD block on chromosomes 12q22 contains EEA1, LOC643339, and LINC02412. Early endosomal antigen 1, encoded by EEA1, is a protein found in cytosol, endosome, and plasma membrane of cells in various organs, including the lungs [37]. Recently, a study reported the microscopic changes in the small airways and lung parenchyma of smokers and patients with COPD, showing an increased expression of SARs-CoV-2 receptor ACE2 and proteins involved in viral entry, including EEA1 [38]. Coronavirus is an enveloped virus that requires cellular ACE2 receptor binding and membrane fusion to enter and eject their RNA into the host cell. The viruses attack the host endocytosis pathway to enter via endosome, proceed to lysosomes, and fuse the viral and lysosomal membrane [39,40,41]. Our finding of EEA1 residing in an associated locus linked to COVID-19 disease severity suggests an overactive cellular response that may facilitate viral entry and processing contributing to COVID-19 severity. Additionally, LOC643339 encodes a long non-coding RNA (lncRNA) with little known functions. However, some lncRNAs have an essential role in pathogenic infection [42]. Therefore, further studies to determine the role of LOC643339 in regulating SARs-CoV-2 propagation seem justified.

The limitation of this study was the relatively small sample size. However, we believe that the mitigation of the widespread devastation caused by the COVID-19 pandemic requires scientific contributions from every corner of the world. The number of exposed individuals with negative PCR tests was smaller than the cases. However, as the use of the general population as a control group carried the potential for misclassification bias with an unknown extent of developing COVID-19 disease [43], we decided not to include the general population in our control group. The small sample size may lessen the likelihood that the association found is real (decreasing the chance of getting a true positive without increasing a false positive), but it could still be true [44]. Additionally, the reported effect estimates (e.g., odds ratio) can be inflated due to the increased sampling variability from a small sample size. Further studies with more samples are needed to verify the findings. In addition, we are participating in an international collaborative effort to uncover the host factors associated with COVID-19. Finally, our report will provide more details of the Thai population to complement the global initiative.

In conclusion, our GWAS study in Thai COVID-19 patients suggests plausible disease susceptibility loci on chromosome 5q32 containing IL17B and chromosome 9q21.13. In addition, the disease severity was suggestive of being linked to loci on chromosomes 12q22 containing LOC643339 and EEA1 and 3p24.3.