Novel susceptibility loci for A(H7N9) infection identified by next generation sequencing and functional analysis

The A(H7N9) virus strain that emerged in 2013 was associated with a high fatality rate and may become a long-term threat to public health. A(H7N9) disease incidence is disproportionate to viral exposure, suggesting that host genetic factors may significantly influence susceptibility to A(H7N9) infection. Human genome variation in conferring risk for A(H7N9) infection in Chinese populations was identified by a two-stage investigation involving 121 A(H7N9) patients and 187 healthy controls using next generation sequencing followed by functional analysis. As a result, a low frequency variant (rs189256251; P = 0.0303, OR = 3.45, 95% CI 1.05–11.35, chi-square test) and three HLA alleles (DQB1*06:01, DQA1*05:05 and C*12:02) were identified in A(H7N9) infected volunteers. In an A549 cell line carrying the rs189256251 variant CT genotype, A(H7N9) infection incidence was elevated 6.665-fold over control cells carrying the CC genotype. Serum levels of interferon alpha were significantly lower in patients with the CT genotype compared to the CC genotype (P = 0.01). The study findings of genetic predisposition to A(H7N9) in the Chinese population may be valuable in systematic investigations of A(H7N9) disease etiology.

Scientific RepoRtS | (2020) 10:11768 | https://doi.org/10.1038/s41598-020-68675-y www.nature.com/scientificreports/ to these pathogens 7 . In 2011 the World Health Organization (WHO) highlighted the importance of host genetic predisposition to infection with influenza virus 8 and several host genetic variants have been reported to contribute to disease severity and the immune response following influenza virus infection 9,10 . Because A(H7N9) disease is contracted at rates disproportionate with virus load under circumstances of exposure, it has been suggested that host genetic effects have a major influence on susceptibility to A(H7N9) infection. Studies have investigated single nucleotide polymorphisms (SNPs) as risk factors for predisposition to infection with A(H7N9) virus in influenza patients [11][12][13][14] . Other investigations attempted to identify differential gene expression in case-controlled studies [15][16][17] . For studies involving small numbers of A(H7N9) patients and a possibility that a rare variant might have a moderate to large effect on susceptibility, a next generation sequencing (NGS) strategy might be superior to a genome wide association study (GWAS). Importantly, a stringent GWAS significance threshold and a modest genetic effect size might mask real associations. Therefore, comprehensive analysis using a high-throughput method is required to define additional genetic causes and risk variants for A(H7N9) infection in a relatively large cohort.
Here, we describe a two-stage study involving NGS of Chinese volunteers including 112 A(H7N9) patients and 167 healthy controls followed by functional validation of the effects of genetic variants. We also looked at which human leukocyte antigen (HLA) alleles closely associated with A(H7N9) infection, using HLA target sequencing in a cohort that included 117 A(H7N9) patients and 131 healthy controls. Altogether, our findings may aid systematic understanding of the role of host genetics in A(H7N9) infection.

Results
Study design. We conducted a two-stage study and the design is shown in Fig. 1. In the discovery stage, whole-exome sequencing (WES) method was used to identify candidate variants in 17 A(H7N9) patients and 100 Chinese Han populations in the 1,000 Genomes Project. In the verification stage, the risk variants were validated in 24 A(H7N9) patients (17 were same with the discovery stage and together with 7 new patients collecting during year 2015-2016) and 103 healthy controls. Then, one chosen risk variant was further replicated in a population including 112 A(H7N9) patients (included the 24 cases in the previous stage) and 167 healthy controls (included the 103 healthy controls in previous stage). HLA gene regions were chosen to do target sequencing in a cohort that included 117 A(H7N9) patients and 131 healthy controls. Clinical and demographic information on the A(H7N9) patients and controls are presented in Supplementary Table S1 online.
Identification of SNPs. Genomic DNA was collected from 17 A(H7N9) patients. An average of ~ 110.5 million purity-filtered reads were generated for each of the 17 samples. 93% of the target bases in each individual were covered by at least 10 independent sequence reads, with an average sequence depth of 76.2× in each sample (see Supplementary Table S2 online). These data were compared with data from 100 Southern Han Chinese individuals in the 1,000 Genomes phase 1 release. A total of 29,484 markers from 17 patients and 100 control subjects passed the quality filter. After genotyping pruning (removing minor allele frequency (MAF) > 0.05 variants) a total of 20,816 variants remained for association analysis. We used two approaches to select a subset of SNPs for further analysis. In one approach, we identified novel SNPs (not reported in the 1,000 Genomes data-  Verification of risk associated SNP rs189256251. The 140 candidate SNPs described above plus 6 previously reported SNPs (see Supplementary Table S3 online) were verified in analysis of 24 A(H7N9) patients and 103 healthy controls. Primers used in verification are shown in Supplementary Table S4 online. After quality control, 120 candidate SNPs and 6 known SNPs from 22 A(H7N9) patients and 103 healthy controls were tested by chi-square and 17 SNPs at 11 loci were identified as risk markers with an unadjusted P value below 0.05 (Table 1). Following Bonferroni correction only rs189256251 and rs116914994 showed significant association with A(H7N9) infection (P = 0.00803, P = 0.0338, respectively). Significant association of missense mutation rs189256251 in the UBX Domain Protein 11 (UBXN11) gene was still evident following logistic regression analysis (P = 0.00089, OR = 24.92, 95% CI 3.742-165.9) but none of the 6 known SNPs showed an association with A(H7N9) infection at this stage of analysis. Further validation of SNP rs189256251 in 112 A(H7N9) patients and 167 healthy controls continued to indicate significant association (P = 0.0303, OR = 3.45, 95% CI 1.05-11.35, chi-square test). All patients or controls with the rs189256251 (C>T) mutation were detected again by Sanger sequencing and all results showed the mutation occurring in both alleles.
Pathogenic potential of rs189256251. Ingenuity pathway analysis (IPA) showed that UBXN11 interacts with EIF4E and RNF213, homologs of which (EIF4B and RNF5) were previously reported to be associated with influenza virus infection through modulation of related signaling pathways 18,19 . The Genecard database shows UBXN11 encoding a member of the ubiquitin binding factor X (UBX) family and having a divergent C-terminal UBX domain. The rs189256251 (a missense mutation NP_892120.2:p.Arg400His) SNP introduces a striking change in the 3-D structure of the C-terminal active domain of UBXN11 (see Supplementary Fig. S1 online). In addition, it has been reported that another member of the UBX family, UBXN1, plays an important role during influenza virus infection by regulating the host antiviral immune response 20 . We therefore performed further functional verification studies on SNP rs189256251.

Rs189256251 CT genotype related to infection. The A549 OE and OE-NC cell lines harbor
UBXN11 with CT genotype and empty lentivirus vectors, respectively. The transcription level of UBXN11 (CT) in OE cells was 5.565 fold increased compared to OE-NC cells. As shown in Fig. 2A and Supplementary  Table S5  Rs189256251 CT genotype associated with reduced serum interferon alpha. Serum was collected from five patients with the CT genotype and twenty-one patients with the CC genotype. Cytokine levels  Table S6 online). Significantly decreased levels of interferon alpha (type I interferon, P = 0.01) and increased levels of interferon gamma (type III interferon, P = 0.043) and IL-28B (type II interferon, P = 0.034) were found in patients with the CT genotype compared to those of genotype CC (Fig. 3).
Classical HLA types linked to A(H7N9) susceptibility. Six classical HLA loci were target-sequenced in one cohort. The primers used are shown in Supplementary Table S4 online. The overall average sequencing These three HLA alleles also showed significant differences compared with 10,689 previously published control subjects with effects in the same direction (Table 2).

Discussion
Multiple studies have demonstrated that host susceptibility to influenza virus is determined by host genetics 21 .
Most studies validated previously reported variants and only two were designed to search for new gene variants associated with disease susceptibility. Chen et al. reported functional variants in Galectin 1 affecting susceptibility to influenza A(H7N9) using a GWAS approach 22 . However, given the small number of A(H7N9) cases involved in the study, GWAS likely not the optimal approach 21 . A WES approach yielded 21 genes related to A(H7N9) infection 13 , but here again the sample size was relatively small. In this study, we performed a two-stage study of host genetic predisposition using next generation sequencing based techniques to analyze a Chinese population that included 121 laboratory-confirmed A(H7N9) patients. We identified one low frequency SNP and three HLA alleles showing significant association with A(H7N9) infection. For all 112 patients, the allele frequency of the T allele of SNP rs18925625 was 4.02% (9/224), significantly higher than 0.5% (1/199) in the Southern Han Chinese population in the 1,000 Genomes phase 1 release, and significantly higher than 0.77% (67/8,622) in the East Asian population in the ExAC database. This was followed by functional verification of the SNP in an in vitro cell infection model. These studies showed that overexpression of UBXN11 (CT) rendered cells significantly more susceptible to A(H7N9) virus infection. The cells were also more susceptible to infection by A(H9N2), A(H5N6) and pandemic H1N1 2009 viruses. We also found reduced serum levels of interferon alpha in patients carrying the CT genotype compared to CC genotype patients. It is well known that interferon alpha is linked to innate immunity, which is critical for eliminating virus soon after  (2) an interval between disease onset and sample collection not exceeding 12 days. IFN-γ levels were determined using the cytokine bead array (CBA) and flow cytometry (FACSAria BD Biosciences, USA). IFN-α and IL-28B were determined using sandwich commercial enzymelinked immunosorbent assay kits (LifeSpan, USA). Significantly decreased serum levels of interferon alpha (type I interferon, P = 0.01) and increased levels of interferon gamma (type III interferon, P = 0.043) and IL-28B (type II interferon, P = 0.034) were associated with the CT genotype of rs189256251.  20 . The HLA locus is involved in regulation of viral immunity. A previous study reported preferential binding of HLA alleles to conserved regions of viral proteomes and suggested that this preference may provide improved clearance of infection in different viral infections 24 . Other studies have suggested that HLA-DR levels in CD14 + cells may be a biomarker predicting H7N9 disease progression 25 and reported divergent differentiation pathways of CD38 + HLA-DR + CD8 + T cells during fatal H7N9 disease 26 . These results support the argument that the HLA alleles identified in the current study may be useful markers for assessing risk of A(H7N9) infection.
There were also some limitations of this study. First, compared with the sequencing analysis of other diseases, our sample size is not large. But considering the overall number of A(H7N9) disease our sample is considered sufficient. Cytokines were measured in serum from only five patients with the CT genotype and 21 patients with the CC genotype. Future studies should explore in larger study populations that have a bigger sample size of cases. In addition, HLA gene alleles are highly polymorphic, and some alleles have low frequencies.
Testing on a small sample size will cause some errors, which also needs to be verified on a larger sample size.
In summary, we identified several genetic variants associated with A(H7N9) infection. The two-stage study design improved confidence in the results from a relatively small cohort, beneficial when the disease frequency is low. All of the identified variants are linked to antiviral immunity, highlighting the importance of innate immunity in genetic predisposition to A(H7N9) infection. Fine mapping and functional studies of the identified loci may provide further insights into the etiology of A(H7N9) disease.

Materials and methods
Study subjects. All subjects were recruited and anonymized by staff of the Shanghai Municipal Center for Disease Control and Prevention (SCDC) or the Chinese National Influenza Center (CNIC) in the course of an epidemiological survey. A total of 121 A(H7N9) patients and 187 healthy controls were recruited. A(H7N9) infection was in each case confirmed by real-time reverse-transcription polymerase chain reaction and/or virus culture. Heavily exposed poultry workers or close contacts were recruited as healthy controls. Venous blood samples and nasal swabs were collected from subjects. Genomic DNA was extracted from venous blood using a Qiagen Blood Kit (Qiagen, Germany) according to the manufacturer's instructions.
All procedures performed in studies involving human participants were in accordance with the ethical standards of Bio-X Institutes of Shanghai Jiao Tong University and National Institute for Viral Disease Control and Prevention (approval number IVDC2014-020) and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Informed consent was obtained from all individual participants included in the study.
WES and data quality control. 17 A(H7N9) patients' DNA exomes were 'captured' using Illumina TruSeq Exome enrichment technology (Illumina, USA). Libraries were indexed, pooled and sequenced on an Illumina HiSeq2000 platform (paired-end, 100-bp reads, 5 or 6 libraries per lane). Raw sequencing reads were aligned to the reference genome hg19 using the Burrows-Wheeler Aligner (BWA, version 0.7.12). Duplicate reads and reads mapping to more than one location were excluded. The Genome Analysis Toolkit (GATK, version 3.5) was used for SNP calling. All SNPs were filtered using the GATK variant filtration module with a hard filter setting for initial filtering. Variants were annotated as a stop gain, stop loss, synonymous, nonsynonymous or splicing mutation using ANNOVAR 27 .
A uniform quality control protocol was used to filter the samples and SNPs 28 . Samples that failed to reach a genotype call rate of 95% and those with sex discrepancies were excluded. SNPs were excluded based on (i) not mapping on an autosomal chromosome, (ii) displaying a low call rate (< 90%) in all subjects and (iii) violating Hardy-Weinberg equilibrium (HWE) (P ≤ 0.00001) in the controls. We used two methods to select SNP sites for subsequent verification. One method was to identify novel, potentially harmful mutations that are not reported in the 1,000 Genomes database (novel SNPs) but were detected in at least 80% of patients. The other method was to compare allele frequencies between cases and controls and generate a P-value using the Pearson chi-squared test. Due to the relatively small sample size in the WES stage we set P < 1 × 10 -4 as the significance threshold in order to select more SNPs. We then focused on nonsynonymous and LOF (including stop gain, stop loss and splicing) mutations in the verification stage.
Validation of candidate SNPs. In order to validate discoveries, candidate SNPs were detected by Agena iPLEX MassARRAY assay, multiplex SNaPshot genotyping or Sanger sequencing to achieve highest confidence results in a verification stage involving 24 A(H7N9) patients (including 17 of the same patients studied in the discovery stage and 7 new patients) and 103 healthy controls. Moreover, 6 known SNPs associated with influenza infection based on previous studies 16,[29][30][31][32] were evaluated. For quality control of genotyping, blinded duplicate samples from two subjects and two negative control (water) samples were included in each method. Samples that failed to reach a call rate of 90% and SNPs displaying a call rate below 95% or HWE P ≤ 0.00001 were excluded. Logistic regression was performed to subtract effects of age and sex on results. Lastly, the SNP with the most significant association with A(H7N9) disease was confirmed in a cohort including 112 A(H7N9) patients and 167 healthy controls. At this stage, our samples are completely covered the previous stage.
In silico analysis. Interactions between the UBXN11 gene and other genes were evaluated using commercial IPA software (Qiagen, Redwood City, CA, USA) based on pathways previously defined in the literature. The gene was characterized based on Genecard database (https ://www.genec ards.org/) and PubMed search Scientific RepoRtS | (2020) 10:11768 | https://doi.org/10.1038/s41598-020-68675-y www.nature.com/scientificreports/ results. Because UBXN11 encodes a 520 amino acid protein with a divergent C-terminal UBX domain we modeled the 3-D structure of the C-terminal active domain of UBXN11 with and without the rs189256251-T mutation using Swiss-Model online software (https ://www.swiss model .expas y.org/) 33

Cell lines and virus infection assays. Adenocarcinoma derived human alveolar basal epithelial (A549)
cells were used in functional validation assays of the CT genotype of rs189256251. A549 cells were purchased from the cell bank of the Chinese Academy of Science. Before assay, the genotype of rs189256251 in the original A549 cell were sequenced by Sanger sequencing and the result showed rs189256251 was the CC genotype. The stable A549 overexpression cell line (A549 OE) and negative control (OE-NC) were constructed using lentivirus vectors with or without the target CT genotype of rs189256251 (Shanghai Genechem Co. Ltd). The transcription levels of UBXN11 in the OE and OE-NC cell lines were measured by real time PCR in triplicate assays and values calculated relative to a GAPDH internal control. Both OE and OE-NC cell lines also expressed EGFP.

Serum cytokine levels.
For serum cytokine detection, sample acceptance criteria were first serum sample collected from A(H7N9) patient and interval between disease onset and sample collection of less than 12 days. Serum levels of IL-1β, IL-2, IL-4, IL-5, IL-6, IL-8, IL-10, IL-12P70, IL-13, IL-17A, IL-21, IL-23, MIP-1a, MIP-1β, MCP-1, MIG, IP-10, RANTES, and IFN-γ were evaluated using the cytokine bead array (CBA) and the Human Th1/Th2 Cytokine CBA kit (BD Biosciences, San Jose, CA, USA) in a flow cytometer (FACSAria BD Biosciences, USA) as previously described 34 . Data were analyzed using BD CBA analysis software. The individual cytokine concentrations in each test sample were determined using a standard curve. Cytokine levels were measured in duplicate, averaged and compared to a standard curve. IFN-α and IL-28B were detected by sandwich commercial enzyme-linked immunosorbent assay kits (LifeSpan, USA). Data analysis was performed according to kit manufacturer's instructions. were chosen for targeted deep sequencing in a cohort that included 117 A(H7N9) patients and 131 healthy controls. HLA sequences were amplified by multiplex PCR and sequenced in PE 250 sequencing mode on the Illumina MiSeq sequencer according to manufacturer's instructions. Low quality reads and those contaminated with adaptor sequences were filtered out to yield clean reads. HLA allele typing was performed using Omixon Target software (version 1.8.1). The HLA genotyping data from 117 A(H7N9) patients were also compared to previously published data from 10,689 Han Chinese control subjects 35 . Statistical analysis. Plink software (version 1.9) was used to evaluate association between each variant and A(H7N9) infection risk. Male vs. female ratio and age distribution were compared by Mann-Whitney U test and two-tailed Student's t-test, where appropriate, using SPSS software (version 19.0.0). Fisher's exact test was used to compare HLA frequency between infected and healthy controls. P < 0.05 was considered statistically significant.

Data availability
The samples' sequences generated during the current study are available in SRA repository: SRR7263353-SRR7263369.