Multiple gene mutations identified in patients infected with influenza A (H7N9) virus

Influenza A (H7N9) virus induced high mortality since 2013. It is important to elucidate the potential genetic variations that contribute to virus infection susceptibilities. In order to identify genetic mutations that might increase host susceptibility to infection, we performed exon sequencing and validated the SNPS by Sanger sequencing on 18 H7N9 patients. Blood samples were collected from 18 confirmed H7N9 patients. The genomic DNA was captured with the Agilent SureSelect Human All Exon kit, sequenced on the Illumina Hiseq 2000, and the resulting data processed and annotated with Genome analysis Tool. SNPs were verified by independent Sanger sequencing. The DAVID database and the DAPPLE database were used to do bioinformatics analysis. Through exon sequencing and Sanger sequencing, we identified 21 genes that were highly associated with H7N9 influenza infection. Protein-protein interaction analysis showed that direct interactions among genetic products were significantly higher than expected (p = 0.004), and DAVID analysis confirmed the defense-related functions of these genes. Gene mutation profiles of survived and non-survived patients were similar, suggesting some of genes identified in this study may be associated with H7N9 influenza susceptibility. Host specific genetic determinants of disease severity identified by this approach may provide new targets for the treatment of H7N9 influenza.

Sample Collection. After informed consents were obtained from all subjects, blood samples were collected from 18 H7N9 infected patients, while exon sequencing was performed on 8 patients. Patients were considered to have H7N9 pneumonia if the following criteria were fulfilled: (1) nasopharyngeal swab positive for H7N9 and; (2) Chest X-ray or CT showing pulmonary infiltrates; (3) clinical symptom of fever and cough. This study was approved by ethical committee of Shanghai Public Health Clinical Center.
Blood collection and DNA extraction. After consent, 10 ml venous blood was drawn from each patient in an EDTA-containing tube. Samples were immediately centrifuged at 500 g for 10 min and plasma was removed and stored for future measurement. Genomic DNA was extracted from the remaining cell pellet using the SQ Blood DNA Kit II (omegabiotek D0714-250). Briefly, cells were lysed and then cell nuclei and mitochondria were separated by centrifugation. The isolated nuclei were resuspended in XL Buffer (supplied by omegabiotek) which contains chaotropic salt and proteinase to remove contamination. Lastly, genomic DNA was purified by isopropanol precipitation.
Exome capture, library preparation and sequencing. The isolated genomic DNA from 8 patients was fragmented into DNA strands with lengths of 150 to 200 bp by Covaris technology, and then adapters were ligated to both ends of the resulting fragments. The adapter-ligated templates were purified by the AgencourtAMPure SPRI beads and fragments with the insert size of about 200 bp were excised. Extracted DNA was amplified by ligation-mediated polymerase chain reaction (LM-PCR), purified, and hybridized to Agilent SureSelect Human All Exon (50 M) human exome array for enrichment. Hybridized fragments were bound to strepavidin beads whereas non-hybridized fragments were washed out after 24 h. Captured LM-PCR products were subjected to Agilent 2100 Bio-analyzer to estimate the magnitude of enrichment. Each captured library was then loaded on Hiseq2000 platform, and high-throughput sequencing for each captured library was performed. Raw image files were processed by Illumina base calling Software 1.7 for base calling with default parameters and the sequences of each individual were generated as 90 bp paired-end reads ( Table 2).

Read mapping and variation detection.
After removing reads containing sequencing adapters and low-quality reads, high-quality reads were aligned to the NCBI human reference genome (hg19/GRCh37) using BWA ( Burrows-Wheeler Aligner, v0.5.9-r16) with default parameters. Low-quality read was defined as more than half of a read was constituted with low quality bases (less than or equal to 5) or a read in which unknown bases were more than 10%. Picard (v1.54) (http://picard.sourceforge.net/) was used to mark duplicates. Subsequently, BAM files (sequence alignment/map format) were compressed to SAM files (the binary files of BAM files). SNPs (Single-nucleotide polymorphism) and InDels (Small insertions/deletions) were detected by module Unified Genotyper of GATK (Genome Analysis Toolkit v1.0.6076). And then ANNOVAR was used to do annotation and classification for SNPs and InDels respectively. Our data have been identified by dbSNP database (http://www. ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi), 1000 human genomes database (www.1000genomes.org/) and BGI's inhouse control database. We used BGI's inhouse control, most controls coming from a Whole Exome Sequencing based study of genetic risk for psoriasis which has been published 9 , and the controls comprised 800 normal people across the whole country. These inhouse control cases were specially used for analysis of rare diseases. Considering only 33 patients had confirmed H7N9 infection in Shanghai, and large population exposed to risk factors, the H7N9 infection was a low possibility case, and could be considered as rare disease, so this control data could be used in this study. We collected 40 genes which were correlated with avian influenza from HuGE Navigator by keyword search with "influenza" and extracted 89 exonic SNVs (single nucleotide variations) (Supplementary Dataset 1) located in 27 genes (  bioinformatics tool that can identify the biological processes, in which a group of genes are involved, were used for functional annotation.

Mutational Analysis of Genes from 18 H7N9 Infected patients.
We have admitted 18 H7N9 infected patients around 10 days after disease onset and a series of clinical manifestation, laboratory examinations and prognosis were carried out for the following 15 weeks. 6 of the 18 patients died and we found the increased plasma CRP (Creactive protein), PCT (Procalcitonin) and virus positive days were associated with mortality 11 . After exon sequencing of 8 survivors, 64 exonic SNPs, located in 21 genes, were found to be enriched in the H7N9 patients compared to controls from the NCBI human genome (hg19) (Supplementary Dataset 1 and Table 4). These mutations were found in genes encoding proteins responsible for multiple key host defense mechanisms, including cytokine production, airway epithelium barrier function and pathogen associated molecular pattern signaling pathway, suggesting biological plausibility ( Table 2).

Bioinformatics analysis.
The resulting genes with exonic SNPs were uploaded to the online tool DAPPLE for PPI network analysis. The results indicate that the PPI network was statistically significant. There were 5 disease proteins participating in the direct network with 3 direct interactions in total expected direct interactions = 0.347, p = 0.004 (Fig. 1, Table 5). Moreover, there were 13 genes participating in the indirect network under the same condition (Fig. 2). We further confirmed the functions of these candidate genes using the online tool DAVID. The genes were significantly enriched for defense-related processes such as response to stimulus (p = 1.81 × 10 −8 ), immune response (p = 8.85 × 10 −7 ), immune system process (p = 1.16 × 10 −6 ), response to biotic stimulus (p = 5.48 × 10 −6 ) and modulation by symbiont of host immune response (p = 1.53 × 10 −5 ) ( Table 6).
Gene mutation distribution between different groups. Whole exome sequencing was performed on 8 H7N9 patients and 89 exonic SNPs were identified. These SNPs were subjected to Sanger sequencing in all the 18 patients and 64 exonic SNVs were verified. We compared the mutation rate of the case and the inhouse control using the Fisher Exact Test and found significant statistical difference (Supplementary Dataset 1 and Table 4). There were 17 SNVs significantly different between the case and the inhouse control and we have validated 16 of them by Sanger sequencing ( Table 7). The 16 validated SNVs were located in 12 genes, and the protein-protein interaction among them (Fig. 3) was consistent with the protein-protein interaction among the 21 genes done before (Fig. 2). It is more likely that both the genes identified from this study that showed statistical difference of mutation frequency and the genes with same mutation rate between patients and controls have participated in the pathogenesis of H7N9 virus infection. We also did Mann-Whitney U test between the first 8 patients and the additional 10 patients and none of P-value was significant (Table 4), which could prove the inhouse control data do not introduce any false signals. Moreover, We compared the mutation rate of death group and survival group and analysis the mutation rate by Mann-Whitney U test and no significant difference was found between the survival and non-survival group (Table 4), suggesting some of genes identified in this study may be associated with H7N9 influenza susceptibility.

Discussion
The 2013 Chinese H7N9 influenza outbreak lead to an estimated 48 deaths with 33% mortality and significant morbidity in patients who survived the virus. An important observation during the recent H7N9 outbreak in China was the wide variation in host response to infections, with some patients developing only mild upper respiratory tract infections, while other patients developed severe ARDS and died. Although several determinants of the host response to infection have been identified, many important genetic factors that dampen or exacerbate  the host response to H7N9 infection likely remain undiscovered. Previous studies suggested that genetic mutations in the protein machinery that comprise key host defense mechanisms could impact outcomes of influenza infection 12 . The differential susceptibilities to influenza A(H7N9) were affected by functional variants of LGALS1 causing the expression variations 13 . The H7N9 influenza outbreak in China provides an unique opportunity to study mutations in this machinery, because many poultry workers were exposed to the virus, yet comparatively    14 . Mutations in these genes may lead to increased host susceptibility to infection or to a heightened, and potentially deleterious, host response to infection. We hypothesized that the exome sequencing of these patients may reveal genetic mutations that increased susceptibility to viral infection, and that in the future, these mutations could provide information regarding risk of infection, especially poultry workers or family members of infected patients. Using a variety of computational genetic techniques, we identified 21 genes that showed a high rate of mutation in patients infected with H7N9 when compared to the general population. Among these genes, some have been identified in prior studies of H7N9 susceptibility genes 14 . For example, Wang et al. reported that IFITM3 dysfunction is associated with increased cytokine production during H7N9 infection and is correlated with mortality 14 . IFITM3 (chr11, 320772, A > G) was reported to be enriched in patients hospitalized due to H1N1/09 infection 15 . Polymorphisms of CPT2, a carnitine palmitoyltransferase 2 protein, were found in patients suffering  , CES1,  HLA-DRB1, IFITM3, TLR2, TLR3,  ABCB1, TLR4, HLA-DQA1, IFNAR1,  LEP, RPAIN, IL10RB, NOS3, MX1   GO:0006955:immune response  9  42.85714286 8.85E-07   HLA-DQB1, MBL2, HLA-DRB1,  IL10RB, IFITM3, TLR2, TLR3, TLR4,  HLA-DQA1   GO:0002376:immune system process  10  47.61904762 1.16E-06   HLA-DQB1, MBL2, HLA-DRB1,  IL10RB, IFITM3, TLR2, TLR3, TLR4   from influenza-associated encephalopathy; results of overexpression of CPT2 variants in vitro suggested that the variants were heat-labile and failed to perform optimally during fever 16,17 . Four disease outcome-associated SNPs were identified on chromosomes 17 (RPAIN and C1QBP), chromosome 1 (FCGR2A), and chromosome 3 (unknown gene). C1QBP and GCGR2A play roles in the formation of immune complexes and complement activation, suggesting that the severe disease outcome of H1N1 infection may result from an enhanced host immune response 12,16 .
Among the 21 genes we identified, we use the online tool DAPPLE to performed a PPI analysis and found 5 proteins directly participates the PPI network. Those proteins include: LEP, IFNAR1, IL10RB, HLA-DQA1, HLA-DQB1. The PPI analysis suggested significant role of these proteins in influenza infection and may provide target for interventional therapy.
The primary limitation of this study is the relative small sample size. Only 18 patients were enrolled, and confirmation of these findings in subsequent studies will be needed. We are planning to collect more samples for next step sequencing.

Conclusion
Using comparative genetic analysis in 18 patients with confirmed H7N9 viral infection in China, we identified 21 genetic mutations that occurred at a higher rate in infected patients when compared to the general population. Many of the identified genes are involved in key host defense mechanisms, which gives strong biologic plausibility to the role of these genes in both host susceptibility to infection as well as host immune response related pathology. Further investigations into the function of these genes in host susceptibility may help identify individuals who are at high risk for infection. In addition, translational research into the function of the genes identified in this study may provide new potential therapeutic targets for influenza virus infection.