High-throughput profiling of influenza A virus hemagglutinin gene at single-nucleotide resolution

Wu, Nicholas C.; Young, Arthur P.; Al-Mawsawi, Laith Q.; Olson, C. Anders; Feng, Jun; Qi, Hangfei; Chen, Shu-Hwa; Lu, I.-Hsuan; Lin, Chung-Yen; Chin, Robert G.; Luan, Harding H.; Nguyen, Nguyen; Nelson, Stanley F.; Li, Xinmin; Wu, Ting-Ting; Sun, Ren

doi:10.1038/srep04942

Download PDF

Article
Open access
Published: 13 May 2014

High-throughput profiling of influenza A virus hemagglutinin gene at single-nucleotide resolution

Nicholas C. Wu^1,2^na1,
Arthur P. Young¹^na1,
Laith Q. Al-Mawsawi¹^na1,
C. Anders Olson¹^na1,
Jun Feng¹^na1,
Hangfei Qi¹^na1,
Shu-Hwa Chen³^na1,
I.-Hsuan Lu³^na1,
Chung-Yen Lin³^na1,
Robert G. Chin⁴^na1,
Harding H. Luan¹^na1,
Nguyen Nguyen¹^na1,
Stanley F. Nelson^2,4^na1,
Xinmin Li⁵^na1,
Ting-Ting Wu¹^na1 &
…
Ren Sun^1,2,6^na1

Scientific Reports volume 4, Article number: 4942 (2014) Cite this article

7164 Accesses
104 Citations
3 Altmetric
Metrics details

Subjects

Abstract

Genetic research on influenza virus biology has been informed in large part by nucleotide variants present in seasonal or pandemic samples, or individual mutants generated in the laboratory, leaving a substantial part of the genome uncharacterized. Here, we have developed a single-nucleotide resolution genetic approach to interrogate the fitness effect of point mutations in 98% of the amino acid positions in the influenza A virus hemagglutinin (HA) gene. Our HA fitness map provides a reference to identify indispensable regions to aid in drug and vaccine design as targeting these regions will increase the genetic barrier for the emergence of escape mutations. This study offers a new platform for studying genome dynamics, structure-function relationships, virus-host interactions and can further rational drug and vaccine design. Our approach can also be applied to any virus that can be genetically manipulated.

The structure of the influenza A virus genome

Article 22 July 2019

Prevalence and mechanisms of evolutionary contingency in human influenza H3N2 neuraminidase

Article Open access 28 October 2022

Inferring the distribution of fitness effects in patient-sampled and experimental virus populations: two case studies

Article 05 January 2022

Introduction

The broad field of systems biology was significantly advanced in the past decade due to many technological improvements, such as the invention of DNA microarray, next generation sequencing, mass-spectrometry and other applications permitting high-throughput screenings^1,2. These technical advancements have enabled large scale studies including interactomics, proteomics, transcriptomics, genomics, epigenomics and metagenomics, which have revolutionized biomedical research^3,4,5,6,7,8. A multitude of structure-function information is embedded in these studies that is valuable for rational drug and vaccine design. In addition, the continued development of in silico approaches to protein structural modeling, prediction and design further complements the impact of high-throughput biological data^9,10,11,12.

High-throughput tools have also influenced the advancement of genetic approaches. Traditional genetic methods focus on a single genotype-phenotype relationship at a time and has been extensively employed to analyze individual mutations. In contrast, high-throughput genetic methods examine the phenotypic outcomes of multiple mutations simultaneously. Genome-wide insertional mutagenesis is a common high-throughput genetic approach. It has been employed to characterize bacterial genomes at a single-gene resolution level^13,14. A higher resolution has been achieved in two medically important RNA viruses, HCV and influenza^15,16. However, the maximum resolution of the insertional mutagenic approach is limited to a protein subdomain level and thus is insufficient to identify critical amino acid residues. Therefore, there is a demand for a high-throughput genetic platform at a single-residue resolution.

In this study, we developed a single-nucleotide resolution genetic approach using a large mutant library and a sensitive deep sequencing technique to annotate the influenza A virus hemagglutinin (HA) gene, which carries critical roles in receptor binding, viral entry, host shifts and immune escape mechanisms. Here, we probe for fitness effects of individual substitutions in 98% of all amino acid positions across HA. Our results provide a comprehensive structure-function description of HA and offer a reference to identify potential vaccine epitope. More importantly, the high-throughput profiling platform established in this study can be applied to any genetically manipulable viral gene or genome to probe mutational fitness effects under any specified growth condition.

Results

High-throughput genetic approach at single-nucleotide resolution

The conceptual basis of our high-throughput genetic platform is to randomly mutagenize each position of the genome, monitor the enrichment or diminishment of each point mutation under a specified growth condition and perform massive deep-sequencing to determine which mutations are associated with negative, neutral, or positive fitness outcomes under the given growth condition. The mutant library was created on influenza A/WSN/1933 (H1N1) hemagglutinin (HA) gene by performing error-prone PCR on the eight-plasmid reverse genetics system¹⁷ (see materials and methods). Subsequently, the viral mutant library was generated by transfection and passaged for two 24-hour replication selection rounds in A549 cells (human lung epithelial carcinoma cells) (Fig. 1A). The plasmid library and the passaged viral library were each sequenced by Illumina HiSeq 2000. Individual mutants would experience an identical selection pressure with other mutants in the pool during the course of transfection and infection. Therefore, comparing the genetic compositions of the plasmid library and the passaged viral library reflects the variation in replication rates for each mutation. Here, we use a relative fitness index (RF index) as a proxy for the fitness effect of individual mutations. The RF index is calculated as:

The occurrence frequency of individual mutations was largely expected to be lower than the sequencing error rate of 0.1% in the Illumina next generation sequencing (NGS).

Therefore, we utilized a two-step PCR approach for library preparation to distinguish true mutations from sequencing errors (Fig. 1B). In the first PCR, the HA gene was divided into 12 amplicons for amplification with a unique tag assigned to individual molecules. In the second PCR, multiple identical copies for individual tagged molecules were generated. The input copy number for the second PCR was well-controlled such that after a sub-saturation PCR, individual tagged molecules would be sequenced ~10 times. True mutations would exist in most, if not all, sequencing reads sharing the same tag, whereas sequencing errors would not. This error-correction approach is based on a valid assumption that occurrence of sequencing error is independent of the identity of the nucleotide tag¹⁸. Therefore, sequencing errors could be distinguished from true mutations. Individual molecules, each carrying a unique tag, have an average copy number of ~10 (median = 10) in the sequencing data, which verified the sequencing library preparation design.

Point mutation fitness profiling of hemagglutinin

The RF indices of individual point mutations were profiled across 98% of amino acid positions of HA in biological duplicate (Spearman correlation = 0.78) (Fig. 2A). The remaining 2% of amino acid positions not observed were from the termini of HA, where the first and last amplicon primers are located. Silent mutations and nonsense mutations provided an internal control to access the data quality. In principle, silent mutations, which alter the nucleotide sequence but not the amino acid sequence, rarely impose a fitness cost. On the other hand, nonsense mutations, which result in a truncated protein product, are lethal to the virus. Indeed, our data is consistent with this notion. Silent mutations have a significantly higher RF index than nonsense mutations (P < 2 e⁻¹⁶, two-tailed Student's t-test) (Fig. 2B). In addition, the RF index distributions of silent mutations and nonsense mutations are well separated, which validated the reliability of our approach. However, several silent mutations with a low RF index were observed, which may be indicative of their roles in codon usage, RNA structure and other functions beyond protein-coding.

Furthermore, the fitness data is consistent with the reported phenotypes of mutants that have been previously characterized in the literature. Examples include a temperature sensitive substitution (Y174H)¹⁹, a host switching substitution (D238G)²⁰, two ther-modynamic stabilizing substitutions (D111E and Q299R)²¹ and four HA cleavage site substitutions (Y342H, Y342C, Y342N and Y342F)²² (Table 1). Y174H, D238G, Y342H, Y342C and Y342N, which are expected to be deleterious under our experimental condition (see footnote in Table 1), have a relatively low RF index (ranging from 0.04 to 0.23). On the other hand, D111E, Q299R and Y342F, which are expected to be neutral under our experimental condition, have a relatively high RF index (ranging from 0.37 to 1.03). These comparisons show the consistency between our dataset and the experimental results reported in the literature.

Table 1 Comparison with phenotype reported in the literature

Full size table

Independent experimental validation also confirmed our dataset. Six randomly selected point mutations were individually reconstructed and analyzed. RF indices of each mutation have a positive correlation with the TCID₅₀ value measured from a rescue experiment (Fig. 3A–B). Overall, these analyses verified the reliability of the fitness profiling data and demonstrated our platform to be comprehensive and at high resolution. The RF indices of all profiled HA amino acid substitutions are presented in Table S1.

Structural analysis of hemagglutinin

Our platform has a high sensitivity for monitoring negative selection in addition to positive selection and therefore enables the identification of deleterious mutations that disappear throughout viral passaging. The availability of the influenza HA crystal structure allowed us to further extrapolate structural insights from our dataset. A weak, yet significant spearman correlation of 0.30 was observed between the RF index and the relative solvent accessible surface area (SASA) of HA (P < 2 e⁻¹⁶). This indicates that surface residues are more tolerant to substitutions than core residues, which is consistent with observations in cellular proteins^23,24. We also analyzed the fitness effects of mutations in different types of structural elements, namely α-helices (mean log₁₀ RF index = −1.19), β-strands (mean log₁₀ RF index = −0.97), turns (mean log₁₀ RF index = −0.98) and coils (mean log₁₀ RF index = −1.01). Interestingly, mutations in α-helices are more deleterious than mutations in β-strands (P = 1 e⁻⁴), turns (P = 1 e⁻³) and coils (P = 2 e⁻³). In contrast, the fitness effects of mutations in β-strands, turns and coils are not significantly different from each other (P > 0.4). This result implies that most functional elements in HA are contained within α-helices.

We further investigated each α-helix by computing their individual mean log₁₀ RF index (Fig. 4A). As expected from the SASA analysis, the α-helices located at the core of HA₁ are the least tolerant to mutations (red and pink, mean log₁₀ RF index = −1.52 and −1.40 respectively). The other α-helix in HA₁ is also relatively intolerable to mutations (orange, mean log₁₀ RF index = −1.11), which is consistent with its role in receptor binding for viral entry²⁵. In HA₂, the two α-helices located at the stem-loop region are relatively intolerable to mutations (green and cyan, mean log₁₀ RF index = −1.11 and −1.22 respectively), which can be attributed to their functional role in membrane fusion during viral entry²⁶. In fact, all of the mean log₁₀ RF indices reported above are lower than that of the entire HA (mean log₁₀ RF index = −1.04). Together, these findings demonstrated that α-helices in HA are important for different functional mechanisms.

Interestingly, the non-structural loop region (blue) that interspaces the aforementioned helices (green and cyan) is more tolerant to mutations compared to its neighboring α-helices (mean log₁₀ RF index = −0.76) (Fig. 4A). This region undergoes a transition from a non-structural loop to an α-helix during membrane fusion. Nonetheless, the relatively high RF index in this region suggests that the structural requirement for this transition is not stringent. This is further evidenced by a proline substitution analysis (Fig. 4B). Among all 20 standard amino acids, proline has the poorest α-helix formation propensity as its presence would result in a break or a kink of an α-helix²⁷. Therefore, it is expected that proline substitutions in an α-helix would carry a low RF index (deleterious). Indeed, all proline substitutions in the HA α-helices have a log₁₀ RF index < −1. In contrast, two out of three proline substitutions in the non-structural loop have a log₁₀ RF index > −1 (−0.81 and −0.19 respectively). This result suggests that the formation of a continuous α-helix in this region is not a strict requirement during membrane fusion.

We also performed an in depth analysis on the α-helix that is important for homotrimer formation (colored in cyan in Fig. 4A). Helix wheel projection showed that high hydrophobicity was critical at heptad position d (Fig. 4C). We further investigated the RF index of those amino acid substitutions at heptad position d (Fig. 4D). Silent mutation at G430 had the lowest RF index (0.24) among all silent mutations at this heptad position. This RF index was employed as a reference to identify substitutions that has a relatively neutral fitness effect. Only three out of 27 amino acid substitutions at this heptad position has an RF index ≥0.24, namely Y437F (RF index = 0.35), V465I (RF index = 0.40) and V465A (RF index = 0.30). These three substitutions are conserved in volume and hydrophobicity, which suggests that residues at heptad position d has a stringent structural constraint in side chain conformation and hydrophobicity for homotrimer formation.

Identification of essential regions

Our profiling also provides information to identify possible essential protein surfaces and indispensable regions useful for vaccine epitopes. Our genetic platform provides the relative fitness effects of an average of five substitutions per amino acid residue. The RF indices of the most destructive substitutions in our dataset can be projected on the HA structure to identify putative functional regions that cannot tolerate certain amino acid substitutions (Fig. 5A–B). Whereas the RF indices of the least destructive substitutions for HA is projected on the HA structure to identify essential regions that are intolerable to any substitution (Fig. 5C). As expected, the trimer formation surface (Fig. 5A) and the stem domain (Fig. 5B–C), which is the major functional component of the membrane fusion machinery in HA, show as essential regions in our profiling data. In addition, our dataset identified the cross-subtype conserved influenza HA stalk region as an indispensable region (Fig. 5C–D), which is at the binding site of the proposed influenza universal antibody, CR6261^28,29. The side-chain interactions at this site are important for CR6261 recognition. Although several missense substitutions in the binding site are allowed, they are conservative substitutions (N389D and T392S) unlikely to disrupt antibody recognition (Fig. 5C–D). It confirms the promising aspect of the proposed universal antibody²⁹. In addition, the main antigenic sites on the globular head of HA were largely tolerable to substitutions (Fig. 5C). This observation suggests a functional basis for the tendency of this domain to rapidly undergo genetic drift, which adversely affects both natural and vaccine-induced immunity³⁰. Overall, our work details the genetic cost for individual point mutations across HA – the primary target of anti-influenza neutralizing antibodies^{28,29,30,31,32}. This dataset therefore provides a valuable reference for rational vaccine design.

Discussion

Traditionally, critical residues on a viral genome are discovered by testing individual mutants and requires multiple assays to dissect the associated biological functions. The low throughput nature of this process limits the number of mutants tested. In this study, we have developed a comprehensive strategy using the influenza A virus as a model system to profile the fitness effects of individual point mutations and to identify essential residues throughout the HA gene in a high-throughput manner.

Recently, two studies that describe the development of a deep sequencing-based high-throughput genetic platform at single-nucleotide resolution have been reported in the literature^33,34. Robins et al. probed for essential residues in T7 bacteriophage and T7-like virus JSF7 of Vibrio cholerae using mutant libraries constructed by chemical-induced transition of a GC base pair to an AT base pair³³. Acevedo et al., on the other hand, interrogated the fitness effects of individual point mutations that naturally emerged in an evolving poliovirus population which has a high mutation rate, rather than employing any engineering strategy of introducing mutations³⁴. In this study, we have developed a novel strategy which utilizes a saturated point mutation library together with a sensitive sequencing approach. When compared to the two aforementioned approaches, our method is more comprehensive and unbiased due to the mutant library construction strategy, which is independent of spontaneous mutations. This application can be extended to other influenza genes and to other genetically manipulable viruses under any applied selection condition at a single-nucleotide resolution level.

Identification of residues essential for viral replication is often inferred by sequence conservation. Observed sequence conservation derives from the viral sequences that initiated the endemic and is influenced by the host genetic background and the specific immune responses associated with the host. Conservation is not equivalent to essentialness for viral replication in cells. Mutational analysis of conserved amino acid residues on influenza A virus has revealed that a significant fraction of conserved residues are dispensable in viral replication^35,36,37. In addition, new mutations emerge every flu season, implying that a certain portion of residues that are conserved currently are still capable to mutate in the natural environment and provide a fitness advantage under future unforeseen selection pressures. This also suggests that a conserved amino acid may not necessarily be essential to viral replication. Additionally, analyses of conserved sequences provide information on viral genetic elements that survived in the selected human population in recent history, but does not provide much information on viral genetic elements that were unable to survive the selection process, nor about which host factor was responsible for exerting the selection. Our approach provides a complementary, yet more direct approach to identify amino acid residues that are critical for viral replication in a defined cellular environment. Nonetheless, to be more comprehensive, similar studies should be performed with strains across subtypes and include different selection conditions.

In summary, the platform described here enabled the simultaneous functional profiling of point mutations across the entire influenza HA at single-nucleotide resolution to determine their roles in viral replication. Our platform provides an efficient tool to address several important biomedical questions. The fitness profiling data allows the study of structure-function relationships at single-amino acid resolution. It enables the search for essential protein surfaces on available structures and thus offers a reference for drug design approaches that aim to increase the genetic barrier for the emergence of escape mutations^38,39,40. Essential peptide stretches could also provide potential targets for drug and vaccine development⁴¹. Our genetic platform can be applied to study viral genome dynamics and identify critical residues for virus-host interactions in a specific cellular responses (such as apoptosis, autophagy, inflammasome induction, ER stress, etc.) and immune responses (such as NK cells, T cells, antibodies, macrophages, cytokines, etc.)^42,43. The current development of a live attenuated influenza vaccine has been based on the modification of NS1 to increase interferon sensitivity⁴⁴. However, this study provides a platform to explore alternative strategies. Comparing the in vitro fitness profile with an in vivo profile could also permit the identification of mutants that replicate efficiently in vitro but not in vivo. The resultant information when coupled with known mutants that are sensitive to a specified immune response could help achieve a higher titer during vaccine production, but exhibit an attenuated phenotype after injection into the human body where an intact immune system is present. Most importantly, our platform is applicable to other viral or microbial genomes where genetic manipulation is available in the laboratory. The sensitivity of our platform will increase as NGS technology improves. With the continued development of NGS technology, we foresee that our platform will be further advanced and can be applied at a much lower cost.

Methods

Viral mutant library and point mutations

The plasmid mutant library was created by performing error-prone PCR on the HA segment of the eight-plasmid reverse genetics system of influenza A/WSN/1933 (H1N1)¹⁷. We PCR-amplified the HA gene insert with error-prone polymerase Mutazyme II (Stratagene, La Jolla, CA). The mutation rate of the error-prone PCR was optimized by adjusting the input template amount to avoid the accumulation of deleterious mutations. The restriction enzyme site BsmBI was present in the PCR primers and used to clone into a BsmBI-digested parental vector pHW2000. Ligations were carried out with high concentration T4 ligase (Life Technologies, Carlsbad, CA). Transformations were carried out with electrocompetent MegaX DH10B T1R cells (Life Technologies) and >200,000 colonies were scraped and directly processed for plasmid DNA purification (Qiagen Sciences, Germantown, MD). As extensive trans-complementation was expected during the transfection step, >35 million cells were used for transfection to average out any bias or artifact generated from possible trans-complementation. Point mutants for the validation experiment were constructed using the QuikChange XL Mutagenesis kit (Stratagene) according to the manufacturer's instructions.

Transfections, infections and titering

C227 cells, a dominant negative IRF-3 stably expressing cell line derived from human embryonic kidney (293T) cells, were transfected with Lipofectamine 2000 (Life Technologies) using the HA mutant library plasmid plus 7 other wildtype plasmids. Supernatant was replaced with fresh cell growth medium at 24 hrs and 48 hrs post-transfection. At 72 hrs post-transfection, supernatant containing infectious virus was harvested, filtered through a 0.45 um MCE filter and stored at −80 degree Celsius. The TCID₅₀ was measured on A549 cells (human lung carcinoma cells).

Virus from the C227 transfection was used to infect A549 cells at an MOI of 0.05. Infected cells were washed three times with PBS followed by the addition of fresh cell growth medium at 2 hrs post-infection. Virus was harvested at 24 hrs post-infection. For the mutant library profiling, HA mutant library was passaged for two 24-hour rounds in A549 cells. Our pilot experiments as well as our previous study revealed that two rounds of passaging were suffcient for profiling⁴⁵. The biological duplicate was performed by an independenly transfected viral library, followed by two rounds of passaging as described above.

Sequencing library preparation

Viral RNA was extracted from the passaged viral mutant library using QIAamp Viral RNA Mini Kit (Qiagen Sciences) and was reverse transcribed to cDNA using Superscript III reverse transcriptase (Life Technologies). DNA from the plasmid library or cDNA from the passaged viral mutant library were amplified with both forward and reverse primers each flanked with a 6 “N” tag and the Illumina flow cell adapter region. Flanking region for 5′ primer: 5′-CTA CAC GAC GCT CTT CCG ATC TNN NNN N-3′, Flanking region for 3′ primer: 5′-TGC TGA ACC GCT CTT CCG ATC TNN NNN N-3′. Following PCR, 12 amplicon products were pooled together. 1.5 million copies of the pooled product were used as the input for the second PCR, which was equivalent to 10 paired-end reads per molecule if 15 million paired-end reads were sequenced. 5′-AAT GAT ACG GCG ACC ACC GAG ATC TA CAC TCT TTC CCT ACA CGA CGC TCT TCC G-3′ and 5′-CAA GCA GAA GAC GGC ATA CGA GAT CGG TCT CGG CAT TCC TGC TGA ACC GCT CTT CCG-3′ were used as the primers for the second PCR. Products of the second PCR were submitted for next generation sequencing. The error-correction technique described in this study shared the same philosophy as described for detecting rare mutations in human cells¹⁸. However, this study included the fine restraint of limiting the input tagged template copy number and PCR efficiency during the second step PCR to accurately control the distribution of cluster size in the sequencing output to a median of 10. Raw sequencing data have been submitted to the NIH Short Read Archive under accession number: BioProject PRJNA243038.

Data analysis

Sequencing reads were mapped by BWA with a maximum of six mismatches and no gap⁴⁶. Amplicons with the same tag were collected to generate a read cluster. Since each read cluster was originated from the same template, true mutations were called only if the mutations occurred in 90% of the reads within a read cluster. We acknowledged that this error-correction approach would only correct errors that occured during the deep sequencing process but not those that were introduced during the reverse transcription process. Read clusters with a size below three reads were filtered out. Read clusters were further conflated into “error-free” reads. Average coverages in terms of “error-free” reads were 177028 per nucleotide in the plasmid mutant library, 112355 per nucleotide in replicate 1 of passaged viral mutant library and 161773 per nucleotide in replicate 2 of passaged viral mutant library (Fig. S1A). Relative fitness index (RF index) for individual point mutations was computed by:

For all the downstream analysis, only point mutations covered with ≥30 tag-conflated reads (“error-free” reads) in the plasmid library were included. This arbitrary cutoff filtered out mutants with low statistical confidence, which is ~16% of all possible point mutations (Fig. S1B). In addition, all C → A and G → T mutations are not included in the reported dataset due to an observed DNA oxidative damage during library preparation⁴⁷. The RF index presented in Table S1 was calculated by averaging all RF indices available for a given amino acid substitution.

Structural analysis

The solvent accessible surface area (SASA) for individual residues was computed from PyMOL using the default “get area” function. SASA obtained from the folded structure was then normalized with the SASA calculated from an unfolded structure to obtain the relative SASA. Secondary structure assignment was performed by STRIDE⁴⁸. The structural analysis was based on PDB: 1RUZ⁴⁹. A two-tailed Student's t-test was employed to compare the log10 RF indices in different types of structural elements. Only missense mutations are included in the analysis unless otherwise stated.

References

Mardis, E. R. Next-generation dna sequencing methods. Annu Rev Genomics Hum Genet 9, 387–402 (2008).
Article CAS PubMed Google Scholar
Schena, M., Shalon, D., Davis, R. W. & Brown, P. O. Quantitative monitoring of gene expression patterns with a complementary dna microarray. Science 270, 467–470 (1995).
Article ADS CAS PubMed Google Scholar
Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).
Article CAS PubMed Google Scholar
Chen, K. & Pachter, L. Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLoS Comput Biol 1, 106–112 (2005).
Article CAS PubMed Google Scholar
Mavromatis, K. et al. The fast changing landscape of sequencing technologies and their impact on microbial genome assemblies and annotation. PLoS One 7, e48837 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, Z., Gerstein, M. & Snyder, M. Rna-seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10, 57–63 (2009).
Article CAS PubMed PubMed Central Google Scholar
Hann, M. M. & Oprea, T. I. Pursuing the leadlikeness concept in pharmaceutical research. Curr Opin Chem Biol 8, 255–263 (2004).
Article CAS PubMed Google Scholar
Sanchez, C. et al. Grasping at molecular interactions and genetic networks in drosophila melanogaster using flynets, an internet database. Nucleic Acids Res 27, 89–94 (1999).
Article CAS PubMed PubMed Central Google Scholar
Brooks, B. R. et al. Charmm: the biomolecular simulation program. J Comput Chem 30, 1545–1614 (2009).
Article CAS PubMed PubMed Central Google Scholar
Kellogg, E. H., Leaver-Fay, A. & Baker, D. Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins 79, 830–838 (2011).
Article CAS PubMed Google Scholar
Koga, N. et al. Principles for designing ideal protein structures. Nature 491, 222–227 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Whitehead, T. A. et al. Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing. Nat Biotechnol 30, 543–548 (2012).
Article CAS PubMed PubMed Central Google Scholar
Christen, B. et al. The essential genome of a bacterium. Mol Syst Biol 7, 528 (2011).
Article PubMed PubMed Central CAS Google Scholar
van Opijnen, T. & Camilli, A. Genome-wide fitness and genetic interactions determined by tn-seq, a high-throughput massively parallel sequencing method for microorganisms. Curr Protoc Microbiol Chapter 1, Unit1E.3 (2010).
PubMed Google Scholar
Arumugaswami, V. et al. High-resolution functional profiling of hepatitis c virus genome. PLoS Pathog 4, e1000182 (2008).
Article PubMed PubMed Central CAS Google Scholar
Heaton, N. S., Sachs, D., Chen, C.-J., Hai, R. & Palese, P. Genome-wide mutagenesis of influenza virus reveals unique plasticity of the hemagglutinin and ns1 proteins. Proc Natl Acad Sci U S A 110, 20248–20253 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Neumann, G. et al. Generation of influenza a viruses entirely from cloned cdnas. Proc Natl Acad Sci U S A 96, 9345–9350 (1999).
Article ADS CAS PubMed PubMed Central Google Scholar
Kinde, I., Wu, J., Papadopoulos, N., Kinzler, K. W. & Vogelstein, B. Detection and quantification of rare mutations with massively parallel sequencing. Proc Natl Acad Sci U S A 108, 9530–9535 (2011).
Article ADS PubMed PubMed Central Google Scholar
Nakajima, S. et al. Identification of the defects in the hemagglutinin gene of two temperature-sensitive mutants of a/wsn/33 influenza virus. Virology 154, 279–285 (1986).
Article CAS PubMed Google Scholar
Leung, H. S. Y. et al. Entry of influenza a virus with a 2,6-linked sialic acid binding preference requires host fibronectin. J Virol 86, 10704–10713 (2012).
Article CAS PubMed PubMed Central Google Scholar
Bloom, J. D. & Glassman, M. J. Inferring stabilizing mutations from protein phylogenies: application to influenza hemagglutinin. PLoS Comput Biol 5, e1000349 (2009).
Article ADS PubMed PubMed Central CAS Google Scholar
Sun, X., Tse, L. V., Ferguson, A. D. & Whittaker, G. R. Modifications to the hemagglutinin cleavage site control the virulence of a neurotropic h1n1 influenza virus. J Virol 84, 8683–8690 (2010).
Article CAS PubMed PubMed Central Google Scholar
Tokuriki, N., Stricher, F., Schymkowitz, J., Serrano, L. & Tawfik, D. S. The stability effects of protein mutations appear to be universally distributed. J Mol Biol 369, 1318–1332 (2007).
Article CAS PubMed Google Scholar
Guo, H. H., Choe, J. & Loeb, L. A. Protein tolerance to random amino acid change. Proc Natl Acad Sci U S A 101, 9205–9210 (2004).
Article ADS CAS PubMed PubMed Central Google Scholar
White, C. L. et al. A sialic acid-derived phosphonate analog inhibits different strains of influenza virus neuraminidase with different efficiencies. J Mol Biol 245, 623–634 (1995).
Article CAS PubMed Google Scholar
Bullough, P. A., Hughson, F. M., Skehel, J. J. & Wiley, D. C. Structure of influenza haemagglutinin at the ph of membrane fusion. Nature 371, 37–43 (1994).
Article ADS CAS PubMed Google Scholar
Pace, C. N. & Scholtz, J. M. A helix propensity scale based on experimental studies of peptides and proteins. Biophys J 75, 422–427 (1998).
Article CAS PubMed PubMed Central Google Scholar
Ekiert, D. C. et al. Antibody recognition of a highly conserved influenza virus epitope. Science 324, 246–251 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Throsby, M. et al. Heterosubtypic neutralizing monoclonal antibodies cross-protective against h5n1 and h1n1 recovered from human igm+ memory b cells. PLoS One 3, e3942 (2008).
Article ADS PubMed PubMed Central CAS Google Scholar
Chen, J.-R., Ma, C. & Wong, C.-H. Vaccine design of hemagglutinin glycoprotein against influenza. Trends Biotechnol 29, 426–434 (2011).
Article CAS PubMed Google Scholar
Sui, J. et al. Structural and functional bases for broad-spectrum neutralization of avian and human influenza a viruses. Nat Struct Mol Biol 16, 265–273 (2009).
Article CAS PubMed PubMed Central MathSciNet Google Scholar
Corti, D. et al. A neutralizing antibody selected from plasma cells that binds to group 1 and group 2 influenza a hemagglutinins. Science 333, 850–856 (2011).
Article ADS CAS PubMed Google Scholar
Robins, W. P., Faruque, S. M. & Mekalanos, J. J. Coupling mutagenesis and parallel deep sequencing to probe essential residues in a genome or gene. Proc Natl Acad Sci U S A 110, E848–E857 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Acevedo, A., Brodsky, L. & Andino, R. Mutational and fitness landscapes of an rna virus revealed through population sequencing. Nature 505, 686–690 (2014).
Article ADS CAS PubMed Google Scholar
Chu, C. et al. Functional analysis of conserved motifs in influenza virus pb1 protein. PLoS One 7, e36113 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Li, Z. et al. Mutational analysis of conserved amino acids in the influenza a virus nucleoprotein. J Virol 83, 4153–4162 (2009).
Article CAS PubMed PubMed Central Google Scholar
Stewart, S. M. & Pekosz, A. Mutations in the membrane-proximal region of the influenza a virus m2 protein cytoplasmic tail have modest effects on virus replication. J Virol 85, 12179–12187 (2011).
Article CAS PubMed PubMed Central Google Scholar
Boltz, D. A., Aldridge, J. R., Webster, R. G. & Govorkova, E. A. Drugs in development for influenza. Drugs 70, 1349–1362 (2010).
Article CAS PubMed PubMed Central Google Scholar
Memoli, M. J., Morens, D. M. & Taubenberger, J. K. Pandemic and seasonal influenza: therapeutic challenges. Drug Discov Today 13, 590–595 (2008).
Article CAS PubMed PubMed Central Google Scholar
Pinto, L. H. & Lamb, R. A. Controlling influenza virus replication by inhibiting its proton channel. Mol Biosyst 3, 18–23 (2007).
Article CAS PubMed Google Scholar
Tan, P. T., Khan, A. M. & August, J. T. Highly conserved influenza a sequences as t cell epitopes-based vaccine targets to address the viral variability. Hum Vaccin 7, 402–409 (2011).
Article CAS PubMed Google Scholar
Ehrhardt, C. et al. Interplay between influenza a virus and the innate immune signaling. Microbes Infect 12, 81–87 (2010).
Article CAS PubMed Google Scholar
Rossman, J. S. & Lamb, R. A. Autophagy, apoptosis and the influenza virus m2 protein. Cell Host Microbe 6, 299–300 (2009).
Article CAS PubMed Google Scholar
Richt, J. A. & Garca-Sastre, A. Attenuated influenza virus vaccines with modified ns1 proteins. Curr Top Microbiol Immunol 333, 177–195 (2009).
CAS PubMed Google Scholar
Wu, N. C. et al. Systematic identification of h274y compensatory mutations in influenza a virus neuraminidase by high-throughput screening. J Virol 87, 1193–1199 (2013).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Lou, D. I. et al. High-throughput dna sequencing errors are reduced by orders of magnitude using circle sequencing. Proc Natl Acad Sci U S A 110, 19872–19877 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Heinig, M. & Frishman, D. Stride: a web server for secondary structure assignment from known atomic coordinates of proteins. Nucleic Acids Res 32, W500–W502 (2004).
Article CAS PubMed PubMed Central Google Scholar
Gamblin, S. J. et al. The structure and receptor binding properties of the 1918 influenza hemagglutinin. Science 303, 1838–1842 (2004).
Article ADS CAS PubMed Google Scholar
Marsh, G. A., Hatami, R. & Palese, P. Specific residues of the influenza a virus hemagglutinin viral rna are important for efficient packaging into budding virions. J Virol 81, 9727–9736 (2007).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We would like to thank J. Zhou, J. Yoshizawa, T. Toy and Z. Chen for performing the high-throughput sequencing experiment. This work was supported by the National Institute of Health (reference R01-EB-009764), UCLA Molecular Biology Whitcome Pre-Doctoral Fellowship, Oppenheimer Endowment Awards and Clinical Translational Seed Grants and the UCLA Jonsson Comprehensive Cancer Center.

Author information

Wu Nicholas C. and Young Arthur P. contributed equally to this work.

Authors and Affiliations

Department of Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
Nicholas C. Wu, Arthur P. Young, Laith Q. Al-Mawsawi, C. Anders Olson, Jun Feng, Hangfei Qi, Harding H. Luan, Nguyen Nguyen, Ting-Ting Wu & Ren Sun
Molecular Biology Institute, University of California, Los Angeles, CA, 90095, USA
Nicholas C. Wu, Stanley F. Nelson & Ren Sun
Institute of Information Science, Academia Sinica, Taipei, Taiwan
Shu-Hwa Chen, I.-Hsuan Lu & Chung-Yen Lin
Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
Robert G. Chin & Stanley F. Nelson
Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, CA, 90095, USA
Xinmin Li
AIDS Institute, University of California, Los Angeles, CA, 90095, USA
Ren Sun

Authors

Nicholas C. Wu
View author publications
You can also search for this author in PubMed Google Scholar
Arthur P. Young
View author publications
You can also search for this author in PubMed Google Scholar
Laith Q. Al-Mawsawi
View author publications
You can also search for this author in PubMed Google Scholar
C. Anders Olson
View author publications
You can also search for this author in PubMed Google Scholar
Jun Feng
View author publications
You can also search for this author in PubMed Google Scholar
Hangfei Qi
View author publications
You can also search for this author in PubMed Google Scholar
Shu-Hwa Chen
View author publications
You can also search for this author in PubMed Google Scholar
I.-Hsuan Lu
View author publications
You can also search for this author in PubMed Google Scholar
Chung-Yen Lin
View author publications
You can also search for this author in PubMed Google Scholar
Robert G. Chin
View author publications
You can also search for this author in PubMed Google Scholar
Harding H. Luan
View author publications
You can also search for this author in PubMed Google Scholar
Nguyen Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Stanley F. Nelson
View author publications
You can also search for this author in PubMed Google Scholar
Xinmin Li
View author publications
You can also search for this author in PubMed Google Scholar
Ting-Ting Wu
View author publications
You can also search for this author in PubMed Google Scholar
Ren Sun
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

N.C.W., A.P.Y. and R.S. designed the experiment, A.P.Y. created the plasmid library, N.C.W. conducted the experiments, R.G.C., S.F.N. and X.L. performed the sequencing, N.C.W. performed the data analysis, S.C., I.L. and C.L. assisted sequence mapping, L.Q.A., J.F., H.H.L. and N.N. provided experimental support, C.A.O., H.Q. and T.W. provided intellectual input. N.C.W., A.P.Y. and R.S. supervised the project, N.C.W. and R.S. wrote the text.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Supplemental Information

Rights and permissions

This work is licensed under a Creative Commons Attribution 3.0 Unported License. The images in this article are included in the article's Creative Commons license, unless indicated otherwise in the image credit; if the image is not included under the Creative Commons license, users will need to obtain permission from the license holder in order to reproduce the image. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/

Reprints and permissions

About this article

Cite this article

Wu, N., Young, A., Al-Mawsawi, L. et al. High-throughput profiling of influenza A virus hemagglutinin gene at single-nucleotide resolution. Sci Rep 4, 4942 (2014). https://doi.org/10.1038/srep04942

Download citation

Received: 10 March 2014
Accepted: 16 April 2014
Published: 13 May 2014
DOI: https://doi.org/10.1038/srep04942

This article is cited by

Major antigenic site B of human influenza H3N2 viruses has an evolving local fitness landscape
- Nicholas C. Wu
- Jakub Otwinowski
- Ian A. Wilson
Nature Communications (2020)
Determinants of Zika virus host tropism uncovered by deep mutational scanning
- Yin Xiang Setoh
- Alberto A. Amarilla
- Alexander A. Khromykh
Nature Microbiology (2019)
A complex epistatic network limits the mutational reversibility in the influenza hemagglutinin receptor-binding site
- Nicholas C. Wu
- Andrew J. Thompson
- Ian A. Wilson
Nature Communications (2018)
How single mutations affect viral escape from broad and narrow antibodies to H1 influenza hemagglutinin
- Michael B. Doud
- Juhye M. Lee
- Jesse D. Bloom
Nature Communications (2018)
The influenza virus hemagglutinin head evolves faster than the stalk domain
- Ericka Kirkpatrick
- Xueting Qiu
- Florian Krammer
Scientific Reports (2018)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.