Structures and functions linked to genome-wide adaptation of human influenza A viruses

Human influenza A viruses elicit short-term respiratory infections with considerable mortality and morbidity. While H3N2 viruses circulate for more than 50 years, the recent introduction of pH1N1 viruses presents an excellent opportunity for a comparative analysis of the genome-wide evolutionary forces acting on both subtypes. Here, we inferred patches of sites relevant for adaptation, i.e. being under positive selection, on eleven viral protein structures, from all available data since 1968 and correlated these with known functional properties. Overall, pH1N1 have more patches than H3N2 viruses, especially in the viral polymerase complex, while antigenic evolution is more apparent for H3N2 viruses. In both subtypes, NS1 has the highest patch and patch site frequency, indicating that NS1-mediated viral attenuation of host inflammatory responses is a continuously intensifying process, elevated even in the longtime-circulating subtype H3N2. We confirmed the resistance-causing effects of two pH1N1 changes against oseltamivir in NA activity assays, demonstrating the value of the resource for discovering functionally relevant changes. Our results represent an atlas of protein regions and sites with links to host adaptation, antiviral drug resistance and immune evasion for both subtypes for further study.


Introduction
Infections with influenza A viruses cause short term respiratory infections with drastic health burdens and economic losses. Recurrent epidemics cause up to 650.000 deaths and 3 to 5 million cases of severe illness per year worldwide 1,2 . Two influenza subtypes are currently endemic in the human population: H3N2 influenza A (H3N2) viruses circulate since 1968 and H1N1 influenza A (pH1N1) viruses circulate since 2009 3-6 . Both lineages descend from pandemic outbreaks of reassortant viruses with segments of human and zoonotic origins 5 . The influenza genome consists of eight segments 3 and encodes fourteen known proteins 7-10 .
Reassortant viruses inherit viral segments from different viruses co-infecting the same cell, which, in case a novel HA is introduced, meet a largely naïve human population and thus can spread widely, causing a pandemic 3 . For H3N2 viruses, the polymerase basic protein 2 (PB1) and hemagglutinin (HA) segments are of avian origin, with evidence of adaptive processes, and the remaining ones originate from the previously circulating human H2N2 subtype 6,11 . The pH1N1 virus arose from a reassortment of three porcine viruses 5 , creating a virus with some segments of prior avian origins (NA, M; and another lineage with PB2 and PA), some distantly related to a human lineage circulating until 1998 (HA, NP, NS) and some descending more recently from humans (PB1) 5 .
Human influenza A viruses are the prime example of a rapidly evolving pathogen engaging in a co-evolutionary arms race with the host adaptive immune defenses. Due to their rapid evolution, the antigenic change of the major antigen to the humoral (B-cell) immune response, the surface protein HA, is monitored across the years 2,12-16 . Every couple of years, as susceptibility of the population becomes low due to prior infection or vaccinations, antigenically novel strains emerge and predominate in seasonal epidemics 17 . An antigenic match of vaccine viruses to the circulating viral population is achieved by regular updates of the vaccine composition 17,18 .
Conversely, on the host side, upon infection or vaccination, specific antibodies are produced, recognizing B-cell epitopes (BCE) on HA, neuraminidase (NA) and matrix protein 2 (M2) and interfering with their function. In addition, cell-mediated immunity evokes T-cell responses against small peptides of internalized viral proteins, representing T-cell epitopes (TCE), which are digested and presented on the cell surface by the major histocompatibility complex (MHC). The innate immune system also recognizes pathogen-associated molecular patterns (PAMPs) as determinants of viruses and activates cellular antiviral responses, like activation of interferon (IFN)-induced proteins, to counteract viral replication 19,20 . In addition, human influenza viruses evolve towards resistance against antivirals 21 and further adaptations to their host after initial establishment occur 5 .
Evolutionary processes such as adaptation to specific environmental conditions can be studied on the level of genes, individual sites or even protein regions 15,16,[22][23][24][25][26][27][28][29] . A widely used method determines the rates of non-synonymous to synonymous changes ( ratio) in a phylogeny 13,30 . A significant excess of to , or , provides evidence for positive selection, assuming that synonymous changes are neutral. This indicates that adaptation is taking place and that the changes of the respective genetic elements have led to a more favorable phenotype 31 . Methods measuring should best be applied to analyze selection across, not within populations, and temporal influenza data may be considered as a series of independently evolving populations 32 .
While H3N2 viruses have been circulating for almost 50 years and presumably to do not require more adaptation to the human host, the recent introduction of pH1N1 viruses presents an excellent opportunity for a comparative analysis of the genome-wide evolutionary forces acting on both subtypes. We here sought to determine protein regions with signs of positive selection in both subtypes, and correlate them with known functional properties, to get a better understanding of the forces shaping the viral genome-wide evolution. We studied all available longitudinal sequence data for H3N2 and pH1N1 viruses collected since 1968 and 2009, respectively, across all eleven available protein structures. Though values have been analysed already for sites 33 or entire proteins of H3N2 34 and pH1N1 35 , to our knowledge this is the first comparative and compressive study of the structures and functions linking to the genome-wide adaptation of both circulating influenza A subtypes.

Structure modeling
We analyzed the polymerase proteins PB2, PB1 and PA, the HA subunits 1 and 2, NP, NA and the splice-variants M1 and M2, as well as NS1 and NS2 for influenza subtypes H3N2 and

Phylogenetic analysis
For each protein, we downloaded the amino acid sequences and the corresponding coding sequences from the NCBI flu database 38 . We removed identical coding sequences, which would accumulate on the same leaf in a phylogenetic tree because they contribute no additional information and removing significantly speeds up the calculation of alignments. The following steps were applied for each protein, individually: We generated a multiple sequence alignment for amino acid sequences using muscle 39  of . The edges are weighted by a distance function were calculates the euclidean distance between the -atoms of a residue pair and . Nodes with a close spatial proximity therefore have a strong connection. We add two additional nodes to the graph, the positive selection node ( ) and the negative selection node ( ). Each is connected to with and to with ̅ . Sites are separated into a positive or a negative selection set by a minimum graph cut approach. The cut divides the graph into two halves by minimizing the sum of weights from edges connecting these halves with a cost function: In a final step, sites in the positive selection set are merged into patches together with neighboring sites within a distance of . reflects how stable a patch appears in the samples. When the value approaches zero, the patch tends to disappear, a value of one indicates that the patch is stable and a value larger than one indicates an unstable patch that breaks apart in the subsamples. In addition, we provide the average value per patch.

NA activity assays
The

Global trends
We calculated values for all proteins of pH1N1 and H3N2 viruses (Supplementary Figure   1). Notably, the HA1 subunit, historically considered the main driver of adaptation of seasonal influenza viruses 12,13 , did not show the most evidence for positive selection. It was ranked fifth for both subtypes when jointly analyzing all time periods, as indicated in analyses on smaller data sets 34,35 . Instead, NA had the highest mean value for both subtypes, followed by M2, NS1 and NS2. NP and the PB1 proteins had the lowest mean value in both subtypes (Supplementary Table 3). We confirmed this finding by re-calculating the statistics using the Suzuki-Gojobori counting approach implemented in HyPhy SLAC 48 (Supplementary Table   3). The Kolmogorov-Smirnov-test (KS-test; distribution of pH1N1 protein is smaller than distribution of H3N2 protein) showed that the global distributions of pH1N1 proteins were significantly larger than those of H3N2 proteins, except for HA2, M2 and NS1 (Supplementary Table 4). This was especially apparent for the polymerase proteins PB2, PB1 and PA, which are of known importance for host adaptation 49 . We hypothesize that positive selection was acting on substantially more regions for pH1N1 than for H3N2 viruses recently, due to their requirement to further adapt to the new host population, which the H3N2 virus likely has undertaken in the early years of its circulation since the 1968 pandemic, for which little data is available.

Adaptive immune evolution
We investigated which patches are located in regions that act as BCE or TCE and thus play a role in evading the host's immune response. In addition to extracellular regions of HA and NA, the M2 ion channel can be partly recognized as a BCE, too 33  of adapatch sites overlap with the patch sites of this study. In turn, thirty ( ) sites were newly clustered into patches, mostly due to inclusion of both exposed and, newly, also  Figure 1C). In total, percent of pH1N1/HA1 sites are located in patches.
Fifteen of twenty-five "adapatch" sites overlap with novel patch sites ( ) and forty of fifty-five novel patch sites are no adapatch sites ( ). In comparison to H3N2/HA, this is a larger discrepancy that cannot only be explained by the inclusion of buried sites. A second reason is the longer time period for which we could include pH1N1 virus in this study. of sweeprelated sites are also patch sites, which is a larger overlap compared to H3N2/HA1 data.  Figure 2B). For pH1N1/HA1 we observe almost identical overlaps of patch sites with TCE sites ( ) and BCE sites ( ). The low percentage of TCE sites in patches for pH1N1/HA1 is similar to H3N2/HA1 results, but, the overlap with BCE sites is lower compared to H3N2 data. This likely is also indicative of pH1N1 viruses not having to evade pre-existing immunity yet as much as H3N2 viruses and that most positive selection observed for HA does not result from humoral immune evasion.
Patch sites of H3N2/NA match reported BCE sites 60 . While the overlap of patch sites with TCE for H3N2/NA is , there is no overlap for pH1N1/NA with known TCE sites, indicating that there was little immune evasion of TCR of NA ( Figure 1B&D). We could not investigate for H3N2/M2 and overlap of BCE and TCE sites with patch sites, because the former are located in regions not covered by our homology model (Supplementary Table 2).

Resistance evolution
The ion channel of M2 is targeted by the antivirals amantadine and rimantadine, which are used to treat influenza infections 21 . We found a strong link between patch sites and antiviral  Figure 2C). Resistance sites 198 and 246 are next to patch 4 and patch 5 in pH1N1/NA, respectively ( Figure 2D). In order to assess whether the predicted sites confer resistance against NA inhibitors, such as oseltamivir, we experimentally investigated positions 199 and site 247 of patch 4 and patch 5, respectively, for a role in generating oseltamivir resistances in NA activity assays. We show that amino acid substitutions in NA from D to N at position 199 and from S to N at position 247 considerably reduced the inhibitory effect of oseltamivir up to 30% neuraminidase activity upon treatment with 100nm oseltamivir (Figure 4). The ability of these mutations to confer oseltamivir resistance was further increased upon treatment with 1000nm oseltamivir concentrations, further reducing NA activity to 5% and 12%, respectively.
In pH1N1/NA, patches 2 and 7, which also include the sweep-related site 117 and 248 16 , respectively, and patch 9 overlap or are vicinal to the active center of NA 75-77 . For H3N2/NA, conversely, this was the case for patch 2, which includes active site 151. The patches in NA without links to resistances or immune responses could be relevant for host adaptation, i.e. for establishing its catalytic function in the human host, to establish cleavage activity on an altered substrate range, as pH1N1 viruses had many more patches than H3N2 viruses. We also found two patches close to known mammalian adaptation sites: site 100 in patch 2 16,80,81 and site 319 next to patch 6 82 . Site 319 is also in proximity to sites 373 and 100 implicated in selective sweeps ( Figure 5D). The region including patch 2 and patch 6 thus may well be implicated in human adaption in influenza A viruses ( Figure 2E), especially, because they have the largest average value ( ) among all NP patches (Supplementary Table   6). Further, of three NP sites (100, 283, 313) relevant for escape recognition by intracellular restriction factor MxA, 313 is located in a beta-sheet adjacent to patch 6 and in vicinity of patch and 362 (proximal to patch 5) ( Figure 2F&3C) 16, 85 . In addition, patch 10, which excludes a sweep-related site, is vicinal to site 336 and has an average value of ( Figure 2F).

Host adaptation of the RNP proteins
Other than for H3N2 viruses, for which we detected a single patch next to 336 (patch 2), an intensified signal of positive selection in this region in pH1N1 viruses emphasizes its relevance for further viral adaptation to the human population. For pH1N1, patch 11 suggest further adaptation of polymerase activity, as it is located next to sweep-related site 321, which is known to increase viral polymerase activity 16, 86 . The linker region of PA (257-277) is a crucial in the PA and PB1 interaction 49,87 and includes patch 7 in pH1N1 viruses, but none in H3N2 viruses. This patch has a very strong signal with a value of (Supplementary Table 6 H3N2 includes site 375, while no patch of pH1N1 included these sites, which include typical mammalian adaptation residues 89,90 . As pH1N1/PB1 previously circulated in the human population for 30 years and re-entered the human population after circulating for 10 years in swine, the segment is likely already adapted to mammals and humans 5 . The centrally located RNA-dependent RNA polymerase activity motifs of PB1 are highly conserved 49 , with patch 3 in H3N2 viruses and in pH1N1 viruses located in proximity ( Figure 5B).

Discussion
It is well established that the surface antigens of human influenza A viruses evolve under positive selection to escape immune recognition, in an ongoing co-evolutionary arms race with the human adaptive immune response. However, much less is known about the relevance of other proteins in this co-evolutionary arms race and which other factors shape the evolutionary trajectories of these viruses. There is emerging evidence that the pH1N1 virus since 2009 also has acquired changes improving its adaptation to the human host 15,81, 86 . The recent introduction of the pH1N1 virus into the human population allows to study the evolutionary forces acting on these viruses in comparison to the H3N2 viruses, which have circulated for almost five decades.
To analyze these effects, we

Software & Data Availability
The

Competing interests
The authors declare no competing interests.           Overview of all patches and patch sites for each protein of pH1N1 and H2N3 viruses. Sweep-related sites are marked in bold.     This is a detailed overview providing the precision, recall, F-score, stability and the average of all patches.