The dynamic proteome of influenza A virus infection identifies M segment splicing as a host range determinant.

Pandemic influenza A virus (IAV) outbreaks occur when strains from animal reservoirs acquire the ability to infect and spread among humans. The molecular basis of this species barrier is incompletely understood. Here we combine metabolic pulse labeling and quantitative proteomics to monitor protein synthesis upon infection of human cells with a human- and a bird-adapted IAV strain and observe striking differences in viral protein synthesis. Most importantly, the matrix protein M1 is inefficiently produced by the bird-adapted strain. We show that impaired production of M1 from bird-adapted strains is caused by increased splicing of the M segment RNA to alternative isoforms. Strain-specific M segment splicing is controlled by the 3' splice site and functionally important for permissive infection. In silico and biochemical evidence shows that avian-adapted M segments have evolved different conserved RNA structure features than human-adapted sequences. Thus, we identify M segment RNA splicing as a viral host range determinant.


A century ago, influenza A virus (IAV) infection caused the 1918 flu pandemic and killed
an estimated 20-40 million people. Pandemic IAV outbreaks occur when strains from animal reservoirs acquire the ability to infect and spread among humans. The molecular details of this species barrier are incompletely understood. We combined metabolic pulse labeling and quantitative shotgun proteomics to globally monitor protein synthesis upon infection of human cells with a human-and a bird-adapted IAV strain. While production of host proteins was remarkably similar, we observed striking differences in the kinetics of viral protein synthesis over the course of infection. Most importantly, the matrix protein M1 was inefficiently produced by the bird-adapted strain at later stages. We show that impaired production of M1 from bird-adapted strains is caused by increased splicing of the M segment RNA to alternative isoforms. Experiments with reporter constructs and recombinant influenza viruses revealed that strain-specific M segment splicing is controlled by the 3' splice site and functionally important for permissive infection.
Independent in silico evidence shows that avian-adapted M segments have evolved different conserved RNA structure features than human-adapted sequences. Thus, our data identifies M segment RNA splicing as a viral determinant of host range.

INTRODUCTION
Influenza A viruses (IAVs) are negative-sense, single-stranded RNA viruses with a segmented genome. IAV infection causes seasonal epidemics and sporadically pandemic outbreaks in the human population with significant morbidity, mortality and economic burden. IAVs can infect both mammals (e.g. humans, pigs, horses) and birds (e.g. chicken, waterfowl). However, strains that are replicating in birds typically do not infect mammals and vice versa. Pandemics occur when influenza strains of avian origin with novel antigenicity acquire the ability to transmit among humans 1 . Understanding the molecular basis of host specificity is therefore of high medical relevance.
The species barriers that hinder most avian IAVs from successfully infecting humans are effective at several steps in the viral life cycle. For example, the avian virus receptor hemagglutinin (HA) recognizes oligosaccharides containing terminal sialic acid (SA) that are linked to galactose by α2,3 2 . In the human upper respiratory airway epithelium the dominant linkage is of α2,6 type, to which human-adapted hemagglutinin binds. Despite these differences in receptor binding, many avian viruses are internalized by human cells and initiate expression of the viral genome. Such infections typically lead to an abortive, nonproductive outcome in human cell lines [3][4][5][6] . Our understanding of this intracellular restriction is still incomplete. One well-established factor is the influenza RNA dependent RNA polymerase (RdRp): This enzyme catalyzes replication of the viral genome and transcription of viral mRNAs 7 . Polymerases from avian strains are considerably less active in mammalian cells than their counterparts from mammalian-adapted strains 8,9 . A wealth of experimental data described adaptive mutations that alter receptor specificity or fusion activity of HA (reviewed in 10 ) and polymerase activity (reviewed in 11 ). However, relatively little is known about the contribution of other Influenza A virus genes for permissive versus non-permissive infection 12 .
A crucial aspect for permissive infection is the correct timing of viral gene expression: IAV proteins are produced at the specific phase of infection when they are needed 13 . One example is the M gene, which encodes predominantly two polypeptides: The larger protein, M1, is produced from a collinear transcript. The smaller one, M2, is encoded by a differentially spliced transcript 14 . M1 is the matrix protein with multiple functions that encapsulates the viral genome and also mediates nuclear export 15,16 . M2 is a protonselective channel that is an integral part of the viral envelope 17,18 . The ratio of spliced to unspliced products increases during infection 19 , which reflects the changing demands required for optimal viral replication.
Systems-level approaches have provided important insights into the molecular details of host-virus interaction 20 . For example, RNAi screens identified host factors required for IAV replication [21][22][23] . Also, interaction proteomics experiments identified many cellular binding partners of IAV proteins [24][25][26] . A number of studies also quantified changes in protein abundance [27][28][29][30][31][32] . However, these steady-state measurements cannot reveal the dynamic changes in protein synthesis during different phases of infection. Early studies used radioactive pulse labeling to monitor protein synthesis in IAV infected cells 6,33 .
However, radioactive pulse labelling cannot provide kinetic profiles for individual proteins.
More recently, stable isotope labelling by amino acids in cell culture (SILAC) emerged as a powerful means to study the dynamic proteome 34 . SILAC-based pulse labelling methods such as pulse SILAC (pSILAC) and dynamic SILAC can quantify protein synthesis and degradation on a proteome-wide scale 35,36 . Moreover, metabolic incorporation of bioorthogonal amino acids such as azidohomoalanine (AHA) provides a means to biochemically enrich for newly synthesized proteins 37 . In combination with SILAC, AHA labeling can be used to quantify proteome dynamics with high temporal resolution [38][39][40][41] .
Here, we used metabolic pulse labeling and quantitative mass spectrometry to compare proteome dynamics upon infection of human cells with a human-adapted and a birdadapted IAV strain. We found that host proteins behaved surprisingly similar but observed striking differences in the production of viral proteins, especially for the matrix protein M1.
Follow-up experiments with reporter constructs, in silico studies and reverse genetics identified an evolutionarily conserved cis-regulatory element in the M segment as a novel host range determinant.

Quantifying the dynamic proteome of permissive and non-permissive IAV infection
To assess species specificity of influenza A viruses (IAVs) we used a model system comparing a low-pathogenic avian H3N2 IAV (A/Mallard/439/2004 -Mal) to a seasonal human IAV isolate of the same subtype (A/Panama/2007/1999 -Pan). While the avian virus is not adapted to efficient growth in cultured human cells and causes a nonpermissive infection, the seasonal human virus replicates efficiently. We demonstrated previously that the Pan virus produces >1,000 fold more infectious viral progeny than the non-adapted virus, even though both strains efficiently enter human cells and initiate their gene expression program 32 .
We reasoned that comparing the kinetics of protein synthesis upon infection with both strains might reveal determinants of species specificity. To this end we performed proteome-wide comparative pulse-labeling experiments by combining labeling with azidohomoalanine (AHA) and SILAC (Stable Isotope Labeling of Amino acids in Cell culture) ( Figure 1A): Cells incorporate AHA instead of methionine into newly synthesized proteins when the cell culture medium is supplemented with this bioorthogonal amino acid 42 . AHA contains an azido-group which can be used to covalently coupled AHA-containing proteins to alkyne beads via click-chemistry. In this manner, newly synthesized proteins can be selectively enriched from the total cellular proteome. Combining AHA labeling with SILAC reveals the kinetics of protein synthesis with high temporal resolution 39,41 . First, we fully labeled human lung adenocarcinoma cells (A549) using SILAC. Second, individual cell populations were infected with either Pan or Mal virus or left uninfected.
Third, all cells were pulse labeled with AHA for four hours during different time intervals post infection (0-4, 4-8, 8-12 and 12-16 hrs). The three cell populations for every time interval were then combined, lysed, and AHA-containing proteins were enriched from the mixed lysate using click chemistry ( Figure 1B). After on-bead digestion, peptide samples were analyzed by high resolution shotgun proteomics.
We quantified proteins using two readouts: (i) SILAC-based relative quantification to assess differences in de novo protein synthesis and (ii) intensity-based absolute quantification (iBAQ) to quantify absolute amounts of newly synthesized proteins 43 . Our data thus provides kinetic profiles for relative and absolute differences in de novo protein synthesis across the course of infection (Supplementary Table S1). In total, we identified 7,189 host and 10 viral proteins and quantified 6,019 proteins in at least two biological replicates with overall good reproducibility (Supplementary Figure 1).

The dynamic host proteome
It is well established that IAV induces a global reduction in the production of host proteins.
This host shutoff was attributed to a plethora of viral effector functions 44 . To assess the host shutoff in our proteomic data, we investigated iBAQ values for viral and host proteins.
As expected, viral proteins were potently induced while the production of host proteins decreased over time ( Figure 1C-D). The difference between host and viral protein synthesis reached several orders of magnitude and was highest during the 8-12 h pulse interval. Moreover, the total cellular protein output dropped to ~24 % (Pan) or ~30 % (Mal) at later stages of infection, of which ~20-40 % was of viral origin ( Figure 1E-F). At this level of detail, we observed no major differences between both strains. Thus, both strains initiate viral protein synthesis and induce the shutoff of host protein synthesis to an overall similar extent.
Next, we investigated the profiles of individual host proteins across the course of infection.
For this, we directly looked at SILAC ratios comparing infected and non-infected cells ( Figure 2A). As expected, synthesis of the vast majority of host proteins markedly decreased over time. However, some proteins were less affected by the host shutoff and displayed only a mildly decreased or even increased production. To assess this observation more systematically, we selected the proteins that were least affected by the shutoff at different pulse periods and performed gene ontology (GO) analysis. The heatmap of enriched GO terms provides a global overview of biological processes as the infection progresses ( Figure 2B and Supplementary Table S2). For example, many wellknown interferon-induced antiviral defense proteins (e.g. MX1, several IFIT proteins, several oligoadenylate synthase proteins) were relatively strongly produced at late stages of infection. Also, many ribosomal proteins (GO term "peptide chain elongation") largely escaped the host shutoff. Interestingly, we also observed significant enrichment of proteins involved in steroid metabolism and mitochondrial proteins (mito-ribosomal, respiratory chain proteins) at early and intermediate stages of infection, respectively. Cellular responses to infection with the Pan and Mal strain were overall similar. To assess potential differences between permissive and non-permissive infection we compared protein log2 fold changes between both viruses directly ( Figure 2C). Interestingly, type I interferon response proteins were first preferentially produced during non-permissive infection. At later stages, however, infection with the Pan virus elicited a stronger interferon response.
Several different hypotheses were made to explain the IAV induced host shutoff. This Traditionally, IAV is thought to prioritize the translation of viral over host mRNAs 48,49 , but more recent experimental and computational analyses challenge this view 47,50 . We investigated this question by calculating protein synthesis efficiencies (i.e. the amount of protein made per mRNA). To this end, we divided iBAQ values by corresponding RPKM values ( Figure 2F). Infection with both strains reduced host protein synthesis efficiencies compared to uninfected controls. Importantly, we did not observe preferential translation of viral transcripts. Instead, viral proteins were even less efficiently synthesized than host proteins in both strains. This suggests that mRNAs from human and avian Influenza virus strains access the translational machinery with comparable efficiency, which argues against the idea that modulation of translation efficiency affects species specificity.

Dysregulated synthesis of viral proteins
Since the observed differences in host protein synthesis were surprisingly subtle we focused our attention to the dynamics of viral protein synthesis. Production of most viral proteins peaked in the 8-12 hour period (see Supplementary Figure 2). The kinetics such as the early production of NS1 and NP and delayed synthesis of M1 is consistent with classical radioactive pulse labeling experiments 33 . We then used SILAC ratios of shared peptides (that is, peptides with sequence identity between both strains) to precisely compare the kinetics of viral protein synthesis ( Figure 3A-B). We found that the avian strain produced higher amounts of all viral proteins at the beginning, confirming that the Mal virus successfully enters cells and initiates its gene expression program. Later on, during mid to late phases, the human Pan virus produced most proteins more abundantly than the avian strain. Note that NS1 and M2 are excluded in this analysis because no identical peptides were identified.
It is well-established that the RNA dependent RNA polymerase (RdRp) from avianadapted IAV strains is less active in mammalian cells 8,9 . Thus, we would have expected the production of all viral proteins in the bird-adapted strain to be reduced to a similar extent. In contrast, we observed striking differences in the synthesis of individual proteins: Hemagglutinin (HA) was more abundantly produced by the avian strain throughout infection. In contrast, neuraminidase (NA) and particularly matrix protein M1 were stronger produced by the human strain at later stages. These differences in the production of individual viral proteins cannot be explained by the global difference in RdRp activity between strains. Thus, the avian strain displays dysregulated protein production relative to its human counterpart ( Figure 3A).
We focused our attention on the M1 protein since it showed the largest difference between both strains. The protein is highly conserved between Pan and Mal (~96% amino acid identity) and the most abundant protein in virions 51 . Moreover, M1 is known to mediate export of the viral genome across the nuclear membrane -an essential step during permissive infection 15,16 . Thus, accumulation of M1 at late stages of infection is required for the appearance of viral ribonucleoproteins (vRNPs) in the cytoplasm of infected cells.
Interestingly, when investigating the subcellular distribution of the viral nucleoprotein (NP) by immunofluorescence microscopy, we observed efficient export during infection with the Pan strain ( Figure 3C). In contrast, NP was inefficiently exported and accumulated in the nucleus upon Mal infection. These microscopy data is also corroborated by the increased interferon response induced by the Pan strain at later stages of infection ( Figure 2C), which is stimulated by cytosolic viral RNA sensors 52 . We conclude that nonpermissive infection correlates with reduced M1 production and impaired nuclear export of NP.

Non-permissive infection is characterized by increased M1 mRNA splicing
We next sought to investigate the mechanism for the impaired M1 production. To this end, we first quantified the levels of viral mRNAs from our RNA-seq data. In total, the avian virus produced ~⅔ of the mRNA of the human strain with the single largest difference observed for M1 ( Figure 4A). The strain-specific differences in M1 mRNA levels were very similar to the observed differences in M1 protein production ( Figure 4B). Hence, the impaired M1 protein production during non-permissive infection can largely be explained by reduced M1 mRNA levels.
M1 is encoded on segment 7 (that is, the M segment), which is the most conserved segment between Pan and Mal (~89 % nucleotide identity). The M1 protein is produced from a collinear transcript that can be alternatively spliced into three additional isoforms which all use a common 3' splice site 53,54 : the M2 mRNA, which encodes the ion channel M2 17 , RNA 3 which is not known to encode a peptide, and M4 mRNA that is proposed to be translated to an isoform of the M2 ion channel in certain strains 55 . We investigated the relative proportion of these isoforms in the RNA-seq data via splice junction reads.
We detected all known isoforms plus a novel transcript of the avian M segment, which we call RNA 5. This transcript results from splicing at 5' donor GG site (pos 520/521) and the common 3' acceptor site and contains an ORF in-frame with M1 with a missing internal region ( Figure 4C).
While only a few percent of the M1 mRNA was alternatively spliced during permissive infection, ~⅓ was spliced upon infection with the avian strain ( Figure 4D). Thus, the reduced level of M1 mRNA during non-permissive infection is at least partially due to increased splicing of the M1 mRNA to alternative isoforms. To validate these data we assessed the kinetics of M1 and M2 mRNA levels during infection via qRT-PCR ( Figure   4E). This confirmed the reduced levels of M1 mRNA in the avian strain, especially at later stages. In contrast, M2 mRNA levels were overall similar throughout infection. We note that the comparable M2 mRNA level during non-permissive infection results from two opposing processes --the increased splicing of the primary transcript to the M2 mRNA, and the global reduction in viral transcripts, which is probably due to the impaired polymerase activity 8,9 . We conclude that M1 splicing is markedly different in permissive versus non-permissive infection.

Difference in M1 mRNA splicing is determined by a cis regulatory element at the 3' splice site
The differences in M1 mRNA splicing can be due to (i) cis-regulatory elements (that is, specific signals encoded in the M segment), (ii) trans-acting factors (that is, other viral or host factors that interact with M1 mRNA) or (iii) a combination of both. To assess whether cis regulatory elements are involved, we sought to investigate M1 splicing outside the context of infection. We therefore designed a splicing reporter system ( Figure 5A-B). To this end, we cloned the coding region of the M segment (nt 29-1007) into a eukaryotic expression vector and fused it to an N-terminal Flag/HA tag. Importantly, this construct avoids the strong 5' splice site of mRNA 3 56 and enabled us to assess the relative levels of M1 to M2 proteins and mRNAs. When we transfected human A549 cells with these reporter constructs we found that M2 was produced to high levels with the construct containing the Mal M sequence but was barely detectable when the Pan M sequence was transfected ( Figure 5C). Thus, our reporter system recapitulates splicing differences observed during infection. We conclude that cis-regulatory elements in the M segment cause excessive splicing of the avian variant.
To determine the sequence responsible for the strain-specific splicing we made chimeric reporter constructs ( Figure 5B). When swapping the entire intron sequence of the M2 splice variant (nucleotides 52-739, corresponding to ~70% of the CDS), we did not observe major changes in the relative amount of M1 to M2. In contrast, integrating the human 3' splice site region (nucleotides 707-779, 73 nucleotides) into the avian construct strongly impaired splicing down to the levels of the human wild type construct.
Conversely, when we integrated the avian 3' splice site region into the human construct we observed a strong increase in splicing, similar to the avian wild type construct ( Figure   5C). To validate these results at the mRNA level we used qRT-PCR. Again, we found that the splice site region alone is sufficient to switch the species-specific splicing phenotype ( Figure 5D). Interestingly, this region has been reported to contain an RNA secondary structure 57 and a binding site for the splicing factor SRSF1 58 . We conclude that a cis regulatory element in the splice site region determines the strain-specific splicing pattern.

An RNA hairpin that spans the 3' splice site is evolutionary conserved in avian but not in human-adapted M segments
We next wanted to assess whether our findings are also relevant for other human-or birdadapted IAVs. Specifically, we sought to identify functionally relevant RNA secondary structures that have been conserved during evolution of avian-and human-adapted IAVs.
To this end, we analysed multiple sequence alignments from hundreds of recent human and avian H3N2 isolates using the RNA structure prediction program RNA-Decoder 59 (see Material and Methods). This program is capable of dis-entangling overlapping evolutionary constraints due to encoded amino acids and RNA structure features and has been shown to successfully identify evolutionarily conserved RNA structures overlapping protein-coding regions, e.g. in viral genomes such as hepatitis C and HIV 59,60 .
Importantly, RNA-Decoder captures evidence on conserved RNA structure based on the evolutionary signals encoded in the sequences of the input alignment. This is a key advantage over computational methods that identify RNA structures based on their thermodynamic stability in vitro, as these methods assume that the RNA has no interactions with other molecules (e.g. proteins and other RNAs) in vivo. Also, RNA-Decoder employs a probabilistic framework which is capable of estimating the reliability of its predictions.
The RNA secondary structure that is best supported by the evolutionary signals in the two multiple sequence alignments (the so-called maximum-likelihood structure) markedly differs between human and avian strains, particularly in the region around the 3' splice site ( Figure 5E): The avian region encodes a hairpin-like structure ( Figure 5F reported here leaves the GC-motif immediately downstream of the AG consensus at the 3' splice site unpaired, making it potentially more accessible to splicing. We conclude that the M segment of avian and human-adapted H3N2 isolates contain evolutionarily conserved RNA secondary structures that markedly differ in exactly the region that is critical for strain-specific splicing. In addition to these computational analyses, we also wanted to test the relevance of our findings for other IAV isolates experimentally. The M segment of the seasonal Pan strain originates from the M segment of the A/Brevig Mission/1/1918 (p1918) virus, which is at the evolutionary root of human strains and caused the 1918 "Spanish flu" pandemic 61 .
Therefore, we cloned the M segment of p1918 into our reporter vector (Supplementary Figure 4). Again, we observed inefficient splicing of the p1918 M gene, consistent with our data for the Pan strain and previous reports 62 . Moreover, integration the Mal 3' splice site region into the p1918 gene increased splicing. Thus, inefficient splicing of the M gene in human-adapted IAVs occurs in a seasonal (Pan) and a pandemic (p1918) strain.

The 3' splice site is a host range determinant
The experiments with reporter constructs described above are advantageous because they allow us to study the impact of M segment sequence features in isolation.
Nevertheless, it is also important to assess the relevance of these findings during infection. We therefore mutated eight nucleotides in the splice site region of the Pan wild type strain to the corresponding nucleotides in the Mal strain using reverse genetics ("Pan-Av" for a Pan strain with an avian splice site region, see Figure 6A and Supplementary Table S3).
We first compared the kinetics of viral protein synthesis upon infection of A549 cells with both strains using pSILAC 35 . M1 synthesis was selectively impaired during Pan-Av infection during both the 6-12 and 12-18 hpi time intervals. At later stages, the Pan-Av strain also showed impaired production of other essential viral proteins ( Figure 6B), suggesting that viral replication is also impaired. Next, we quantified M1 and M2 protein ( Figure 6C) and mRNA levels ( Figure 6D). The Pan-Av strain displayed decreased M1 protein and mRNA levels, mimicking the behaviour of the Mal strain (compare also Figure   4).
To assess the impact of M segment splicing on IAV replication in human cells, we assessed the growth characteristics of the different viruses ( Figure 6E). As expected, the Pan strain reached ~1,000 fold higher titers than the Mal strain. Exchanging the entire M segment of the Pan strain with the M segment of the Mal strain (Pan + Mal M) reduced titers about 10-fold. Importantly, a similar ~10 fold attenuation was also seen in the Pan-Av strain that only differs from the Pan strain by 8 nucleotides. We conclude that the 3' splice site of the IAV M segment is indeed an important host range determinant.

DISCUSSION
Advances in high-throughput sequencing have provided insights into the extraordinary diversity of viruses and their genomic determinants of host adaptation. However, the mechanism how these adaptive mutations enable replication in a given host is less understood. Our proteomic pulse labelling data allowed us to take an unbiased look at protein synthesis upon permissive and non-permissive infection. We found that the synthesis profiles of host cell proteins were remarkably similar. Hence, the outcome of infection does not appear to depend on a specific host response. In contrast, we observed Our global assessment of protein de novo synthesis upon infection revealed a global reduction in overall protein output during infection with both strains, probably reflecting a global stress response. Also, we observed the well-known shutoff of host protein synthesis 44,[46][47][48] . Specific classes such as interferon-related, ribosomal and mitochondrial proteins escaped the shutoff. We observed that the amount of protein synthesis upon infection primarily depends on mRNA levels. Thus, altered translation does not play a major role for the host shut-off, consistent with recent findings 47 . Surprisingly, we also found that viral transcripts were not more efficiently translated than host transcripts.
Instead, their translation efficiency (that is, the amount of protein made per mRNA) was even lower than for host proteins. This contrasts with early studies based on reporter systems 48 but corroborates recent ribosome profiling data 47 . Our finding is also consistent with the fact that the codon usage of IAV genes is not optimized to reflect the codon usage of the host 50 . It is also interesting that the translation efficiency of the bird-and the human-adapted strains was similarly poor. Thus, adaptation towards high translational efficiency does not seem to be required for crossing the species barrier.
Our unbiased proteomic analysis indicates that the differences between permissive and non-permissive infection depend on differences in viral rather than host protein synthesis.
Hence, the orchestrated synthesis of the viral proteome appears to be critically important for permissive infection. This supports the emerging view that modulation of viral protein synthesis underpins host adaptation 63 . Specifically, we find that the strain-specific differences in M1 protein synthesis critically depend on a conserved cis-regulatory element, which controls M-segment mRNA splicing. M1 is particularly important for the nuclear export of the viral genome to the cytoplasm 15,16 . Consistently, we observed that the genome of the bird-adapted strain was inefficiently exported ( Figure 3C).
We found that exchanging only eight nucleotides of the human-adapted M segment to the bird-adapted sequences markedly impaired viral replication. Hence, the cis-regulatory element described here plays an important role for host adaptation. However, it is critical to also emphasize that this is not the only relevant factor for IAV host range. For example, despite the overall similar host response, we and others have previously described host factors affecting human and avian virus infections [20][21][22][23]25,32 . It is also well-established that the RdRp of avian-adapted strains is less active in human cells 8,9 . Moreover, differences in the binding specificity of viral hemagglutinins (HA) are known to play an important role for host adaptation 10 . Lastly, M-segment splicing does not only depend on cis-regulatory elements but also on trans-acting factors, such as NS1, RdRp, NS1-BP or HNRNPK 56,[64][65][66] . Indeed, while M1 production was clearly impaired in our mutant strain ( Figure 6B), the wild-type bird-adapted strain produced even less ( Figure 3A). It is therefore important to interpret our findings in the broader context of viral and host factors that jointly determine the success of IAV replication.
Splice junction reads for various M transcripts were counted with customized Perl scripts.
Splice isoforms were accepted that had >500 read counts in both replicates.  These two input alignments (including the combined annotation of the known proteincoding M1 and M2 regions) and the corresponding evolutionary trees were then used as input to RNA-Decoder 59 . We used RNA-Decoder to identify the RNA secondary structure that is best supported by the evolutionary signals contained in the two input alignments (the so-called maximum-likelihood structure). The predictions by RNA-Decoder also included the posterior base-pairing probabilities for each base-pair of the predicted RNA structure. Predicted base-pairs with a base-pairing probability smaller than 25% were omitted from the RNA structure visualization. Note that each multiple sequence alignment was analysed by RNA-Decoder in one chunk, i.e. without partitioning it artificially into subalignments.
Finally, the predicted RNA structure element nt 733-766 was plotted with the sequence of the Mal strain using VARNA tool 84 the RNA structures predicted for the two alignments of avian-and human-adapted sequences was visualized using R-chie 85 including information on the pairing probability of each base-pair.
Nucleotide polymorphisms. For calculating percentages of nucleotide identities at the 3' splice site we used the avian and human-adapted sequences as described above.           Figure S4