Influenza virus can escape most antibodies with single mutations. However, rare antibodies broadly neutralize many viral strains. It is unclear how easily influenza virus might escape such antibodies if there was strong pressure to do so. Here, we map all single amino-acid mutations that increase resistance to broad antibodies to H1 hemagglutinin. Our approach not only identifies antigenic mutations but also quantifies their effect sizes. All antibodies select mutations, but the effect sizes vary widely. The virus can escape a broad antibody to hemagglutinin’s receptor-binding site the same way it escapes narrow strain-specific antibodies: via single mutations with huge effects. In contrast, broad antibodies to hemagglutinin’s stalk only select mutations with small effects. Therefore, among the antibodies we examine, breadth is an imperfect indicator of the potential for viral escape via single mutations. Antibodies targeting the H1 hemagglutinin stalk are quantifiably harder to escape than the other antibodies tested here.
Nearly all viruses show some antigenic variation. However, the extent of this variation ranges widely. For instance, although both measles virus1,2 and polio virus3,4,5 exhibit antigenic variation, the magnitude of this variation is small. Therefore, immunity of these viruses is lifelong6,7. In contrast, human influenza virus exhibits much more antigenic variation. So although infection with an influenza virus strain provides long-term immunity to that exact strain8,9,10, the virus’s rapid antigenic evolution erodes the effectiveness of this immunity to that strain’s descendants within ∼5 years11,12.
One possible reason that viruses exhibit different amounts of antigenic variation is that they have disparate evolutionary capacities to escape the immunodominant antibodies generated by natural immune responses13,14,15. According to this explanation, human influenza virus undergoes rapid antigenic drift because most neutralizing antibodies target epitopes on the viral hemagglutinin (HA) protein that are highly tolerant of mutational change. This explanation is supported by classic experiments showing that it is easy to select viral mutants that escape most antibodies16,17, as well as by the observation that mutations that alter antigenicity arise frequently during influenza’s evolution globally18,19,20,21,22 and within individual humans with long-term infections23. A corollary of this explanation is that influenza virus’s capacity for antigenic drift would be reduced if most antibodies instead targeted epitopes that were less mutationally tolerant.
Verifying this corollary has become of practical importance with the discovery of broadly neutralizing antibodies against influenza virus. These antibodies typically target conserved epitopes in HA’s stalk24,25,26 or receptor-binding site27,28,29, and neutralize a wide range of viral strains. Broad antibodies are usually less abundant in human serum than antibodies to antigenically variable epitopes on the head of HA30,31. However, major efforts are underway to elicit broad antibodies by vaccination or administer them directly as therapeutics32,33.
If these efforts succeed, the epitopes of broad antibodies could come under stronger antigenic selection in human influenza virus. Might such selection then drive antigenic variation in these epitopes? There is precedent for the idea that the immune status of the host population can shape influenza virus evolution: the virus undergoes faster antigenic drift in long-lived humans that accumulate immune memory than in short-lived swine that are mostly naive34,35, and poultry vaccination may accelerate antigenic drift of avian influenza36,37. But alternatively, perhaps broad antibodies are broad because the virus has difficulty escaping them regardless of selection from host immunity.
So far, there is limited data to distinguish between these possibilities. Several studies have shown that the head domain of HA is more mutationally tolerant than the stalk domain where many broad antibodies bind38,39,40. However, these studies did not select for antibody escape, so it is difficult to relate their measurements to the virus’s evolutionary capacity under immune selection. Other work has shown that it is possible to select antigenic mutants with broad antibodies41,42,43,44,45,46, demonstrating that these epitopes are not entirely refractory to change. But given that antibodies can select some antigenic variation even in measles virus1,2 and polio virus3,4, the existence of selectable mutations does not necessarily imply that influenza virus can escape broad antibodies as easily as it drifts away from narrow strain-specific ones. The fundamental problem is that existing studies have not quantified the ease of viral escape in a way that can be compared across antibodies in an apples-to-apples fashion.
Here we systematically quantify the results of selecting all single amino-acid mutations to an H1 HA with several broad and narrow antibodies. Critically, our approach quantifies the magnitude of the antigenic effect of every mutation in a way that can be directly compared across antibodies. We find that even the broadest antibodies select antigenic mutations. However, the magnitudes of the antigenic effects vary greatly across antibodies. Single mutations make the virus completely resistant to both narrow strain-specific antibodies and a broad antibody that targets residues in HA’s receptor-binding site. But no single mutation does more than modestly increase the virus’s resistance to two broad antibodies against the HA stalk. Therefore, broad anti-stalk antibodies are quantifiably more resistant to viral escape via single amino-acid mutations than the other antibodies tested here.
Fraction of each viral mutant that escapes neutralization
We can visualize the outcome of antibody selection on viral populations containing antigenic mutations as in Fig. 1. If a mutation strongly escapes neutralization, then all virions with this mutation survive antibody treatment at a concentration where other virions are mostly neutralized (Fig. 1a). This escape is manifested by a large shift in the neutralization curve for the mutant (Fig. 1b). If we draw vertical lines through the overlaid neutralization curves, we can calculate the fraction of virions with each mutation that survive neutralization at each antibody concentration. These fractions can be represented using logo plots, where the height of each letter is proportional to the fraction of virions with that amino acid at a site that survive (Fig. 1c). Large letters correspond to strong escape mutations.
Now consider the case where a mutation has just a small antigenic effect, and slightly increases the fraction of virions that survive neutralization (Fig. 1d). In this scenario, the neutralization curve shifts only slightly (Fig. 1e). In the logo plot representation, the antigenic mutation is slightly larger than other amino acids (Fig. 1f), since possessing the mutation modestly increases the chance that a virion survives antibody treatment. These logo plots therefore provide a way to both identify antigenic mutations and quantify the magnitudes of their effects in a way that is directly comparable across antibodies.
Our goal is to determine the fraction of mutant virions that survive antibody neutralization for all mutations to HA. One way to do this would be to measure individual neutralization curves for each of the 19 × 565 = 10,735 single amino-acid mutants of the 565-residue HA protein. However, individually creating and assaying that many mutants would be exceedingly time-consuming and expensive. Fortunately, we have shown that antibody selection on all viral mutations can be assayed in a single experiment using mutational antigenic profiling47,48. This approach involves generating viral libraries containing all mutations to the protein of interest, selecting these viruses with or without antibody, and using an accurate deep-sequencing method to determine the relative frequencies of each mutation.
These frequencies can be analyzed to calculate the fraction of virions with each mutation that survive antibody treatment. Specifically, the deep sequencing determines the frequencies of virions carrying amino-acid a at site r in the antibody-selected and mock-selected conditions, which we denote as and , respectively. We can also measure the total fraction of the viral library that survives the antibody, which we denote as γ. The fraction of variants with amino-acid a at site r that survive antibody selection is then simply
For instance, in Fig. 1a, the frequency of virions with the orange mutation is in the antibody selection and in the mock selection. The overall fraction of virions that survive the antibody in Fig. 1a is . Therefore, we use Eq. 1 to calculate that the fraction of variants with the orange mutation that survive is . Performing the analogous calculation for Fig. 1d correctly determines that fraction of virions with the orange mutation that survive the antibody is only 0.5 for the scenario in that figure panel. In the analyses of real data below, we will plot the excess fraction surviving above the overall library average, which is
Importantly, Eqs. 1 and 2 correct for effects on viral growth due to normalization by the mock-selected control, and so measure only antigenicity and not viral growth provided that the virus at least grows well enough to be present in the library. Details of how the calculations are extended to account for sequencing errors and sampling statistics are in the Methods section. Open-source software that performs all steps in the analysis beginning with the deep sequencing data is available at https://jbloomlab.github.io/dms_tools2/.
Broad and narrow antibodies that neutralize influenza virus
We applied this approach to anti-HA antibodies with a range of breadths and epitopes. The crystal structures or sites of escape mutations selected by these antibodies are shown in Fig. 2a. We chose two broad antibodies, FI6v3 and C179, that target the stalk of HA26,49,50. FI6v3 is extremely broad, and neutralizes both group 1 and group 2 HAs (Fig. 2b). C179 is less broad, and neutralizes only some group 1 HAs (Fig. 2b). We also chose a broad antibody, S139/1, that crystallographic studies have shown binds to residues in HA’s receptor-binding pocket27, and which can neutralize both group 1 and group 2 HAs27,41. Finally, we re-analyzed deep sequencing data from prior mutational antigenic profiling of three narrow strain-specific antibodies, H17-L19, H17-L10, and H17-L747. These narrow antibodies bind the Ca2, Ca1, and Cb antigenic regions on HA’s globular head51, and only neutralize a narrow slice of H1 viruses.
We performed our experiments using the lab-adapted A/WSN/1933 (H1N1) strain of influenza. This strain is derived from an early seasonal H1N1 that was extensively passaged in the lab, where it adapted to become neurotropic and trypsin independent52. But despite these unusual properties, the virus is neutralized by most broad antibodies that target other H1 viruses, including those used in this study (Fig. 3). Our experiments utilize fully infectious influenza virus rather than pseudovirus, which is important since the accessibility of some epitopes can vary with HA density, which differs between fully infectious virus and pseudovirus26,53.
The wildtype virus is neutralized by all the antibodies, with IC50s between 0.01 and 1 μg/ml (Fig. 3). However, our selections are performed on mutant virus libraries, not wildtype virus. Because these libraries have different capacities to escape each antibody, the fraction of each library that survives high antibody concentrations will vary among antibodies. For instance, at concentrations that neutralize 99% of the wildtype virus, we expect a larger fraction of a library to survive an antibody for which there are many HA escape mutations than an antibody with few HA escape mutations. Therefore, rather than using the same concentration for all antibodies, we selected concentrations for each antibody where between 2% and 0.1% of the libraries survived in order to strongly select for escape mutations (Fig. 3). Slight differences among antibodies in the fraction surviving within this range should not strongly affect our results, since Eqs. 1 and 2 account for such differences via the γ term. However, to confirm the robustness of our results, we used several concentrations of each broad antibody (Fig. 3).
The effects of all mutations on antibody neutralization
We performed mutational antigenic profiling using the three broad antibodies at the concentrations indicated in Fig. 3 (the fraction of each library neutralized at each of these concentrations is listed in Supplementary Table 1). All experiments were performed in full biological triplicate using three independently generated virus libraries carrying single amino-acid mutations to HA54. Importantly, as described previously54, these virus libraries were generated by mutagenizing HA at the codon level rather than at the nucleotide level. Performing codon mutagenesis is important, because single-nucleotide mutations access only about a third of the possible amino-acid mutations from a given codon, whereas codon mutations access all possible amino-acid mutations.
The correlations among replicates of the mutational antigenic profiling, in terms of the measured fraction-surviving above average for each possible amino-acid mutation, are shown in Supplementary Fig. 1. For the remainder of this paper, we will refer to the median antigenic effect of each mutation across replicates.
It is immediately obvious that the narrow strain-specific antibodies and the antibody targeting residues in HA’s receptor-binding pocket (S139/1) select mutations with large antigenic effects. For all four of these antibodies, there are multiple sites in HA where mutations enable a substantial fraction of virions to survive high antibody concentrations (Fig. 4). Specifically, there are mutations that enable over a third of virions to survive at concentrations where virtually all wildtype virions are neutralized (Supplementary Fig. 2). Therefore, the virus can escape these four antibodies with the sort of large-effect single amino-acid mutations that characterize traditional influenza antigenic drift16,17,18,19,20,21.
In contrast, the stalk-targeting antibodies C179 and FI6v3 select no strong escape mutants. If we look at the results for these antibodies on the same scale as the other antibodies, we see only a few small bumps in the fraction of virions surviving (Fig. 4 and Supplementary Fig. 2). Only if we zoom in can we see that there are actually a few sites where mutations slightly increase the fraction of virions surviving C179 and FI6v3 (Supplementary Figs. 3, 4, 5, 6, 7, and 8). But the effect sizes of these antigenic mutations are tiny compared to the other antibodies, especially for FI6v3. Therefore, the HA of A/WSN/1933 influenza virus is far less capable of escaping these anti-stalk antibodies by single mutations than it is of escaping the other four antibodies.
Selected mutations are near antibody binding footprints
Antigenic mutations selected by narrow strain-specific antibodies against HA are thought to occur at residues in or near the physical binding footprint of the antibody16,17,51. We examined whether this was the case for the broad antibodies used in our experiments. Figure 5a shows a zoomed-in view of the sites of mutations selected by each antibody, as well as their locations on HA’s structure. It is immediately clear that the selected mutations are nearly all in or close to the antibody-binding footprint.
For the S139/1 antibody that targets residues in the HA receptor-binding pocket, there are strong escape mutations at sites 156, 158, and 193 (Fig. 5a; sites are in H3 numbering). These three sites fall directly in the physical binding footprint of the antibody27, and are the same three sites where previous work has selected escape mutants in H1, H2, and H3 HAs41. Our data show that numerous different amino-acid mutations at each site confer neutralization resistance. The mutation with the largest effect, G158N, introduces an N-linked glycosylation motif.
Although the anti-stalk antibodies C179 and FI6v3 select mutations with small effects only, these mutations almost all fall in or near the physical binding footprints of the antibodies (Fig. 5a). The two antibodies have similar epitopes and angles of approach49, and they select identical mutations at several sites (Fig. 5b). The three largest-effect mutations for FI6v3 (K280S, K280T, and N291S) all introduce glycosylation motifs near the epitope, and all three mutations have similar magnitude antigenic effects in both FI6v3 and C179.
However, C179 selects several mutations that do not have any apparent effect on FI6v3 (Fig. 5a, Supplementary Fig. 6). The most notable of these C179-specific mutations are at site 38. The additional breadth of FI6v3 over antibodies such as C179 that neutralize only group 1 HAs is because FI6v3 can accommodate a glycan on the asparagine at site 38 that is present in group 2 HAs24,25,26. However, the H38S mutation that has the largest effect on C179 resistance in our experiments does not introduce a glycosylation motif, showing that there are also other ways to escape anti-stalk antibodies at this site. Interestingly, group 1 HA subtypes that are susceptible to C179 tend to possess a histidine at site 38, but subtypes that are not bound or neutralized by C179 often possess a serine (Fig. 5c).
The FI6v3 antibody also weakly selects several mutations at residue -8, which is part of HA’s signal peptide (Fig. 5a). This signal peptide is cleaved from the mature HA protein55,56, although mutations at this site can affect HA’s expression level57, which might conceivably affect HA density on virions and subsequently antibody neutralization26,53.
Validation by neutralization assays
Do the mutations identified in our mutational antigenic profiling actually have the expected effect on antibody neutralization? We have previously validated many of the large-effect antigenic mutations selected by the narrow antibodies H17-L19, H17-L10, and H17-L747. However, the mutations selected by the broad anti-stalk antibodies have much smaller effects in our mutational antigenic profiling, especially for the broadest antibody, FI6v3. We therefore tested some of these FI6v3-selected mutations using neutralization assays on individual viral mutants.
Figure 6 shows that the mutational antigenic profiling is highly predictive of the results of the neutralization assays, even for small-effect mutations. As discussed in the previous section, the three mutations most strongly selected by FI6v3 introduce glycosylation motifs at sites 278–280 or 289–291 (Fig. 5a,b). We created viruses carrying each of these mutations (K280S, K280T, and N291S) and validated that all three modestly but significantly increased resistance to FI6v3 (Fig. 6a, Supplementary Fig. 9). As a control, we also validated that a mutation at one of these sites (K280A) that does not have an effect in our mutational antigenic profiling does not significantly shift the neutralization curve (Fig. 6a, Supplementary Fig. 9).
Our mutational antigenic profiling also identified several non-glycosylation-motif mutations that were selected by FI6v3. We validated that one of these mutations, G47R in the HA2 chain, significantly increased neutralization resistance (Fig. 6a, Supplementary Fig. 9), although as predicted by the mutational antigenic profiling, the magnitude of the effect was small. The most unexpected mutations identified in our mutational antigenic profiling were at site -8 in the signal peptide. We tested one of these mutations, K(-8)T, and it led to a very slight increase in neutralization resistance (Fig. 6a, Supplementary Fig. 9), although despite the significance testing in Supplementary Fig. 9, we remain circumspect about the magnitude of this effect relative to the noise in our neutralization assays. As controls, we also tested three mutations (P80D and V135T, which are escape mutations for H17-L7 and H17-L19, and M17L in HA2) that did not have substantial effects in the mutational antigenic profiling, and confirmed that none of them significantly affected neutralization resistance (Fig. 6a, Supplementary Fig. 9).
A notable aspect of these validation experiments is the very small effect sizes of the identified mutations on neutralization by FI6v3. Antigenic mutations selected by strain-specific antibodies to HA generally increase the concentration of antibody needed to neutralize the virus by orders of magnitude. Neutralization curves for such large-effect escape mutants are in Fig. 6b,c. Although there are no such large-effect single mutations that escape FI6v3 or C179, the results in Fig. 6a show that we can still use mutational antigenic profiling to identify mutations that have small but measurable effects on resistance to these antibodies.
HA mutational tolerance and antibody escape
Why are there no large-effect escape mutations from the anti-stalk antibodies? One possibility is that all HA sites in the antibody-binding footprint are intolerant of mutations, meaning that viruses with mutations at these sites cannot replicate and so are not present in our mutant virus libraries. Another possibility is that mutations are tolerated at some HA sites in the antibody footprint, but that the binding energetics are distributed across sites in such a way that none of these tolerated mutations strongly affect neutralization.
We can examine these possibilities using deep mutational scanning data that measures the tolerance of HA for each possible amino-acid mutation. Specifically, we have previously selected our A/WSN/1933 virus HA mutant libraries for variants that can replicate in cell culture, and then used deep sequencing to estimate the preference of each site in HA for each possible amino acid54. Figure 7 shows these amino-acid preferences for all sites in HA within 4 Å of each broad antibody, with the antigenic effects of the mutations overlaid. Although some HA sites in the antibody footprints strongly prefer a single amino acid, for all antibodies there are also footprint sites that tolerate a fairly wide range of amino acids. In most cases the mutations selected by the antibodies occur at these mutationally tolerant sites. However, there are exceptions—for instance, the H38S mutation selected by C179 is rather disfavored with respect to viral growth, but has a large enough antigenic effect to still be detected in our mutational antigenic profiling.
The data in Fig. 7 show that the lack of large-effect escape mutants from FI6v3 and C179 is not entirely due to the mutational intolerance of HA sites in the antibody-binding footprints. Some HA sites in each antibody footprint are fairly mutationally tolerant, and contain a range of mutations in the viral libraries used in our antibody selections. However, our mutational antigenic profiling shows that only a fraction of mutations at a fraction of these sites actually affect antibody neutralization. This finding is reminiscent of prior work showing that the binding energetics at protein–protein interfaces can be asymmetrically distributed across sites58,59,60. The broad anti-stalk antibodies therefore appear to both mostly target mutationally intolerant sites and distribute their binding energetics in such a way that altering the mutationally tolerant HA sites has relatively little effect on neutralization.
We have quantified how all single amino-acid mutations to an H1 influenza virus HA affect neutralization by a collection of broad and narrow antibodies. Our results show that the virus’s inherent evolutionary capacity for escape via point mutations differs across antibodies. Interestingly, antibody breadth is not always an indicator of the difficulty of viral escape. As expected, single amino-acid mutations can make the virus completely resistant to narrow strain-specific antibodies against HA’s globular head. However, such mutations can also enable the virus to escape the broad S139/1 antibody targeting residues in HA’s receptor-binding pocket, despite the fact that this antibody neutralizes multiple subtypes. But no single mutation has a comparably large effect on neutralization by two broad antibodies targeting HA’s stalk, FI6v3 and C179. Therefore, these anti-stalk antibodies are quantifiably more difficult for the virus to escape.
Although there are no large-effect escape mutations from the broad anti-stalk antibodies, there are mutations that more modestly affect neutralization. This finding emphasizes the importance of identifying antigenic mutations in a way that accounts for effect sizes. The classic approach for selecting escape mutations involves treating a virus stock with antibody at a concentration that completely neutralizes wildtype, and looking for viral mutants that survive this treatment16,17. There are no such single mutations for the H1 HA and broad anti-stalk antibodies tested here, since no mutations shift the neutralization curve enough to enable survival at antibody concentrations that fully neutralize wildtype. However, our approach shows that there are mutations that have more modest (<10-fold) effects on neutralization by even the broadest antibody. Interestingly, most previous studies42,43,44,45 that have reported selecting single mutations with large effects (10-fold) on neutralization by anti-stalk antibodies have used group 2 (e.g., H3 or H7) HAs rather than group 1 HAs like the one used in our work, although at least one study has selected a large-effect escape mutation to a broad anti-stalk antibody in an H5 group 1 HA61. In addition, when interpreting the magnitude of the effects measured in our experiments, it is important to note that we are only assessing how mutations affect neutralization, and not how they affect Fc-mediated functions that are responsible for much of the in vivo protection afforded by anti-stalk antibodies62,63.
Another important caveat is that our experiments examine single amino-acid mutations to the HA from one influenza virus strain. The protein evolution literature is full of examples of epistatic interactions that enable multiple mutations to access phenotypes not accessible by single mutations64,65,66. Such epistasis is relevant to HA’s evolution. For instance, work by Das et al.67 suggests that the sequential accumulation of mutations can shift the spectrum of available antibody-escape mutations. Wu et al.68 have used deep mutational scanning to directly demonstrate that rampant epistasis enables HA’s receptor-binding pocket to accommodate combinations of individually deleterious mutations, some of which affect sensitivity to antibodies. Therefore, our work does not imply any absolute limits on the possibilities for antibody escape when evolution is given sufficient time to explore combinations of mutations. However, single mutations are the most accessible form of genetic variation, and much of influenza virus’s natural antigenic drift involves individual mutations that reduce sensitivity to immunodominant antibody specificities16,17,18,19,20,21. Quantifying the antigenic effects of all such mutations therefore provides a relevant measure of ease of viral antibody escape.
A major rationale for studying broadly neutralizing antibodies is that they are hoped to be more resistant to viral evolutionary escape than the antibodies that dominate natural immune responses to influenza virus32,33. We have used a new approach to quantify the extent to which this is actually true, and shown that neutralization of an H1 virus by broad anti-stalk antibodies is indeed more, although certainly not completely, resistant to erosion by viral point mutations. Going forward, we suggest that completely mapping viral escape mutations will be a useful complement to more traditional techniques that simply characterize the breadth of anti-viral antibodies against circulating strains.
C179 IgG was purchased from Takara Bio Inc (Catalog #M145). FI6v3 was purified from 293 F cells (ThermoFisher R79007) transduced with a lentiviral vector encoding a commercially synthesized gene for the IgG form of the antibody, with the heavy and light chains reverse-translated from the protein sequence in the PDB structure 3ZTN26 as described previously69. Genes encoding S139/1 in IgG form were were reverse-translated from the protein sequence in PDB structure 4GMS27, and used to express and purify protein by the Fred Hutchinson Cancer Research Center protein expression core.
We performed neutralization assays using influenza viruses that carried GFP in the PB1 segment. These PB1flank-eGFP were generated in co-cultures of 293T-CMV-PB1 and MDCK-SIAT1-CMV-PB1 cells as described previously70, using the standard bi-directional pHW181-PB2, pHW182-PB1, pHW183-PA, pHW184-HA, pHW185-NP, pHW186-NA, pHW187-M, and pHW188-NS reverse-genetics plasmids71 for all genes except PB1, plus the pHH-PB1flank-eGFP plasmid70. Each mutant was generated by repeating this process using a version of the pHW184-HA plasmid that had been engineered by site-directed mutagenesis to carry the indicated mutation. The neutralization assays themselves were performed by using a plate reader to quantify the GFP signal produced by MDCK-SIAT1-CMV-PB1 cells infected by PB1flank-eGFP virus that had been incubated with the indicated antibody concentration as described previously72. All neutralization curves in Fig. 6a represent the mean and standard deviation of three measurements, with the individual replicates shown in Supplementary Fig. 9. All the neutralization assays for FI6v3 were performed on the same day to eliminate batch effects, with each replicate involving independent serial dilution of the antibody in a separate column of a 96-well plate.
H3 sequence numbering
Unless otherwise indicated, all residues are numbered in the H3 numbering scheme, with the signal peptide in negative numbers, the HA1 subunit as plain numbers, and the HA2 subunit denoted with “HA2.” The conversion between sequential numbering of the A/WSN/1933 HA and the H3 numbering scheme was performed using the Python script available at https://github.com/jbloomlab/HA_numbering. Supplementary Data 1 gives the numbering conversion.
Inference of HA phylogenetic tree
To infer the phylogenetic tree in Fig. 2, we downloaded one HA sequence per subtype from the Influenza Research Database73, inferred the phylogenetic tree using RaxML74 with a GTR model, and visualized the tree using FigTree (http://tree.bio.ed.ac.uk/software/figtree/). The HA sequences used are in Supplementary Data 2. In Fig. 2, we indicate which HAs each antibody has been reported to bind or neutralize26,27,41,49,50. Among broad antibodies, S139/1 has not been tested against H8 and H11; C179 has not been tested against H8 and H11; and no antibodies have been tested against H17 and H18. The narrow H17-L19, H17-L10, and H17-L7 antibodies have not been tested against any other subtypes-however, since these antibodies have a very limited range even among H1 HAs51, we assume that they do not bind other subtypes.
For the cladogram in Fig. 5c, the amino-acid identities at site 38 are from the strains tested against C179 by by Dreyfus et al.49. For subtypes not tested, the amino-acid identity reported is that in the strain for that subtype in Supplementary Data 2.
Mutant virus libraries
The mutant virus libraries are those described in Doud and Bloom54, and were produced in full biological triplicate. Briefly, these libraries were generated by using codon mutagenesis75 to introduce random codon mutations into plasmid-encoded HA, and then using a helper-virus strategy that avoids the bottlenecks associated with standard influenza reverse genetics to create the virus libraries. Although a helper virus is used to generate the libraries from plasmids, the viruses in the resulting library carry the full complement of genes and are fully infectious and replication-competent54. This fact is important, since the accessibility of HA epitopes can depend on virion HA density, which is often lower in pseudovirus than in fully infectious virus26,53. Full details of the library generation and sequencing statistics that quantify how completely each of the triplicate libraries covers the possible amino-acid mutations have been described previously54.
Mutational antigenic profiling
The mutational antigenic profiling was performed as described previously47. Briefly, we diluted each of the virus libraries to a concentration of 106 TCID50 per ml and incubated the virus dilutions with an equal volume of antibody at the intended concentration at 37 °C for 1.5 h. The final antibody concentrations in these mixtures are shown in Fig. 3. We performed three fully independent replicates of each selection using the three replicate mutant virus libraries. In addition, we performed technical replicates (independent neutralization experiments on the same virus library) in some cases as indicated in Supplementary Fig. 1. The virus-antibody mixtures were used to infect cells, and viral RNA was extracted, reverse-transcribed, and PCR amplified as described previously47. In order to obtain high accuracy in the Illumina deep sequencing, we used the barcoded-subamplicon sequencing strategy described by Doud and Bloom54, which is a slight modification of the strategy of Wu et al.39.
We also estimated the overall fraction of virions surviving each antibody selection. These fractions are denoted by γ in this paper. The average of these fractions across libraries are reported in Fig. 3, and the values for each individual replicate are in Supplementary Table 1. The fractions were estimated using qRT-PCR against the viral NP and canine GAPDH as described previously47. Briefly, we made duplicate 10-fold serial dilutions of each of the virus libraries to use as a standard curve of infectivity. We also performed qPCR on the cells infected with the virus-antibody mix. To estimate the fractions, we used linear regression to fit a line relating logarithm of the viral infectious dose in the standard curve to the difference in Ct values between NP and GAPDH, and then interpolated the fraction surviving for each selection from this regression.
Analysis of deep sequencing data
The deep sequencing data were analyzed using version 2.2.1 of the dms_tools2 software package76, which is available at http://jbloomlab.github.io/dms_tools2. Supplementary Data 3 contains a Jupyter notebook that performs all steps of the analysis beginning with downloading the FASTQ files from the Sequence Read Archive. Detailed statistics about the sequencing depth and error rates are shown in this Jupyter notebook and its HTML rendering in Supplementary Data 4.
Calculating fraction of mutants that survive neutralization
In prior mutational antigenic profiling work47,48, we calculated the differential selection on each mutation as the logarithm of its enrichment relative to wildtype in an antibody-selected sample versus a mock-selected control. These mutation differential selection values are useful for the analysis of individual experiments. However, there is no natural way to compare these values across experiments with different antibodies at different concentrations, since the strength of differential selection depends on details of how the pressure is imposed. We therefore developed the new approach in this paper to quantify the antigenic effect of a mutation in units that can be compared across antibodies and concentrations.
The general principle of the calculations is illustrated in Fig. 1 and discussed in the first section of the Results section. Here we provide details on how these calculations are performed. The deep sequencing measures the number of times that codon x is observed at site r in both the antibody-selected and mock-selected conditions. Denote these counts as and , respectively. We also perform deep sequencing of a control (in this case, plasmid DNA encoding the wildtype HA gene) to estimate the sequencing error rate. Denote the counts of codon x at site r in this control as . Also denote the total reads at each site r in each sample as , , and .
We first estimate the rate of sequencing errors at site r as
For the wildtype identity at site r, which we denote as wt(r), the value of εr,wt(r) is the fraction of times we correctly observe the wildtype identity wt(r) at site r versus observing some spurious mutation. For all mutant identities at site r, εr,x is the fraction of times we observe the mutation x at site r when the identity is really wildtype. We ignore second-order terms where we incorrectly read one mutation as another, as such errors will be very rare as mutations themselves are rare (most codons are wildtype in most sequences).
We next adjust all of the deep sequencing codon counts in the antibody-selected and mock-selected conditions by the error control. Specifically, the error-adjusted counts for the antibody-selected sample are
An equivalent equation is used to calculate . We then sum the error-adjusted codon counts for each amino acid a:
so that are the error-adjusted counts for the antibody-selected condition summed across all codons x where the encoded amino acid is a. An equivalent equation is used to calculate .
Finally, we use these error-adjusted amino-acid counts to estimate the mutation frequencies and that are used in Eq. 1 to calculate the fraction Fr,a of virions with amino acid a at site r that survive the selection. When estimating these mutation frequencies, we add a pseudocount of P=5 to the lower-depth sample, and a depth-adjusted pseudocount to the higher depth sample. The rationale for adding a pseudocount is to regularize the estimates in the case of low counts. Specifically, we estimate the mutation frequencies as
where A is the number of characters (e.g., 20 for amino acids), and are the pseudocount adjustment factors defined as:
The pseudocount adjustment factors ensure that P is added to the counts for the lower depth sample, and a proportionally scaled-up pseudocount is added to the higher depth sample. The depth scaling is necessary to avoid systematically biasing towards higher mutation frequencies in the lower depth sample. It is these estimated mutation frequencies that are used in conjunction with γ (the qPCR estimated overall of virions that survive selection) to compute the fraction surviving (Fr,a) and excess fraction surviving above the library average via Eqs. 1 and 2.
In some cases, we need to summarize the excess fraction of mutations surviving into a single number for each site, such as for plotting as a function of the site number or displaying on the crystal structure. There are 19 different values for non-wildtype amino acids for each site. One summary statistic is the fraction surviving above the library average averaged over all 19 amino-acid mutations at site r:
Another summary statistic is the maximum fraction surviving above average among all 19 amino-acid mutations at site r
In this paper, Fig. 4 and Supplementary Fig. 2 show the median of excess fraction surviving taken across all biological and technical replicates at a given antibody concentration (Eq. 2). The subsequent logo plots show the medians of these values taken across all concentrations for each antibody. The numerical values plotted in these logo plots are in Supplementary Data 5. The fraction surviving values not adjusted to be in excess of the library average (Eq. 1) are in Supplementary Data 6.
Deep sequencing data are available from the Sequence Read Archive under BioSample accession SAMN05789126 at https://www.ncbi.nlm.nih.gov/sra/?term = SAMN05789126. Computer code that analyzes these data to generate all the results described in this paper is in Supplementary Data 3, and an HTML version of the analysis notebook is in Supplementary Data 4. In addition, all of this code as well as the manuscript itself and other data are available on GitHub at https://github.com/jbloomlab/HA_antibody_ease_of_escape. Finally, the dms_tools2 software76 that performs most of the analysis is available at https://jbloomlab.github.io/dms_tools2/. The authors declare that all other data supporting the findings of this study are available within the article and its Supplementary Information files, or are available from the authors upon request
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We thank Adam Dingens, Sarah Hilton, Katherine Xue, Lauren Gentles, and Jeremy Roop for helpful comments on the project and manuscript. We thank the Fred Hutchinson Cancer Research Center genomics core for performing the Illumina deep sequencing, and the protein expression core for expressing and purifying the S139/1 antibody. This work was supported by grant R01AI127893 from the NIAID of the NIH. M.B.D. was supported in part by training grant T32AI083203 from the NIAID of the NIH. J.M.L. was supported in part by the Center for Inference and Dynamics of Infectious Diseases (CIDID), which is funded by grant U54GM111274 from the NIGMS of the NIH. The research of J.D.B. is supported in part by a Faculty Scholar Grant from the Howard Hughes Medical Institute and the Simons Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.