Quantifying the ease of viral escape from broad and narrow antibodies to influenza hemagglutinin

Influenza virus can completely escape most antibodies with single mutations. However, rare antibodies broadly neutralize many viral strains. It is unclear how easily influenza virus might escape such antibodies if they became widespread due to therapeutic use or vaccination. Here we map all single amino-acid mutations that increase resistance to broad antibodies targeting an H1 hemagglutinin. Crucially, our approach not only identifies antigenic mutations but also quantifies their effect sizes. All antibodies select mutations, but the effect sizes vary widely. The virus can escape a broad antibody targeting hemagglutinin’s receptor-binding site the same way it escapes narrow strain-specific antibodies: via single mutations with huge effects. In contrast, broad antibodies targeting hemagglutinin’s stalk only select mutations with small effects. Therefore, antibody breadth is not necessarily an indicator of the difficulty of viral escape. Broadly neutralizing antibodies targeting hemagglutinin’s stalk are quantifiably harder to escape than the other antibodies tested here.


INTRODUCTION
Nearly all viruses show some antigenic variation. However, the extent of this variation ranges widely. For instance, although both measles virus (Birrer et al., 1981;Ter Meulen et al., 1981) and polio virus (Crainic et al., 1983;Diamond et al., 1985;Drexler et al., 2014) exhibit antigenic variation, the magnitude of this variation is small. Therefore, immunity to these viruses is lifelong (Panum, 1847;Salk, 1984). In contrast, human influenza virus exhibits much more antigenic variation. So although infection with an influenza virus strain provides long-term immunity to that exact strain (British Medical Journal, 1978;Davies et al., 1982;Yu et al., 2008), the virus's rapid antigenic evolution erodes the effectiveness of this immunity to that strain's descendants within four to seven years (Couch and Kasel, 1983).
One possible reason that viruses exhibit different amounts of antigenic variation is that they have disparate evolutionary capacities to escape the immunodominant antibodies generated by natural immune responses (Lipsitch and O'Hagan, 2007;Cobey, 2014;Fulton et al., 2015). According to this explanation, human influenza virus undergoes rapid antigenic drift because most neutralizing antibodies target epitopes on the viral hemagglutinin (HA) protein that are highly tolerant of mutational change.
This explanation is supported by classic experiments showing that it is easy to select viral mutants that escape most antibodies (Yewdell et al., 1979;Webster and Laver, 1980), as well as by the observation that mutations that alter antigenicity arise frequently during influenza's evolution globally (Koel et al., 2013;Chambers et al., 2015;Petrie et al., 2016;Neher et al., 2016) and within individual humans with long-term infections (Xue et al., 2017). A corollary of this explanation is that influenza virus's capacity for antigenic drift would be reduced if most antibodies instead targeted epitopes that were less mutationally tolerant.
Verifying this corollary has become of practical importance with the discovery of broadly neutralizing antibodies against influenza virus. These antibodies typically target conserved epitopes in HA's stalk (Sui et al., 2009;Ekiert et al., 2009;Corti et al., 2011) or receptor-binding site (Lee et al., 2012;Ekiert et al., 2012;Schmidt et al., 2015), and neutralize a wide range of viral strains. Broad antibodies are usually less abundant in human serum than antibodies to antigenically variable epitopes on the head of HA (Ellebedy et al., 2014;Andrews et al., 2015). However, major efforts are underway to elicit broad antibodies by vaccination or administer them directly as therapeutics (Krammer and Palese, 2015;Corti et al., 2017).
If these efforts succeed, the epitopes of broad antibodies will be under strong antigenic selection in human influenza virus. Might such selection then drive antigenic variation in these epitopes? There is precedent for the idea that the immune status of the host population can shape influenza virus evolution: the virus undergoes faster antigenic drift in long-lived humans that accumulate immune memory than in shortlived swine that are mostly naive (Sheerar et al., 1989;Luoh et al., 1992), and poultry vaccination may accelerate antigenic drift of avian influenza (Lee et al., 2004;Cattoli et al., 2011). But alternatively, perhaps broad antibodies are broad because the virus has difficulty escaping them regardless of selection from host immunity.
So far, there is limited data to distinguish between these possibilities. Several studies have shown that the head domain of HA is more mutationally tolerant than the stalk domain where many broad antibodies bind (Thyagarajan and Bloom, 2014;Wu et al., 2014;Heaton et al., 2013). However, these studies did not select for antibody escape, so it is difficult to relate their measurements to the virus's evolutionary capacity under immune selection. Other work has shown that it is possible to select antigenic mutants with broad antibodies (Yoshida et al., 2009;Chai et al., 2016), demonstrating that these epitopes are not entirely refractory to change. But given that antibodies can select some antigenic variation even in measles virus (Birrer et al., 1981;Ter Meulen et al., 1981) and polio virus (Crainic et al., 1983;Diamond et al., 1985), the existence of selectable mutations does not necessarily imply that influenza virus can escape broad antibodies as easily as it drifts away from narrow strain-specific ones. The fundamental problem is that existing studies have not quantified the ease of viral escape in a way that can be compared across antibodies in an apples-to-apples fashion.
Here we systematically quantify the results of selecting all single amino-acid mutations to an H1 HA with several broad and narrow antibodies. Critically, our approach quantifies the magnitude of the antigenic effect of every mutation in a way that can be directly compared across antibodies. We find that even the broadest antibodies select antigenic mutations. However, the magnitudes of the antigenic effects vary greatly across antibodies. Single mutations make the virus completely resistant to both narrow strain-specific antibodies and a broad antibody against the receptorbinding site. But no single mutation does more than modestly increase the virus's resistance to two broad antibodies against the HA stalk. Therefore, broad anti-stalk antibodies are quantifiably more resistant to viral escape via single amino-acid mutations than the other antibodies tested here.

RESULTS
An approach to quantify the fraction of virions with each mutation that escape antibody neutralization we calculate the fraction of virions with that mutation that survive the antibody, and indicate this fraction by the height of the letter corresponding to that amino acid at that site. (D-F) Similar data to the first three panels, but now V1K has only a small antigenic effect, and so only modestly increases the fraction of virions that survive antibody treatment.
through the overlaid neutralization curves, we can calculate the fraction of virions with each mutation that survive neutralization at each antibody concentration. These fractions can be represented using logo plots, where the height of each letter is proportional to the fraction of virions with that amino acid at a site that survive ( Figure 1C).
Large letters correspond to strong escape mutations. Now consider the case where a mutation has just a small antigenic effect, and so only slightly increases the fraction of virions that survive neutralization ( Figure 1D).
In this scenario, the neutralization curve shifts only slightly ( Figure 1E). In the logo plot representation, the antigenic mutation is only slightly larger than other amino acids ( Figure 1F), since possessing the mutation only modestly increases the chance that a virion survives antibody treatment. These logo plots therefore provide a way to both identify antigenic mutations and quantify the magnitudes of their effects in a way that is directly comparable across antibodies.
Our goal is to determine the fraction of mutant virions that survive antibody neu-tralization for all mutations to HA. One way to do this would be to measure individual neutralization curves for each of the 19 × 565 = 10, 735 single amino-acid mutants of the 565-residue HA protein. However, individually creating and assaying that many mutants would be exceedingly time-consuming and expensive. Fortunately, we have shown that antibody selection on all viral mutations can be assayed in a single experiment using mutational antigenic profiling (Doud et al., 2017;Dingens et al., 2017).
This approach involves generating viral libraries containing all mutations to the protein of interest, selecting these viruses with or without antibody, and using an accurate deep-sequencing method to determine the relative frequencies of each mutation.
These frequencies can be analyzed to calculate the fraction of virions with each mutation that survive antibody treatment. Specifically, the deep sequencing determines the frequencies of virions carrying amino-acid a at site r in the antibody-selected and mock-selected conditions, which we denote as ρ selected r,a and ρ mock r,a , respectively. We can also measure the total fraction of the viral library that survives the antibody, which we denote as γ. The fraction of variants with amino-acid a at site r that survive antibody selection is then simply For instance, in Figure 1A, the frequency of virions with the orange mutation is ρ selected r,a = 4 7 in the antibody selection and ρ mock r,a = 4 16 in the mock selection. The overall fraction of virions that survive the antibody in Figure 1A is γ = 7 16 . Therefore, we use Equation 1 to calculate that the fraction of variants with the orange mutation that survive is F r,a = 7 16 × 4/7 4/16 = 1. Performing the analogous calculation for Figure 1D correctly determines that fraction of virions with the orange mutation that survive the antibody is only 0.5 for the scenario in that figure panel. In the analyses of real data below, we will plot the excess fraction surviving above the overall library average, which is   Gamblin et al., 2004). S139/1 (PDB 4GMS; Lee et al., 2012) targets the receptor-binding site; C179 (PDB 4HLZ; Dreyfus et al., 2013) and FI6v3 (PDB 3ZTN; Corti et al., 2011) target the stalk. The sites of escape mutations for H17-L19, H17-L10, and H17-L7 are those mapped by Doud et al. (2017). (B) A phylogenetic tree of HA subtypes. Circles (broad antibodies) and squares (narrow antibodies) denote reported antibody binding or neutralization activity against that subtype. Not all antibodies have been tested against all subtypes.

Broad and narrow antibodies that neutralize influenza virus
We applied this approach to anti-HA antibodies with a range of breadths and epitopes.
The crystal structures or sites of escape mutations selected by these antibodies are shown in Figure 2A. We chose two broad antibodies, FI6v3 and C179, that target the stalk of HA (Corti et al., 2011;Okuno et al., 1993;Dreyfus et al., 2013). FI6v3 is extremely broad, and neutralizes both group 1 and group 2 HAs ( Figure 2B). C179 is less broad, and neutralizes only some group 1 HAs ( Figure 2B). We also selected a broad antibody, S139/1, that targets HA's receptor-binding site and can neutralize both group 1 and group 2 HAs (Yoshida et al., 2009;Lee et al., 2012). Finally, we re-analyzed deep sequencing data from prior mutational antigenic profiling of three narrow strain-specific antibodies, H17-L19, H17-L10, and H17-L7 (Doud et al., 2017).
These narrow antibodies bind the Ca2, Ca1, and Cb antigenic regions on HA's globular head (Caton et al., 1982), and only neutralize a narrow slice of H1 viruses.
We performed our experiments using the A/WSN/1933 (H1N1) strain of influenza, which is neutralized by all six antibodies ( Figure 3). We used antibody concentrations sufficient for nearly complete neutralization of wildtype virus, allowing us to detect antigenic mutations that increase the fraction of mutant virions surviving beyond the baseline of wildtype ( Figure 3). We used several concentrations of each broad antibody to increase sensitivity to antigenic mutations of both large and small effect.

Quantifying the antigenic effects of all mutations selected by each antibody
We performed mutational antigenic profiling using the three broad antibodies at the concentrations indicated in Figure 3, and calculated the fraction of virions carrying each mutation that survived neutralization. All experiments were performed in full biological triplicate using three independently generated virus libraries carrying sin- ). There are multiple sites of large-effect mutations for H17L19, H17L10, H17L7, and S139/1 -but none for FI6v3 and C179. Figure S2 shows the excess fraction surviving for the largest-effect mutation at each site. Figures S3, S4, S5, S6, S7, and S8 show all mutations using logo plots. Sites are labeled in H3 numbering. gle amino-acid mutations to HA (Doud and Bloom, 2016). The correlations among replicates are shown in Figure S1. For the remainder of this paper, we will refer to the median antigenic effect of each mutation across replicates.
It is immediately obvious that the narrow strain-specific antibodies and the antibody targeting HA's receptor-binding pocket (S139/1) select mutations with large antigenic effects. For all four of these antibodies, there are multiple sites in HA where mutations enable a substantial fraction of virions to survive high antibody concentrations ( Figure 4). Specifically, there are mutations that enable over a third of virions to survive at concentrations where virtually all wildtype virions are neutralized ( Figure S2). Therefore, the virus can escape these four antibodies with the sort of large-effect single amino-acid mutations that characterize traditional influenza antigenic drift (Yewdell et al., 1979;Webster and Laver, 1980;Koel et al., 2013;Chambers et al., 2015;Petrie et al., 2016;Neher et al., 2016).
In contrast, the stalk-targeting antibodies C179 and FI6v3 select no strong escape mutants. If we look at the results for these antibodies on the same scale as the other antibodies, we see only a few small bumps in the fraction of virions surviving ( The selected mutations are near the binding footprints of the antibodies Antigenic mutations selected by narrow strain-specific antibodies against HA are thought to occur at residues in or near the physical binding footprint of the antibody (Yewdell et al., 1979;Webster and Laver, 1980;Caton et al., 1982). We examined whether this was the case for the broad antibodies used in our experiments. Figure 5A shows a zoomed-in view of the sites of mutations selected by each antibody, as well as their locations on HA's structure. It is immediately clear that the selected mutations are nearly all in or close to the antibody-binding footprint.
For the broad anti-receptor binding pocket antibody S139/1, there are strong escape mutations at sites 156, 158, and 193 ( Figure 5A; sites are in H3 numbering). Although the anti-stalk antibodies C179 and FI6v3 only select mutations with small effects, these mutations almost all fall in or near the physical binding footprints of the antibodies ( Figure 5A). The two antibodies have similar epitopes and angles of approach (Dreyfus et al., 2013), and they select identical mutations at several sites ( Figure 5B). The three largest-effect mutations for FI6v3 (K280S, K280T, and N291S) all introduce glycosylation motifs near the epitope, and all three mutations have similar magnitude antigenic effects in both FI6v3 and C179.
However, C179 selects several mutations that do not have any apparent effect on FI6v3 ( Figure 5A, Figure S6). The most notable of these C179-specific mutations are at site 38. The additional breadth of FI6v3 over antibodies such as C179 that neutralize only group 1 HAs is because FI6v3 can accommodate a glycan on the asparagine at site 38 that is present in group 2 HAs (Corti et al., 2011;Sui et al., 2009;Ekiert et al., 2009). However, the H38S mutation that has the largest effect on C179 resistance in our experiments does not introduce a glycosylation motif, showing that there are also other ways to escape anti-stalk antibodies at this site. Interestingly, group 1 HA subtypes that are susceptible to C179 tend to possess a histidine at site 38, but subtypes that are not bound or neutralized by C179 often possess a serine ( Figure 5C).
The FI6v3 antibody also weakly selects several mutations at residue -8, which is part of HA's signal peptide ( Figure 5A). This signal peptide is cleaved from the mature HA protein (Daniels et al., 2003;Burke and Smith, 2014), although mutations at this site can affect HA's expression level (Nordholm et al., 2017).

Neutralization assays validate the mutational antigenic profiling
Do the mutations identified in our mutational antigenic profiling actually have the expected effect on antibody neutralization? We have previously validated many of the large-effect antigenic mutations selected by the narrow antibodies H17-L19, H17-L10, and H17-L7 (Doud et al., 2017). However, the mutations selected by the broad anti-stalk antibodies have much smaller effects -especially for the broadest antibody, FI6v3. We therefore tested some of these FI6v3-selected mutations using neutralization assays on individual viral mutants. Figure 6 shows that the mutational antigenic profiling is highly predictive of the results of the neutralization assays, even for small-effect mutations. As discussed in the previous section, the three mutations most strongly selected by FI6v3 introduce glycosylation motifs at sites 278-280 or 289-291 ( Figure 5A,B). We created viruses carrying each of these mutations (K280S, K280T, and N291S) and validated that all three modestly increased resistance to FI6v3 ( Figure 6A). As a control, we also validated that a mutation at one of these sites (K280A) that does not have an effect in our mutational antigenic profiling does not shift the neutralization curve ( Figure 6A).
Our mutational antigenic profiling also identified several non-glycosylation-motif mutations that were selected by FI6v3. We validated that one of these mutations, G47R in the HA2 chain, increased neutralization resistance ( Figure 6A) -although as predicted by the mutational antigenic profiling, the magnitude of the effect was small. The most unexpected mutations identified in our mutational antigenic profiling were at site -8 in the signal peptide. We tested one of these mutations, K(-8)T, and found that it did appear to lead to a very slight increase in neutralization resistance ( Figure 6A). As controls, we also tested three mutations (P80D and V135T, which are escape mutations for H17-L7 and H17-L19, and M17L in HA2) that did not have substantial effects in the mutational antigenic profiling, and confirmed that none of them affected neutralization resistance.
A notable aspect of these validation experiments is the very small effect sizes of the identified mutations on neutralization by FI6v3. Antigenic mutations selected by strain-specific antibodies to HA generally increase the concentration of antibody needed to neutralize the virus by orders of magnitude. Neutralization curves for such large-effect escape mutants are in Figure 6B,C. Although there are no such large-effect single mutations that escape FI6v3 or C179, the results in Figure 6A show that we can still use mutational antigenic profiling to identify mutations that have small but measurable effects on resistance to these antibodies.
Limited HA mutational tolerance does not fully explain the lack of strong escape mutations from the anti-stalk antibodies Why are there no large-effect escape mutations from the anti-stalk antibodies? One possibility is that all HA sites in the antibody-binding footprint are intolerant of mutations, meaning that viruses with mutations at these sites cannot replicate and so are not present in our mutant virus libraries. Another possibility is that mutations are tolerated at some HA sites in the antibody footprint, but that the binding energetics are distributed across sites in such a way that none of these tolerated mutations strongly affect neutralization.
We can examine these possibilities using deep mutational scanning data that mea-  Doud and Bloom (2016). For instance, site 153 only tolerates tryptophan, so W occupies the entire height of the preference logo stack. In contrast, site 156 tolerates many amino acids, all of which contribute to the height of the preference logo stack. Above the preference logo stacks are logo plots showing the excess fraction surviving antibody treatment as measured in the current study. Note that scale for these antigenic effects is 10× smaller for FI6v3 and C179 than for S139/1.
are also footprint sites that tolerate a fairly wide range of amino acids. In most cases the mutations selected by the antibodies occur at these mutationally tolerant sites.
However, there are exceptions -for instance, the H38S mutation selected by C179 is rather disfavored with respect to viral growth, but has a large enough antigenic effect to still be detected in our mutational antigenic profiling.
The data in Figure 7 show that the lack of large-effect escape mutants from FI6v3 and C179 is not due to complete mutational intolerance of HA sites in the antibodybinding footprints. Some HA sites in each antibody footprint are fairly mutationally tolerant, and contain a range of mutations in the viral libraries used in our antibody selections. However, our mutational antigenic profiling shows that only a fraction of mutations at a fraction of these sites actually affect antibody neutralization. This finding is reminiscent of prior work showing that the binding energetics at protein-protein interfaces can be asymmetrically distributed across sites (Jin et al., 1992;Cunningham and Wells, 1993;Dall'Acqua et al., 1998). The broad anti-stalk antibodies appear to distribute their binding energetics in such a way that altering the mutationally tolerant HA sites has relatively little effect on neutralization. to immunodominant antibody specificities (Yewdell et al., 1979;Webster and Laver, 1980;Koel et al., 2013;Chambers et al., 2015;Petrie et al., 2016;Neher et al., 2016).

DISCUSSION
Quantifying the antigenic effects of all such mutations therefore provides a relevant measure of ease of viral antibody escape.
A major rationale for studying broadly neutralizing antibodies is that they are hoped to be more resistant to viral evolutionary escape than the antibodies that dominate natural immune responses to influenza virus (Krammer and Palese, 2015;Corti et al., 2017). We have used a new approach to quantify the extent to which this is actually true, and shown that the anti-viral activity of broad anti-stalk antibodies is indeed more -although certainly not completely -resistant to erosion by viral point mutations. Going forward, we suggest that quantifying the ease of viral escape will be a useful complement to more traditional techniques that characterize the structural epitopes and strain breadth of anti-viral antibodies.

Antibodies
C179 IgG was purchased from Takara Bio Inc (Catalog #M145). FI6v3 was purified from 293F cells transduced with a lentiviral vector encoding a commercially synthesized gene for the IgG form of the antibody, with the heavy and light chains reverse-translated from the protein sequence in the PDB structure 3ZTN (Corti et al., 2011) as described in Balazs et al. (2013). Genes encoding S139/1 in IgG form were were reverse-translated from the protein sequence in PDB structure 4GMS (Lee et al., 2012), and used to express and purify protein by the Fred Hutchinson Cancer Research Center protein expression core.

Neutralization assays
We performed neutralization assays using influenza viruses that carried GFP in the PB1 segment. These PB1flank-eGFP were generated in co-cultures of 293T-CMV-PB1 and MDCK-SIAT1-CMV-PB1 cells as described in Bloom et al. (2010), using the standard bi-directional pHW181-PB2, ..., pHW188-NS reverse-genetics plasmids (Hoffmann et al., 2000) for all genes except PB1, plus the pHH-PB1flank-eGFP plasmid (Bloom et al., 2010). Each mutant was generated by repeating this process using a version of the pHW184-HA plasmid that had been engineered by site-directed mutagenesis to carry the indicated mutation. The neutralization assays themselves were performed by using a plate reader to quantify the GFP signal produced by MDCK-SIAT1-CMV-PB1 cells infected by PB1flank-eGFP virus that had been incubated with the indicated antibody concentration as described in Hooper and Bloom (2013). All neutralization curves represent the mean and standard deviation of three measurements.

H3 sequence numbering
Unless otherwise indicated, all residues are numbered in the H3 numbering scheme, with the signal peptide in negative numbers, the HA1 subunit as plain numbers, and the HA2 subunit denoted with "(HA2)". The conversion between sequential numbering of the A/WSN/1933 HA and the H3 numbering scheme was performed using the Python script available at https: //github.com/jbloomlab/HA_numbering. File S1 gives the numbering conversion.

Inference of HA phylogenetic tree
To infer the phylogenetic tree in Figure 2, Figure 2, we indicate which HAs each antibody has been reported to bind or neutralize (Yoshida et al., 2009;Lee et al., 2012;Okuno et al., 1993;Dreyfus et al., 2013;Corti et al., 2011). Among broad antibodies, S139/1 has not been tested against H8, H9, and H11; C179 has not been tested against H8 and H11; and no antibodies have been tested against H17 and H18. The narrow H17-L19, H17-L10, and H17-L7 antibodies have not been tested against any other subtypeshowever, since these antibodies have a very limited range even among H1 HAs (Caton et al., 1982), we assume that they do not bind other subtypes. For the cladogram in Figure 5C, the amino-acid identities at site 38 are from the strains tested against C179 by Dreyfus et al. (2013). For subtypes not tested, the amino-acid identity reported is that in the strain for that subtype in File S2.

Mutant virus libraries
The mutant virus libraries are those described in Doud and Bloom (2016), and were produced in full biological triplicate. Briefly, these libraries were generated by using codon mutagenesis (Bloom, 2014) to introduce random codon mutations into plasmid-encoded HA, and then using a helper-virus strategy that avoids the bottlenecks associated with standard influenza reverse genetics to create the virus libraries. Full details of the library generation and sequencing statistics that quantify how completely each of the triplicate libraries covers the possible amino-acid mutations are in Doud and Bloom (2016).

Mutational antigenic profiling
The mutational antigenic profiling was performed as described in Doud et al. (2017). Briefly, we diluted each of the virus libraries to a concentration of 10 6 TCID 50 per ml and incubated the virus dilutions with an equal volume of antibody at the intended concentration at 37 • C for 1.5 hours. The final antibody concentrations in these mixtures are shown in Figure 3. We performed three fully independent replicates of each selection using the three replicate mutant virus libraries. In addition, we performed technical replicates (independent neutralization experiments on the same virus library) in some cases as indicated in Figure S1. The virusantibody mixtures were used to infect cells, and viral RNA was extracted, reverse-transcribed, and PCR amplified as described in Doud et al. (2017). In order to obtain high accuracy in the Illumina deep sequencing, we used the barcoded-subamplicon sequencing strategy described in Doud et al. (2017), which is a slight modification of the strategy of Wu et al. (2014).
We also estimated the overall fraction of virions surviving each antibody selection. These fractions are denoted by γ in this paper. The average of these fractions across libraries are reported in Figure 3, and the values for each individual replicate are in Table S1. The fractions were estimated using qRT-PCR against the viral NP and canine GAPDH as described in Doud et al. (2017). Briefly, we made duplicate 10-fold serial dilutions of each of the virus libraries to use as a standard curve of infectivity. We also performed qPCR on the cells infected with the virus-antibody mix. To estimate the fractions, we used linear regression to fit a line relating logarithm of the viral infectious dose in the standard curve to the difference in Ct values between NP and GAPDH, and then interpolated the fraction surviving for each selection from this regression.

Analysis of deep sequencing data
The deep sequencing data were analyzed using version 2.1.0 of the dms tools2 software package (Bloom, 2015), which is available at http://jbloomlab.github.io/dms_tools2. File S3 contains a Jupyter notebook that performs all steps of the analysis beginning with downloading the FASTQ files from the Sequence Read Archive. Detailed statistics about the sequencing depth and error rates are shown in this Jupyter notebook.

Calculating the fraction of virions with each mutation that escapes antibody neutralization
In prior mutational antigenic profiling work (Doud et al., 2017;Dingens et al., 2017), we calculated the differential selection on each mutation as the logarithm of its enrichment relative to wildtype in an antibody-selected sample versus a mock-selected control. These mutation differential selection values are useful for the analysis of individual experiments. However, there is no natural way to compare these values across experiments with different antibodies at different concentrations, since the strength of differential selection depends on details of how the pressure is imposed. We therefore developed the new approach in this paper to quantify the antigenic effect of a mutation in units that can be compared across antibodies and concentrations.
The general principle of the calculations is illustrated in Figure 1 and discussed in the first section of the RESULTS. Here we provide details on how these calculations are performed. The deep sequencing measures the number of times that codon x is observed at site r in both the antibody-selected and mock-selected conditions. Denote these counts as n selected r,x and n mock r,x , respectively. We also perform deep sequencing of a control (in this case, plasmid DNA encoding the wildtype HA gene) to estimate the sequencing error rate. Denote the counts of codon x at site r in this control as n err r,x . Also denote the total reads at each site r in each sample as N selected r = x n selected r,x , N mock r = x n mock r,x , and N err r = x n err r,x . We first estimate the rate of sequencing errors at site r as ( For the wildtype identity at site r, which we denote as wt (r), the value of r,wt(r) is the fraction of times we correctly observe the wildtype identity wt (r) at site r versus observing some spurious mutation. For all mutant identities x wt (r) at site r, r,x is the fraction of times we observe the mutation x at site r when the identity is really wildtype. We ignore second-order terms where we incorrectly read one mutation as another, as such errors will be very rare as mutations themselves are rare (most codons are wildtype in most sequences). We next adjust all of the deep sequencing codon counts in the antibody-selected and mock-selected conditions by the error control. Specifically, the error-adjusted counts for the antibody-selected sample arê An equivalent equation is used to calculaten mock r,x . We then sum the error-adjusted codon counts for each amino acid a:n so thatn selected r,a are the error-adjusted counts for the antibody-selected condition summed across all codons x where the encoded amino acid A (x) is a. An equivalent equation is used to calculaten mock r,a . Finally, we use these error-adjusted amino-acid counts to estimate the mutation frequencies ρ selected r,a and ρ mock r,a that are used in Equation 1 to calculate the fraction F r,a of virions with amino acid a at site r that survive the selection. When estimating these mutation frequencies, we add a pseudocount of P = 5 to the lower-depth sample, and a depth-adjusted pseudocount to the higher depth sample. The rationale for adding a pseudocount is to regularize the estimates in the case of low counts. Specifically, we estimate the mutation frequencies as where A is the number of characters (e.g., 20 for amino acids), f r,selected and f r,mock are the pseudocount adjustment factors defined as: The pseudocount adjustment factors ensure that P is added to the counts for the lower depth sample, and a proportionally scaled-up pseudocount is added to the higher depth sample. The depth scaling is necessary to avoid systematically biasing towards higher mutation frequencies in the lower depth sample. It is these estimated mutation frequencies that are used in conjunction with γ (the qPCR estimated overall of virions that survive selection) to compute the fraction surviving (F r,a ) and excess fraction surviving above the library average (F excess r,a ) via Equations 1 and 2.
In some cases, we need to summarize the excess fraction of mutations surviving into a single number for each site, such as for plotting as a function of the site number or displaying on the crystal structure. There are 19 different F excess r,a values for non-wildtype amino acids for each site. One summary statistic is the fraction surviving above the library average averaged over all 19 amino-acid mutations at site r: Another summary statistic is the maximum fraction surviving above average among all 19 amino-acid mutations at site r: In this paper, Figures 4 and S2 show the median of excess fraction surviving taken across all biological and technical replicates at a given antibody concentration. The subsequent logo plots show the medians of these values taken across all concentrations for each antibody. The numerical values plotted in these logo plots are in File S4.
Code that performs these fraction surviving analyses has been added to version 2.1.0 of the dms tools2 software package (Bloom, 2015) which is available at http://jbloomlab. github.io/dms_tools2.

Data availability and source code
Deep sequencing data are available from the Sequence Read Archive under BioSample accession SAMN05789126. Computer code that analyzes these data to generate the results described in this paper is in File S3.

ACKNOWLEDGMENTS
We thank Adam Dingens, Sarah Hilton, Katherine Xue, Lauren Gentles, and Jeremy Roop for helpful comments on the project and manuscript. We thank the Fred Hutchinson Cancer Research Center genomics core for performing the Illumina deep sequencing, and the protein expression core for expressing and purifying the C179 antibody. This work was supported by grant R01AI127893 from the NIAID of the NIH. MBD was supported in part by training grant T32AI083203 from the NIAID of the NIH. JML was supported in part by the Center for Inference and Dynamics of Infectious Diseases (CIDID), which is funded by grant U54GM111274 from the NIGMS of the NIH. The research of JDB is supported in part by a Faculty Scholar Grant from the Howard Hughes Medical Institute and the Simons Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

AUTHOR CONTRIBUTIONS
MBD and JML performed the experiments. All three authors designed the project, contributed to the computer code, analyzed the data, and wrote the paper.

H17-L10
15 μg/ml H17-L7 Figure S1: Correlations across experimental replicates. Each point represents one site in HA, and gives the fraction surviving above average across all amino-acid mutations at that site, as calculated using Equation 10. The replicates are highly correlated for antibodies with strong escape mutations (S139/1, H17-L19, H17-L10, and H17-L7), and reasonably correlated for antibodies with only weak escape mutations (FI6v3 and C179). 0.8 broad anti-RBS antibody: S139/1 (300 g/ml) Figure S2: The excess fraction surviving for the single strongest escape mutation at each site. This plot differs from Figure 4 in that the height of the line indicates the excess fraction of virions that survive the antibody selection for the single strongest escape mutation at that site, rather than the average across all amino-acid mutations at that site.   Figure S3: The excess fraction surviving selection with antibody H17L19 for all aminoacid mutations. The excess fraction surviving for each replicate was computed using Equation 2, then we took the median across all technical and biological replicates for each antibody concentration, and then took the medians of those values across concentrations. The height of each letter is proportional to the excess fraction surviving of virions with that mutation. The scale bar at the top of the plot relates the letter heights to the actual fractions. The sites are labeled using H3 numbering.   Figure S4: The excess fraction surviving selection with antibody H17L10 for all aminoacid mutations. The excess fraction surviving for each replicate was computed using Equation 2, then we took the median across all technical and biological replicates for each antibody concentration, and then took the medians of those values across concentrations. The height of each letter is proportional to the excess fraction surviving of virions with that mutation. The scale bar at the top of the plot relates the letter heights to the actual fractions. The sites are labeled using H3 numbering.   Figure S5: The excess fraction surviving selection with antibody H17L7 for all amino-acid mutations. The excess fraction surviving for each replicate was computed using Equation 2, then we took the median across all technical and biological replicates for each antibody concentration, and then took the medians of those values across concentrations. The height of each letter is proportional to the excess fraction surviving of virions with that mutation. The scale bar at the top of the plot relates the letter heights to the actual fractions. The sites are labeled using H3 numbering.    Figure S8: The excess fraction surviving selection with antibody S139/1 for all amino-acid mutations. The excess fraction surviving for each replicate was computed using Equation 2, then we took the median across all technical and biological replicates for each antibody concentration, and then took the medians of those values across concentrations.  Table S1: The total fraction of virions surviving each antibody treatment at each concentration as estimated by qPCR. These are the quantities referred to as γ. This table shows the values for the broad antibodies; values for the narrow H17-L17, H17-L10, and H17-L7 antibodies are reported in Doud et al. (2017).

N Y Y W T L L E P G D T I I F E A T G N L I A P W Y A F A L S R G F E S G I I T S N A S M H E C N T K C Q T P Q G
File S1: Conversion from sequential numbering of the A/WSN/1933 HA to H3 numbering.
In this CSV file, the original column gives the residue number in sequential (1, 2, ...) numbering of the A/WSN/1933 HA, and the new column gives the residue number in H3 numbering.
File S2: Sequences used to infer the tree for all HA subtypes. This FASTA file gives the HA sequences used to infer the tree of subtypes in Figure 2.
File S3: Computer code and data for the analysis of the mutational antigenic profiling data. The code in this ZIP file performs the entire computational analysis beginning with downloading the FASTQ files from the Sequence Read Archive. The ZIP file contains a README file that explains the contents in detail. The actual analysis is performed by the Jupyter notebook analysis notebook.ipynb, which includes embedded plots summarizing key statistics and results. An HTML version of this notebook is also included as analysis notebook.html.
File S4: The excess fraction surviving for each mutation for each antibody. This file is a ZIP of CSV files giving the numerical values plotted in the logo plots. These are median excess fraction surviving taken first across replicates and then across antibody concentrations.