Incomplete influenza A virus genomes are abundant but readily complemented during spatially structured viral spread

Viral genomes comprising multiple distinct RNA segments can undergo genetic exchange through reassortment, a process that facilitates viral evolution and can have major epidemiological consequences. Segmentation also allows the replication of incomplete viral genomes (IVGs), however, and evidence suggests that IVGs occur frequently for influenza A viruses. Here we quantified the frequency of IVGs using a novel single cell assay and then examined their implications for viral fitness. We found that each segment of influenza A/Panama/2007/99 (H3N2) virus has only a 58% probability of being present in a cell infected with a single virion. These observed frequencies accurately account for the abundant reassortment seen in co-infection, and suggest that an average of 3.7 particles are required for replication of a full viral genome in a cell. This dependence on multiple infection is predicted to decrease infectivity and to slow viral propagation in a well-mixed system. Importantly, however, modeling of spatially structured viral growth predicted that the need for complementation is met more readily when secondary spread occurs locally. This expectation was supported by experimental infections in which the level spatial structure was manipulated. Furthermore, a virus engineered to be entirely dependent on co-infection to replicate in vivo was found to grow robustly in guinea pigs, suggesting that coinfection is sufficiently common in vivo to support propagation of IVGs. The infectivity of this mutant virus was, however, reduced 815-fold relative wild-type and the mutant virus did not transmit to contacts. Thus, while incomplete genomes augment reassortment and contribute to within-host spread, the existence of rare complete IAV genomes may be critical for transmission to new hosts.


Abstract 10
Viral genomes comprising multiple distinct RNA segments can undergo genetic exchange 11 through reassortment, a process that facilitates viral evolution and can have major 12 epidemiological consequences. Segmentation also allows the replication of incomplete viral 13 genomes (IVGs), however, and evidence suggests that IVGs occur frequently for influenza A 14 viruses. Here we quantified the frequency of IVGs using a novel single cell assay and then 15 examined their implications for viral fitness. We found that each segment of influenza 16 A/Panama/2007/99 (H3N2) virus has only a 58% probability of being present in a cell infected 17 with a single virion. These observed frequencies accurately account for the abundant 18 reassortment seen in co-infection, and suggest that an average of 3.7 particles are required for 19 replication of a full viral genome in a cell. This dependence on multiple infection is predicted to 20 decrease infectivity and to slow viral propagation in a well-mixed system. Importantly, however, 21 modeling of spatially structured viral growth predicted that the need for complementation is met 22 more readily when secondary spread occurs locally. This expectation was supported by 23 experimental infections in which the level spatial structure was manipulated. Furthermore, a 24 virus engineered to be entirely dependent on co-infection to replicate in vivo was found to grow 25 robustly in guinea pigs, suggesting that coinfection is sufficiently common in vivo to support 26 propagation of IVGs. The infectivity of this mutant virus was, however, reduced 815-fold relative 27 wild-type and the mutant virus did not transmit to contacts. Thus, while incomplete genomes 28 augment reassortment and contribute to within-host spread, the existence of rare complete IAV 29 genomes may be critical for transmission to new hosts. 30 31

Main Text 32
Pathogen evolution poses a continued threat to public health by reducing the 33 effectiveness of antimicrobial drugs and adaptive immunity. In the case of the influenza A virus 34 (IAV), this evolution results in seasonal outbreaks as new viruses emerge to which pre-existing 35 immunity is weak. Each year requires a new vaccine as a consequence, and keeping pace with 36 IAV evolution is challenging: unexpected emergence of new strains could render the vaccine 37 obsolete before the flu season starts. IAV populations evolve rapidly in part because their 38 mutation rates are high, on the order of 10 -4 substitutions per nucleotide per genome copied 1 . 39 The segmentation of the viral genome gives a second source of genetic diversity. The IAV genome 40 is composed of eight single-stranded RNA segments, and so cells co-infected with two different 41 IAV virions can produce chimeric progeny with a mix of segments from these two viruses. This 42 process, termed reassortment, carries costs and benefits analogous to those of sexual 43 reproduction in eukaryotes 2 . Reassortment can join beneficial mutations from different 44 backgrounds to alleviate clonal interference 3 , and purge deleterious mutations to mitigate the 45 effects of Muller's ratchet 4,5 . This combinatorial shuffling of mutations may accelerate adaptation 46 to new environments such as a novel host 6 . But free mixing of genes through reassortment may 47 also reduce viral fitness by separating beneficial segment pairings, as sexual reproduction carries 48 this cost in eukaryotes 7 . Previous work has shown that reassortment occurs readily between 49 closely related variants 8 , but is limited between divergent lineages due to molecular barriers 9,10 50 or reduced fitness of progeny 11,12 . Nevertheless, the contribution of reassortment to emergence 51 of novel epidemic and pandemic IAVs has been documented repeatedly [13][14][15][16] . Factors that affect 52 infections initiated with randomly distributed inocula contained more IVGs than those generated 89 by secondary spread from low MOIs, in which spatial structure is inherent. To determine the 90 potential for complementation to occur in vivo, we generated a mutant virus that was fully 91 dependent on cellular co-infection for viral replication, and found that it was able to grow within 92 guinea pigs, but unable to transmit to cagemates. Taken together, these results suggest that the 93 abundance of incomplete genomes and the potential for complementation are important factors 94 in the replication and transmission of IAV. 95 96 Results 97

Measurement of P P 98
To better evaluate the implications of genome incompleteness for IAV fitness and 99 reassortment, we sought to quantify the probability of successful replication (P P = probability 100 present) for each of the eight IAV genome segments within single cells infected with single virus 101 particles. To ensure accurate detection of IVGs, we devised a system that would allow their 102 replication to high copy number. We applied our approach to the human seasonal isolate, 103 influenza A/Panama/2007/99 (H3N2) virus. In this assay, MDCK cells are inoculated with a virus 104 of interest, referred to herein as "Pan/99-WT" or "WT", and a genetically tagged helper virus 105 ("Pan/99-Helper" or "Helper"). This Helper virus differs from the WT strain only by silent 106 mutations on each segment that provide distinct primer-binding sites. For example, qPCR primers 107 targeting WT PB2 will not anneal to cDNA of Helper PB2, and vice versa. By co-inoculating cells 108 with a low MOI of WT virus and a high MOI of Helper virus, we ensure that each cell is 109 productively infected, but is unlikely to receive more than one WT virion. Following infection, one 110 cell per well is sorted into a 96-well plate containing MDCK cell monolayers. The initially infected 111 cell produces progeny which then infect neighboring cells, effectively amplifying the vRNA 112 segments present in the first cell. The presence or absence of WT segments in each well can then 113 be measured by performing segment-specific RT qPCR. As detailed in the Methods, a correction 114 factor was applied to account for multiple infection, the probability of which could be estimated 115 based on the observed number of cells infected with each virus. 116 Using this assay, the P P values for each segment of Pan/99 virus were quantified (Fig. 1A). 117 We observed that each segment was present at an intermediate frequency between 0.5 and 0.6, 118 indicating that incomplete genomes may arise from loss of any segment(s). The mean frequency 119 across all segments was 0.58. When used to parameterize a model that estimates the frequency 120 of reassortment, which we published previously 24 , these P P values generated predicted levels of 121 reassortment that align closely with experimental data (Fig. 1B). This match between observed 122 and predicted reassortment is important because i) it offers a validation of the measured P P 123 values and ii) it indicates that IVGs fully account for the levels of reassortment observed, which 124 are much higher than predicted for viruses with only complete genomes 24 . 125 Interactions between vRNP segments are thought to play an important role in the 126 assembly of new virions 10,26-29 . To determine whether similar interactions exist that could 127 mediate the co-delivery of segments to the cell, the patterns of segment co-occurrence were 128 analyzed. In performing this analysis, it was again important to take into account the known 129 probability of multiple infection in our single cell assay. As shown in Supplementary Figure 1, cells 130 containing more segments were likely to have been infected with multiple virions. Because such 131 cells are less informative for this analysis, we applied a weighting factor to ensure that results 132 relied more strongly on data from cells with fewer WT segments. Namely, we determined the 133 probability that a given cell acquired its gene constellation by infection with a single virion and 134 weighted data according to this probability to calculate the pairwise correlation between 135 segments. While some significant interactions were observed (HA-NA, HA-M, M-NS), they were 136 relatively weak, with r 2 values below 0.15 (Fig.1C). Thus, our data suggest that associations 137 among specific vRNPs do not play a major role during the establishment of infection within a cell. 138

Predicted costs of incomplete genomes for cellular infectivity 139
If singular infections often result in replication of fewer than eight viral gene segments, 140 the infectious unit would be expected to comprise multiple particles. To evaluate the relationship 141 between the frequency of IVGs and the number of particles required to infect a cell, we 142 developed a probabilistic model in which the likelihood of segment delivery is governed by the 143 parameter P P . In Figure 2A we examine how P P affects the frequency with which a single virion 144 delivers a given number of segments. If P P is low, singular infections typically yield few segments 145 per cell. Even at the intermediate P P that characterizes Pan/99 virus, the vast majority of singular 146 infections give rise to IVGs within the cell. When P P is high, however, most cells receive the full 147 complement of eight segments. In Figure 2B we plot the relationship between P P and the 148 percentage of cells that are expected to be productively infected following singular infection. If 149 only a single virus infects a cell, then the probability that all eight segments are present will be 150 P P 8 . For Pan/99 virus, the frequency with which eight segments are present is approximately 0.58 8 151 Importantly, however, if more than one virus particle infects the cell, then the probability 153 that all eight segments are present will be considerably higher. This effect is demonstrated in 154 Figure 1C, where the percentage of cells containing all eight IAV segments is plotted as a function 155 of the number of virions that have entered the cell. Here we see that, even for low P P , a high 156 probability of productive infection is reached at high multiplicities of infection. Finally, in Figure  157 2D, the relationship between P P and the average number of virions required to productively 158 infect a cell is examined. We see that the number of virions comprising an infectious unit is 159 inversely proportional to P P . Based on our experimentally determined values of P P for Pan/99 160 virus, we estimate that an average of 3.7 virions must enter a cell to render it productively 161 infected (Fig. 2D). Thus, as a result of stochastic loss of gene segments, the likelihood that a full 162 viral genome will be replicated within a singularly infected cell is low. The fitness implications of 163 this inefficiency may be offset, however, by complementation of IVGs in multiply infected cells. 164

Predicted costs of incomplete genomes for population infectivity 165
The potential for multiple infection to mitigate the costs of inefficient genome delivery 166 will, of course, depend on the frequency of multiple infection. To evaluate the theoretical impact 167 of IVGs on viral fitness, we therefore modeled the process of infection at a population level. A 168 population of computational virions was randomly distributed across a population of 169 computational cells over a range of MOIs, such that the likelihood of multiple infection was 170 dictated by Poisson statistics. The frequency with which each cell acquired segments was again 171 governed by P P . For each combination of P P and MOI, we calculated the percentage of 172 populations in which at least one cell contained eight segments (Fig. 3A). This plot shows that 173 viruses with lower P P require markedly higher MOIs to ensure productive infection within a 174 population of cells. Indeed, when we estimated the MOI required for a virus of a given P P to infect 175 50% of populations, we observed that the ID 50 increases exponentially as P P decreases (Fig. 3B). 176 Thus, a reliance on multiple infection in a well-mixed system is predicted to bear a substantial 177 fitness cost. 178

Model of spatially structured viral spread 179
The estimates of viral infectivity made above assume that virus is distributed randomly 180 over a population of cells. Following the initial infection event, however, viruses spread with 181 spatial structure. We hypothesized that this structure may be very important for reducing the 182 costs of genome incompleteness once infection is established. To test this idea, we developed a 183 model of viral spread in which the extent of spatial structure could be varied. 184 The system comprises a spatially explicit grid of cells that can become infected with virus. 185 The number and type of segments delivered upon infection is dictated by the parameter P P and, 186 if all eight segments are present, a cell produces virus particles. These particles can then diffuse 187 in a random direction, with the distance traveled governed by the diffusion coefficient (D). D was 188 varied in the model to modulate the spatial structure of viral spread: higher D corresponds to 189 greater dispersal of virus and therefore lower spatial structure. We simulated replication of two  The model allows the potential costs of incomplete genomes to be evaluated by 201 comparing results obtained for a virus with P P =1.0 to those obtained for a virus with a lower P P . 202 In particular, we focused on P P =0.58 based on the measured values for Pan/99 virus. In Figure 4B  growth analysis indicated that a maximum of 11.5 PFU per cell was produced during Pan/99 virus 220 infection of MDCK cells. Based on measured P P values, these data estimate that a single 221 productively infected cell produces 962 virions, and this value was used as the burst size in our 222

Impact of MOI and spatial structure on IVG complementation in cell culture 224
Our models indicate that, for a virus of a given P P , the frequency of infected cells 225 containing IVGs is reduced i) at higher MOIs and ii) under conditions of high spatial structure. 226 These predictions can be seen in Figure 6, where we examined how the proportion of infected 227 cells that are semi-infected varies with MOI ( Fig. 6A) and with the diffusion coefficient (Fig. 6B). 228 We tested these predictions of the models experimentally by modulating MOI and spatial spread 229 in IAV infected cell cultures and gauging the impact of each manipulation on levels of IVGs. 230 To monitor levels of IVGs, we used flow cytometry to measure the potential for 231 complementation-that is, the benefit provided by the addition of Pan/99-Helper virus. We 232 hypothesized that, under single cycle conditions, the potential for complementation would 233 decrease with increasing WT virus MOI, since complementation between co-infecting WT viruses 234 would occur frequently at high MOIs. In addition, under multicycle conditions initiated from low 235 MOI, we predicted that the potential for complementation would be greatest at the beginning of 236 infection, due to the random distribution of viral particles, and reduced by secondary spread. We 237 hypothesized that the combination of local dispersal and high particle production during 238 secondary spread would support co-infection in neighboring cells. To test our hypotheses, we 239  Figure 6C revealed that the potential for complementation at 252 the outset of infection was high at low MOIs, but decreased with increasing MOI. This result was 253 as expected, since complementation between WT virus particles was predicted to reduce the 254 need for Helper virus (Fig. 6A). 255 To evaluate the impact of spatially structured secondary spread on IVG prevalence, cells 256 were inoculated with Pan/99-WT virus at low MOI (0.002 or 0.01 PFU/cell) and then multicycle 257 replication was allowed to proceed over a 12 h period. After this period, cells were inoculated These data agree with our theoretical results (Fig. 6B) and indicate that the spatial structure of 265 secondary spread facilitates complementation between WT particles as they infect neighboring 266 cells at locally high MOIs. 267

Generation of a virus with absolute dependence on multiple infection 268
To evaluate the potential for complementation in vivo, we generated a virus that is fully expression of M1 and M2 by flow cytometry. We observed that, as dilution increased, cells 292 expressing M1 were less likely to express M2, and vice versa (Fig. 7C). This result would be 293 expected if expression of both proteins from the same cell required co-infection with M1.Only 294 and M2.Only encoding virions. As a control, we monitored the effect of dilution on co-expression 295 of HA and M1 or M2. Here, we found that co-expression of M1 or M2 and HA was much less 296 sensitive to dilution, consistent with co-delivery of M and HA segments by single virions. 297 To test the hypothesis that a given number of Pan/99-M.STOP virus particles would be 298 less infectious than a comparable number of Pan/99-WT virus particles, we characterized both 299 viruses using a series of titration methods that vary in their dependence on infectivity and M 300 protein expression. We first used ddPCR to quantify NS copy numbers of the WT and M.STOP 301 viruses and then normalized all other comparisons to this ratio to account for the difference in 302 virus concentration. As shown above, total M copy numbers were roughly equivalent when 303 normalized to NS ( Fig. 7B and 7D). Using immunotitration, in which cells are infected under single-304 cycle conditions with serial dilutions of virus and then stained for HA expression 34 , we observed 305 equivalent titers of both viruses (Fig. 7D). This was expected, as HA expression under single-cycle 306 conditions is not dependent on M1 or M2 proteins. When titration relied upon multi-cycle 307 replication, however, the WT virus was higher titer than the M.STOP virus. This difference was 308 moderate in cell culture-based measurements, with PFU and TCID 50 titers 24-and 51-fold higher, 309 respectively, likely because of the reduced importance of M2 in this environment. The full cost 310 for infectivity of separating the M1 and M2 ORFs onto distinct segments was apparent in vivo, 311 where 815-fold as much M.STOP virus was required to infect 50% of guinea pigs compared to WT 312 virus (Fig. 7D). Thus, although the M.STOP virus differs from a virus with very low P P in that 313 complementation can only occur when viruses carrying M1.Only and M2.Only segments co-314 infect, the prediction shown in Figure 3 that increased dependence on multiple infection 315 decreases infectivity held true in this system. 316

Potential for complementation in vivo 317
Having determined that the dependence of Pan/99-M.STOP virus on complementation 318 impairs viral infectivity, we next sought to evaluate the potential for complementation to occur guinea pigs, following similar kinetics to Pan/99-WT virus. Average peak virus production, 325 measured as NS vRNA copies, was reduced by only 9-fold relative to WT (Fig. 8A). was observed in nasal washings collected from contacts (Fig. 8B). These results suggest that the 334 spatial structure inherent to multi-cycle replication mitigates the cost of incomplete genomes in 335 an individual host, but dependence on complementation is costly for transmission. 336 337 Discussion 338 Using a novel single-cell approach that enables robust detection of incomplete IAV 339 genomes, we show that ~99% of Pan/99 virus infections led to replication of fewer than eight 340 segments. The theoretical models we describe predict that the existence of IVGs presents a need 341 for cellular co-infection, and that this need has a high probability of being met when spread 342 occurs in a spatially structured manner. Use of silent genetic tags allowed us to experimentally 343 interrogate cooperation at the cellular level to test these predictions. In agreement with our 344 models, experiments in cell culture showed that co-infection and complementation occur readily 345 when multiple rounds of infection are allowed to proceed with spatial structure. The high 346 potential for complementation to occur in vivo was furthermore revealed by the robust within-

Probabilistic model to estimate costs of incomplete genomes for cellular infectivity 449
To define the impact of incomplete viral genomes on viral infectivity, we considered how 450 the infectious dose varies with P P , the probability that an individual genome segment from an 451 infectious virus is successfully delivered and replicated within the infected cell. This model 452 assumes that a single particle can deliver each of the 8 segments that a host cell does not already 453 contain. Furthermore, delivery of each segment is independent, making the action of segment 454 delivery by a single virion a binomial process (p = P P , N = # of missing segments).  Table 1.

6) Infected cells (1 -8 segments) may become refractory to super-infection. 506
To generate the data shown in Figure 4 and Figure 6B, these events were iterated over multiple viruses were cultured in 9-11 day old embryonated hens' eggs unless otherwise noted below. To 527 limit propagation of defective interfering viral genomes, virus stocks were generated either from 528 a plaque isolate or directly from 293T cells transfected with reverse genetics plasmids. The only 529 genetic modification made to the Pan/99-WT virus was the addition of sequence encoding a 6-530 His tag plus GGGGS linker following the signal peptide of the HA protein as previously described 8 . 531 A genetically distinct but phenotypically similar virus, referred to herein as "Pan/99-Helper", was 532 generated by the introduction of six or seven silent mutations on each segment, as well as the 533 addition of the HA-tag (sequence: YPYDVPDYA) instead of the 6-His tag. The silent mutations are 534 listed in Supplementary Table 1 and were designed to introduce strain-specific primer binding 535 sites, allowing the presence or absence of each segment to be measured by qRT-PCR. Epitope 536 tags in HA allowed identification of infected cells by flow cytometry. inoculate a plaque assay, and after 48 h a plaque isolate was used to inoculate a 75 cm 2 flask of 551 MDCK cells. Following 48 h of growth, this stock was aliquoted and used to inoculate a plaque 552 assay. One plaque isolate was diluted and used to inoculate 10-day-old embryonated chickens' 553 eggs for a third passage. Experiments were conducted with this egg passage stock. 554 Infections 555 6-well dishes (Corning) were seeded with 4 x 10 5 MDCK cells in 2 mL MEM, then incubated 556 for 24 h. Prior to inoculation, MEM was removed and cells were washed twice with 1 mL PBS per 557 wash. Inocula containing virus in 200 uL PBS were added to cells, which were incubated on ice 558 (to permit attachment but not viral entry) for 45 minutes. After inoculation, the monolayer was 559 washed with PBS remove unbound virus before 2 mL virus medium was added and plates were 560 incubated at 33°C. For multi-cycle replication, TPCK-treated trypsin was added to virus medium 561 to a final concentration of 1 ug/mL. When single-cycle conditions were required, virus medium 562 was removed after 3 h and replaced with 2 mL virus medium containing NH 4 Cl and HEPES. 563

Flow cytometry 564
At 12 h post-inoculation, virus medium was aspirated from infected cells, and monolayers 565 were washed with PBS. The monolayer was disrupted using 0.05% trypsin + 0.53 mM EDTA in 566 Hank's Balanced Salt Solution (HBSS). After 15 minutes at 37°C, plates were washed with 1 mL 567 FACS buffer (PBS + 1% FCS + 5 mM EDTA) to collect cells and transfer them to 1.7 mL tubes. Cells 568 were spun at 2,500 rpm for 5 minutes, then resuspended in 200 uL FACS buffer and transferred 569 to 96-well V-bottom plates (Corning). The plate was spun at 2,500 rpm and supernatant 570 discarded. Cells were resuspended in 50 uL FACS buffer containing antibodies at the following 571 concentrations, then incubated at 4° C for 30 minutes: 572 1.) His Tag-Alexa 647 (5 ug/mL) (Qiagen, catalog no. 35370) 573 2.) HA Tag-FITC (7 ug/mL) (Sigma, clone HA-7) 574 After staining, cells were washed by three times by centrifugation and resuspension in 575 FACS buffer. After the final wash, cells were resuspended in 200 uL FACS buffer containing 7-AAD 576 (12.5 ug/mL) and analyzed by flow cytometry using a BD Fortessa. 577 This approach was modified slightly when staining for M1 and M2. After staining for His 578 and HA (where indicated), cells were washed once with 200 uL FACS buffer, then resuspended in 579 100 uL BD Cytofix/Cytoperm buffer and incubated at 4°C for 20 minutes. BD Cytoperm/Cytowash 580 (perm/wash) buffer was added to each well, and cells were spun at 2,500 rpm for 5 minutes. 581 After a second wash, cells were resuspended in 50 uL perm/wash buffer containing antibodies at 582 the following concentrations: 583 1.) Anti-M1 GA2B conjugated to Pacific Blue (4 ug/mL) (ThermoFisher) 584 2.) Anti-M2 14C2 conjugated to PE (4 ug/mL) (Santa Cruz) 585 Following another 30 minutes of staining at 4° C, cells were washed three times (as 586 described above) with perm/wash buffer, then resuspended in FACS buffer without 7-AAD just 587 prior to analysis on the BD Fortessa. 588

Quantification of P P values 589
A single cell sorting assay was used to measure the frequency with which individual 590 genome segments are delivered to an infected cell. 4*10 5 MDCK cells were seeded into a 6-well 591 dish, then counted the next day just before inoculation. Cells were then washed 3x with PBS and  Colored points correspond to the average P P value of each experimental replicate in Fig. 1 and  700 therefore show the prediction for Pan/99 virus. (C) The probability that a cell will be productively 701 infected following infection with a given number of virions was calculated for the same P P values 702 as in (A). (D) The expected number of virions required to make a cell productively infected is 703 plotted as a function of P P . As in (B), colored points correspond to the average P P value of each 704 Pan/99 experimental replicate in Fig. 1.  Curves were generated by local regression. Shading represents 95% CI. 725