Analysis By Deep Sequencing of Discontinued Neurotropic Yellow Fever Vaccine Strains

Deep sequencing of live-attenuated viral vaccines has focused on vaccines in current use. Here we report characterization of a discontinued live yellow fever (YF) vaccine associated with severe adverse events. The French neurotropic vaccine (FNV) strain of YF virus was derived empirically in 1930 by 260 passages of wild-type French viscerotropic virus (FVV) in mouse brain. The vaccine was administered extensively in French-speaking Africa until discontinuation in 1982, due to high rates of post-vaccination encephalitis in children. Using rare archive strains of FNV, viral RNAs were sequenced and analyzed by massively parallel, in silico methods. Diversity and specific population structures were compared in reference to the wild-type parental strain FVV, and between the vaccine strains themselves. Lower abundance of polymorphism content was observed for FNV strains relative to FVV. Although the vaccines were of lower diversity than FVV, heterogeneity between the vaccines was observed. Reversion to wild-type identity was variably observed in the FNV strains. Specific population structures were recovered from vaccines with neurotropic properties; loss of neurotropism in mice was associated with abundance of wild-type RNA populations. The analysis provides novel sequence evidence that FNV is genetically unstable, and that adaptation of FNV contributed to the neurotropic adverse phenotype.

encephalitis in children, constituting a rationale for restricting administration of the vaccine to those above 10 years of age in 1960, and later discontinuance of the vaccine in 1982 due to the availability of the safer 17D strain (which is routinely given to those aged 9 months and older 9 .
Mechanisms of neutrotropism for YFV vaccines are poorly understood, and information on the FNV is particularly obscure. Reference material for FNV is of very limited availability, and passage histories are not generally accessible. YFV strains are variably neurotropic when directly introduced to central nervous tissues of mice, a property that is exploited in lethality and serological protection assays [10][11][12][13] . Both rhesus and cynomolgous macaques are in some cases susceptible to neurotropic disease when challenged intracerebrally with YFV vaccine strains, while wild-type strains are not neurotropic and cause viscerotropic disease even when introduced to the brain; this property forms the basis of the current World Health Organization standard assessment of neurotropism for YFV 17D vaccine seed lots 14 .
There are multiple, competing hypotheses available to explain the adverse neurotropic events attributed to administration of FNV. First, it is plausible that the adaptation of the virus to mouse brain conferred an enhanced capacity of the vaccine to infect mammalian central nervous tissues. Second, it is possible that FNV in some cases accumulates wild-type sequence content once administered. We have previously shown that the 17D vaccine genome is genetically highly stable, accumulating nucleotide substitutions mutations very infrequently 15 , whereas the FNV strain genome is highly unstable under laboratory conditions. This instability of FNV was previously observed after few cell culture passages, with concomitant alterations to mouse neurotropism, appearance of wild-type genotypes, and greater consensus sequence variation than the 17D vaccine 16 . Only one example of FNV has been fully sequenced in reference to the parental wild-type FVV strain, revealing 77 nucleotide changes and 33 amino acid substitutions 17 . Other FNV strains were partially sequenced in the study of Wang et al., which were unequally neurotropic in mice and monkeys. In comparison, studies of 17D vaccine produced from different substrains (17D-204, 17DD and 17D-213) with different passage histories all show very limited genetic variation 18 . Thus, the co-observation of variable neurotropic phenotypes in mice and monkeys with sequence instability in FNV strains leads to a reasonable hypothesis that incompletely fixed attenuation determinants are present in the population structures of these vaccines. This study investigates these population structures using in silico techniques.

Results
Four examples of the FNV (FNV-IP, FNV-FC, FNV-Yale and FNV-NT), 17D-204 vaccine and wild-type parental strains Asibi and FVV were all subjected to next generation sequencing ( Table 1). All viruses were passaged once in Vero cells prior to sequencing, save 17D-204 which was amplified directly from reconstituted vaccine.
Analysis of Consensus Sequences. Read alignments were analyzed to identify the consensus sequences for each virus. All alignments were of very deep coverage (Fig. 1A), and so the recovery of consensus (Table 2) and variant information is provided at high confidence.
Putative Consensus Genotype of FNV. For FNV-IP, 60 nucleotide differences were observed compared to the wild-type parent FVV, including 27 amino acid substitutions. For FNV-Yale, 63 nucleotide differences were observed, including 28 amino acid substitutions. For FNV-FC, 60 nucleotide differences were observed, including 27 amino acid substitutions. For FNV-NT, 122 nucleotide differences were observed, including 49 amino acid substitutions. Overall, 19 common amino acid substitutions are conserved between all four FNV strains when compared to wild-type FVV (Table 2), however 26 of these substitutions are conserved by the FNV strains at 75 percent. Since further data in this study reports sequence instability in some of the vaccine strains, these 26 sites were presumed to describe a consensus genotype of FNV relative to FVV and were analyzed as such.
Comparison of FVV to Asibi. For comparison of the wild-type FVV to the Asibi strain, 11 nucleotide differences were observed, with 2 amino acid substitutions located at FVV positions 1572 (E-K200T) and 5408 (NS3-V279I),   For 26 Sites Composing the Putative FNV Genotype. NSE was calculated along the 26 nucleotide positions encoding the putative FNV genotype (Table 2), and is shown in Table 3A and Fig. 1E. A single-factor Kruskal-Wallis test showed a significant difference for NSE within the aggregate group (X 2 = 67.348, df = 6, p < 0.001). A pairwise Wilcoxon sign-rank test revealed significant differences in NSE between all pairs considered (n  Table 2B and Fig. 1F. A single-factor Kruskal-Wallis test showed a significant difference of RMSD between the strains at these sites (X 2 = 131.87, df = 6, p < 0.001).
Diversity Loss for 17D-204 from Asibi at Known Genotype. NSE of Asibi and 17D-204 was compared using the 20 amino acid substitution sites defined by international standards to be encoded by all substrains of 17D-204. NSE of Asibi at these sites (mean = 0.118, median = 0.133, s.d. = 0.0738) was greater than that of 17D-204 (mean = 0.029, median = 0.015, s.d. = 0.046), and this difference was statistically significant by Wilcoxon test (W = 361, p < 0.001) (Fig. 1B). The loss of diversity at these sites is expected, and consistent with previous reports 15 .
Error Rates for Open Reading Frames. Error rates for the alignments were compared using quantile-quantile plots and bootstrapped Kolmogorov-Smirnov (KS) tests; this procedure revealed significant divergence between the distribution of error rate for all pairings of strains, except between pairs FNV-IP/FNV-NT and FNV-Yale/ FNV-FC (Fig. 2).
Analysis of Variant Structure. Subpopulation, single-nucleotide variants were recovered in all of the read alignments analyzed in the study, with differences observed between the vaccine strains for raw counts of variants and overall distribution of the frequency of the variants. The results of this test support the observation by consensus sequence that a number of nucleotide positions in the read alignments encode reversions to wild-type identity, as defined by the FVV consensus sequence used here (Fig. 3A,B). For the highly neurotropic strains FNV-Yale and FNV-IP, the numbers of observed variants was equivalent (n = 91, both strains); for the low-neurotropism strains FNV-NT and FNV-FC, a greater abundance of variants was recovered from the read alignments, and these were at greater frequency that the high-neurotropism strains (Table 4). Vaccine strains (FNV, 17D) contained fewer variants than their parental strains (FVV, Asibi). Overt Reversion and Instability. Eight nucleotide positions were observed to be especially unstable between the FNV strains, showing both high diversity and high genetic distance from FVV for at least one of the FNV strains sequenced (Fig. 1E,F). These were, respectively, 1572 (E-T200K), 4607 (NS3-I13V), 6942 (NS4B-L19S), 7178 (NS4B-V97I), 7380 (NS4B-A165V), 7641 (NS5-2RT), 8409 (NS5-I258T), 8640 (NS5-R335K) (Table 3C). For these positions, the presence of low-frequency revertants was most evident in highly neurotropic strains FNV Yale and FNV-IP; these reversion events in some cases proceed to consensus levels in the low-neurotropism strains FNV-FC and FNV-NT (Table 2), and so were not detected by the model when the revertant nucleotide was of majority identity (Table 5).

Discussion
Previous to the molecular era and viral quasispecies paradigm, population diversity had been hypothesized to influence the properties of YFV vaccines. Some baseline of population diversity has been observed in YFV vaccine strains, for which evidence includes the recovery by plaque purification of not only rare clonal neurotropic variants, but also the detection of low-level antigenic variants of 17D by monoclonal immunoassay 19,20 . In 1935, the presence of a selectable population of virus alleles was proposed to explain the rapid adaptation of FNV to serial intrahepatic passage in rhesus macaques, in which the vaccine strain gained a wild-type viscerotropic phenotype 21 . Efficient phenotype selection was also shown for 17D, for a study in which, monkey neurotropism was efficiently rescued by passage of the 17D vaccine strain in mouse brain, showing increased morbidity after a single passage cycle 22 . These studies and others suggest a paradigm by which adverse variability YF vaccines may result from poorly controlled empiric adaptation.
We have previously reported a massively parallel (deep sequencing) approach to analyze population structure of the commercial YFV vaccine strain 17D-204 (YF-Vax ® ), showing that the vaccine was of lower diversity than the wild-type parental strain Asibi 15 . From these previous results, it was postulated that low population diversity would contribute to the considerable safety record of 17D vaccine derivatives. The current study confirmed the previous results for Asibi and 17D (YF-Vax ® ), in which the vaccine was observed to be of significantly lower diversity than the parental strain. This pattern is recapitulated for comparisons of FVV and FNV. Furthermore, the pattern is overt for the 26 nucleotide substitutions form the putative consensus genotype for FNV. While the general pattern of low diversity is repeated here for FNV, there is considerable population diversity that is not shared between the vaccine strains. This supports previous data that suggests that certain collection strains of FNV may not have been stably fixed. Unfortunately, the relationship of the four FNVs in this study to the original "Dakar" strain of FNV is not known. Historical documentation attests that FNV lots were produced from seeds of restricted passage level, controlling for sterility, potency, and immunogenicity in monkeys 6 . Laboratory passage histories are not available for the strains that were handled in this study from any of the source institutions, however, the previously observed differences in mouse and monkey neurovirulence for these strains provides a framework by which the RNA population structure of the viruses may be associated with the adverse phenotype of the vaccine. A comparison of FNV neurovirulence in mice showed that FNV-Yale produced a shorter average survival time (AST) following both intracerebral and intranasal inoculation when compared to FNV-IP and FNV-NT, while FNV-FC had an extended AST compared to the three other strains 17 Table 2), suggesting that these are the particular amino acids involved in the derivation of FNV from FVV; however, we cannot exclude that other mutations in the original FNV have been lost on passage of the four FNVs studied here. For example, several amino acid substitutions are conserved across three of the vaccines, but not in the most divergent strain, FNV-NT, which significantly had lost the monkey neurovirulence phenotype (Table 1). By RMSD, the genetic distance from parental consensus identity is bifurcated (Table 3); a number of sites harbor particularly overt reversion to wild-type identity. This finding suggests that these particular mutations in the vaccine are selected to revert under laboratory conditions, and that maintenance of this small set of mutations (n = 8) may influence the neurotropic phenotype of FNV (Table 2).
Notably, the substitutions M-L36F, E-K200T, E-K331R, and NS4B-I95M were observed in common for the attenuation processes between FVV to FNV, and Asibi to 17D (Table 2). Flavivirus M protein contains a pro-apoptotic domain that, upon transfection and overexpression, is attenuated in phenotype when phenylalanine    Table 4. Descriptive statistics for single-nucleotide variants observed in the read alignments of each virus. Although raw counts of polymorphic sites are irregular, the frequencies of these SNPs are lower for the vaccine strains than for the parental viruses; frequencies of SNPs are higher for FNV strains with low neurotropism (FNV-FC, FNV-NT).
Scientific REPORtS | (2018) 8:13408 | DOI:10.1038/s41598-018-31085-2 is substituted at residue 36 23 . Since the substitution at E-K200T is shared by Asibi and all FNV strains, it is likely either a cell culture adaptation or compensatory, and not consequential to the attenuated phenotype. Despite being associated with mouse neuroinvasion the residue did not singularly affect the phenotype of a cloned virus 24 .
The residue E-331R was associated with both attenuation of Asibi to yield 17D and experimental adaptation of the Asibi strain to hamster liver, so the contribution to vaccine attenuation is complicated by the paucity of animal models and the difficulty to parse neurotropic from viscerotropic determinants of pathogenicity 25 . The function of NS4B mutations in attenuation of the vaccine is not understood, however, evidence has been put forth to suggest roles for NS4B in both interferon antagonism and replication complex associations 26,27 . Reversion to wild-type is observed for FNV strains at not only consensus, but also for small population variants. For raw variant counts and revertants, the strains varied in their divergence from the original vaccine; e.g. FNV-IP and FNV-Yale alignments revealed the relatively low counts of both variants and revertants. Conversely, counts of variants and revertants were higher in FNV-FC and FNV-NT, which are hypothesized to be the most phenotypically divergent from the original FNV strain. Though genome-scale estimations of diversity are similar between all FNV strains and 17D-204; as expected, diversity of FVV was greater than for the vaccines. Processing and alignment methods used in these analyses were more stringent than previously used for the Asibi strain, reducing observable differences in estimated diversity between Asibi and 17D-204 15 . However, density comparison shows both that 17D-204 variants are fixed to lower aggregate frequencies than those of Asibi, FVV, or the FNV vaccines, and that the wild-type strains contain a population of higher frequencies for coding variants (Fig. 3A,B). This pattern may contribute to the superior safety record of 17D vaccine compared to FNV, and supports a hypothesis that diversity influencing the vaccine phenotype arises from a limited number of sites, and that some level of low-frequency variants are tolerated in fixed vaccine preparations.
An alignment of reference sequences for Asibi [gbAY640589.1] and FVV [gbYFU21056.1] shows 21 nucleotide differences, of which 8 are coding changes (not shown). The presented study indicates lesser divergence of the parental strains than was previously described by Sanger methods undertaken 20 years ago, revealing instead only 11 consensus nucleotide differences and two amino acid substitutions 17 . FVV and Asibi were both isolated in 1927 at considerable geographic separation (Senegal and Ghana, respectively), however the limited divergence observed in this study supports a conclusion that both parental strains originated from the same YF epidemic.

Viruses. Wild-type French viscerotropic (FVV) and Asibi strains were obtained from the World Reference
Center for Emerging Viruses and Arboviruses (Galveston, TX, USA) as lyophilized cell culture supernatant. "FNV-IP" was obtained from the Institut Pasteur, (Paris, France). "FNV-Yale" was obtained from the Yale Arbovirus Collection, (New Haven, CT, USA). "FNV-FC" was obtained from the Centers for Disease Control and Prevention (Fort Collins, CO). "FNV-NT" was obtained from what is now called Public Health England, Porton Down (Salisbury, UK) 12 . Vaccine strain 17D-204 was obtained from a commercial ampoule of YF-Vax ® lot UH356AA (Sanofi-Pasteur, Swiftwater, PA) ( Table 1). Asibi and 17D-204 were included in the study as method controls; they were previously analyzed for their diversity profiles and provide an expected baseline of the comparative relationship of vaccine to wild-type parental strain 15 . The passage history of all four FNVs from originating FNV vaccine is unknown; their in vivo phenotypic properties have been described in mouse and monkey and are cited in Table 1 17 . All vaccine strains were passaged once only in Vero cells in this study to produce working stocks using Eagle's minimal essential media, supplemented with 2% fetal bovine serum, L-glutamine, and non-essential amino acids (5% CO 2 , 37 °C). Wild-type strains FVV and Asibi were handled at BSL-3 containment, vaccine strains were handled at BSL-2.

RT-PCR Amplification Strategy.
Wild-type FVV and Asibi strains were reconstituted in sterile, molecular-grade water. Vaccine strain 17D-204 was reconstituted using the provided injection diluent. RNA was isolated from reconstituted seed stocks of FVV and all working stocks of FNV by column isolation using the QiAmp kit (Qiagen, Gaithersburg, MD), and amplified by RT-PCR to produce six overlapping amplicons as previously described 15 .  Comparison of Diversity And Genetic Distances. Read alignments were analyzed for the population diversity of viral RNAs, which is quantitated by index measurements that derive from the relative frequencies of nucleotide identities that align to a position in the reference sequence. For each virus sequenced, nucleotide counts for all strain alignments were parsed from BAM files using the R library deepSNV v.1.8.0, and relative frequencies were computed for the possible nucleotide set {A, C, G, U, −}, which includes a gap character that may be introduced by the alignment algorithm. Error rate ER of the alignment X at nucleotide position i is defined as the remaining nucleotide frequencies after subtracting the frequency of the maximum value of the set, the frequency of the presumed consensus nucleotide.
In the same manner, diversity of nucleotides at each position i were estimated using relative frequencies normalized Shannon's entropy NSE and normalized to 1.61, the maximum value of the index. The probability P refers to the frequency of any nucleotide in the set p, which are log-transformed and summed.
Additionally, genetic distance from wild-type was estimated by root-mean square distance RMSD at each nucleotide position in two read alignments X (wild-type) and Y (vaccine). The squared difference in nucleotide frequency between alignments X and Y is obtained, squared, summed, and normalized. RMSD values for a nucleotide position may range from 0 (no change) to 0.632 (complete selection of an alternate consensus nucleotide. The simultaneous use of both diversity and distance measurements is necessitated by the possibility that nucleotide diversity may be of both low diversity and high genetic distance from the reference alignment; this would be a case of complete selection of an alternate consensus nucleotide. Changes in diversity and relative genetic distance measurements were estimated along scales of (1) the single open reading frame of the virus and (2) sets of nucleotide positions which define the consensus genotype differences between wild-type and vaccine. For this study, the "FNV" nucleotide set consists of the 26 positions encoding amino acid substitutions in at least 3 of 4 FNV strains, relative to the parental strain FVV. The "17D" nucleotide set consists of the 20 positions encoding amino acid substitutions in 17D strain vaccines, relative to the parental Asibi strain 14 .
Analysis of Variant Structure. For each alignment, variants were modeled using V-Phaser v.2.0, with default settings and a significance cutoff of 0.05 30 . Variants were classified against the consensus codon as silent or coding using SNPdat.pl 31 , and then were inspected for reversion to wild-type identity. Identified variant sets were inspected for the presence of coding polymorphisms that arose from nucleotide positions with large shifts in diversity between the strains, under a hypothesis that these would be most likely to influence previously identified differences of in vivo neurotropism.
Statistics. Error rate distributions were compared for each pair of alignments using quantile-quantile plots, followed by bootstrapped Kolmogorov-Smirnov tests. NSE and RMSD were compared nonparametrically across the scale noted using single-factor Kruskal-Wallis followed by pairwise Wilcoxon signed-rank tests, using an alpha of 0.05 and bonferroni correction. Variants were modeled from paired-end Illumina read alignments using the method of Yang 30 , using default false discovery rate settings. All statistical tests were performed in base R v.3.4.4, with graphs generated using ggplot2.