Comparative analysis of pentavalent rotavirus vaccine strains and G8 rotaviruses identified during vaccine trial in Africa

RotaTeqTM is a pentavalent rotavirus vaccine based on a bovine rotavirus genetic backbone in vitro reassorted with human outer capsid genes. During clinical trials of RotaTeqTM in Sub-Saharan Africa, the vaccine efficacy over a 2-year follow-up was lower against the genotypes contained in the vaccine than against the heterotypic G8P[6] and G8P[1] rotavirus strains of which the former is highly prevalent in Africa. Complete genome analyses of 43 complete rotavirus genomes collected during phase III clinical trials of RotaTeqTM in Sub-Saharan Africa, were conducted to gain insight into the high level of cross-protection afforded by RotaTeqTM against these G8 strains. Phylogenetic analysis revealed the presence of a high number of bovine rotavirus gene segments in these human G8 strains. In addition, we performed an in depth analysis on the individual amino acid level which showed that G8 rotaviruses were more similar to the RotaTeqTM vaccine than non-G8 strains. Because RotaTeqTM possesses a bovine genetic backbone, the high vaccine efficacy against G8 strains might be partially explained by the fact that all these strains contain a complete or partial bovine-like backbone. Altogether, this study supports the hypothesis that gene segments other than VP7 and VP4 play a role in vaccine-induced immunity.

For VP4, a clear West-East African division exists within the P [6] RVA strains, however the strain from Kenya still shared 95.6-95.9% nt identity with the P [6] strains from Ghana and Mali. In this study, all P [6] strains were more closely related to human P [6] strains, rather than to porcine P [6] strains, represented by strain RVA/Pig-tc/USA/Gottfried/1983/G4P [6] in Fig. 2. Strain Ghan-059 clusters within the P[1] genotype, and was most closely related (97.5% on the nt level) to strain RVA/Human-tc/NIG/ HMG035/1999/G8P [1], a human strain believed to be of animal origin, isolated from a predominantly rural livestock-producing area in Nigeria 23 . The non-G8 strains were also heterotypic to the genotypes present in RV5, clustering within the P [6] and P [4] genotypes.
Phylogenetic analyses of the genes forming the genetic backbone of the West-African G8P [6] RVAs. The 9 RVA gene segments encoding VP1-3, VP6, NSP1-5/6 of the six characterized West-African G8P [6] strains can be divided into 3 'tree-groups' , based on their clustering in the phylogenetic trees. The first 'tree-group' contains the VP1, VP6, VP3 and NSP2 segments of the West-African strains. Figure 3 shows that strains Ghana-113, Ghan-149, Mali-039, Mali-048, Mali-119 and Mali-135 are only distantly related to typical human DS-1-like RVA strains, which are indicated by red bars in the phylogenetic dendrograms. Instead, these genes cluster within clusters containing bovine or bovine-like human RVAs, indicated by blue bars in the phylogenetic trees. For example, for VP6, the strains from Mali are closely related to RVAs isolated from an antelope and a cow, and to the South-African strain RVA/Human-wt/ ZAF/2371WC/2008/G9P [8] which has several gene segments with close similarity to artiodactyl RVAs 34 . and RVA/Pig-wt/IND/HP140/1987/G6P [13], the latter being a porcine strain with a VP6 gene of bovine origin 35 . The VP1 genes of the West-African RVA strains formed 3 distinct subclusters. The first and second subcluster contained the Ghanaian and Malian G8P [6] strains which were most closely related to RVA/Human-wt/GHA/GH018-08/2008/G8P [6], described to be a human-bovine reassortant virus 31 . The third cluster contained the G8P [1] strain, clustering most closely to strain RVA/Human-wt/BEL/ B1711/2002/G6P [6], an atypical human strain which is believed to be a bovine/human reassortant strain infecting a Belgian child during a trip to Mali 36 . For VP3, the strains from Mali are most closely related to strain RVA/Human-wt/HUN/BP1062/2004/G8P [14], another atypical human strain which is the result of an interspecies transmission event 37 . During 2008, at least three different lineages of the M2 genotype circulated in Ghana, as indicated by three distinct Ghanaian clusters within the VP3 tree, showing only 83.0-88.4% nt identity. One cluster only contained non-G8 RVAs, the second cluster contained strain Ghan-059 and the third contained both strains Ghan-149 and Ghan-113. The latter two strains are very closely related to VP3 sequences of GH018-08 and RVA/Human-wt/GHA/GH019-08/2008/G8P [6], showing 99.9% nt similarity 31 . The tree of NSP2 also showed three distinct Ghanaian clusters. Two of them, which contain the Ghanaian G8 RVAs (Ghan-059 vs. Ghan-149 and Ghan-113), show 87.5% similarity on the nt level and are both subclusters of the N2 genotype not belonging to the typical human N2 cluster (indicated in red in Fig. 3). The four Malian strains are identical to 2371WC (100% nt similarity) and cluster together with unusual human P [14] RVAs indicated a shared ancestry with animal RVAs.
The third 'tree-group' includes the remaining genes, VP2, NSP4 and NSP5. For these genes, the most likely host origin of the strains is either animal or human depending on their country of origin. Figure 5 shows that the G8P [6] strains from Ghana are more closely related to RVAs of animal origin (blue bar) than to typical human RVAs (red bar). The VP2 genes of the Ghanaian strains are divided in two clusters sharing only 85.9% nt similarity with each other and showing high similarity to the VP2 sequences of GH018-08 and GH019-08. For NSP5, the Ghanaian strains belong to genotype H3, clustering together with RVAs isolated from animals such as cow, goat and sheep. For NSP4, the three Ghanaian strains belonged to genotype E2. Both G8P [6] strains are identical and show 90.9% nt similarity to the G8P[1] strain Ghan-059. For all three genes (VP2, NSP4 and NSP5) the strains from Mali and the non-G8 strains from Ghana, formed a monophyletic cluster within the C2, E2 and H2 genotypes, clustering together with typical human RVA strains, isolated in different parts of the world. Ghan-059 as the East-African G8P [6] strain Keny-078, often showed another evolutionary pattern than that of the other West-African G8P [6] strains. The G8P [1] RVA strain Ghan-059 appears to possess 11 gene segments which are more closely related to bovine-like RVA strains than to typical human DS-1-like strains, including the A11 (NSP1) genotype and the T6 (NSP3) genotype, two genotypes typically found in RVAs isolated in animals which are members of the Artiodactyla family (cows, antelope, sheep and goat), suggesting a complete animal origin of this strain. The only G8 strain characterized from Kenya, possess the same genotype constellation as the G8P [6] strains from Mali (G8-P[6]-I2-R2-C2-M2-A2-N 2-T2-E2-H2). However, the majority of its genes were more closely related to gene segments of typical human DS-1-like RVAs. The only exception was NSP4, showing a high similarity (99.6%) to G8P [6] strains isolated from Iraq which were suggested to have NSP4 genes of animal origin and to the bovine RVA strain RVA/Cow-wt/DNK/DK12011/2007/G6P [5] (97.5%) 38,39 . Genetic distances to RotaTeq ™ . The results of the phylogenetic analyses are summarized in Fig. 1, showing the most likely host origin of each segment (genes of animal origin in blue and of human origin in red); revealing that at least 4 distinct variants of G8 RVA strains co-circulated among infants in sub-Saharan Africa during 2008. The fact that all 8 G8 strains possessed a partial or complete bovine-like genetic backbone, led to the hypothesis that the observed high VE of RV5 against G8 RVAs might be explained by the fact that RV5 also possesses a bovine RVA genetic backbone. To investigate this hypothesis in more detail we compared the genetic distances (on the amino acid level) between RV5 and G8 RVA strains on the one hand, and RV5 and human non-G8 DS-1-like strains, circulating in the same area during the same period, on the other hand. Therefore, 35 human DS-1-like strains, representing the identified diversity of DS-1-like strains during the study period, were selected and completely sequenced. In total 17 G2P[6], 11 G2P[4], 6 G3P[6] and 1 G1P [6] strains with a complete DS-1-like background have been completely sequenced (table S1). Figure 6 plots all the genetic distances between vaccine and non-vaccine strains. RV5 strain RVA/ Vaccine/USA/RotaTeq-WI78-8/1992/G3P7 [5] represents the X-axis of the plot; except for the genotypes not present in strain WI78-8, which are represented by strains WI79-9 (for G1 and M1), SC2-9 (for G2), BrB-9 (for G4) or WI79-4 (for P [8]). The genetic distances between G8 strains and RV5 are indicated by red symbols, while those between the non-G8 DS-1-like strains and RV5 are visualized by blue symbols. A large fraction of the symbols are plotted in the 15-45% aa difference range, as could be expected for the segments which belong to genotypes heterotypic compared to those of RV5 strains. More specifically, the VP7 (G8 instead of G1-G4 or G6), VP4 (P [6] or P[1] instead of P [8] or P [5]), VP3 (M2 instead of M1), NSP1 (A2 or A11 instead of A3), NSP3 (T2 instead of T6) and NSP5 (H2 instead of H3) genes possesses heterotypic genotypes. Some of these G8 RVA genes show smaller genetic distances to RV5 than the corresponding genes of DS-1-like RVAs ( Fig. 6 and table S2). Firstly, the VP7 genotype G8 (together with the G1 and G3 genotypes) is closer related to the G6 genotype present in the vaccine than the G2 genotype often associated with DS-1-like human strains (yellow color in table S2, column VP7-G6). Secondly, the NSP1 genotype A11 (present in strains Ghan-059) showed a smaller genetic distance to the A3 genotype present in RV5 compared to the A2 genotype (74.5% aa similarity for A3 versus 58.5-59.1% for A2). Thirdly, Ghan-059 contained a T6 NSP3 gene, which shared 97.1% aa similarity with the T6 present in the vaccine. Finally, all 3 Ghanaian strains possessed the same NSP5 genotype as RV5, H3, instead of H2 present in the Malian strains and the other DS-1-like RVAs. Focusing on the genotypes of the G8 strains, which were homotypic with the genotypes of the RV5 vaccine, minor differences between RV5/G8 RVAs and RV5/non-G8 DS-1-like RVAs were found. For example, although all VP6 genes possessed the I2 genotype and show small genetic distances to RV5, the VP6 protein of the G8 strains show only 0.3-0.5% aa differences to RV5 while the selected DS-1-like RVAs show 1.0-2.0% aa differences. The same observation can be made for VP1, showing aa differences to RV5 of 2.1-2.5% and 2.9-3.5% respectively for the G8 and non-G8 RVA strains. For VP2, VP3-M2 and NSP4 differences in genetic distances to RV5 exists between the Ghanaian and Malian G8 RVAs (as indicated by the difference in red dots and rectangles in Fig. 6) pointing out a closer relationship between the VP2, VP3 and NSP4 of RV5 and G8 RVAs isolated in Ghana than to those isolated in Mali.
Two interesting observations could be made regarding the non-G8 RVAs. First, the NSP2 of the non-G8 strains from Mali were closer related to the NSP2 of RV5, compared to all other RVAs. Second, eight Ghanaian non-G8 strains clustered together with the NSP4 of the G8 strains, which were suggested to be of animal origin.

In-depth analysis on the individual amino acid level. To evaluate if certain individual amino acid
positions could contribute to the increased VE of RV5 against G8 strains, we plotted all 5771 amino acids (aa) for which data were available on a scale from -1 to 1, with a score of 1 corresponding to an aa of the G8 strains which was identical to one of those present in RV5, but was not present in the non-G8 DS-1-like RVAs (Fig. 7). In 86.33% of the aa positions the similarity score was 0 (these positions were omitted in Fig. 7), indicating no difference between the G8 and non-G8 strains versus RV5 strains. 7.82% of investigate sites showed a positive score compared to 5.85% with a negative score, indicating that more aa of the G8 RVAs were similar to RV5 than the non-G8 RVAs. Of particular interest were the aa with a score ranging between 0.5 and 1, as these positions in the G8 strains were much more similar to the RV5 strains, compared to the non-G8 RVAs. In total we identified 49 amino acids with a score ranging between 0.57 and 1. Of these, 8 aa were located in VP7, 4 in VP6, 10 in VP1, 21 within VP3, 3 in NSP2 and 3 in NSP5 (Table S3).
For VP7, none of the identified residues were located within the two structurally defined antigenic epitopes (epitopes 7-1 and 7-2) 2 . However, aa 35, was previously described to be part of the cytotoxic T cell epitope. Franco and colleagues identified aa 31-40 as an immunodominant region in the response against VP7, containing a K b allele specific motif (XXXX(Y,F)XX(I,L,M,V,T)) where the one-letter code for aa is used, X represents any aa and parenthesis are used to indicate the anchor positions) 40 . This octapeptide requires a tyrosine (Y) or phenylalanine (F) at position 5, which correlated with aa 35 of VP7, and serves as a potential major histocompatibility complex class I anchor residue 41 . However, all analysed RVAs showed either a Y or F at aa 35 of VP7, keeping the K b motif intact not effecting its potential role in T-cell mediated immunity. In addition to VP7, VP6, VP1 and VP3 also were described to contain multiple CTL epitopes, indicating their role in T-cell mediated immunity [42][43][44] . The non-structural proteins, NSP2 and NSP5 are known to play a role in the formation of viroplasms, which are the sites of genome replication and viral particle packaging 45 . Donker and colleagues identified a NSP2 monoclonal antibody binding epitope using phage display, modelling the antibody binding region on the NSP2 protein with a motif spanning aa 244 to aa 252 46 . However, this region does not span one of the aa identified in this Figure 7. Comparison of the similarity of the G8 strains to RV5 and the non-G8 strains to RV5 per amino acid position. Each amino acid position with a score different from zero was indicated with a dot, color-coded ranging from red (1) to blue (-1). Positive similarity scores represent positions for which G8 strains were more similar to RV5 than non-G8 strains. Negative scores indicate amino acids positions were non-G8 strains are more similar to RV5 strains than G8 strains.
Scientific RepoRts | 5:14658 | DOi: 10.1038/srep14658 study. More specifically, none of the aa residues identified in this study have been described thus fare to play a role in immunity against RVA.

Discussion
Despite extensive research, the immunologic mechanisms and effectors responsible for protection against rotavirus after either natural infection or vaccination are still incompletely understood 47,48 . The recognition that multiple human rotavirus genotypes can co-circulate has long raised the critical question whether protective immunity is mainly homotypic (same G-or P-type) or rather heterotypic (different G-or P-type). The finding that heterotypic protection against severe rotavirus gastroenteritis caused by G8P [6] and G8P [1] rotavirus strains was high (87.5%, 95% CI 6.5-99.7) and statistically significant in the African clinical trial, over the 2-year follow up period of the study has indicated that high VE of RV5 in sub-Saharan Africa is possible against heterotypic genotypes 14 .
This study aimed to investigate the hypothesis that the observed high level of protection against G8 strains by RV5 could potentially be explained by the genetic backbone of the G8 strains used to estimate the VE during the clinical trial of RV5 conducted in Africa. In general, the G8 strains that drove the G8 serotype-specific VE were samples isolated in Ghana and Mali -not Kenya. Despite the high prevalence of G8 RVAs in Kenya (23%), the majority of these G8 samples were not classified as cases since they did not meet the criteria of having a Vesikari severity score of ≥ 11. The G8 serotype-specific analysis performed by Tapia et al. was based on one vaccine case and eight placebo cases (cases had to show severe rotavirus gastro-enteritis, regardless of serotype, that occurred 14 days post-dose 3), warranting caution for drawing general conclusions. More specifically, in Ghana, four G8 cases (one of which is associated with P[1]) among placebo receipts were found, three of which were completely sequenced in this study (Table S4). In Mali, four G8 cases -one in a vaccine recipient and three among placebo recipients -were detected; all four were included in this study (Table S4). The remaining G8 case used for the G8 serotype-specific analysis was reported in a placebo subject from Kenya. Unfortunately this case did not meet the selection criterion per the study protocol to make it eligible for further investigation. However, we did have the opportunity to analyse another G8 strain from Kenya which was also isolated in 2008, Keny-078. This case (Vesikari score 13) was not included by Tapia et al. to estimate the G8 serotype-specific VE because the G-genotype of the strain could not be determined during the initial study, but was confirmed to be G8 in a later analysis 14 .
RVA strains with at least four distinct G8 genotype constellations were identified in Ghana, Mali and Kenya with different levels of relatedness to bovine-like RV strains. Surprisingly, seven out of eight completely characterized G8 RVA strains possessed RVA genotype constellations unusual for human DS-1-like strains. More specifically, the genotype constellations of the West-African strains were shown to be largely (5 or 8 out of 11 segments) or fully of bovine or bovine-like origin. These findings suggested multiple independent interspecies transmission events followed by several reassortment events for the G8P [6] strains. However, the limited number of animal RVA sequences, especially compared to the number of human RVA sequences in Genbank, makes it difficult to determine whether the G8 strains characterized in this study are the result of a recent or more historical cross of the species boundary. More efforts are needed to sequence animal RVA strains, especially in low-income country settings. Despite the fact that no reassortment between the vaccine strains and circulating human RVA strains were found in this study, previous studies reported that reassortment between different RV5 strains and/ or human RVAs is possible and can cause gastroenteritis 49,50 .
The fact that the G8P[1] possessed 11 gene segments of animal origin, including the P[1], T6 and A11 genotypes (typical for RVAs isolated from members of the Artiodactyla family), strengthens the hypothesis that this strain was the result of a direct interspecies transmission event, able to cause gastroenteritis in a human child. However the fact that this genotype was detected only in one case suggests that the G8P[1] strain was not adapted to spread among humans resulting in a dead-end infection. However, the close genetic relationship between some of the West-African G8P [6] RVA strains suggest the ability of these bovine-human reassortant RVA strains to efficiently spread from one human to another, highlighting the possibility of human and animal rotavirus strains to reassort, resulting in progeny viruses with the capability to spread in humans, pinpointing the need for continuous surveillance of rotavirus strain diversity.
The finding that the genetic backbone of the African G8 RVA strains was (partial) bovine-like led to the hypothesis that the significant protection conferred by RV5 against the heterotypic G8P[6] and G8P [1] human RVA strains could potentially be partly explained by the bovine RVA genetic backbone present in RV5. Therefore, we compared genetic distances between RV5 and G8 or non-G8 RVA strains. Minor differences between the distances of G8 and non-G8 strains were observed, often showing smaller distances between G8 RVAs and RV5 than between non-G8 RVAs and RV5. Especially the VP6, VP1 and the Ghanaian VP2, VP3 and NSP4 segments are of particular interest since these genes are more closely related to RV5 than the corresponding genes of typical human DS-1-like strains. Especially VP6 and NSP4 have been previously suggested to play a role in vaccine-induced immunity [51][52][53][54][55] . The observation that several G8 strains contained the H3 NSP5 genotype, could also be part of the explanation of the observed high VE against G8 strains.
In-depth analysis on the individual amino acid level did identify several amino acids in VP7, VP6, VP1, VP3, NSP2 and NSP5 where the G8 strains showed higher similarity to RV5 than the analysed Scientific RepoRts | 5:14658 | DOi: 10.1038/srep14658 non-G8 RVAs. Currently, none of these sites have been described to be part of rotaviral epitopes, although further studies are needed to determine if any of the identified aa belong to additional epitopes potentially present within the VP7, VP6, VP1, VP3, NSP2 or NSP5 proteins. Despite the fact that several of the observed differences are based on small numbers of samples, which does not permit robust conclusions to be made, our study may provide additional information on the possible role of other rotavirus proteins or specific amino acid residues that might play an important role in the induction of protection after natural infection or vaccination. Overall, the results of this study contribute to the understanding of why the point estimate of the vaccine efficacy against severe rotavirus gastroenteritis caused by G8 RV strains (associated with P [6] or P [1]) was higher (87.5%, 95% CI 6.5-99.7) than the efficacy against each of the individual genotypes contained in the vaccine and detected during the study: G1 (in association with any P-type), 32.3% (95% CI < 0, 55.4); G2 (in association with any P-type), 27.1% (95% CI < 0-52.2); and G3 (in association with any P-type), 62.3% (95% CI < 0-93.6). The demonstrated heterotypic protection lends further support to the hypothesis that RVA proteins other than VP7 and VP4 play a significant role in vaccine-induced immunity.

Methods
Study design. The samples characterized in this study were collected during a randomized, placebo-controlled phase III trial (registered with ClinicalTrials.gov, number NCT00362648 on August 8, 2006) conducted between 28 April 2007 and 31 March 2009 in three sites in Africa. More details about the study design of this clinical trial were reported previously 8,14 . Stool samples were collected with each diarrheal episode, if possible, and screened for the presence of RVA antigen by an enzyme immunoassay (EIA). Positive samples were G-and P-genotyped at the Merck Research Laboratories using short amplicon sequencing for VP7 and multiplex RT-PCR for VP4 56 . For this study, samples were selected by Merck/PATH based on the following criteria: (i) samples had to contribute to the per-protocol efficacy analysis, (ii) availability of sufficient stool sample (approximately 500-1500 μ L of 20% raw stool suspensions), (iii) excluding EIA weakly positive samples, and (iv) permission from central and local institutional review boards and local authorities to carry out the analyses. Samples were shipped to the Rega Institute for Medical Research in Leuven, Belgium for complete genome analysis.
Sequencing and data analysis. Double stranded RNA was extracted using the QIAamp Viral RNA mini-kit (Qiagen/Westburg, Leusden, The Netherlands) according to the manufacturer's instructions. Subsequently reverse transcription-polymerase chain reaction (RT-PCR) was carried out at denatured RNA extracts (95 °C, 2 min followed by cooling on ice) using the Qiagen One Step RT-PCR kit (Qiagen/ Westburg) with an initial RT step at 50 °C for 30 min; PCR activation was at 95 °C for 15 min, followed by 35 cycles of amplification (30 s at 94 °C, 30 s at 50 °C, and 1.5 or 6 min at 72 °C for the six shortest RVA segments and the five longest RVA segments, respectively), with a final extension of 10 min at 72 °C using the Biometra T3000 thermocycler (Biometra, Westburg BV, Netherlands). Primers used to amplify all gene segments are shown in table S5. For each sample, all 11 amplicons were pooled and sequenced on the 454 Roche GS-FLX sequencing platform (Penzberg, Germany). The sequence reads obtained from the 454 Roche GS-FLX were mapped against a VP7 and VP4 matching the previously determined G-and P-genotype of the sample completed with a Wa-like or DS-1-like background in Mira 3.4.0 or using the CLC Genomics Workbench 7.0. In cases were insufficient reads were mapped a de novo assembly was carried out. Sites with insufficient sequence read coverage after combining the reference mapping and the de novo assembly results were resequenced using the traditional Sanger sequencing method. The obtained consensus sequences were submitted to GenBank (accession numbers: Table S1) and aligned with a reference set of RVA genomes and manually edited for insertions and deletions in homopolymer regions. Using the rotavirus classification tool, RotaC (http://rotac.regatools.be), genotypes were assigned to each of the 11 gene segments 57 . In addition, to investigate the most likely host origin (human or animal) of the different gene segments phylogenetic analyses were performed. More specifically, phylogenetic trees were constructed using the maximum likelihood method with the general time reversible model in MEGA 6.0. 58 . P-distances on the amino acid (aa) level were calculated and plotted in a linear fashion.
To evaluate the similarity between G8 strains and RV5 strains on the one hand, and between non-G8 strains and RV5 strains on the other hand, a scoring system was developed. For each amino acid position the proportion of identical amino acids for G8/non-G8 and RV5 strains was determined and the difference between these proportions was used to calculate a score ranging from -1 to 1. An amino position with a score of -1 means that all the non-G8 strains were identical to RV5, while none of the G8 strains shared the same amino acid with RV5. A score of 0 means that an equal proportion of G8 strains and non-G8 strains were identical to RV5, whereas a score of 1 was defined as all G8 strains being identical to RV5, while none of the non-G8 strains were identical to RV5.