Respiratory syncytial, parainfluenza and influenza virus infection in young children with acute lower respiratory infection in rural Gambia

Respiratory viral infections contribute significantly to morbidity and mortality worldwide, but representative data from sub-Saharan Africa are needed to inform vaccination strategies. We conducted population-based surveillance in rural Gambia using standardized criteria to identify and investigate children with acute lower respiratory infection (ALRI). Naso- and oropharyngeal swabs were collected. Each month from February through December 2015, specimens from 50 children aged 2–23 months were randomly selected to test for respiratory syncytial (RSV), parainfluenza (PIV) and influenza viruses. The expected number of viral-associated ALRI cases in the population was estimated using statistical simulation that accounted for the sampling design. RSV G and F proteins and influenza hemagglutinin genes were sequenced. 2385 children with ALRI were enrolled, 519 were randomly selected for viral testing. One or more viruses were detected in 303/519 children (58.4%). RSV-A was detected in 237 and RSV-B in seven. The expected incidence of ALRI associated with RSV, PIV or influenza was 140 cases (95% CI, 131–149) per 1000 person-years; RSV incidence was 112 cases (95% CI, 102–122) per 1000 person-years. Multiple strains of RSV and influenza circulated during the year. RSV circulated throughout most of the year and was associated with eight times the number of ALRI cases compared to PIV or IV. Gambian RSV viruses were closely related to viruses detected in other continents. An effective RSV vaccination strategy could have a major impact on the burden of ALRI in this setting.

: Screening criteria for referral of out-and in-patients for clinician assessment (if one or more criteria are present for 14 days or less) Patients aged 2 to 23 months  History of cough or difficulty breathing, plus raised respiratory rate for age  Lower chest wall indrawing, nasal flaring, or grunting  Oxygen saturation less than 92%  History of convulsion  Impaired consciousness *  Bulging fontanelle  Stiff neck  Axillary temperature at least 38°C, or less than 36°C, in a patient admitted or being admitted  Prostration †  Weight below -3 z score for age  Local musculoskeletal swelling or tenderness  Irrespective of age or residential location, any child with possible meningitis Suspected pneumonia is defined if there is a history of cough or difficulty breathing of less than 14 days' duration, accompanied by one or more of: 1. Raised respiratory rate for age * 2. Lower chest wall indrawing, nasal flaring or grunting 3. Oxygen saturation less than 92% 4. Focal chest signs (dull percussion note, coarse crackles, bronchial breathing)

Suspected meningitis
Suspected meningitis will be defined according to clinical judgement and is to be considered if any of the following are present: 1. Neck stiffness 2. Impaired consciousness † 3. Prostration ‡ 4. History of convulsion 5. Bulging fontanelle

Suspected septicaemia
Suspected septicaemia will be defined as one or more of: 1. Clinician diagnosis of focal sepsis (including but not limited to: septic arthritis, osteomyelitis, endocarditis, peritonitis, liver abscess, soft tissue abscess, cellulitis) 2. For a patient admitted, or being admitted, axillary temperature is <36°C or ≥38°C and no obvious cause of fever 3. For a patient admitted, or being admitted, the clinical impression is of severe malnutrition § *Raised respiratory rate for age is defined as ≥ 50 breaths per minute for children at least 2 months but less than 12 months, and as ≥ 40 breaths per minute for children at least 12 months but less than 60 months. † Impaired consciousness is defined as V, P, or U on the AVPU score, where A is if the patient is alert, V if responsive to verbal stimulus, P if responsive to pain stimulus, and U if unresponsive. ‡ Prostration is defined as inability to drink or breast feed, or to remain in a seated position in a child otherwise able to do so. § Severe malnutrition is defined according to the WHO definition (WHO. Guideline: Updates on the management of severe acute malnutrition in infants and children. Geneva: World Health Organization; 2013). c. Patients with suspected pneumonia are to have chest X-ray.
d. Chest X-ray should also be considered in patients with meningitis or septicaemia if the clinician's impression is of co-existing pneumonia or if it is judged that a chest X-ray will assist in management.
e. Patients with suspected pneumonia, septicaemia, or meningitis are to have nasopharyngeal and oropharyngeal swabs.
f. Lung aspirate should be considered for a patient if peripheral consolidation has been demonstrated, preferably by chest X-ray. ii. serum collection for antibiotic activity detection if surveillance number ends in '0' or '5' and the patient is enrolled in Basse.

Influenza viruses
We extracted RNA from clinical specimens as described for the diagnostic RT-PCRs in the main manuscript.
Influenza virus type A cDNA was synthesized from extracted RNA using the following assay setup: Prior to sequencing, the amplificate was cleaned using ExoSAP-IT reagent (Thermo Fisher). Four µl ExoSAP-IT reagent was mixed with 10 µl PCR product and incubated 15 minutes 37°C, 15 minutes 80°C, cooled down and stored at 4°C until further processing.
For sequencing, cleaned PCR product was mixed at 10 ng/100 nucleotides with 20 pmol (4 µl) of the forward or reverse sequencing primer (Tables S6 and S7) and filled up to 20 µl with H2O.
The sealed microtiter plate was submitted to the sequence service provider BaseClear for Sanger sequencing using the 96-WELL PREMIX SEQUENCING SERVICE (http://www.baseclear.com/genomics/sanger-sequencing/96-well-plates). BaseClear returns the raw sequencing trace files for further processing. The trace files were analysed and assembled in one consensus sequence per specimen in BioNumerics version 7.6.2.
Prior to sequencing, the amplificate was cleaned using ExoSAP-IT reagent (ThermoFisher). Four µl ExoSAP-IT reagent was mixed with 10 µl PCR product and incubated 15 minutes 37°C, 15 minutes 80°C, cooled down and stored at 4°C until further processing.
For sequencing, cleaned PCR product was mixed at 10 ng/100 nucleotides with 20 pmol (4 µl) of the forward or reverse sequencing primer (Table S8) and filled up to 20 µl with H2O. The sealed microtiter plate was submitted to BaseClear for Sanger sequencing using the 96-WELL PREMIX SEQUENCING SERVICE (http://www.baseclear.com/genomics/sangersequencing/96-well-plates). BaseClear returns the raw sequencing trace files for further processing. The trace files were analysed and assembled in one consensus sequence per specimen in BioNumerics version 7.6.2. The partial sequence ranges from halfway the transmembrane region amino acid position 54 through the whole external part of the G protein up to the stop codon. A number of sequences are extended due to the loss of one or more stop codons. These are indicated in the phylogenetic trees with the number of nucleotides in the extension.

RSV A/B F-protein gene sequencing
cDNA synthesis was performed using the following assay setup: Prior to sequencing, the amplificate was cleaned using ExoSAP-IT reagent (ThermoFisher). Four µl ExoSAP-IT reagent was mixed with 10 µl PCR product and incubated 15 minutes 37°C, 15 minutes 80°C, cooled down and stored at 4°C until further processing.
For sequencing, cleaned PCR product was mixed at 10 ng/100 nucleotides with 20 pmol (4 µl) of the forward or reverse sequencing primer (Table S9) and filled up to 20 µl with H2O. The sealed microtiter plate was submitted to BaseClear for Sanger sequencing using the 96-WELL PREMIX SEQUENCING SERVICE (http://www.baseclear.com/genomics/sanger-sequencing/96-well-plates). BaseClear company returns the raw sequencing trace files for further processing. The trace files were analysed and assembled in one consensus sequence per specimen in BioNumerics version 7.6.2.

Simulation of the expected number of cases in the population taking into account the monthly sampling scheme
An example of one simulation (PIV1, 2-11 mo age group; Table S12) is given in the R code below.
nsamp is the number of times (1,000,000 in the study) that the sampling exercise is run in order to generate the estimated number of test-positives in the untested cohort with ALRI.
p represents the number of observed cases per month divided by the number of patients randomly selected for viral testing per month.
n represents the number of randomly selected patients per month.

Phylogenetic analysis
Phylogenetic trees annotated with amino acid substitutions have been inferred from the generated influenza virus hemagglutinin and RSV G-protein gene sequences using the protein coding parts, devoid of signal peptide and stop codon. First the sequences were aligned using BioEdit software version 7.2.5 1 with ClustalW algorithm 2 and manually refined taking into account the right reading frame for translation and subsequently trimmed to remove signal peptide and stop codon. The annotated phylogenetic analysis was executed by making use of the treesub package (https://github.com/tamuri/treesub) which combines Maximum Likelihood methodology of RAxML 8.2.10 3 , and the annotated tree inference and branch length estimation of the baseml function in PAML 4.9 4 . At first a maximum-likelihood analysis was performed by fitting to the GTRGAMMA model. Subsequently bootstrapping was done using the autoMRE (majority-rule extended) function of RAxML which performs bootstraps until convergence 5 . The optimal tree, as indicated by RAxML, was provided of the respective amino acid substitutions by baseml as called by treesub. FigTree 1.4.3 6 was used to visualise the trees, pre-prepare for publication and export in pdf format. Adobe Illustrator CC 2017 software was used to add the bootstrap values taking them from the bootstrapped tree visualised in FigTree. For influenza virus the Gambian influenza viruses are set in the context of recent vaccine strains and viruses from around the world and specifically from the African continent and from the Netherlands from the same sampling period representing recent genetic groups circulating. For RSV the Gambian RSV were set in the context of viruses from around the world and specifically from the Netherlands from the same sampling period representing the recent diversity of genetic groups circulating. Sequences covering the stretch of the RSV G-protein gene used in our study were not available for viruses from the African continent sampled in the same period as the Gambian RSV when sequences for our analysis were downloaded from GenBank in January 2018. Nevertheless, we used shorter sequences of Kenyan RSV available in January 2018 covering most of the ectodomain coding part of the G-protein gene sampled in the same time period as our Gambian RSV. We conducted a preliminary phylogenetic analysis (Neighbor-Joining with Jukes-Cantor model, pairwise deletion of missing sequence parts and 1000 bootstraps in MEGA version 7) to see whether Gambian and Kenyan RSV from 2015 segregate similarly with RSV from other continents ( Figure S1). Kenyan sequences for this analysis were retrieved from Genbank PopSet 1153279097 for RSV-A 7 and 1071930237 for RSV-B 8 . This preliminary analysis used the same set of other Kenyan RSV sequences from before 2014 and other continent RSV with more complete G-protein sequences that we used for our full analysis. Accession numbers of all sequences downloaded from GenBank and GISAID databases and used in the phylogenetic analysis are listed in Table S14. Sequences of partial G protein genes of RSV and full length hemagglutinin gene segment of influenza viruses from patients in The Gambia generated in this study and sequences of partial G protein gene of RSV from patients in the Netherlands not previously reported have the following accession numbers: for RSV G protein genes from Gambian patients: GenBank MG971446-MG971464; for RSV G protein genes from Dutch patients: Genbank MG971421-MG971445; for influenza viruses from The Gambia: GISAID EPI1180580-EPI1180595. The names of viruses from The Gambia have been color coded in the phylogenetic trees in Figures S2-S5 to indicate the month of sampling in 2015. Amino acid substitutions relative to the oldest vaccine virus included for influenza viruses and oldest strain included for RSV are indicated on or below branches after which the allocated viruses have the specified amino acid substitution in common. The amino acid numbering follows that of the consensus sequence after alignment of the included sequences taking into account the start of the coding sequence for partial sequences and all insertions/deletions and extensions due to loss of stop codons.
Potential N-glycosylation sites on the HA protein of influenza virus and the G protein of RSV and potential O-glycosylation sites on the G protein of RSV were predicted using the NetNGlyc server version 1.0 (http://www.cbs.dtu.dk/services/NetNGlyc/) and the NetOGlyc server version 4.0 (http://www.cbs.dtu.dk/services/NetOGlyc/) respectively. Predicted sites are superimposed on the phylogenetic tree: -N or +N indicate the loss or gain of an N-glycosylation site respectively; -O or +O indicate the loss or gain of an Oglycosylation site respectively. Insertions are indicated with , deletions with . The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test is shown on or below the branches for values ≥ 70%.

RSV F-protein amino acid composition analysis
The first 280 amino acids of the F-protein of 15 RSV-A and 2 RSV-B from The Gambia are shown in alignment with F-protein sequences of RSV-A strain A2 and RSV-B strain B-1 (GenBank accession numbers KJ155694 and AF013254 respectively), strains frequently used in vaccine development 1 in Figure S6. The corresponding Gambian partial RSV F protein gene sequences are available from GenBank under accession numbers MH399208-MH399224, MH686533 and MH686534. Potential N-glycosylation sites on the F-protein were predicted using the NetNGlyc server version 1.0.

Results of RSV and influenza virus sequencing
Based on G-protein gene sequencing, all Gambian RSV-A viruses clustered in a recent clade designated ON1 characterized by a 72 nucleotides duplication insert. 1,2 All but one of these viruses clustered in a subgroup within this clade together with 2014 and 2015 viruses from The Netherlands and the USA in one group characterized by amino acid substitutions K134I, I243S (gain of O-glycosylation site) and E262K in the G-protein. Some further diversification was seen in these Gambian RSV-A with two groups of two viruses each, characterized by additional amino acid substitutions K197E and S294F (loss of Oglycosylation site) or L274P respectively. The one other Gambian RSV-A clustered within clade ON1 with 2015 and 2016 viruses from The Netherlands and New Zealand characterized by S102F (loss of O-glycosylation site), K216N and E271K. These results suggest multiple introductions and local expansion of RSV-A in The Gambia. Based on G-protein gene sequencing, all Gambian RSV-B viruses clustered in a recent clade previously designated BA and characterized by a 60 nucleotides duplication insert and a 6 nucleotide deletion. 1 The Gambian RSV-B viruses clustered with 2015 RSV-B from other continents in a subgroup of the BA clade characterized by amino acid substitutions T107A (loss of O-glycosylation site), R136T (gain of O-glycosylation site) and T303I. Additionally, one of the Gambian viruses lost two stop codons increasing the length of the protein. Similar to RSV-A, the results for Gambian RSV-B suggest a single introduction and local expansion. Although introduction followed by local expansion in a country has been shown before, e.g. for Kenya 3 , a recent analysis of global RSV-A from all six continents strongly suggests intra-and inter-continent circulation. 2 Similar to described for Kenya 2,4 , clustering of Gambian RSV-A and RSV-B with viruses from Europe suggests a transmission link between these continents, although Kenyan as well as Gambian RSV-A and RSV-B clustered also with viruses from other continents. In addition, the preliminary phylogenetic analysis with less complete Kenyan Gprotein gene sequences showed that none of the Gambian RSV-A clustered with 2014 or 2015 Kenyan RSV-A. The vast majority of the Kenyan RSV-A segregated completely from the Gambian and other country RSV-A in a separate large group within clade ON1. In contrast, Gambian RSV-B clustered with a small proportion of 2015 and all 2016 Kenyan RSV-B in one bigger subgroup of the BA clade with RSV-B from other continents. These results suggest that there is no strong link between temporal circulation of RSV-A and RSV-B in West and East Africa. O-and N-glycosylation of the G-protein is important for antigenicity 5 and changes have been associated with repeat infection. 6 Clades RSV-A ON1 and RSV-B BA and subclades have been associated with specific glycosylation consensus patterns. 7 However, we showed that it is also important to analyze the effect of subgroup defining amino acid substitutions on the gain and loss of potential O-and N-glycosylation sites as that is the basis for the analysis of (sub)cluster-specific phenotypic properties of RSV.
Alignment of F-protein sequences showed that the antigenic sites Ø, II and VIII of Gambian RSV were highly conserved although some amino acid differences were detected in antigenic sites Ø and VIII between RSV-A and RSV-B. The Gambian F-proteins of RSV-A had the same potential N-glycosylation sites at amino acids 27, 70, 116, 120 and 126. The Gambian F-proteins of RSV-B had one site less at amino acid 126. Amino acid 70 is close to antigenic site Ø in the pre-fusion conformation and a loss of N-glycosylation at this site might affect the antigenicity of pre-F. Furthermore, the N-glycosylation pattern of the F-protein affects its fusion activity, of which the site at amino acid 500 which was not included in our analysis seems most important. 8 Nevertheless, N-glycosylation among the Gambian RSV was conserved for the other sites.