Genomic analysis of respiratory syncytial virus infections in households and utility in inferring who infects the infant

Agoti, Charles N.; Phan, My V. T.; Munywoki, Patrick K.; Githinji, George; Medley, Graham F.; Cane, Patricia A.; Kellam, Paul; Cotten, Matthew; Nokes, D. James

doi:10.1038/s41598-019-46509-w

Download PDF

Article
Open access
Published: 11 July 2019

Genomic analysis of respiratory syncytial virus infections in households and utility in inferring who infects the infant

Charles N. Agoti^1,2,
My V. T. Phan³^nAff8,
Patrick K. Munywoki¹^nAff9,
George Githinji¹,
Graham F. Medley ORCID: orcid.org/0000-0002-0030-7278⁴,
Patricia A. Cane⁵,
Paul Kellam^3,6,
Matthew Cotten³^nAff10^nAff11 &
…
D. James Nokes ORCID: orcid.org/0000-0001-5426-1984^1,2,7

Scientific Reports volume 9, Article number: 10076 (2019) Cite this article

2821 Accesses
16 Citations
16 Altmetric
Metrics details

Subjects

Abstract

Infants (under 1-year-old) are at most risk of life threatening respiratory syncytial virus (RSV) disease. RSV epidemiological data alone has been insufficient in defining who acquires infection from whom (WAIFW) within households. We investigated RSV genomic variation within and between infected individuals and assessed its potential utility in tracking transmission in households. Over an entire single RSV season in coastal Kenya, nasal swabs were collected from members of 20 households every 3–4 days regardless of symptom status and screened for RSV nucleic acid. Next generation sequencing was used to generate >90% RSV full-length genomes for 51.1% of positive samples (191/374). Single nucleotide polymorphisms (SNPs) observed during household infection outbreaks ranged from 0–21 (median: 3) while SNPs observed during single-host infection episodes ranged from 0–17 (median: 1). Using the viral genomic data alone there was insufficient resolution to fully reconstruct within-household transmission chains. For households with clear index cases, the most likely source of infant infection was via a toddler (aged 1 to <3 years-old) or school-aged (aged 6 to <12 years-old) co-occupant. However, for best resolution of WAIFW within households, we suggest an integrated analysis of RSV genomic and epidemiological data.

Integrating epidemiological and genetic data with different sampling intensities into a dynamic model of respiratory syncytial virus transmission

Article Open access 14 January 2021

Ivy K. Kombe, Charles N. Agoti, … Graham F. Medley

Off-season RSV epidemics in Australia after easing of COVID-19 restrictions

Article Open access 24 May 2022

John-Sebastian Eden, Chisha Sikazwe, … the Australian RSV study group

Evolution of respiratory syncytial virus genotype BA in Kilifi, Kenya, 15 years on

Article Open access 03 December 2020

Everlyn Kamau, James R. Otieno, … Charles N. Agoti

Introduction

Respiratory syncytial virus (RSV) is a leading viral cause of bronchiolitis and pneumonia during infancy¹. Global estimates in 2015 indicated that RSV causes ~33.1 million episodes of acute lower respiratory tract illness annually, ~3.2 million of which lead to hospital admissions and ~60,000 deaths in hospitalized children aged under 5 years¹. Despite this burden, our understanding of RSV transmission patterns during epidemics, including who infects the vulnerable infant populations remains incomplete². Defining the patterns of RSV transmission during epidemics, and specifically Who Acquires Infection From Whom (WAIFW) has the potential to inform control strategies^3,4.

RSV transmission occurs during contact with an infectious person or contaminated environmental surfaces⁵. Households are considered an important setting for RSV spread due to likely close person-to-person contacts^6,7. A family study in the United States in the 1970s showed that up to 46% of family members and 62% of infants in the household become infected once the virus is introduced into a household⁸. Since this study, important advances have been made in diagnostic sensitivity and characterisation of infection sources for household cases, in contact mapping tools and in statistical methods to infer epidemiologically linked case pairs^9,10. Furthermore, household demographic characteristics may differ between developed and developing settings¹¹.

Currently, there is no licensed RSV vaccine, although there are 19 vaccine, prophylactic or monoclonal antibody candidate products in clinical trials^12,13. Impediments to RSV vaccine discovery have been the need to immunize in the first weeks of life when infant immune responses are still sub-optimal and enhanced disease observed during a formalin inactivated vaccine trial in the late 1960s¹⁴. Live attenuated vaccines given intranasally, have proved difficult to sufficiently attenuate to limit upper airway congestion during vaccination, while still maintaining immunogenicity^15,16. As a result, alternative approaches are being considered including boosting infant antibody levels through maternal sub-unit vaccine immunization, pre-season delivery of high titre extended half-life immunoglobulin, reducing virus circulation in the community by vaccination of older babies and children or by cocoon vaccination to interrupt chains of transmission leading to infant infection^4,17,18. To advance the cocoon vaccination strategy, a better understanding of RSV transmission in household settings where most transmissions appear to occur is required¹⁸.

Currently, little is known about the sequence change patterns during individual RSV infection episodes, or during intra-household and inter-household transmission events¹⁰. It is unclear if the pace of RSV genomic change is sufficient to allow tracking of transmission during epidemics. We have previously shown that partial RSV nucleotide sequences from the highly variable attachment (G) encoding gene (~900 nt) provide insufficient discriminatory power to delineate RSV transmission chains^19,20,21. However, our initial analysis of RSVA full genome sequences (~15,200 nt) showed significant promise in providing phylogenetic resolution of viruses circulating in different households¹⁰ and similar application of these methods have been shown for norovirus²², foot and mouth disease virus²³, influenza A virus²⁴, MERS-CoV²⁵, and Ebola virus^26,27. In this study, we aimed to determine if RSV transmission in households is trackable using viral genomic data and if it is possible to identify who is the likely infector of the under 1-year-old infant.

Materials and Methods

Study location, design and samples

The study was undertaken within Kilifi County, which is located in coastal Kenya. A detailed description of the study location and study design was provided elsewhere²⁸. Briefly, 47 households scattered across an area of approximately 21 km² were followed up over a 6-month period beginning December 2009 and ending June 2010 coinciding with the RSV peak activity months in the area²⁹. Households were defined as a group of people sharing a compound and eating from the same kitchen²⁰. The selected households (abbreviated HHs) were given designated identifiers from 1 to 57 (HH01 to HH57). Twice weekly throughout the study period, a nasopharyngeal-flocked swab (NPS) was obtained from each member regardless of symptom status. The NPS samples were screened for RSV using a multiplex real-time RT-PCR method which subtyped RSV positives into RSVA and RSVB³⁰. For whole genome sequencing (WGS), we targeted 20 select households that documented RSV infection of ≥2 members. A geographical map showing the distribution of the study households is provided in the Additional File: Fig. S1.

RNA extraction, amplification and whole nucleotide sequencing

Viral RNAs from the positive samples of selected households were obtained using the QIAamp viral RNA extraction Kit (QIAGEN, Hilden, Germany) following the manufacturer’s instructions. Complementary DNA (cDNA) synthesis and RSV whole genome amplification was achieved using a six-overlapping PCR fragments strategy (each ~2.5 kb) as previously described²¹. Sequencing libraries were prepared using Nextera DNA Library Prep kits and nucleotide sequencing performed using Illumina MiSeq platform multiplexing 15–20 samples per run to generate approximately 1 million paired-end reads (150 bp × 2) per sample²¹.

Whole genome sequence assembly and multiple sequence alignments

The short reads from the MiSeq instrument were de-multiplexed, quality checked (median read Phred score of ≥35) and trimmed using QUASR v6.08³¹. Reads passing quality checks were de novo assembled into longer contigs using the SPAdes v3.5.0³². RSV contigs were identified by matching to a database of RSV sequences using USEARCH program³³, examined for completeness of the expected open reading frames (ORF) using Geneious v8.1.6 (https://www.geneious.com) and, where necessary, partial contigs were further combined to longer ones using Sequencher v5.0.1³⁴. These were subsequently checked presence of intact ORFs, sorted by household, re-aligned and positions of nucleotide variation double-checked if these were supported by majority of the raw reads associated with that sample¹⁰. Multiple sequence alignments were prepared in MAFFT v7.220³⁵.

Phylogenetic analysis

Sequence phylogenies were inferred using Maximum Likelihood (ML) methods in MEGA7³⁶ and RAxML v8.2.12³⁷. The best-fitting models of nucleotide substitution for each alignment were in IQ-TREE v1.4.3³⁸. Best tree search was performed by Nearest Neighbor Interchange (NNI). Branch support was evaluated by bootstrapping with 1,000 replicates. Pairwise genetic distances were calculated in pairsnp 0.0.6³⁹. The phylogenetic relatedness of the household RSVA and RSVB genomes was assessed at three levels; (i) in combination with global sequences deposited in GenBank (RSVA, n = 657 collected between 1977–2015 while RSVB, n = 416 collected between 1978–2016), (ii) among the households viruses alone and (iii) among viruses collected from same households only. The potential transmission networks within and between households for each group were inferred in PopART package v1.7.2 using median joining tree (MJT) method with an epsilon of zero⁴⁰. Evolutionary analyses were determined in maximum-likelihood-based TreeTime program⁴¹. Phylogenetic trees were visualized and annotated in FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/).

Identifying who infected the household infant (s)

Infants were defined as the participants aged <1-year-old during the study²⁰. We grouped the other participants into 5 age-groups: (i) toddlers (1 to <3 years), (ii) pre-schoolers (3 to <6 years), (iii) school-aged (6 to <12 years), (iv) adolescents (12 to <18 years) and adults (>18 years). We attempted to identify who among these other age-groups were the most likely infectors of the infants by examining the relatedness of the virus genome(s) obtained from the infant to all the viral genomes obtained from the other members in the same household. Information on the dates of the sampling of the sequenced samples was taken into account to position the infant in the transmission network/chain.

Sequence nomenclature and accession numbers

The sequence nomenclature of the household samples has four digits that include the household identifier (first two digits) and subject identifier (the last two digits). All the new 112 full or partial RSVB genome sequences from this study were deposited in GenBank under the accession numbers MH594350 – MH594461. The RSVA genomes are deposited in GenBank under accession numbers KX510136-KX510266.

Ethical approval

The samples were collected after obtaining informed written consent from each participant if aged ≥18 years or through a guardian or parent if aged <18 years. In addition, children aged above 5 years were asked for assent. The study protocol approved by both the Scientific and Ethics Review Unit (SERU) of the Kenya Medical Research Institute (KEMRI), Nairobi, and Coventry Research Ethics Committee of UK²⁰. All study procedures were performed in accordance with the approved protocol guidelines and in compliance with the relevant regulations.

Results

RSV infections and whole genome sequencing

We targeted 20 households with a total of 226 occupants (range 4–37 persons per household) for WGS. Details of the demographic characteristics of the analysed households, total specimens collected, diagnostic results, genome sequencing success and the observed phylogenetic clades (defined later) are summarized in Table 1. Over the six-month period (December 2009 – June 2010), a total of 7,695 nasopharyngeal-flocked swabs (NPS) were collected from the 20 HHs, 415 (5.4%) of which were determined to be RSV real-time RT-PCR positive (cycle threshold (Ct) value of <35.0; 189 RSVA, 214 RSVB and 12 RSVA/B co-infections) these originating from 130 participants. Of the 415 positive specimens, 374 (90.1%) samples were processed for WGS²¹ with successful amplification and assembly of RSV contigs of >1000 nucleotides length in 246 samples (65.7%). Of these 191 samples (51.1%) yielded contigs >14000 nucleotides (nt) (103 RSVA and 88 RSVB i.e. >90% of RSV full-length genomes) hereafter referred to as genomes. In eight and 14 HHs, two or more RSVA or RSVB genomes were recovered, respectively, allowing our investigation into within-household RSV transmission and variation, Table 1. Genome sequencing success negatively correlated with increasing diagnostic RT-PCR Ct value. These results, together with details on the metadata of the sequenced RSVB viruses, GenBank and Sequence Read Archive accession numbers and assembly metrics are provided in the Additional File and Supplementary Dataset, respectively.

Table 1 Demographic details of the 20 households, number of positive samples, number of sequenced samples and clade assignment.

Full size table

Diversity of the viruses isolated in the study

From G gene phylogeny, all RSVA and RSVB viruses sequenced were genotypes GA2 and BA, respectively (results not shown). The genome-based maximum likelihood (ML) phylogenetic trees are shown in Fig. 1. The RSVA genomes formed a single monophyletic cluster on the global phylogeny while household RSVB genomes formed 5 distinct phylogenetic clusters interspersed with sequences from other global locations, Fig. 1, panel a. On their own, both RSVA and RSVB household genomes formed multiple phylogenetic clusters (several apparently genetically distinct and supported by >60% bootstrap values and we later assigned these into clades and sub-clades – see below). On the household genomes only ML tree, these clusters appeared to be mostly household specific with a few exceptions, Fig. 1, panel b.

The time-resolved ML trees and temporal signal in nucleotide divergence of the household RSVA and RSVB viruses are shown in Fig. 2. The time to Most Recent Common Ancestor (tMRCA) estimate of all the sampled RSVB viruses was estimated to be December 2004 (Lower and Upper boundaries for 90% Highest Posterior Density (HPD) of October 1997 and September 2008), which is much earlier than the equivalent estimate for RSVA viruses of December 2008 (Lower and Upper boundaries for 90% HPD of January 1987 and December 2009), with both point estimates showing a wide uncertainty interval.

We quantified the genetic diversity observed within the two RSV groups by calculating the number of pairwise single nucleotide polymorphisms (SNPs) (pairwise distance) of viruses within the same group, Fig. 3, panel a. We found this value to range from 0–35 (median: 19, mean: 16.6) for RSVA and 0–177 (median: 134, mean: 99.7) for RSVB. Overall within-group pairwise distances among RSVB viruses were 6.5 times higher than those of RSVA (mean distance of 0.006094 vs 0.001065). The distribution of the number of pairwise SNPs within clusters of the household viruses observed on the global phylogeny are shown in Fig. 3, Panel b–f.

To facilitate further analysis, we assigned the household viruses into “clades” and “sub-clades” defined by both their clustering patterns on global phylogenies (Fig. 1, panel a), the inferred divergence dates of the strains (Fig. 2, panel a) and, the number of pairwise SNP (Fig. 3). We grouped viruses in the same clade if they occurred as a monophyletic group on the global phylogeny, had <60 pairwise SNPs across the genome with every other member of that clade and diverged more than a year prior to their date of collection. Viruses within the same clade were further assigned into sub-clades if they showed >10 pairwise SNPs differences across the genome and were estimated to have diverged more than six months prior to their date of collection (Figs 2 and 3). Using these criteria, we assigned all household RSVA strains into a single clade named RSVA/I while household RSVB strains were assigned into 5 clades named RSVB/I through RSVB/V. Viruses within clade RSVA/I were assigned into five sub-clades; RSVA/Ia through RSVA/Ie, viruses within RSVB/I clade were assigned into two sub-clades RSVB/Ia and RSV/Ib, and viruses within RSVB/II were assigned into two sub-clades RSVB/IIa and RSV/IIb.

Virus transmission within and between households

We investigated the genomics and temporal and spatial patterns of RSVA and RSVB virus clades observed within and between households. An analysis using minimum spanning network which depicts shared differences without regard to an evolutionary model was used to detect patterns in the RSVA and RSVB genomes and examine potential intra- and inter-household transmission patterns (Fig. 4, panel a). Similar to the ML phylogenies, the majority of the viruses clustered by household with the major clusters corresponding to the clades and sub-clades observed in the ML trees. Notably clades/sub-clades RSVA/Ia, RSVA/Ie, RSVB/Ia, RSVB/Ib, RSVB/IIa, RSVB/IIb, and RSVB/IV were observed in multiple HHs indicating potential transmission linkage of the involved HHs during the epidemic. In the timeline of viruses identified (Fig. 4, panel b), all except five households (HH06, HH26, HH38, HH41 and HH42) had a single RSV clade sequenced. The exceptional households had two virus clades infecting members but mostly one of the two clades predominated e.g. in HH06, HH41 and HH42. On the other hand, in the remaining two households distinct RSVA and RSVB outbreaks occurred: HH38 in which the first outbreak was RSVB/I and at a later date a second outbreak of RSVA/I, and HH26 with concurrent RSVA/I and RSVB/IV.

The relationship between the geographical distance between the households and the RSVA and RSVB clades that circulated in these households is shown in Fig. 4, panel c. Paradoxically, some of the households that were in very close proximities experienced infections with viruses from different clades or sub-clades e.g. HH41 and HH42 were <30 meters apart, yet none of the virus clades circulating in these 2 households were shared (Fig. 4, panel c). In contrast, HH35 and HH38, separated by a distance of ~3 kilometres, shared the same virus clade (RSVB/Ia). There was no apparent correlation between inter-HH distance and genetic relatedness or between sampling dates and virus transmission, i.e. no correlation between geo-temporal-spatial patterns of virus transmission within and between households.

Intra-host, inter-host and inter-house virus variation

The SNP abundance in samples collected from same host during repeat visits and in presumed single household outbreaks are shown in Figs 5 and 6. Overall, intra-host SNPs ranged from 0–17 (median: 1, mean: 1.75, Fig. 5) while intra-household SNPs ranged from 0 to 21 (median: 3, mean: 6.2, Fig. 6). Nucleotide changes were, in general, rare intra-host during the shedding period of a presumed single episode. When changes were evident, they were usually multiple SNPs occurring simultaneously and mostly affecting the last few positive samples collected from the subject. For nine subjects who remained virus positive for more than 21 days, we compared the recovered genome sequences to determine if these represented more than one infection (Fig. 7, panel a and b). Four of these individuals showed zero change despite the sequenced samples spanning a period of over a month. For the individuals that showed SNPs, these were few (<6 SNPs). In the intra-household analysis, it appeared that the households with a higher number of SNPs (>5 i.e. falling in the upper quartile) may have experienced multiple introductions of viruses from the same clade or sub-clade e.g. in HH26 for RSVA (see Additional File: Fig. S10 sample from 2605 collected on 26-Mar-2010), HH38 for RSVB (see Fig. 8, sample from 3803 collected on 19-Feb-2010).

To track independent viruses that were either introduced from elsewhere into the study area during the epidemic or were local but diverged outside the 2009/10 season we coined the word “epidemiological strain”. Genetically, viruses referred to as same epidemiological strain had <10 SNPs across their genomes and belonged to the same clade and sub-clade (where assigned). In total, we identified 12 epidemiological strains (five within RSVA and seven within RSVB) that occurred in the study area during the six-month surveillance, eight (66.7%) of which were observed in multiple households while four were found in a single household. For the epidemiological strains that occurred in multiple households, between 5–33 (median:12, mean 15.3) SNPs were observed across their genomes. A comparison of SNP abundance intra-host, inter-host and inter-household is provided in Fig. 7, Panel c. SNP abundance appeared to increase linearly across these three levels.

Who infected the infant(s) in the study households?

There were 22 infants from the 20 HHs. By our diagnostics, the infant in HH18 did not get RSV infected during our surveillance period. The household-by-household time-resolved infection patterns, genome alignments, phylogenies and minimum spanning sequence networks are provided in the Additional Files S3–21. We present the infection and genomic patterns of HH38 as an example in Fig. 8. Patterns of RSVA infection in HH14 and HH38 can be found in our previous publication¹⁰. Following examination of the patterns from all the 20 households, the summary of our deductions on who most likely infected the infant is provided in Table 2. Overall, we could infer the single most likely individual to infect the study infant for only 19% (4/21) of the infants, Table 2. For a further 19% we identified the top two individuals who most likely to have infected the infant. Note that in HH38, the infant was infected in both the RSVA and RSVB outbreaks that occurred in this household. All except one of the suspected infant infectors were aged <12 years-old.

Table 2 Inferring who most likely infected the infant in the household.

Full size table

Discussion

The origin of this work was a study of who introduces RSV into the household and who infects the infant²⁰. This was motivated by unsuccessful vaccines for early infants and that evaluation of other options (family cocooning, school age vaccination^20,42) requires an improved understanding of WAIFW. Our earlier work, based on temporal case observations, clearly suggested that the older children (siblings or cousins aged <15 years), particularly those attending school, played an important role in introducing the virus into the household leading to infant infection, but was not able to resolve within household transmission chains²⁰. We have subsequently formalised the epidemiological analysis of RSV transmission in the household using an individual-based statistical approach to quantify the risk of infection from a range of host, pathogen and environmental factors⁷. The present study takes an alternative perspective of the problem, by focusing on the temporal patterns of genomic sequence variation to elucidate who infects whom in the household. This work extends a smaller study based on genomes of RSVA from 9 households¹⁰, to the current study of genomes of RSVA and RSVB in 20 households.

Our key observation from the present analysis is that RSV consensus genomes incur zero to just a few nucleotide substitutions within infected individuals (median: one SNP per episode) or between infected individuals of the same household (median: three SNPs). Combined with the rapid spread of RSV within households and incomplete sequencing (~50%) of the positive samples challenges the reconstruction of the transmission using genomic data alone. For six households (32%) where the infant was infected (n = 8), we could identify the 1–2 most likely individuals who infected them. The infant suspected infectors were mostly household co-occupants <12 year of age (7/8, specifically toddlers (43%) and school-aged (50%) age-groups). Only in a single instance was an adult co-occupant (mother) suspected to be the infant infector. In the remaining households (13/19, 68%), the infant was identified as either the household index case, a co-index case or the sequencing of key samples failed, making it difficult to infer their infection source.

Elsewhere we attempted to utilise shared minor variants identified from deep sequencing data for RSV in these same households to draw out patterns of transmission⁴³. The conclusion of the work was that shared minor variants provide little additional resolving power to discern chains of transmission beyond that possible through consensus sequences.

Previously, only two other studies focused on transmission of RSV infections within households^8,44. In these studies, notably, it was assumed that a single infection source was responsible for the cases occurring in the same household, whereas temporally it can be difficult to fully establish this. Furthermore, without virus genotyping and, ideally, full-genome sequence data, the composition of outbreaks cannot be definitively established; as we have seen multiple concurrent virus introductions into households are not uncommon. Furthermore, for study of Heikkenen et al.⁴⁴, the investigators followed up the household only after the index infant had been admitted to hospital, which limits the possibility of observing preceding transmission events including who infected them.

Our study involved sampling irrespective of symptom status, coupled to sensitive molecular diagnostics and genomic sequencing, which has given a clear indication that households are indeed a common space for RSV transmission⁷. Similar to previous studies based solely on epidemiological (not sequence) data^8,44, we highlight the importance of the infant’s elder siblings especially those under 12 years of age as a source of the infant infection. Adults in the households played only a minor role when considered either as household RSV infection introducers or as infant infectors. Furthermore, by analysis of RSVA viruses from nine households, we had previously shown that most (6/9, 67%) RSV infections in a household outbreak result from a single introduction of the virus¹⁰. Here we have extended the analysis to RSVB, confirming a closely similar pattern to RSVA.

The unique household study design here allowed us to compare the phylodynamics of RSVA and RSVB viruses. Overall, the sequenced RSVB viruses showed ~7 times greater genomic diversity compared to RSVA. It is likely that the observed difference reflects annual stochasticity in the number of introduced strains rather than an inherent biological difference although a few previous reports indicated existence of subtle differences between the two groups in transmissibility and local persistence^21,45. Despite the close genetic relatedness of RSVA viruses detected in the study, our analysis showed that the 9 infected households were invaded by up to 5 distinct RSVA “epidemiological strains” that diverged at least 6 months before their collection date. For RSVB we determined that the 14 infected households were invaded by up to 7 distinct RSVB epidemiological strains. Highly similar intra-household and intra-host genomic variation patterns were observed between the two groups.

Due to the intense logistics involved in undertaking such a study, only 50 households from one administrative unit (14,998 persons in 1,835 homesteads) within Kilifi County were recruited²⁰. The genome sequencing work targeted 20 households where ≥2 members were found to be RSV infected. Despite these households occurring in a small geographical area (~20 km²) it was surprising to see up to 12 epidemiological strains in circulation. Most of the sampled viruses clustered by household. Some households shared the infecting strain with other households, suggesting a shared infection source although direct transmission between these households was unlikely given the large fraction of non-sampled households. Four out of the 12 identified epidemiological strains occurred only in one household each. Notably, households in close physical proximity did not necessarily end up being infected with similar virus clades or subclades implying other unobserved epidemiological factors rather than physical proximity may be more important in determining WAIFW in this community⁴⁶.

Our earlier epidemiological analysis suggested school-going house-members are the sub-population (39%) most likely to introduce the infection into the household²⁰. Perplexingly, the study infants were the second most frequent index cases (32%) and were co-index in a further 14% of the household episodes²⁰. It is possible that some of the infant co-index cases were the infectors of infants in the household, but our diagnostic method (nasopharyngeal swab combined with RT-PCR) failed to detect the virus in the preceding samples. This may occur perhaps due to limited virus replication in older individuals or our 3–4 days sampling interval may have been too wide to capture index cases before onward transmission. By our diagnostic method, a parent was the index case only in one household.

It was surprising to find few to no SNPs in RSV genomes from individuals appearing to shed RSV for up to 2 months. These individuals may have been true prolonged shedders of the virus or were virus re-infected. If prolonged shedders, then it is perplexing that in some individuals, there was one or more negative sample(s) separating the positive samples. Alternately, these could be false negative assay results which may have arisen due to the sensitivity of our sampling or diagnostic method or that the virus was temporarily absent from the upper respiratory tract airway but was still present elsewhere in the individual’s respiratory tract. Prolonged shedding of RSV of up to 2 months has been previously reported especially in immune-compromised populations^47,48. Alternatively, if these were indeed reinfections, then this observation calls for an interrogation of protective RSV immune responses and this has implications to the development of effective RSV vaccines^49,50.

Our study illustrates both the value and the limitations of RSV genomic data in tracking transmission of this rapidly spreading infection in a household setting. The pace of RSV substitutions was demonstrated to be insufficiently fast to enable the full inference of within household RSV transmission trees. Additionally, we have previously shown that patterns of sharing of minor variants does not add insight beyond the consensus sequence approach⁴³. Since in close to half of the study households the infant participant was the infection index or co-index case, for future studies we recommend sampling protocols that also consider, in addition to households, other potential RSV transmission settings in the community e.g. child-care centres, post-natal clinics, schools, school transportation, sporting events etc. Contact data should be collected to reinforce the viral sequence data and epidemiological data to support robust inferences of transmission pairs⁴⁶. The protocols for genomic sequencing also need to be optimised to obtain virus sequences even from samples with diminishing virus titres. Given the imperfections of analyses of epidemiological data or genomic data in isolation, there is a clear need to undertake the joint analysis of both sources of information using a probabilistic framework⁷, that will allow inference of events not directly observable with inevitably imperfect data.

Data Availability

The sequence data from this study has been deposited in both GenBank and Short Read Archive databases (see accession details in Supplementary Dataset). For more detailed information beyond the metadata used in the paper, there is a process of managed access requiring submission of a request form for consideration by our Data Governance Committee (http://kemri-wellcome.org/about-us/#ChildVerticalTab_15).

References

Shi, T. et al. Global, regional, and national disease burden estimates of acute lower respiratory infections due to respiratory syncytial virus in young children in 2015: a systematic review and modelling study. Lancet 390, 946–958, https://doi.org/10.1093/infdis/jiu075 (2017).
Article PubMed PubMed Central Google Scholar
Agoti, C. N. et al. Successive Respiratory Syncytial Virus Epidemics in Local Populations Arise from Multiple Variant Introductions, Providing Insights into Virus Persistence. J Virol 89, 11630–11642, https://doi.org/10.1128/jvi.03105-15 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kinyanjui, T. M. et al. Vaccine Induced Herd Immunity for Control of Respiratory Syncytial Virus Disease in a Low-Income Country Setting. PLoS One 10, e0138018 (2015).
Article Google Scholar
Pan-Ngum, W. et al. Predicting the relative impacts of maternal and neonatal respiratory syncytial virus (RSV) vaccine target product profiles: A consensus modelling approach. Vaccine 35, 403–409 (2017).
Article CAS Google Scholar
Hall, C. B. Respiratory syncytial virus: its transmission in the hospital environment. Yale J Biol Med 55, 219–223 (1982).
CAS PubMed PubMed Central Google Scholar
La Rosa, G., Fratini, M., Della Libera, S., Iaconelli, M. & Muscillo, M. Viral infections acquired indoors through airborne, droplet or contact transmission. Ann Ist Super Sanita 49, 124–132, https://doi.org/10.4415/ann_13_02_03 (2013).
Article PubMed Google Scholar
Kombe, I. K., Munywoki, P. K., Baguelin, M., Nokes, D. J. & Medley, G. F. Model-based estimates of transmission of respiratory syncytial virus within households. Epidemics In Press, Accepted Manuscript, https://doi.org/10.1016/j.epidem.2018.12.001 (2018).
Article Google Scholar
Hall, C. B. et al. Respiratory syncytial virus infections within families. N Engl J Med 0294, 414–419, https://doi.org/10.1056/nejm197602192940803 (1976).
Article CAS Google Scholar
Kraemer, M. U. G. et al. Reconstruction and prediction of viral disease epidemics. Epidemiol Infect, 1–7, https://doi.org/10.1017/s0950268818002881 (2018).
Agoti, C. N. et al. Transmission patterns and evolution of respiratory syncytial virus in a community outbreak identified by genomic analysis. Virus Evol 3, vex006 (2017).
PubMed PubMed Central Google Scholar
Prem, K., Cook, A. R. & Jit, M. Projecting social contact matrices in 152 countries using contact surveys and demographic data. PLoS Comput Biol 13, e1005697 (2017).
Article ADS Google Scholar
PATH. RSV Vaccine and mAb Snapshot, https://www.path.org/resources/rsv-vaccine-and-mab-snapshot/ (2018).
Mazur, N. I. et al. The respiratory syncytial virus vaccine landscape: lessons from the graveyard and promising candidates. Lancet Infect Dis. https://doi.org/10.1016/s1473-3099(18)30292-5 (2018).
Article PubMed Google Scholar
Fulginiti, V. A. et al. Respiratory virus immunization. I. A field trial of two inactivated respiratory virus vaccines; an aqueous trivalent parainfluenza virus vaccine and an alum-precipitated respiratory syncytial virus vaccine. Am J Epidemiol 89, 435–448 (1969).
Article CAS Google Scholar
Karron, R. A. et al. Identification of a recombinant live attenuated respiratory syncytial virus vaccine candidate that is highly attenuated in infants. J Infect Dis 191, 1093–1104, https://doi.org/10.1086/427813 (2005).
Article PubMed Google Scholar
Buchholz, U. J. et al. Live Respiratory Syncytial Virus (RSV) Vaccine Candidate Containing Stabilized Temperature-Sensitivity Mutations Is Highly Attenuated in RSV-Seronegative Infants and Children. J Infect Dis 217, 1338–1346 (2018).
Article CAS Google Scholar
WHO. RSV Vaccine Research and Development Technology Roadmap. Priority activities for development, testing, licensure and global use of RSV vaccines, with a specific focus on the medical need for young children in low- and middle-income countries (Catalogue No. 28-Nov-2018, 2017).
Nokes, J. D. & Cane, P. A. New strategies for control of respiratory syncytial virus infection. Curr Opin Infect Dis 21, 639–643, https://doi.org/10.1097/QCO.0b013e3283184245 (2008).
Article CAS PubMed Google Scholar
Cane, P. A. In Respiratory Syncytial Virus Pespectives in Medical Virology (ed. Patricia Cane) Ch. 3, 89–114 (Elsevier, 2007).
Munywoki, P. K. et al. The source of respiratory syncytial virus infection in infants: a household cohort study in rural Kenya. J Infect Dis 209, 1685–1692, https://doi.org/10.1186/1471-2334-14-178 (2014).
Article PubMed Google Scholar
Agoti, C. N. et al. Local evolutionary patterns of human respiratory syncytial virus derived from whole-genome sequencing. J Virol 89, 3444–3454, https://doi.org/10.1093/infdis/jiv263 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kundu, S. et al. Next-generation whole genome sequencing identifies the direction of norovirus transmission in linked patients. Clin Infect Dis 57, 407–414 (2013).
Article CAS Google Scholar
Cottam, E. M. et al. Molecular epidemiology of the foot-and-mouth disease virus outbreak in the United Kingdom in 2001. J Virol 80, 11274–11282 (2006).
Article CAS Google Scholar
Meinel, D. M. et al. Whole genome sequencing identifies influenza A H3N2 transmission and offers superior resolution to classical typing methods. Infection 46, 69–76, https://doi.org/10.1007/s15010-017-1091-3 (2018).
Article CAS PubMed Google Scholar
Cotten, M. et al. Transmission and evolution of the Middle East respiratory syndrome coronavirus in Saudi Arabia: a descriptive genomic study. Lancet 382, 1993–2002 (2013).
Article Google Scholar
Arias, A. et al. Rapid outbreak sequencing of Ebola virus in Sierra Leone identifies transmission chains linked to sporadic cases. Virus Evol 2, vew016 (2016).
Article Google Scholar
Dudas, G. et al. Virus genomes reveal factors that spread and sustained the Ebola epidemic. Nature 544, 309–315 (2017).
Article ADS CAS Google Scholar
Scott, J. A. et al. Profile: The Kilifi Health and Demographic Surveillance System (KHDSS). Int J Epidemiol 41, 650–657 (2012).
Article Google Scholar
Nokes, D. J. et al. Incidence and severity of respiratory syncytial virus pneumonia in rural Kenyan children identified through hospital surveillance. Clin Infect Dis 49, 1341–1349 (2009).
Article Google Scholar
Gunson, R. N., Collins, T. C. & Carman, W. F. Real-time RT-PCR detection of 12 respiratory viral infections in four triplex reactions. J Clin Virol 33, 341–344, https://doi.org/10.1016/j.jcv.2004.11.025 (2005).
Article CAS PubMed Google Scholar
Watson, S. J. et al. Viral population analysis and minority-variant detection using short read next-generation sequencing. Philos Trans R Soc Lond B Biol Sci 368, 20120205 (2013).
Article Google Scholar
Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol 19, 455–477 (2012).
Article MathSciNet CAS Google Scholar
Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461, https://doi.org/10.1093/bioinformatics/btq461 (2010).
Article CAS PubMed Google Scholar
Kearse, M. et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012).
Article Google Scholar
Katoh, K. & Standley, D. M. MAFFT: iterative refinement and additional methods. Methods Mol Biol 1079, 131–146, https://doi.org/10.1007/978-1-62703-646-7_8 (2014).
Article PubMed Google Scholar
Kumar, S., Stecher, G. & Tamura, K. MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Mol Biol Evol 33, 1870–1874, https://doi.org/10.1093/molbev/msw054 (2016).
Article CAS Google Scholar
Kozlov, A. M., Darriba, D., Flouri, T., Morel, B. & Stamatakis, A. RAxML-NG: A fast, scalable, and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics. https://doi.org/10.1093/bioinformatics/btz305 (2019).
Article PubMed Google Scholar
Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32, 268–274 (2015).
Article CAS Google Scholar
pairsnp v. 0.0.6 (GitHub, https://github.com/gtonkinhill/pairsnp/, 2018).
Leigh, J. W. & Bryant, D. POPART: full-feature software for haplotype network construction Methods in Ecology and Evolution, https://doi.org/10.1111/2041-210X.12410 (2015).
Article Google Scholar
Sagulenko, P., Puller, V. & Neher, R. A. TreeTime: Maximum-likelihood phylodynamic analysis. Virus Evol 4, vex042 (2018).
Article Google Scholar
Graham, B. S. Protecting the family to protect the child: vaccination strategy guided by RSV transmission dynamics. J Infect Dis 209, 1679–1681, https://doi.org/10.1128/jvi.01181-10 (2014).
Article PubMed PubMed Central Google Scholar
Githinji, G. et al. Assessing the utility of minority variant composition in elucidating RSV transmission pathways. bioRxiv 411512, https://doi.org/10.1101/411512 (2018).
Heikkinen, T., Valkonen, H., Waris, M. & Ruuskanen, O. Transmission of respiratory syncytial virus infection within families. Open Forum Infect Dis 2, ofu118 (2015).
Article Google Scholar
White, L. J., Waris, M., Cane, P. A., Nokes, D. J. & Medley, G. F. The transmission dynamics of groups A and B human respiratory syncytial virus (hRSV) in England & Wales and Finland: seasonality and cross-protection. Epidemiol Infect 133, 279–289, https://doi.org/10.1016/j.mbs.2006.08.018 (2005).
Article CAS PubMed PubMed Central Google Scholar
Kiti, M. C. et al. Quantifying social contacts in a household setting of rural Kenya using wearable proximity sensors. EPJ Data Sci 5, 21, https://doi.org/10.1016/s0140-6736(17)30938-8 (2016).
Article PubMed PubMed Central Google Scholar
Hall, C. B. et al. Respiratory syncytial viral infection in children with compromised immune function. N Engl J Med 315, 77–81, https://doi.org/10.1056/nejm198607103150201 (1986).
Article CAS PubMed Google Scholar
Madhi, S. A., Schoub, B., Simmank, K., Blackburn, N. & Klugman, K. P. Increased burden of respiratory viral associated severe lower respiratory tract infections in children infected with human immunodeficiency virus type-1. J Pediatr 137, 78–84, https://doi.org/10.1067/mpd.2000.105350 (2000).
Article CAS PubMed Google Scholar
Sande, C. J., Mutunga, M. N., Medley, G. F., Cane, P. A. & Nokes, D. J. Group- and genotype-specific neutralizing antibody responses against respiratory syncytial virus in infants and young children with severe pneumonia. J Infect Dis 207, 489–492 (2013).
Article CAS Google Scholar
Agoti, C. N. et al. Genetic relatedness of infecting and reinfecting respiratory syncytial virus strains identified in a birth cohort from rural Kenya. J Infect Dis 206, 1532–1541, https://doi.org/10.1111/irv.12131 (2012).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank the study participants for providing the study samples. We thank members of the Virus Epidemiology and Control (VEC) Research Group in Kilifi whom were involved in this study especially in sample and data collection and laboratory screening for RSV. We thank the Illumina C team at the Wellcome Trust Sanger Institute (Hinxton, Cambridge, UK) for their help in deep sequencing. This work was funded by the Wellcome Trust (grant refs: 090853, 102975 and 203077/Z/16/Z). Dr Agoti is supported through the DELTAS Africa Initiative [DEL-15-003]. The DELTAS Africa Initiative is an independent funding scheme of the African Academy of Sciences (AAS)‘s Alliance for Accelerating Excellence in Science in Africa (AESA) and supported by the New Partnership for Africa’s Development Planning and Coordinating Agency (NEPAD Agency) with funding from the Wellcome Trust [107769/Z/10/Z] and the UK government. The views expressed in this publication are those of the author(s) and not necessarily those of AAS, NEPAD Agency, Wellcome Trust or the UK government.

Author information

My V. T. Phan
Present address: Erasmus Medical Center, Department of Viroscience, Rotterdam, The Netherlands
Patrick K. Munywoki
Present address: Center for Disease Control and Prevention, Division of Global Health Protection, Nairobi, Kenya
Matthew Cotten
Present address: MRC/UVRI & LSHTM Uganda Research Unit, Entebbe, Uganda
Matthew Cotten
Present address: MRC-University of Glasgow Centre for Virus Research, Glasgow, UK

Authors and Affiliations

Kenya Medical Research Institute (KEMRI)—Wellcome Trust Research Programme, Epidemiology and Demography Department, Kilifi, Kenya
Charles N. Agoti, Patrick K. Munywoki, George Githinji & D. James Nokes
Pwani University, School of Health and Human Sciences, Kilifi, Kenya
Charles N. Agoti & D. James Nokes
Wellcome Trust Sanger Institute, Cambridge, United Kingdom
My V. T. Phan, Paul Kellam & Matthew Cotten
London School of Hygiene and Tropical Medicine (LSHTM), Department of Global Health and Development and Centre for Mathematical Modeling of Infectious Disease, London, United Kingdom
Graham F. Medley
Public Health England, Porton Down, Salisbury, United Kingdom
Patricia A. Cane
Imperial College London, Department of Infection, London, United Kingdom
Paul Kellam
University of Warwick, School of Life Sciences and Zeeman Institute, Coventry, United Kingdom
D. James Nokes

Authors

Charles N. Agoti
View author publications
You can also search for this author in PubMed Google Scholar
My V. T. Phan
View author publications
You can also search for this author in PubMed Google Scholar
Patrick K. Munywoki
View author publications
You can also search for this author in PubMed Google Scholar
George Githinji
View author publications
You can also search for this author in PubMed Google Scholar
Graham F. Medley
View author publications
You can also search for this author in PubMed Google Scholar
Patricia A. Cane
View author publications
You can also search for this author in PubMed Google Scholar
Paul Kellam
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Cotten
View author publications
You can also search for this author in PubMed Google Scholar
D. James Nokes
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.N.A.: Designed sequencing and analysis protocols, laboratory work, phylogenetic analysis and first manuscript draft, M.V.T.P.: Short-read data assembly, geo-temporal-spatial analysis and manuscript revision, P.K.M.: Study design, applied for funds, field work and manuscript revision, G.G.: Sequence data analysis and manuscript revision, G.F.M.: Study design, applied for funds and manuscript revision, P.A.C.: Study design, applied for funds and manuscript revision, P.K.: Helped design sequencing and analysis protocols, contributed sequencing funds and manuscript revision, M.C.: Helped design sequencing and analysis protocols, assembly of short-read data, sequence analysis and manuscript revision, D.J.N.: Study design, applied for funds and manuscript revision.

Corresponding author

Correspondence to Charles N. Agoti.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional File

Dataset 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Agoti, C.N., Phan, M.V.T., Munywoki, P.K. et al. Genomic analysis of respiratory syncytial virus infections in households and utility in inferring who infects the infant. Sci Rep 9, 10076 (2019). https://doi.org/10.1038/s41598-019-46509-w

Download citation

Received: 12 February 2019
Accepted: 26 June 2019
Published: 11 July 2019
DOI: https://doi.org/10.1038/s41598-019-46509-w

This article is cited by

A computational approach to design a polyvalent vaccine against human respiratory syncytial virus
- Abu Tayab Moin
- Md. Asad Ullah
- Saiful Islam
Scientific Reports (2023)
Integrating epidemiological and genetic data with different sampling intensities into a dynamic model of respiratory syncytial virus transmission
- Ivy K. Kombe
- Charles N. Agoti
- Graham F. Medley
Scientific Reports (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Integrating epidemiological and genetic data with different sampling intensities into a dynamic model of respiratory syncytial virus transmission

Off-season RSV epidemics in Australia after easing of COVID-19 restrictions

Evolution of respiratory syncytial virus genotype BA in Kilifi, Kenya, 15 years on

Introduction

Materials and Methods

Study location, design and samples

RNA extraction, amplification and whole nucleotide sequencing

Whole genome sequence assembly and multiple sequence alignments

Phylogenetic analysis

Identifying who infected the household infant (s)

Sequence nomenclature and accession numbers

Ethical approval

Results

RSV infections and whole genome sequencing

Diversity of the viruses isolated in the study

Virus transmission within and between households

Intra-host, inter-host and inter-house virus variation

Who infected the infant(s) in the study households?

Discussion

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Additional information

Supplementary information

Additional File

Dataset 1

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

A computational approach to design a polyvalent vaccine against human respiratory syncytial virus

Integrating epidemiological and genetic data with different sampling intensities into a dynamic model of respiratory syncytial virus transmission

Comments

Search

Quick links