Distinct patterns of within-host virus populations between two subgroups of human respiratory syncytial virus

Human respiratory syncytial virus (RSV) is a major cause of lower respiratory tract infection in young children globally, but little is known about within-host RSV diversity. Here, we characterised within-host RSV populations using deep-sequencing data from 319 nasopharyngeal swabs collected during 2017–2020. RSV-B had lower consensus diversity than RSV-A at the population level, while exhibiting greater within-host diversity. Two RSV-B consensus sequences had an amino acid alteration (K68N) in the fusion (F) protein, which has been associated with reduced susceptibility to nirsevimab (MEDI8897), a novel RSV monoclonal antibody under development. In addition, several minor variants were identified in the antigenic sites of the F protein, one of which may confer resistance to palivizumab, the only licensed RSV monoclonal antibody. The differences in within-host virus populations emphasise the importance of monitoring for vaccine efficacy and may help to explain the different prevalences of monoclonal antibody-escape mutants between the two subgroups.

For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code Data collection

Data analysis
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. Data Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: All RSV-positive samples (N = 858) collected from the REspiratory Syncytial virus Consortium in EUrope (RESCEU) project during the 2017-20 RSV seasons were sequenced for this study. Samples fulfilling the inclusion criteria of the within-host virus analysis were included in this study (N = 322). After removing three outlier samples, within-host genetic diversity of RSV was characterised from a total of 319 samples (44% RSV-A and 36% RSV-B), which represent the most comprehensive dataset to date.
There was one RSV-A and two RSV-B samples with a significantly higher mean cumulative minor allele frequency (MAF) per sample, 0.52%, 0.17%, and 0.19% respectively, than that of all other samples (mean 0.039%; range 0.025%-0.068%). These three samples were excluded from the within-host diversity analysis because they presumably represented a real or artefactual mixture of genetically distinct strains of the same RSV subgroup. In addition, genomic positions with read depth of less than 200 were excluded from the calculations of mean cumulative MAF per sample, nucleotide diversity, and Manhattan distances because sites with low read depth had a greater variance of minor variants than sites with high read depth due to a small sampling fraction. These exclusions were based on a pre-established analysis plan.
Our study was based on 858 RSV samples (i.e., biological replicates). Among them, 319 RSV samples that generated enough RSV reads with a single RSV subgroup were included in our analyses (excluding three samples that had a significantly high mean cumulative minor allele frequency per sample). The findings presented in the manuscript were summarised from these 319 replicates.
All available samples were sequenced and analysed, so no randomisation was performed. Sequencing was done in four different batches given the large number of samples in this study, and multiple linear regression and z-score standardisation of the data were applied to control for the covariate (i.e., batch).
Investigators were not blinded to group allocation during data collection and analysis. Blinding was not relevant to our study as there is no group allocation involved in this study.