Rectal swabs are potentially a valuable method for monitoring the gut microbiome in research and clinical settings, where it is important to adhere to strict timing, or where acute sampling is needed. It is currently unknown whether rectal swabs give comparable results to faecal samples regarding microbiota community composition in neonates and infants. To study how well the two sampling methods correlate in infants, we compared the 16S-rRNA-based sequencing results of 131 paired rectal swabs and faecal samples collected from 116 infants at two timepoints in early life. The paired samples were highly comparable regarding both diversity and overall community composition, and strongly correlated on taxonomical level. We observed no significant nor relevant contribution of sampling method to the variation in overall gut microbiota community composition in a multivariable model. Our study provides evidence supporting the use of rectal swabs as a reliable proxy for faecal samples in infant gut microbiota research.
The interest in studying the gut microbiome in relation to health and disease is rapidly growing1. With continuing advances in rapid sequencing technology, monitoring of the gut microbiome in a clinical setting and conducting longitudinal microbiota studies into cause-effect relationships are becoming more feasible2. Currently, the routine sampling method for gut microbiota analysis is the collection of faeces3. However, it can be problematic to collect faeces within a narrow timeframe, as stool is not always readily available, especially in early life when the frequency of defaecation varies greatly4. Rectal swabs, on the other hand, can be collected easily and at any time, allowing flexible and consistent sampling between individuals and in relation to interventions. The collection of rectal swabs is already being applied for the screening of specific pathogens and multi-resistant organisms in the clinical setting5. Previous studies have shown that faecal samples and rectal swabs show satisfactory concordance when studying the gut microbiota in adults6,7. In the paediatric population, rectal swabs have been compared to faecal samples with respect to the detection of specific pathogens, such as norovirus8, but not yet on their performance in analysing the overall microbiota composition. The infant gut microbiota is very dynamic in the first weeks of life9,10, so it would be valuable to verify whether rectal swabs give a reliable representation of its composition in this period. If so, rectal swabs could be an ideal sampling method for the monitoring of treatment or study interventions on the neonatal ward or paediatric intensive care unit or in (longitudinal) population-based studies.
In our study, we compared the alpha and beta diversity between rectal swabs and faecal samples collected at the same time from the same individual at two sampling moments, and studied how well the sampling methods correlate on taxonomical level. The objective of our study was to determine whether rectal swabs are a good proxy for faecal samples in infant gut microbiota research.
After quality filtering, there were 131 closely paired faecal and rectal swab samples available from a subset of 116 neonates with suspected early onset neonatal sepsis participating in the ZEBRA study. A closely paired sample is defined as a matched faecal sample and rectal swab collected within a timeframe of 24 hours from an individual participant. Paired samples were collected at two timepoints: before start (timepoint 1) and after cessation (timepoint 2) of antibiotic treatment. For ethical reasons the start of antibiotic treatment could not be delayed whilst waiting for a faecal sample, so per our study protocol we collected rectal swabs from all neonates before the start of antibiotic treatment, and additionally, a faecal sample if the neonate had defecated before start of treatment as well, to avoid treatment bias. The number of times that both the faecal sample and rectal swab were collected before antibiotic administration had started was therefore restricted to 21/116 infants. At the second timepoint, we managed to obtain closely paired faecal samples and rectal swabs in 110/116 infants (median time difference between paired samples: 0 hours, range 0–24 hours). In a total of 23 cases the rectal swab was collected before the faecal sample (range 5 minutes to 24 hours), in 20 cases the rectal swab was collected after the faecal sample (range 15 minutes to 22,5 hours) and in 88 cases the rectal swab and faecal sample were collected at almost the same time, where in practice a trained physician collected the rectal swab first and a nurse collected the faeces shortly hereafter, the rectal swab being a stimulatory trigger for defaecation. The median age at sampling was 1 day for timepoint 1 and 6 days for timepoint 2. Further sample characteristics are detailed in Supplementary Table S1. Of the 116 infants included in this study, 15 had a paired sample available for both timepoint 1 and 2. Therefore, for analyses comparing sampling methods overall, we stratified per timepoint to take repeated measures into account. In order to answer our primary research question, namely if a rectal swab reliably reflects the fecal composition of an individual, we included all 131 paired samples, regardless of timepoint, for correlation analyses.
Paired faecal samples and rectal swabs are comparable in alpha and beta diversity
The 262 samples analysed in this study represented 13,026,207 high quality Illumina Miseq sequences with a median Good’s coverage of 99.99% (range 99.72–100%). These sequences were annotated to 372 unique taxa. 270 taxa, representing 99.99% of all sequences, were present in both the faecal samples and rectal swabs. The combined relative abundance of the 52 taxa found only in faeces, and the 50 taxa found only in rectal swabs, was only 0.01%.
Overall, we found no differences in alpha diversity between the faecal samples and rectal swabs as measured by using observed species richness and Shannon diversity (Fig. 1a,b). Since we found a correlation between alpha diversity and timepoint (Wilcoxon test p = 0.004), we also performed the analyses stratified per timepoint (Fig. 1c). At timepoint 1, we did not find a significant difference in observed species richness and Shannon diversity (Fig. 1d) between the sampling methods, though at timepoint 2 species richness was significantly higher in the rectal swabs compared to the faecal samples (Wilcoxon test, median 23 [range 7–42] versus 16 [range 4–58] Operational Taxonomical Units [OTUs], p < 0.001). When analysing the paired data, however, we found a strong correlation between paired faecal samples and rectal swabs for both species richness and Shannon diversity indices (Fig. 1e,f; Pearson’s r = 0.641, and 0.697, respectively: both p < 0.001).
Regarding overall community composition, the two sampling methods did not differ at either timepoint, neither in an overall analysis, nor after stratification per timepoint (permutational multivariate analysis of variance [PERMANOVA]-test, R2 0.006 with p = 0.994 at timepoint 1 and R2 0.002 with p = 0.897 at timepoint 2; Supplementary Fig. 1). Furthermore, the composition of paired samples was significantly more similar than that of unpaired samples (Fig. 2; median Bray-Curtis [BC] similarity 0.866; versus median inter-individual similarity 0.007, p < 0.001). Neither the time between the collection of a paired faecal sample and rectal swab, nor the difference in reads between the sample types, was correlated with community composition similarity as measured by BC.
Effect of sampling method compared to other clinical variables on microbial community composition
To evaluate the importance of sampling method for the overall observed variation in microbial community composition, we performed a multivariable PERMANOVA-test. First, we tested variables known to be associated with microbiota composition (age, delivery mode, feeding type) univariately and only included variables that showed a significant association in our study (age, delivery mode) in the multivariable model, along with sampling method. The contribution of sampling method to the variation in community composition was minimal and not significant (Fig. 3; R2 0.002 p = 0.835), while participant’s age was the most important explanatory variable (R2 0.016, p = 0.002).
Paired faecal samples and rectal swabs correlate strongly on taxonomical level
The paired faecal samples and rectal swabs had a similar taxonomical composition with respect to the most abundant OTUs (Fig. 4). We performed Pearson correlation tests between the faecal samples and rectal swabs for all 198 testable OTUs, meaning that the 102 OTUs that were uniquely present in either the faecal samples or rectal swabs were excluded. The correlations of the top 15 most abundant taxa are shown in Table 1, all showing strong (Pearson’s r > 0.60) to very strong (Pearson’s r > 0.80) correlations. A total of 150 out of 198 testable OTUs correlated strongly and significantly in relative abundance between the paired samples with a median Pearson’s r of 0.92 (IQR 0.90–1.00, adjusted p-value < 0.05). Together, these 150 taxa had a combined relative abundance of 98.5% of all sequences observed in our dataset. The comprehensive results of the correlations of all OTUs can be found in Supplementary Table 2.
Rectal swabs are more flexible to collect than faecal samples for research and clinical purposes, and could therefore be a valuable tool in cases where it is important to adhere to strict sampling timeframes, as when sampling around time sensitive interventions. In this study, we showed that paired infants’ rectal swabs and faecal samples correlate well on alpha diversity, are comparable in overall community composition and correlate strongly on taxonomical level.
The observed species richness did not differ significantly between the two sampling methods at the earliest timepoint and correlated strongly between the paired faecal samples and rectal swabs. At the second timepoint, the observed species richness differed significantly between faecal samples and rectal swabs, even after filtering our OTU table of possible contaminants and also when repeating this analysis with rarefied data. However, this difference in diversity was contrary to what we expected: if at all different, we expected the rectal swabs, due to the smaller amount of material collected with this method, to pick up fewer species, but the opposite was the case, showing at least no taxa were missed with this type of sampling. Importantly, no differences in Shannon diversity were found between the sampling methods overall, or at either timepoint, and a clear correlation between paried samples was found. We also found an equal number of taxa that were unique to either sampling method (52 in faecal samples and 50 in rectal swabs), though the combined relative abundance of these was extremely low (<0.01%) and therefore less relevant for overall community structure. As a result, we found no difference in overall microbial community composition between the two sampling methods. Also, the concordance between the microbiota of a paired faecal sample and rectal swab collected from the same participant was high, confirming previous findings in adults7.
In a broader clinical context, we found it interesting to establish that sampling method did not significantly explain variation in microbiota composition, as opposed to known drivers such as age11, further supporting that rectal swabs are an appropriate proxy for faecal samples in infant gut microbiota studies. Unfortunately, we could not study the effect of antibiotics on composition in the multivariable model, because antibiotic treatment was colinear with timepoint in our study (sampling moments were before the start and after cessation of antibiotic treatment), and therefore age.
With respect to individual taxa, we found a high abundance of facultative anaerobic genera, such as Escherichia coli, and Staphylococcus epidermidis in the earliest samples, consistent with the description of normal early life gut microbiota development in previous studies,12,13. The paired faecal samples and rectal swabs showed a strong correlation for most bacteria abundantly present in the infant microbiota. The combined relative abundance of the taxa with a strong to very strong correlation (Pearson’s r > 0.60) between the sampling methods was above 98%, including predominant and clinically relevant taxa such as Klebsiella and Enterococcus faecium, which are known reservoirs for antibiotic resistance genes14,15,16, as well as taxa like Bifidobacterium which are associated with various beneficial functions17. Altogether, this underlines that rectal swabs are a very good proxy for faecal samples in microbiota analyses.
Study population and sample collection
To study whether rectal swabs are a good proxy for faecal samples in infant gut microbiota studies, we used a subset of 131 paired faecal and rectal swab samples from 116 children participating in the Dutch randomised controlled ZEBRA study. The ZEBRA study aims to evaluate the effects of antibiotic treatment indicated for (suspected) neonatal sepsis in the first week of life on the developing gut microbiota. Written informed consent was obtained from both parents. Ethical approval was granted by the national ethics committee in the Netherlands, METC Noord-Holland (committee on research involving human subjects, M014-024, NTR5119). The study was conducted in accordance with the European Statements for Good Clinical Practice.
Rectal swabs were collected using FaecalSwab™ kits (Copan Diagnostics, CA, USA) by trained physicians or research personnel before the start of antibiotic treatment (timepoint 1) and 24–48 hours after cessation of antibiotic therapy (timepoint 2). Faecal samples were obtained at the same timepoints, usually directly after the rectal swab, this being a stimulatory trigger for defaecation, and stored in sterile faecal containers by a nurse during hospital stay, or by the parents if the participant was already discharged at the later timepoint. All material was directly stored at −20 °C before being transferred (<2 weeks) to a −80 °C freezer until further laboratory processing. We only analysed paired samples that were obtained within 24 hours of one another, and in the case of timepoint 1 were also both obtained strictly before the start of antibiotic treatment.
DNA isolation and sequencing
Bacterial DNA was isolated from faecal samples as previously described18. We used approximately 100 μl of faeces, 300 μl of lysis buffer, 500 μl zirconium beads and 500 μl of phenol, and performed an extra phenol/chloroform step. Samples collected on day 1 were presumed to have low bacterial abundance. Therefore, further adaptations were applied as described previously19, with the additional changes of using 150 μl instead of 100 μl of faeces (or 100 μl of material in the case of rectal swabs) and implementing an extra step with wash buffer 1. DNA blanks and a positive control consisting of a mix of up to three random faecal samples were used for quality control. The amount of bacterial DNA was determined by quantitative polymerase chain reaction (qPCR) as previously described19.
After amplifying the V4 hypervariable region of the 16S rRNA, quantification of the amount of amplified DNA per sample was executed with the dsDNA 910 Reagent Kit on the Fragment Analyzer (Advanced Analytical, IA, USA). Samples yielding insufficient DNA after amplification, defined as <0.5 ng/μl, were repeated with a higher concentration of template DNA. A mock control and three PCR blanks were included in each PCR plate. 16S rRNA sequencing was performed on the Illumina MiSeq platform (Illumina, Eindhoven, the Netherlands).
The samples and their sequences described in this manuscript are part of a larger dataset existing of 2176 samples and controls, and together were processed using our in-house bioinformatics pipeline20. In short, we applied an adaptive, window-based trimming algorithm (Sickle, version 1.33) to filter out low quality reads, maintaining a Phred score threshold of 30 and a length threshold of 150 nucleotides21. Error correction was performed with BayesHammer (SPAdes genome assembler toolkit, version 3.5.0)22. Each set of paired-end sequence reads was assembled using PANDAseq (version 2.10) and demultiplexed (QIIME, version 1.9.1)23,24. Singleton and chimeric reads (UCHIME) were removed. OTU picking was conducted with VSEARCH abundance-based greedy clustering with a 97% identity threshold25. OTUs were annotated using the Naïve Bayesian RDP classifier (version 2.2) and the SILVA reference database26,27. This resulted in an OTU-table containing 18,951 taxa in total. We created an abundance-filtered dataset selecting OTUs present at a confident level of detection (0.1% relative abundance) in at least two samples28, hereafter referred to as our raw OTU-table. The raw OTU-table consisted of in total 730 taxa (0.49% sequences excluded with filtering). Next, we used both the prevalence and frequency methods of the decontam package29 to exclude possible contaminants, discarding 35 taxa, and thus retaining 695 taxa in total. The subset of paired samples studied here contained only 372 of these taxa.
All analyses were performed in R version 3.4.330 within RStudio version 1.1.38331 and figures were made using packages ggplot232 and ggpubr33. The alpha diversity of the two sampling methods was compared using the observed species richness and Shannon diversity. When rarefying to a sequencing depth of 25,000 reads after filtering and decontamination (lowest quartile), or even 15,000 reads, the differences found using the raw data remained, so from here the raw (unrarefied) data was used for the comparisons and correlations in alpha diversity. Group differences were tested for using Wilcoxon tests. The correlation in alpha diversity between paired samples was calculated with Pearson.
The effect of sampling method on composition was analysed univariately with PERMANOVA-tests with 1999 permutations (adonis function; vegan package34) for all samples, and also stratified per timepoint, to prevent confounding by repeated measures. To visualise differences in composition we generated non-metric multidimensional scaling plots (nMDS; vegan package34). Ordinations were based on the Bray-Curtis (BC) dissimilarity matrix of relative abundance data with parameter trymax 10,000. To test whether paired faecal samples and rectal swabs (within child comparison) were more similar in microbiota composition than unpaired samples (between children comparison), we calculated the BC similarities (1 – BC dissimilarity) between all samples and compared the level of similarity between the paired and unpaired samples using Wilcoxon. For the paired samples, we also tested the correlation between composition and the difference in collection time and reads between the two sampling methods.
We performed a temporal, multivariable PERMANOVA-test (adonis2 function, vegan package34, 1999 permutations) to test whether sampling method contributes to the variation in overall gut microbiota community composition and how this relates to other known drivers of community composition. First, we tested covariates known to be associated with gut microbiota composition (age, delivery mode, feeding type) univariately with PERMANOVA-tests as described above. Only covariates that showed a significant association in the univariate analysis (age, delivery mode), were included in a multivariable model along with sampling method, whilst setting the strata parameter to individual.
Finally, we evaluated whether the sampling methods correlated on taxonomical level by calculating the Pearson correlation coefficient for all individual taxa based on their relative abundance. We also calculated the delta mean relative abundance between the two sampling methods to show the differences found.
P-values or, where applicable, adjusted p-values calculated using the Benjamini-Hochberg method35, <0.05 were deemed significant.
Sequence data that support the findings of this study have been deposited in the NCBI Sequence Read Archive (SRA) database with BioProject ID PRJNA524461. The last 3 digits of the sample.id variable of the attributes are unique to a sample pair (with two exceptions: 387.006 is paired with 382.138 and 384.464 is paired with 383.572). The time variable refers to timepoint (1 = Timepoint 1, 2 = Timepoint 2). The mat variable refers to sampling method (2 = faeces, 5 = rectal swab).
Kundu, P., Blacher, E., Elinav, E. & Pettersson, S. Our Gut Microbiome: The Evolving Inner Self. Cell 171, 1481–1493 (2017).
Leggett, R. M. et al. Rapid MinION metagenomic profiling of the preterm infant gut microbiota to aid in pathogen diagnostics. bioRxiv 180406, https://doi.org/10.1101/180406 (2017).
Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
den Hertog, J. et al. The defecation pattern of healthy term infants up to the age of 3 months. Arch. Dis. Child. Fetal Neonatal Ed. 97, F465–70 (2012).
Siegel, J. D., Rhinehart, E., Jackson, M. & Chiarello, L. & Healthcare Infection Control Practices Advisory Committee. Management of multidrug-resistant organisms in health care settings, 2006. Am. J. Infect. Control 35, S165–93 (2007).
Budding, A. E. et al. Rectal swabs for analysis of the intestinal microbiota. PLoS One 9, e101344 (2014).
Bassis, C. M. et al. Comparison of stool versus rectal swab samples and storage conditions on bacterial community profiles. BMC Microbiol. 17, 78 (2017).
Gibory, M. et al. Rotavirus detection in bulk stool and rectal swab specimens in children with acute gastroenteritis in Norway. J. Clin. Virol. 97, 50–53 (2017).
Wampach, L. et al. Colonization and Succession within the Human Gut Microbiome by Archaea, Bacteria, and Microeukaryotes during the First Year of Life. Front. Microbiol. 8, 738 (2017).
Arrieta, M.-C., Stiemsma, L. T., Amenyogbe, N., Brown, E. M. & Finlay, B. The intestinal microbiome in early life: health and disease. Front. Immunol. 5, 427 (2014).
Lozupone, C. A. et al. Meta-analyses of studies of the human microbiota. Genome Res. 23, 1704–14 (2013).
Yatsunenko, T. et al. Human gut microbiome viewed across age and geography. Nature 486, 222–7 (2012).
Bokulich, N. A. et al. Antibiotics, birth mode, and diet shape microbiome maturation during early life. Sci. Transl. Med. 8, 343ra82–343ra82 (2016).
Gibson, M. K. et al. Developmental dynamics of the preterm infant gut microbiota and antibiotic resistome. Nat. Microbiol. 1, 16024 (2016).
Boucher, H. W. et al. Bad bugs, no drugs: no ESKAPE! An update from the Infectious Diseases Society of America. Clin. Infect. Dis. 48, 1–12 (2009).
Peterson, L. R. Bad bugs, no drugs: no ESCAPE revisited. Clin. Infect. Dis. 49, 992–3 (2009).
Kwak, M.-J. et al. Evolutionary architecture of the infant-adapted group of Bifidobacterium species associated with the probiotic function. Syst. Appl. Microbiol. 39, 429–439 (2016).
Zaura, E., Keijser, B. J. F., Huse, S. M. & Crielaard, W. Defining the healthy core microbiome of oral microbial communities. BMC Microbiol. 9, 259 (2009).
Biesbroek, G. et al. Deep sequencing analyses of low density microbial communities: working at the boundary of accurate microbiota detection. PLoS One 7, e32942 (2012).
Bosch, A. A. T. M. et al. Maturation of the Infant Respiratory Microbiota, Environmental Drivers, and Health Consequences. A Prospective Cohort Study. Am. J. Respir. Crit. Care Med. 196, 1582–1590 (2017).
Joshi, N. A. & Fass, J. N. Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ files (Version 1.33). Available at, https://github.com/najoshi/sickle (2011).
Nikolenko, S. I., Korobeynikov, A. I. & Alekseyev, M. A. BayesHammer: Bayesian clustering for error correction in single-cell sequencing. BMC Genomics 14, S7 (2013).
Masella, A. P., Bartram, A. K., Truszkowski, J. M., Brown, D. G. & Neufeld, J. D. PANDAseq: paired-end assembler for illumina sequences. BMC Bioinformatics 13, 31 (2012).
Caporaso, J. G. et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7, 335–6 (2010).
Rognes, T., Flouri, T., Nichols, B., Quince, C. & Mahé, F. VSEARCH: a versatile open source tool for metagenomics. PeerJ 4, e2584 (2016).
Wang, Q., Garrity, G. M., Tiedje, J. M. & Cole, J. R. Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Appl. Environ. Microbiol. 73, 5261–5267 (2007).
Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2012).
Subramanian, S. et al. Persistent gut microbiota immaturity in malnourished Bangladeshi children. Nature 510, 417–21 (2014).
Davis, N. M., Proctor, D., Holmes, S. P., Relman, D. A. & Callahan, B. J. Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data. bioRxi v, 221499, https://doi.org/10.1101/221499 (2017).
R Core Team. R: A language and environment for statistical computing. https://www.r-project.org/ (2017).
RStudio Team. RStudio: Integrated Development for R. http://www.rstudio.com/ (2016).
Wickham, B. ggplot2: Elegant Graphics for Data Analysis. (Springer-Verlag New York, 2009).
Kassambara, A. ggpubr: ‘ggplot2’ Based Publication Ready Plots. https://CRAN.R-project.org/package=ggpubr (2018).
Oksanen, J. et al. Title Community Ecology Package. https://CRAN.R-project.org/package=vegan (2017).
Benjamini, Y. & Hochberg, Y. Controlling The False Discovery Rate - A Practical And Powerful Approach To Multiple. Testing. J. R. Stat. Soc. Ser. B 57, 289–300 (1995).
The authors are most grateful for the participation of all the children and their families. We would like to acknowledge all the members of the research team of the Spaarne Gasthuis Academy, the laboratory staff of the University Medical Center Utrecht and the Netherlands Organisation for Applied Scientific Research, and the Neonatology Departments of the Spaarne Gasthuis Hoofddorp and Haarlem, Diakonessenhuis Utrecht and Tergooiziekenhuis Blaricum for the help in participant recruitment. The ZEBRA research was financed by ZonMw Priority Medicines grant 205300001 and CSO grant SCAF/16/03.
E.A.M.S. declares to have received unrestricted research support from Pfizer, grant support for vaccine studies from Pfizer and GSK and fees paid to the institution for advisory boards or participation in independent data monitoring committees for Pfizer and GSK. D.B. declares to have received unrestricted fees paid to the institution for advisory work for Friesland Campina as well as research support from Nutricia. None of the fees or grants listed here was received for the research described in this paper. No other authors report financial disclosures. None of the other authors report competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Reyman, M., van Houten, M.A., Arp, K. et al. Rectal swabs are a reliable proxy for faecal samples in infant gut microbiota research based on 16S-rRNA sequencing. Sci Rep 9, 16072 (2019). https://doi.org/10.1038/s41598-019-52549-z